- Add local-storage.ts with smart folder structure:
/storage/products/{brand}/{state}/{product_id}/
- Add storage-adapter.ts unified abstraction
- Add docker-compose.local.yml (NO MinIO)
- Add start-local.sh convenience script
- Update CLAUDE.md with:
- PERMANENT RULES section (no data deletion)
- DEPLOYMENT AUTHORIZATION requirements
- LOCAL DEVELOPMENT defaults
- STORAGE BEHAVIOR documentation
- FORBIDDEN ACTIONS list
- UI ANONYMIZATION rules
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
30 KiB
Claude Guidelines for this Project
PERMANENT RULES (NEVER VIOLATE)
1. NO DELETION OF DATA — EVER
CannaiQ is a historical analytics system. Data retention is permanent by design.
NEVER delete:
- Product records
- Crawled snapshots
- Images
- Directories
- Logs
- Orchestrator traces
- Profiles
- Selector configs
- Crawl outcomes
- Store data
- Brand data
NEVER automate cleanup:
- No cron or scheduled job may
rm,unlink,delete,purge,prune,clean, orresetany storage directory or DB row - No migration may DELETE data — only add/update/alter columns
- If cleanup is required, ONLY the user may issue a manual command
Code enforcement:
local-storage.tsmust only: write files, create directories, read files- No
deleteImage,deleteProductImages, or similar functions
2. DEPLOYMENT AUTHORIZATION REQUIRED
NEVER deploy to production unless the user explicitly says:
"CLAUDE — DEPLOYMENT IS NOW AUTHORIZED."
Until then:
- All work is LOCAL ONLY
- No
kubectl apply,docker push, or remote operations - No port-forwarding to production
- No connecting to Kubernetes clusters
3. LOCAL DEVELOPMENT BY DEFAULT
In local mode:
- Use
docker-compose.local.yml(NO MinIO) - Use local filesystem storage at
./storage - Connect to local PostgreSQL at
localhost:54320 - Backend runs at
localhost:3010 - NO remote connections, NO Kubernetes, NO MinIO
Environment:
STORAGE_DRIVER=local
STORAGE_BASE_PATH=./storage
DATABASE_URL=postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus
# MINIO_ENDPOINT is NOT set (forces local storage)
STORAGE BEHAVIOR
Local Storage Structure
/storage/products/{brand}/{state}/{product_id}/
image-{hash}.webp
image-{hash}-medium.webp
image-{hash}-thumb.webp
/storage/brands/{brand}/
logo-{hash}.webp
Storage Adapter
import { saveImage, getImageUrl } from '../utils/storage-adapter';
// Automatically uses local storage when STORAGE_DRIVER=local
Files
| File | Purpose |
|---|---|
backend/src/utils/local-storage.ts |
Local filesystem adapter |
backend/src/utils/storage-adapter.ts |
Unified storage abstraction |
docker-compose.local.yml |
Local stack without MinIO |
start-local.sh |
Convenience startup script |
FORBIDDEN ACTIONS
- Deleting any data (products, snapshots, images, logs, traces)
- Deploying without explicit authorization
- Connecting to Kubernetes without authorization
- Port-forwarding to production without authorization
- Starting MinIO in local development
- Using S3/MinIO SDKs when
STORAGE_DRIVER=local - Automating cleanup of any kind
- Dropping database tables or columns
- Overwriting historical records (always append snapshots)
UI ANONYMIZATION RULES
- No vendor names in forward-facing URLs: use
/api/az/...,/az,/az-schedule - No "dutchie", "treez", "jane", "weedmaps", "leafly" visible in consumer UIs
- Internal admin tools may show provider names for debugging
FUTURE TODO / PENDING FEATURES
- Orchestrator observability dashboard
- Crawl profile management UI
- State machine sandbox (disabled until authorized)
- Multi-state expansion beyond AZ
Multi-Site Architecture (CRITICAL)
This project has 5 working locations - always clarify which one before making changes:
| Folder | Domain | Type | Purpose |
|---|---|---|---|
backend/ |
(shared) | Express API | Single backend serving all frontends |
frontend/ |
dispos.crawlsy.com | React SPA (Vite) | Legacy admin dashboard (internal use) |
cannaiq/ |
cannaiq.co | React SPA + PWA | NEW admin dashboard / B2B analytics |
findadispo/ |
findadispo.com | React SPA + PWA | Consumer dispensary finder |
findagram/ |
findagram.co | React SPA + PWA | Consumer delivery marketplace |
IMPORTANT: frontend/ vs cannaiq/ confusion:
frontend/= OLD/legacy dashboard design, deployed todispos.crawlsy.com(internal admin)cannaiq/= NEW dashboard design, deployed tocannaiq.co(customer-facing B2B)- These are DIFFERENT codebases - do NOT confuse them!
Before any frontend work, ASK: "Which site? cannaiq, findadispo, findagram, or legacy (frontend/)?"
All four frontends share:
- Same backend API (port 3010)
- Same PostgreSQL database
- Same Kubernetes deployment for backend
Each frontend has:
- Its own folder, package.json, Dockerfile
- Its own domain and branding
- Its own PWA manifest and service worker (cannaiq, findadispo, findagram)
- Separate Docker containers in production
Multi-Domain Hosting Architecture
All three frontends are served from the same IP using host-based routing:
Kubernetes Ingress (Production):
# Each domain routes to its own frontend service
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: multi-site-ingress
spec:
rules:
- host: cannaiq.co
http:
paths:
- path: /
backend:
service:
name: cannaiq-frontend
port: 80
- path: /api
backend:
service:
name: scraper # shared backend
port: 3010
- host: findadispo.com
http:
paths:
- path: /
backend:
service:
name: findadispo-frontend
port: 80
- path: /api
backend:
service:
name: scraper
port: 3010
- host: findagram.co
http:
paths:
- path: /
backend:
service:
name: findagram-frontend
port: 80
- path: /api
backend:
service:
name: scraper
port: 3010
Key Points:
- DNS A records for all 3 domains point to same IP
- Ingress controller routes based on
Hostheader - Each frontend is a separate Docker container (nginx serving static files)
- All frontends share the same backend API at
/api/* - SSL/TLS handled at ingress level (cert-manager)
PWA Setup Requirements
Each frontend is a Progressive Web App (PWA). Required files in each public/ folder:
- manifest.json - App metadata, icons, theme colors
- service-worker.js - Offline caching, background sync
- Icons - 192x192 and 512x512 PNG icons
Vite PWA Plugin Setup (in each frontend's vite.config.ts):
import { VitePWA } from 'vite-plugin-pwa'
export default defineConfig({
plugins: [
react(),
VitePWA({
registerType: 'autoUpdate',
manifest: {
name: 'Site Name',
short_name: 'Short',
theme_color: '#10b981',
icons: [
{ src: '/icon-192.png', sizes: '192x192', type: 'image/png' },
{ src: '/icon-512.png', sizes: '512x512', type: 'image/png' }
]
},
workbox: {
globPatterns: ['**/*.{js,css,html,ico,png,svg,woff2}']
}
})
]
})
Core Rules Summary
- DB: Use the single consolidated DB (CRAWLSY_DATABASE_URL → DATABASE_URL); no dual pools; schema_migrations must exist; apply migrations 031/032/033.
- Images: No MinIO. Save to local /images/products//-.webp (and brands); preserve original URL; serve via backend static.
- Dutchie GraphQL: Endpoint https://dutchie.com/api-3/graphql. Variables must use productsFilter.dispensaryId (platform_dispensary_id). Mode A: Status="Active". Mode B: Status=null/activeOnly:false. No dispensaryFilter.cNameOrID.
- cName/slug: Derive cName from each store's menu_url (/embedded-menu/ or /dispensary/). No hardcoded defaults. Each location must have its own valid menu_url and platform_dispensary_id; do not reuse IDs across locations. If slug is invalid/missing, mark not crawlable and log; resolve ID before crawling.
- Dual-mode always: useBothModes:true to get pricing (Mode A) + full coverage (Mode B).
- Batch DB writes: Chunk products/snapshots/missing (100–200) to avoid OOM.
- OOS/missing: Include inactive/OOS in Mode B. Union A+B, dedupe by external_product_id+dispensary_id. Insert snapshots with stock_status; if absent from both modes, insert missing_from_feed. Do not filter OOS by default.
- API/Frontend: Use /api/az/... endpoints (stores/products/brands/categories/summary/dashboard). Rebuild frontend with VITE_API_URL pointing to the backend.
- Scheduling: Crawl only menu_type='dutchie' AND platform_dispensary_id IS NOT NULL. 4-hour crawl with jitter; detection job to set menu_type and resolve platform IDs.
- Monitor: /scraper-monitor (and /az-schedule) should show active/recent jobs from job_run_logs/crawl_jobs, with auto-refresh.
- No slug guessing: Never use defaults like "AZ-Deeply-Rooted." Always derive per store from menu_url and resolve platform IDs per location.
Detailed Rules
-
Use the consolidated DB everywhere
- Preferred env:
CRAWLSY_DATABASE_URL(fallbackDATABASE_URL). - Do NOT create dutchie tables in the legacy DB. Apply migrations 031/032/033 to the consolidated DB and restart.
- Preferred env:
-
Dispensary vs Store
- Dutchie pipeline uses
dispensaries(not legacystores). For dutchie crawls, always work with dispensary ID. - Ignore legacy fields like
dutchie_plus_idand slug guessing. Use the record'smenu_urlandplatform_dispensary_id.
- Dutchie pipeline uses
-
Menu detection and platform IDs
- Set
menu_typefrommenu_urldetection; resolveplatform_dispensary_idformenu_type='dutchie'. - Admin should have "refresh detection" and "resolve ID" actions; schedule/crawl only when
menu_type='dutchie'ANDplatform_dispensary_idis set.
- Set
-
Queries and mapping
- The DB returns snake_case; code expects camelCase. Always alias/map:
platform_dispensary_id AS "platformDispensaryId"- Map via
mapDbRowToDispensarywhen loading dispensaries (scheduler, crawler, admin crawl).
- Avoid
SELECT *; explicitly select and/or map fields.
- The DB returns snake_case; code expects camelCase. Always alias/map:
-
Scheduling
/scraper-scheduleshould accept filters/search (All vs AZ-only, name).- "Run Now"/scheduler must skip or warn if
menu_type!='dutchie'orplatform_dispensary_idmissing. - Use
dispensary_crawl_statusview; show reason when not crawlable.
-
Crawling
- Trigger dutchie crawls by dispensary ID (e.g.,
/api/az/admin/crawl/:idorrunDispensaryOrchestrator(id)). - Update existing products (by stable product ID), append snapshots for history (every 4h cadence), download images locally (
/images/...), store local URLs. - Use dutchie GraphQL pipeline only for
menu_type='dutchie'.
- Trigger dutchie crawls by dispensary ID (e.g.,
-
Frontend
- Forward-facing URLs:
/api/az,/az,/az-schedule; no vendor names. /scraper-schedule: add filters/search, keep as master view for all schedules; reflect platform ID/menu_type status and controls (resolve ID, run now, enable/disable/delete).
- Forward-facing URLs:
-
No slug guessing
- Do not guess slugs; use the DB record's
menu_urland ID. Resolve platform ID from the URL/cName; if set, crawl directly by ID.
- Do not guess slugs; use the DB record's
-
Verify locally before pushing
- Apply migrations, restart backend, ensure auth (
userstable) exists, run dutchie crawl for a known dispensary (e.g., Deeply Rooted), check/api/az/dashboard,/api/az/stores/:id/products,/az,/scraper-schedule.
- Apply migrations, restart backend, ensure auth (
-
Image storage (no MinIO)
- Save images to local filesystem only. Do not create or use MinIO in Docker.
- Product images:
/images/products/<dispensary_id>/<product_id>-<hash>.webp(+medium/+thumb). - Brand images:
/images/brands/<brand_slug_or_sku>-<hash>.webp. - Store local URLs in DB fields (keep original URLs as fallback only).
- Serve
/imagesvia backend static middleware.
-
Dutchie GraphQL fetch rules
- Endpoint:
https://dutchie.com/api-3/graphql(NOTapi-gw.dutchie.comwhich no longer exists). - Variables: Use
productsFilter.dispensaryId=platform_dispensary_id(MongoDB ObjectId, e.g.,6405ef617056e8014d79101b). - Do NOT use
dispensaryFilter.cNameOrID- that's outdated. cName(e.g.,AZ-Deeply-Rooted) is only for Referer/Origin headers and Puppeteer session bootstrapping.- Mode A:
Status: "Active"- returns active products with pricing - Mode B:
Status: null/activeOnly: false- returns all products including OOS/inactive - Example payload:
{ "operationName": "FilteredProducts", "variables": { "productsFilter": { "dispensaryId": "6405ef617056e8014d79101b", "pricingType": "rec", "Status": "Active" } }, "extensions": { "persistedQuery": { "version": 1, "sha256Hash": "<hash>" } } } - Headers (server-side axios only): Chrome UA,
Origin: https://dutchie.com,Referer: https://dutchie.com/embedded-menu/<cName>,accept: application/json,content-type: application/json. - If local DNS can't resolve, run fetch from an environment that can (K8s pod/remote host), not from browser.
- Use server-side axios with embedded-menu headers; include CF/session cookie from Puppeteer if needed.
- Endpoint:
-
Stop over-prep; run the crawl
- To seed/refresh a store, run a one-off crawl by dispensary ID (example for Deeply Rooted):
If local DNS is blocked, run the same command inside the scraper pod via
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" \ npx tsx -e "const { crawlDispensaryProducts } = require('./src/dutchie-az/services/product-crawler'); const d={id:112,name:'Deeply Rooted',platform:'dutchie',platformDispensaryId:'6405ef617056e8014d79101b',menuType:'dutchie'}; crawlDispensaryProducts(d,'rec',{useBothModes:true}).then(r=>{console.log(r);process.exit(0);}).catch(e=>{console.error(e);process.exit(1);});"kubectl exec ... -- bash -lc '...'. - After crawl, verify counts via
dutchie_products,dutchie_product_snapshots, anddispensaries.last_crawl_at. Do not inspect the legacyproductstable for Dutchie.
- To seed/refresh a store, run a one-off crawl by dispensary ID (example for Deeply Rooted):
-
Fetch troubleshooting
- If 403 or empty data: log status + first GraphQL error; include cf_clearance/session cookie from Puppeteer; ensure headers match a real Chrome request; ensure variables use
productsFilter.dispensaryId. - If DNS fails locally, do NOT debug DNS—run the fetch from an environment that resolves (K8s/remote) or via Puppeteer-captured headers/cookies. No browser/CORS attempts.
- If 403 or empty data: log status + first GraphQL error; include cf_clearance/session cookie from Puppeteer; ensure headers match a real Chrome request; ensure variables use
-
Views and metrics
- Keep v_brands/v_categories/v_brand_history based on
dutchie_productsand preserve brand_count metrics. Do not drop brand_count.
- Keep v_brands/v_categories/v_brand_history based on
-
Batch DB writes to avoid OOM
- Do NOT build one giant upsert/insert payload for products/snapshots/missing marks.
- Chunk arrays (e.g., 100–200 items) and upsert/insert in a loop; drop references after each chunk.
- Apply to products, product snapshots, and any "mark missing" logic to keep memory low during crawls.
-
Use dual-mode crawls by default
- Always run with
useBothModes:trueto combine:- Mode A (active feed with pricing/stock)
- Mode B (max coverage including OOS/inactive)
- Union/dedupe by product ID so you keep full coverage and pricing in one run.
- If you only run Mode B, prices will be null; dual-mode fills pricing while retaining OOS items.
- Always run with
-
Capture OOS and missing items
- GraphQL variables must include inactive/OOS (Status: All / activeOnly:false). Mode B already returns OOS/inactive; union with Mode A to keep pricing.
- After unioning Mode A/B, upsert products and insert snapshots with stock_status from the feed. If an existing product is absent from both Mode A and Mode B for the run, insert a snapshot with is_present_in_feed=false and stock_status='missing_from_feed'.
- Do not filter out OOS/missing in the API; only filter when the user requests (e.g., stockStatus=in_stock). Expose stock_status/in_stock from the latest snapshot (fallback to product).
- Verify with
/api/az/stores/:id/products?stockStatus=out_of_stockand?stockStatus=missing_from_feed.
-
Menu discovery must crawl the website when menu_url is null
- For dispensaries with no menu_url or unknown menu_type, crawl the dispensary.website (if present) to find provider links (dutchie, treez, jane, weedmaps, leafly, etc.). Follow “menu/order/shop” links up to a shallow depth with timeouts/rate limits.
- If a provider link is found, set menu_url, set menu_type, and store detection metadata; if dutchie, derive cName from menu_url and resolve platform_dispensary_id; store resolved_at and detection details.
- Do NOT mark a dispensary not_crawlable solely because menu_url is null; only mark not_crawlable if the website crawl fails to find a menu or returns 403/404/invalid. Log the reason in provider_detection_data and crawl_status_reason.
- Keep this as the menu discovery job (separate from product crawls); log successes/errors to job_run_logs. Only schedule product crawls for stores with menu_type='dutchie' AND platform_dispensary_id IS NOT NULL.
-
Preserve all stock statuses (including unknown)
- Do not filter or drop stock_status values in API/UI; pass through whatever is stored on the latest snapshot/product. Expected values include: in_stock, out_of_stock (if exposed), missing_from_feed, unknown. Only apply filters when explicitly requested by the user.
-
Never delete or overwrite historical data
- Do not delete products/snapshots or overwrite historical records. Always append snapshots for changes (price/stock/qty), and mark missing_from_feed instead of removing records. Historical data must remain intact for analytics.
-
Deployment via CI/CD only
- Test locally, commit clean changes, and let CI/CD build and deploy to Kubernetes at code.cannabrands.app. Do NOT manually build/push images or tweak prod pods. Deploy backend first, smoke-test APIs, then frontend; roll back via CI/CD if needed.
-
Per-location cName and platform_dispensary_id resolution
- For each dispensary, menu_url and cName must be valid for that exact location; no hardcoded defaults and no sharing platform_dispensary_id across locations.
- Derive cName from menu_url per store:
/embedded-menu/<cName>or/dispensary/<cName>. - Resolve platform_dispensary_id from that cName using GraphQL GetAddressBasedDispensaryData.
- If the slug is invalid/missing, mark the store not crawlable and log it; do not crawl with a mismatched cName/ID. Store the error in
provider_detection_data.resolution_error. - Before crawling, validate that the cName from menu_url matches the resolved platform ID; if mismatched, re-resolve before proceeding.
-
API endpoints (AZ pipeline)
- Use /api/az/... endpoints: stores, products, brands, categories, summary, dashboard
- Rebuild frontend with VITE_API_URL pointing to the backend
- Dispensary Detail and analytics must use AZ endpoints
-
Monitoring and logging
- /scraper-monitor (and /az-schedule) should show active/recent jobs from job_run_logs/crawl_jobs
- Auto-refresh every 30 seconds
- System Logs page should show real log data, not just startup messages
-
Dashboard Architecture - CRITICAL
- Frontend: If you see old labels like "Active Proxies" or "Active Stores", it means the old dashboard bundle is being served. Rebuild the frontend with
VITE_API_URLpointing to the correct backend and redeploy. Clear browser cache. Confirm new labels show up. - Backend:
/api/dashboard/statsMUST use the consolidated DB (same pool as dutchie-az module). Use the correct tables:dutchie_products,dispensaries, and views likev_dashboard_stats,v_latest_snapshots. Do NOT use a separate legacy connection. Do NOT queryaz_products(doesn't exist) or legacystores/productstables. - DB Connectivity: Use the proper DB host/role. Errors like
role "dutchie" does not existmean you're exec'ing into the wrong Postgres pod or using wrong credentials. Confirm the correctDATABASE_URLand test:kubectl exec deployment/scraper -n dispensary-scraper -- psql $DATABASE_URL -c '\dt' - After fixing: Dashboard should show real data (e.g., 777 products) instead of zeros. Do NOT revert to legacy tables; point dashboard queries to the consolidated DB/views.
- Checklist:
- Rebuild/redeploy frontend with correct API URL, clear cache
- Fix
/api/dashboard/*to use the consolidated DB pool and dutchie views/tables - Test
/api/dashboard/statsfrom the scraper pod; then reload the UI
- Frontend: If you see old labels like "Active Proxies" or "Active Stores", it means the old dashboard bundle is being served. Rebuild the frontend with
-
Deployment (Gitea + Kubernetes)
- Registry: Gitea at
code.cannabrands.app/creationshop/dispensary-scraper - Build and push (from backend directory):
# Login to Gitea container registry docker login code.cannabrands.app # Build the image cd backend docker build -t code.cannabrands.app/creationshop/dispensary-scraper:latest . # Push to registry docker push code.cannabrands.app/creationshop/dispensary-scraper:latest - Deploy to Kubernetes:
# Restart deployments to pull new image kubectl rollout restart deployment/scraper -n dispensary-scraper kubectl rollout restart deployment/scraper-worker -n dispensary-scraper # Watch rollout status kubectl rollout status deployment/scraper -n dispensary-scraper kubectl rollout status deployment/scraper-worker -n dispensary-scraper - Check pods:
kubectl get pods -n dispensary-scraper kubectl logs -f deployment/scraper -n dispensary-scraper kubectl logs -f deployment/scraper-worker -n dispensary-scraper - K8s manifests are in
/k8s/folder (scraper.yaml, scraper-worker.yaml, etc.) - imagePullSecrets use
regcredsecret for Gitea registry auth
- Registry: Gitea at
-
Crawler Architecture
- Scraper pod (1 replica): Runs the Express API server + scheduler. The scheduler enqueues detection and crawl jobs to the database queue (
crawl_jobstable). - Scraper-worker pods (5 replicas): Each worker runs
dist/dutchie-az/services/worker.js, polling the job queue and processing jobs. - Job types processed by workers:
menu_detection/menu_detection_single: Detect menu provider type and resolve platform_dispensary_id from menu_urldutchie_product_crawl: Crawl products from Dutchie GraphQL API for dispensaries with valid platform IDs
- Job schedules (managed in
job_schedulestable):dutchie_az_menu_detection: Runs daily with 60-min jitter, detects menu type for dispensaries with unknown menu_typedutchie_az_product_crawl: Runs every 4 hours with 30-min jitter, crawls products from all detected Dutchie dispensaries
- Trigger schedules manually:
curl -X POST /api/az/admin/schedules/{id}/trigger - Check schedule status:
curl /api/az/admin/schedules - Worker logs:
kubectl logs -f deployment/scraper-worker -n dispensary-scraper
- Scraper pod (1 replica): Runs the Express API server + scheduler. The scheduler enqueues detection and crawl jobs to the database queue (
-
Crawler Maintenance Procedure (Check Jobs, Requeue, Restart) When crawlers are stuck or jobs aren't processing, follow this procedure:
Step 1: Check Job Status
# Port-forward to production kubectl port-forward -n dispensary-scraper deployment/scraper 3099:3010 & # Check active/stuck jobs curl -s http://localhost:3099/api/az/monitor/active-jobs | jq . # Check recent job history curl -s "http://localhost:3099/api/az/monitor/jobs?limit=20" | jq '.jobs[] | {id, job_type, status, dispensary_id, started_at, products_found, duration_min: (.duration_ms/60000 | floor)}' # Check schedule status curl -s http://localhost:3099/api/az/admin/schedules | jq '.schedules[] | {id, jobName, enabled, lastRunAt, lastStatus, nextRunAt}'Step 2: Reset Stuck Jobs Jobs are considered stuck if they have
status='running'but no heartbeat in >30 minutes:# Via API (if endpoint exists) curl -s -X POST http://localhost:3099/api/az/admin/reset-stuck-jobs # Via direct DB (if API not available) kubectl exec -n dispensary-scraper deployment/scraper -- psql $DATABASE_URL -c " UPDATE dispensary_crawl_jobs SET status = 'failed', error_message = 'Job timed out - worker stopped sending heartbeats', completed_at = NOW() WHERE status = 'running' AND (last_heartbeat_at < NOW() - INTERVAL '30 minutes' OR last_heartbeat_at IS NULL); "Step 3: Requeue Jobs (Trigger Fresh Crawl)
# Trigger product crawl schedule (typically ID 1) curl -s -X POST http://localhost:3099/api/az/admin/schedules/1/trigger # Trigger menu detection schedule (typically ID 2) curl -s -X POST http://localhost:3099/api/az/admin/schedules/2/trigger # Or crawl a specific dispensary curl -s -X POST http://localhost:3099/api/az/admin/crawl/112Step 4: Restart Crawler Workers
# Restart scraper-worker pods (clears any stuck processes) kubectl rollout restart deployment/scraper-worker -n dispensary-scraper # Watch rollout progress kubectl rollout status deployment/scraper-worker -n dispensary-scraper # Optionally restart main scraper pod too kubectl rollout restart deployment/scraper -n dispensary-scraperStep 5: Monitor Recovery
# Watch worker logs kubectl logs -f deployment/scraper-worker -n dispensary-scraper --tail=50 # Check dashboard for product counts curl -s http://localhost:3099/api/az/dashboard | jq '{totalStores, totalProducts, storesByType}' # Verify jobs are processing curl -s http://localhost:3099/api/az/monitor/active-jobs | jq .Quick One-Liner for Full Reset:
# Reset stuck jobs and restart workers kubectl exec -n dispensary-scraper deployment/scraper -- psql $DATABASE_URL -c "UPDATE dispensary_crawl_jobs SET status='failed', completed_at=NOW() WHERE status='running' AND (last_heartbeat_at < NOW() - INTERVAL '30 minutes' OR last_heartbeat_at IS NULL);" && kubectl rollout restart deployment/scraper-worker -n dispensary-scraper && kubectl rollout status deployment/scraper-worker -n dispensary-scraperCleanup port-forwards when done:
pkill -f "port-forward.*dispensary-scraper" -
Frontend Architecture - AVOID OVER-ENGINEERING
Key Principles:
- ONE BACKEND serves ALL domains (cannaiq.co, findadispo.com, findagram.co)
- Do NOT create separate backend services for each domain
- The existing
dispensary-scraperbackend handles everything
Frontend Build Differences:
frontend/uses Vite (outputs todist/, usesVITE_env vars) → dispos.crawlsy.com (legacy)cannaiq/uses Vite (outputs todist/, usesVITE_env vars) → cannaiq.co (NEW)findadispo/uses Create React App (outputs tobuild/, usesREACT_APP_env vars) → findadispo.comfindagram/uses Create React App (outputs tobuild/, usesREACT_APP_env vars) → findagram.co
CRA vs Vite Dockerfile Differences:
# Vite (frontend, cannaiq) ENV VITE_API_URL=https://api.domain.com RUN npm run build COPY --from=builder /app/dist /usr/share/nginx/html # CRA (findadispo, findagram) ENV REACT_APP_API_URL=https://api.domain.com RUN npm run build COPY --from=builder /app/build /usr/share/nginx/htmllucide-react Icon Gotchas:
- Not all icons exist in older versions (e.g.,
Cannabisdoesn't exist) - Use
Leafas a substitute for cannabis-related icons - When doing search/replace for icon names, be careful not to replace text content
- Example: "Cannabis-infused food" should NOT become "Leaf-infused food"
Deployment Options:
- Separate containers (current): Each frontend in its own nginx container
- Single container (better): One nginx with multi-domain config serving all frontends
Single Container Multi-Domain Approach:
# Build all frontends FROM node:20-slim AS builder-cannaiq WORKDIR /app/cannaiq COPY cannaiq/package*.json ./ RUN npm install COPY cannaiq/ ./ RUN npm run build FROM node:20-slim AS builder-findadispo WORKDIR /app/findadispo COPY findadispo/package*.json ./ RUN npm install COPY findadispo/ ./ RUN npm run build FROM node:20-slim AS builder-findagram WORKDIR /app/findagram COPY findagram/package*.json ./ RUN npm install COPY findagram/ ./ RUN npm run build # Production nginx with multi-domain routing FROM nginx:alpine COPY --from=builder-cannaiq /app/cannaiq/dist /var/www/cannaiq COPY --from=builder-findadispo /app/findadispo/dist /var/www/findadispo COPY --from=builder-findagram /app/findagram/build /var/www/findagram COPY nginx-multi-domain.conf /etc/nginx/conf.d/default.confnginx-multi-domain.conf:
server { listen 80; server_name cannaiq.co www.cannaiq.co; root /var/www/cannaiq; location / { try_files $uri $uri/ /index.html; } } server { listen 80; server_name findadispo.com www.findadispo.com; root /var/www/findadispo; location / { try_files $uri $uri/ /index.html; } } server { listen 80; server_name findagram.co www.findagram.co; root /var/www/findagram; location / { try_files $uri $uri/ /index.html; } }Common Mistakes to AVOID:
- Creating a FastAPI/Express backend just for findagram or findadispo
- Creating separate Docker images per domain when one would work
- Replacing icon names with sed without checking for text content collisions
- Using
npm ciin Dockerfiles when package-lock.json doesn't exist (usenpm install)