Files
cannaiq/CLAUDE.md
Kelly e63329457c feat: Add local storage adapter and update CLAUDE.md with permanent rules
- Add local-storage.ts with smart folder structure:
  /storage/products/{brand}/{state}/{product_id}/
- Add storage-adapter.ts unified abstraction
- Add docker-compose.local.yml (NO MinIO)
- Add start-local.sh convenience script
- Update CLAUDE.md with:
  - PERMANENT RULES section (no data deletion)
  - DEPLOYMENT AUTHORIZATION requirements
  - LOCAL DEVELOPMENT defaults
  - STORAGE BEHAVIOR documentation
  - FORBIDDEN ACTIONS list
  - UI ANONYMIZATION rules

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-06 12:36:10 -07:00

30 KiB
Raw Blame History

Claude Guidelines for this Project


PERMANENT RULES (NEVER VIOLATE)

1. NO DELETION OF DATA — EVER

CannaiQ is a historical analytics system. Data retention is permanent by design.

NEVER delete:

  • Product records
  • Crawled snapshots
  • Images
  • Directories
  • Logs
  • Orchestrator traces
  • Profiles
  • Selector configs
  • Crawl outcomes
  • Store data
  • Brand data

NEVER automate cleanup:

  • No cron or scheduled job may rm, unlink, delete, purge, prune, clean, or reset any storage directory or DB row
  • No migration may DELETE data — only add/update/alter columns
  • If cleanup is required, ONLY the user may issue a manual command

Code enforcement:

  • local-storage.ts must only: write files, create directories, read files
  • No deleteImage, deleteProductImages, or similar functions

2. DEPLOYMENT AUTHORIZATION REQUIRED

NEVER deploy to production unless the user explicitly says:

"CLAUDE — DEPLOYMENT IS NOW AUTHORIZED."

Until then:

  • All work is LOCAL ONLY
  • No kubectl apply, docker push, or remote operations
  • No port-forwarding to production
  • No connecting to Kubernetes clusters

3. LOCAL DEVELOPMENT BY DEFAULT

In local mode:

  • Use docker-compose.local.yml (NO MinIO)
  • Use local filesystem storage at ./storage
  • Connect to local PostgreSQL at localhost:54320
  • Backend runs at localhost:3010
  • NO remote connections, NO Kubernetes, NO MinIO

Environment:

STORAGE_DRIVER=local
STORAGE_BASE_PATH=./storage
DATABASE_URL=postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus
# MINIO_ENDPOINT is NOT set (forces local storage)

STORAGE BEHAVIOR

Local Storage Structure

/storage/products/{brand}/{state}/{product_id}/
  image-{hash}.webp
  image-{hash}-medium.webp
  image-{hash}-thumb.webp

/storage/brands/{brand}/
  logo-{hash}.webp

Storage Adapter

import { saveImage, getImageUrl } from '../utils/storage-adapter';

// Automatically uses local storage when STORAGE_DRIVER=local

Files

File Purpose
backend/src/utils/local-storage.ts Local filesystem adapter
backend/src/utils/storage-adapter.ts Unified storage abstraction
docker-compose.local.yml Local stack without MinIO
start-local.sh Convenience startup script

FORBIDDEN ACTIONS

  1. Deleting any data (products, snapshots, images, logs, traces)
  2. Deploying without explicit authorization
  3. Connecting to Kubernetes without authorization
  4. Port-forwarding to production without authorization
  5. Starting MinIO in local development
  6. Using S3/MinIO SDKs when STORAGE_DRIVER=local
  7. Automating cleanup of any kind
  8. Dropping database tables or columns
  9. Overwriting historical records (always append snapshots)

UI ANONYMIZATION RULES

  • No vendor names in forward-facing URLs: use /api/az/..., /az, /az-schedule
  • No "dutchie", "treez", "jane", "weedmaps", "leafly" visible in consumer UIs
  • Internal admin tools may show provider names for debugging

FUTURE TODO / PENDING FEATURES

  • Orchestrator observability dashboard
  • Crawl profile management UI
  • State machine sandbox (disabled until authorized)
  • Multi-state expansion beyond AZ

Multi-Site Architecture (CRITICAL)

This project has 5 working locations - always clarify which one before making changes:

Folder Domain Type Purpose
backend/ (shared) Express API Single backend serving all frontends
frontend/ dispos.crawlsy.com React SPA (Vite) Legacy admin dashboard (internal use)
cannaiq/ cannaiq.co React SPA + PWA NEW admin dashboard / B2B analytics
findadispo/ findadispo.com React SPA + PWA Consumer dispensary finder
findagram/ findagram.co React SPA + PWA Consumer delivery marketplace

IMPORTANT: frontend/ vs cannaiq/ confusion:

  • frontend/ = OLD/legacy dashboard design, deployed to dispos.crawlsy.com (internal admin)
  • cannaiq/ = NEW dashboard design, deployed to cannaiq.co (customer-facing B2B)
  • These are DIFFERENT codebases - do NOT confuse them!

Before any frontend work, ASK: "Which site? cannaiq, findadispo, findagram, or legacy (frontend/)?"

All four frontends share:

  • Same backend API (port 3010)
  • Same PostgreSQL database
  • Same Kubernetes deployment for backend

Each frontend has:

  • Its own folder, package.json, Dockerfile
  • Its own domain and branding
  • Its own PWA manifest and service worker (cannaiq, findadispo, findagram)
  • Separate Docker containers in production

Multi-Domain Hosting Architecture

All three frontends are served from the same IP using host-based routing:

Kubernetes Ingress (Production):

# Each domain routes to its own frontend service
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: multi-site-ingress
spec:
  rules:
  - host: cannaiq.co
    http:
      paths:
      - path: /
        backend:
          service:
            name: cannaiq-frontend
            port: 80
      - path: /api
        backend:
          service:
            name: scraper  # shared backend
            port: 3010
  - host: findadispo.com
    http:
      paths:
      - path: /
        backend:
          service:
            name: findadispo-frontend
            port: 80
      - path: /api
        backend:
          service:
            name: scraper
            port: 3010
  - host: findagram.co
    http:
      paths:
      - path: /
        backend:
          service:
            name: findagram-frontend
            port: 80
      - path: /api
        backend:
          service:
            name: scraper
            port: 3010

Key Points:

  • DNS A records for all 3 domains point to same IP
  • Ingress controller routes based on Host header
  • Each frontend is a separate Docker container (nginx serving static files)
  • All frontends share the same backend API at /api/*
  • SSL/TLS handled at ingress level (cert-manager)

PWA Setup Requirements

Each frontend is a Progressive Web App (PWA). Required files in each public/ folder:

  1. manifest.json - App metadata, icons, theme colors
  2. service-worker.js - Offline caching, background sync
  3. Icons - 192x192 and 512x512 PNG icons

Vite PWA Plugin Setup (in each frontend's vite.config.ts):

import { VitePWA } from 'vite-plugin-pwa'

export default defineConfig({
  plugins: [
    react(),
    VitePWA({
      registerType: 'autoUpdate',
      manifest: {
        name: 'Site Name',
        short_name: 'Short',
        theme_color: '#10b981',
        icons: [
          { src: '/icon-192.png', sizes: '192x192', type: 'image/png' },
          { src: '/icon-512.png', sizes: '512x512', type: 'image/png' }
        ]
      },
      workbox: {
        globPatterns: ['**/*.{js,css,html,ico,png,svg,woff2}']
      }
    })
  ]
})

Core Rules Summary

  • DB: Use the single consolidated DB (CRAWLSY_DATABASE_URL → DATABASE_URL); no dual pools; schema_migrations must exist; apply migrations 031/032/033.
  • Images: No MinIO. Save to local /images/products//-.webp (and brands); preserve original URL; serve via backend static.
  • Dutchie GraphQL: Endpoint https://dutchie.com/api-3/graphql. Variables must use productsFilter.dispensaryId (platform_dispensary_id). Mode A: Status="Active". Mode B: Status=null/activeOnly:false. No dispensaryFilter.cNameOrID.
  • cName/slug: Derive cName from each store's menu_url (/embedded-menu/ or /dispensary/). No hardcoded defaults. Each location must have its own valid menu_url and platform_dispensary_id; do not reuse IDs across locations. If slug is invalid/missing, mark not crawlable and log; resolve ID before crawling.
  • Dual-mode always: useBothModes:true to get pricing (Mode A) + full coverage (Mode B).
  • Batch DB writes: Chunk products/snapshots/missing (100200) to avoid OOM.
  • OOS/missing: Include inactive/OOS in Mode B. Union A+B, dedupe by external_product_id+dispensary_id. Insert snapshots with stock_status; if absent from both modes, insert missing_from_feed. Do not filter OOS by default.
  • API/Frontend: Use /api/az/... endpoints (stores/products/brands/categories/summary/dashboard). Rebuild frontend with VITE_API_URL pointing to the backend.
  • Scheduling: Crawl only menu_type='dutchie' AND platform_dispensary_id IS NOT NULL. 4-hour crawl with jitter; detection job to set menu_type and resolve platform IDs.
  • Monitor: /scraper-monitor (and /az-schedule) should show active/recent jobs from job_run_logs/crawl_jobs, with auto-refresh.
  • No slug guessing: Never use defaults like "AZ-Deeply-Rooted." Always derive per store from menu_url and resolve platform IDs per location.

Detailed Rules

  1. Use the consolidated DB everywhere

    • Preferred env: CRAWLSY_DATABASE_URL (fallback DATABASE_URL).
    • Do NOT create dutchie tables in the legacy DB. Apply migrations 031/032/033 to the consolidated DB and restart.
  2. Dispensary vs Store

    • Dutchie pipeline uses dispensaries (not legacy stores). For dutchie crawls, always work with dispensary ID.
    • Ignore legacy fields like dutchie_plus_id and slug guessing. Use the record's menu_url and platform_dispensary_id.
  3. Menu detection and platform IDs

    • Set menu_type from menu_url detection; resolve platform_dispensary_id for menu_type='dutchie'.
    • Admin should have "refresh detection" and "resolve ID" actions; schedule/crawl only when menu_type='dutchie' AND platform_dispensary_id is set.
  4. Queries and mapping

    • The DB returns snake_case; code expects camelCase. Always alias/map:
      • platform_dispensary_id AS "platformDispensaryId"
      • Map via mapDbRowToDispensary when loading dispensaries (scheduler, crawler, admin crawl).
    • Avoid SELECT *; explicitly select and/or map fields.
  5. Scheduling

    • /scraper-schedule should accept filters/search (All vs AZ-only, name).
    • "Run Now"/scheduler must skip or warn if menu_type!='dutchie' or platform_dispensary_id missing.
    • Use dispensary_crawl_status view; show reason when not crawlable.
  6. Crawling

    • Trigger dutchie crawls by dispensary ID (e.g., /api/az/admin/crawl/:id or runDispensaryOrchestrator(id)).
    • Update existing products (by stable product ID), append snapshots for history (every 4h cadence), download images locally (/images/...), store local URLs.
    • Use dutchie GraphQL pipeline only for menu_type='dutchie'.
  7. Frontend

    • Forward-facing URLs: /api/az, /az, /az-schedule; no vendor names.
    • /scraper-schedule: add filters/search, keep as master view for all schedules; reflect platform ID/menu_type status and controls (resolve ID, run now, enable/disable/delete).
  8. No slug guessing

    • Do not guess slugs; use the DB record's menu_url and ID. Resolve platform ID from the URL/cName; if set, crawl directly by ID.
  9. Verify locally before pushing

    • Apply migrations, restart backend, ensure auth (users table) exists, run dutchie crawl for a known dispensary (e.g., Deeply Rooted), check /api/az/dashboard, /api/az/stores/:id/products, /az, /scraper-schedule.
  10. Image storage (no MinIO)

    • Save images to local filesystem only. Do not create or use MinIO in Docker.
    • Product images: /images/products/<dispensary_id>/<product_id>-<hash>.webp (+medium/+thumb).
    • Brand images: /images/brands/<brand_slug_or_sku>-<hash>.webp.
    • Store local URLs in DB fields (keep original URLs as fallback only).
    • Serve /images via backend static middleware.
  11. Dutchie GraphQL fetch rules

    • Endpoint: https://dutchie.com/api-3/graphql (NOT api-gw.dutchie.com which no longer exists).
    • Variables: Use productsFilter.dispensaryId = platform_dispensary_id (MongoDB ObjectId, e.g., 6405ef617056e8014d79101b).
    • Do NOT use dispensaryFilter.cNameOrID - that's outdated.
    • cName (e.g., AZ-Deeply-Rooted) is only for Referer/Origin headers and Puppeteer session bootstrapping.
    • Mode A: Status: "Active" - returns active products with pricing
    • Mode B: Status: null / activeOnly: false - returns all products including OOS/inactive
    • Example payload:
      {
        "operationName": "FilteredProducts",
        "variables": {
          "productsFilter": {
            "dispensaryId": "6405ef617056e8014d79101b",
            "pricingType": "rec",
            "Status": "Active"
          }
        },
        "extensions": {
          "persistedQuery": { "version": 1, "sha256Hash": "<hash>" }
        }
      }
      
    • Headers (server-side axios only): Chrome UA, Origin: https://dutchie.com, Referer: https://dutchie.com/embedded-menu/<cName>, accept: application/json, content-type: application/json.
    • If local DNS can't resolve, run fetch from an environment that can (K8s pod/remote host), not from browser.
    • Use server-side axios with embedded-menu headers; include CF/session cookie from Puppeteer if needed.
  12. Stop over-prep; run the crawl

    • To seed/refresh a store, run a one-off crawl by dispensary ID (example for Deeply Rooted):
      DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" \
      npx tsx -e "const { crawlDispensaryProducts } = require('./src/dutchie-az/services/product-crawler'); const d={id:112,name:'Deeply Rooted',platform:'dutchie',platformDispensaryId:'6405ef617056e8014d79101b',menuType:'dutchie'}; crawlDispensaryProducts(d,'rec',{useBothModes:true}).then(r=>{console.log(r);process.exit(0);}).catch(e=>{console.error(e);process.exit(1);});"
      
      If local DNS is blocked, run the same command inside the scraper pod via kubectl exec ... -- bash -lc '...'.
    • After crawl, verify counts via dutchie_products, dutchie_product_snapshots, and dispensaries.last_crawl_at. Do not inspect the legacy products table for Dutchie.
  13. Fetch troubleshooting

    • If 403 or empty data: log status + first GraphQL error; include cf_clearance/session cookie from Puppeteer; ensure headers match a real Chrome request; ensure variables use productsFilter.dispensaryId.
    • If DNS fails locally, do NOT debug DNS—run the fetch from an environment that resolves (K8s/remote) or via Puppeteer-captured headers/cookies. No browser/CORS attempts.
  14. Views and metrics

    • Keep v_brands/v_categories/v_brand_history based on dutchie_products and preserve brand_count metrics. Do not drop brand_count.
  15. Batch DB writes to avoid OOM

    • Do NOT build one giant upsert/insert payload for products/snapshots/missing marks.
    • Chunk arrays (e.g., 100200 items) and upsert/insert in a loop; drop references after each chunk.
    • Apply to products, product snapshots, and any "mark missing" logic to keep memory low during crawls.
  16. Use dual-mode crawls by default

    • Always run with useBothModes:true to combine:
      • Mode A (active feed with pricing/stock)
      • Mode B (max coverage including OOS/inactive)
    • Union/dedupe by product ID so you keep full coverage and pricing in one run.
    • If you only run Mode B, prices will be null; dual-mode fills pricing while retaining OOS items.
  17. Capture OOS and missing items

    • GraphQL variables must include inactive/OOS (Status: All / activeOnly:false). Mode B already returns OOS/inactive; union with Mode A to keep pricing.
    • After unioning Mode A/B, upsert products and insert snapshots with stock_status from the feed. If an existing product is absent from both Mode A and Mode B for the run, insert a snapshot with is_present_in_feed=false and stock_status='missing_from_feed'.
    • Do not filter out OOS/missing in the API; only filter when the user requests (e.g., stockStatus=in_stock). Expose stock_status/in_stock from the latest snapshot (fallback to product).
    • Verify with /api/az/stores/:id/products?stockStatus=out_of_stock and ?stockStatus=missing_from_feed.
  18. Menu discovery must crawl the website when menu_url is null

    • For dispensaries with no menu_url or unknown menu_type, crawl the dispensary.website (if present) to find provider links (dutchie, treez, jane, weedmaps, leafly, etc.). Follow “menu/order/shop” links up to a shallow depth with timeouts/rate limits.
    • If a provider link is found, set menu_url, set menu_type, and store detection metadata; if dutchie, derive cName from menu_url and resolve platform_dispensary_id; store resolved_at and detection details.
    • Do NOT mark a dispensary not_crawlable solely because menu_url is null; only mark not_crawlable if the website crawl fails to find a menu or returns 403/404/invalid. Log the reason in provider_detection_data and crawl_status_reason.
    • Keep this as the menu discovery job (separate from product crawls); log successes/errors to job_run_logs. Only schedule product crawls for stores with menu_type='dutchie' AND platform_dispensary_id IS NOT NULL.
  19. Preserve all stock statuses (including unknown)

    • Do not filter or drop stock_status values in API/UI; pass through whatever is stored on the latest snapshot/product. Expected values include: in_stock, out_of_stock (if exposed), missing_from_feed, unknown. Only apply filters when explicitly requested by the user.
  20. Never delete or overwrite historical data

    • Do not delete products/snapshots or overwrite historical records. Always append snapshots for changes (price/stock/qty), and mark missing_from_feed instead of removing records. Historical data must remain intact for analytics.
  21. Deployment via CI/CD only

    • Test locally, commit clean changes, and let CI/CD build and deploy to Kubernetes at code.cannabrands.app. Do NOT manually build/push images or tweak prod pods. Deploy backend first, smoke-test APIs, then frontend; roll back via CI/CD if needed.
  22. Per-location cName and platform_dispensary_id resolution

    • For each dispensary, menu_url and cName must be valid for that exact location; no hardcoded defaults and no sharing platform_dispensary_id across locations.
    • Derive cName from menu_url per store: /embedded-menu/<cName> or /dispensary/<cName>.
    • Resolve platform_dispensary_id from that cName using GraphQL GetAddressBasedDispensaryData.
    • If the slug is invalid/missing, mark the store not crawlable and log it; do not crawl with a mismatched cName/ID. Store the error in provider_detection_data.resolution_error.
    • Before crawling, validate that the cName from menu_url matches the resolved platform ID; if mismatched, re-resolve before proceeding.
  23. API endpoints (AZ pipeline)

    • Use /api/az/... endpoints: stores, products, brands, categories, summary, dashboard
    • Rebuild frontend with VITE_API_URL pointing to the backend
    • Dispensary Detail and analytics must use AZ endpoints
  24. Monitoring and logging

    • /scraper-monitor (and /az-schedule) should show active/recent jobs from job_run_logs/crawl_jobs
    • Auto-refresh every 30 seconds
    • System Logs page should show real log data, not just startup messages
  25. Dashboard Architecture - CRITICAL

    • Frontend: If you see old labels like "Active Proxies" or "Active Stores", it means the old dashboard bundle is being served. Rebuild the frontend with VITE_API_URL pointing to the correct backend and redeploy. Clear browser cache. Confirm new labels show up.
    • Backend: /api/dashboard/stats MUST use the consolidated DB (same pool as dutchie-az module). Use the correct tables: dutchie_products, dispensaries, and views like v_dashboard_stats, v_latest_snapshots. Do NOT use a separate legacy connection. Do NOT query az_products (doesn't exist) or legacy stores/products tables.
    • DB Connectivity: Use the proper DB host/role. Errors like role "dutchie" does not exist mean you're exec'ing into the wrong Postgres pod or using wrong credentials. Confirm the correct DATABASE_URL and test: kubectl exec deployment/scraper -n dispensary-scraper -- psql $DATABASE_URL -c '\dt'
    • After fixing: Dashboard should show real data (e.g., 777 products) instead of zeros. Do NOT revert to legacy tables; point dashboard queries to the consolidated DB/views.
    • Checklist:
      1. Rebuild/redeploy frontend with correct API URL, clear cache
      2. Fix /api/dashboard/* to use the consolidated DB pool and dutchie views/tables
      3. Test /api/dashboard/stats from the scraper pod; then reload the UI
  26. Deployment (Gitea + Kubernetes)

    • Registry: Gitea at code.cannabrands.app/creationshop/dispensary-scraper
    • Build and push (from backend directory):
      # Login to Gitea container registry
      docker login code.cannabrands.app
      
      # Build the image
      cd backend
      docker build -t code.cannabrands.app/creationshop/dispensary-scraper:latest .
      
      # Push to registry
      docker push code.cannabrands.app/creationshop/dispensary-scraper:latest
      
    • Deploy to Kubernetes:
      # Restart deployments to pull new image
      kubectl rollout restart deployment/scraper -n dispensary-scraper
      kubectl rollout restart deployment/scraper-worker -n dispensary-scraper
      
      # Watch rollout status
      kubectl rollout status deployment/scraper -n dispensary-scraper
      kubectl rollout status deployment/scraper-worker -n dispensary-scraper
      
    • Check pods:
      kubectl get pods -n dispensary-scraper
      kubectl logs -f deployment/scraper -n dispensary-scraper
      kubectl logs -f deployment/scraper-worker -n dispensary-scraper
      
    • K8s manifests are in /k8s/ folder (scraper.yaml, scraper-worker.yaml, etc.)
    • imagePullSecrets use regcred secret for Gitea registry auth
  27. Crawler Architecture

    • Scraper pod (1 replica): Runs the Express API server + scheduler. The scheduler enqueues detection and crawl jobs to the database queue (crawl_jobs table).
    • Scraper-worker pods (5 replicas): Each worker runs dist/dutchie-az/services/worker.js, polling the job queue and processing jobs.
    • Job types processed by workers:
      • menu_detection / menu_detection_single: Detect menu provider type and resolve platform_dispensary_id from menu_url
      • dutchie_product_crawl: Crawl products from Dutchie GraphQL API for dispensaries with valid platform IDs
    • Job schedules (managed in job_schedules table):
      • dutchie_az_menu_detection: Runs daily with 60-min jitter, detects menu type for dispensaries with unknown menu_type
      • dutchie_az_product_crawl: Runs every 4 hours with 30-min jitter, crawls products from all detected Dutchie dispensaries
    • Trigger schedules manually: curl -X POST /api/az/admin/schedules/{id}/trigger
    • Check schedule status: curl /api/az/admin/schedules
    • Worker logs: kubectl logs -f deployment/scraper-worker -n dispensary-scraper
  28. Crawler Maintenance Procedure (Check Jobs, Requeue, Restart) When crawlers are stuck or jobs aren't processing, follow this procedure:

    Step 1: Check Job Status

    # Port-forward to production
    kubectl port-forward -n dispensary-scraper deployment/scraper 3099:3010 &
    
    # Check active/stuck jobs
    curl -s http://localhost:3099/api/az/monitor/active-jobs | jq .
    
    # Check recent job history
    curl -s "http://localhost:3099/api/az/monitor/jobs?limit=20" | jq '.jobs[] | {id, job_type, status, dispensary_id, started_at, products_found, duration_min: (.duration_ms/60000 | floor)}'
    
    # Check schedule status
    curl -s http://localhost:3099/api/az/admin/schedules | jq '.schedules[] | {id, jobName, enabled, lastRunAt, lastStatus, nextRunAt}'
    

    Step 2: Reset Stuck Jobs Jobs are considered stuck if they have status='running' but no heartbeat in >30 minutes:

    # Via API (if endpoint exists)
    curl -s -X POST http://localhost:3099/api/az/admin/reset-stuck-jobs
    
    # Via direct DB (if API not available)
    kubectl exec -n dispensary-scraper deployment/scraper -- psql $DATABASE_URL -c "
      UPDATE dispensary_crawl_jobs
      SET status = 'failed',
          error_message = 'Job timed out - worker stopped sending heartbeats',
          completed_at = NOW()
      WHERE status = 'running'
        AND (last_heartbeat_at < NOW() - INTERVAL '30 minutes' OR last_heartbeat_at IS NULL);
    "
    

    Step 3: Requeue Jobs (Trigger Fresh Crawl)

    # Trigger product crawl schedule (typically ID 1)
    curl -s -X POST http://localhost:3099/api/az/admin/schedules/1/trigger
    
    # Trigger menu detection schedule (typically ID 2)
    curl -s -X POST http://localhost:3099/api/az/admin/schedules/2/trigger
    
    # Or crawl a specific dispensary
    curl -s -X POST http://localhost:3099/api/az/admin/crawl/112
    

    Step 4: Restart Crawler Workers

    # Restart scraper-worker pods (clears any stuck processes)
    kubectl rollout restart deployment/scraper-worker -n dispensary-scraper
    
    # Watch rollout progress
    kubectl rollout status deployment/scraper-worker -n dispensary-scraper
    
    # Optionally restart main scraper pod too
    kubectl rollout restart deployment/scraper -n dispensary-scraper
    

    Step 5: Monitor Recovery

    # Watch worker logs
    kubectl logs -f deployment/scraper-worker -n dispensary-scraper --tail=50
    
    # Check dashboard for product counts
    curl -s http://localhost:3099/api/az/dashboard | jq '{totalStores, totalProducts, storesByType}'
    
    # Verify jobs are processing
    curl -s http://localhost:3099/api/az/monitor/active-jobs | jq .
    

    Quick One-Liner for Full Reset:

    # Reset stuck jobs and restart workers
    kubectl exec -n dispensary-scraper deployment/scraper -- psql $DATABASE_URL -c "UPDATE dispensary_crawl_jobs SET status='failed', completed_at=NOW() WHERE status='running' AND (last_heartbeat_at < NOW() - INTERVAL '30 minutes' OR last_heartbeat_at IS NULL);" && kubectl rollout restart deployment/scraper-worker -n dispensary-scraper && kubectl rollout status deployment/scraper-worker -n dispensary-scraper
    

    Cleanup port-forwards when done:

    pkill -f "port-forward.*dispensary-scraper"
    
  29. Frontend Architecture - AVOID OVER-ENGINEERING

    Key Principles:

    • ONE BACKEND serves ALL domains (cannaiq.co, findadispo.com, findagram.co)
    • Do NOT create separate backend services for each domain
    • The existing dispensary-scraper backend handles everything

    Frontend Build Differences:

    • frontend/ uses Vite (outputs to dist/, uses VITE_ env vars) → dispos.crawlsy.com (legacy)
    • cannaiq/ uses Vite (outputs to dist/, uses VITE_ env vars) → cannaiq.co (NEW)
    • findadispo/ uses Create React App (outputs to build/, uses REACT_APP_ env vars) → findadispo.com
    • findagram/ uses Create React App (outputs to build/, uses REACT_APP_ env vars) → findagram.co

    CRA vs Vite Dockerfile Differences:

    # Vite (frontend, cannaiq)
    ENV VITE_API_URL=https://api.domain.com
    RUN npm run build
    COPY --from=builder /app/dist /usr/share/nginx/html
    
    # CRA (findadispo, findagram)
    ENV REACT_APP_API_URL=https://api.domain.com
    RUN npm run build
    COPY --from=builder /app/build /usr/share/nginx/html
    

    lucide-react Icon Gotchas:

    • Not all icons exist in older versions (e.g., Cannabis doesn't exist)
    • Use Leaf as a substitute for cannabis-related icons
    • When doing search/replace for icon names, be careful not to replace text content
    • Example: "Cannabis-infused food" should NOT become "Leaf-infused food"

    Deployment Options:

    1. Separate containers (current): Each frontend in its own nginx container
    2. Single container (better): One nginx with multi-domain config serving all frontends

    Single Container Multi-Domain Approach:

    # Build all frontends
    FROM node:20-slim AS builder-cannaiq
    WORKDIR /app/cannaiq
    COPY cannaiq/package*.json ./
    RUN npm install
    COPY cannaiq/ ./
    RUN npm run build
    
    FROM node:20-slim AS builder-findadispo
    WORKDIR /app/findadispo
    COPY findadispo/package*.json ./
    RUN npm install
    COPY findadispo/ ./
    RUN npm run build
    
    FROM node:20-slim AS builder-findagram
    WORKDIR /app/findagram
    COPY findagram/package*.json ./
    RUN npm install
    COPY findagram/ ./
    RUN npm run build
    
    # Production nginx with multi-domain routing
    FROM nginx:alpine
    COPY --from=builder-cannaiq /app/cannaiq/dist /var/www/cannaiq
    COPY --from=builder-findadispo /app/findadispo/dist /var/www/findadispo
    COPY --from=builder-findagram /app/findagram/build /var/www/findagram
    COPY nginx-multi-domain.conf /etc/nginx/conf.d/default.conf
    

    nginx-multi-domain.conf:

    server {
        listen 80;
        server_name cannaiq.co www.cannaiq.co;
        root /var/www/cannaiq;
        location / { try_files $uri $uri/ /index.html; }
    }
    
    server {
        listen 80;
        server_name findadispo.com www.findadispo.com;
        root /var/www/findadispo;
        location / { try_files $uri $uri/ /index.html; }
    }
    
    server {
        listen 80;
        server_name findagram.co www.findagram.co;
        root /var/www/findagram;
        location / { try_files $uri $uri/ /index.html; }
    }
    

    Common Mistakes to AVOID:

    • Creating a FastAPI/Express backend just for findagram or findadispo
    • Creating separate Docker images per domain when one would work
    • Replacing icon names with sed without checking for text content collisions
    • Using npm ci in Dockerfiles when package-lock.json doesn't exist (use npm install)