cannaiq/CLAUDE.md at e63329457c9f49b0b3d7f58a11e02f0c953d6976

Files

Kelly e63329457c feat: Add local storage adapter and update CLAUDE.md with permanent rules

- Add local-storage.ts with smart folder structure:
  /storage/products/{brand}/{state}/{product_id}/
- Add storage-adapter.ts unified abstraction
- Add docker-compose.local.yml (NO MinIO)
- Add start-local.sh convenience script
- Update CLAUDE.md with:
  - PERMANENT RULES section (no data deletion)
  - DEPLOYMENT AUTHORIZATION requirements
  - LOCAL DEVELOPMENT defaults
  - STORAGE BEHAVIOR documentation
  - FORBIDDEN ACTIONS list
  - UI ANONYMIZATION rules

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-06 12:36:10 -07:00

30 KiB

Raw Blame History

Claude Guidelines for this Project

PERMANENT RULES (NEVER VIOLATE)

1. NO DELETION OF DATA — EVER

CannaiQ is a historical analytics system. Data retention is permanent by design.

NEVER delete:

Product records
Crawled snapshots
Images
Directories
Logs
Orchestrator traces
Profiles
Selector configs
Crawl outcomes
Store data
Brand data

NEVER automate cleanup:

No cron or scheduled job may rm, unlink, delete, purge, prune, clean, or reset any storage directory or DB row
No migration may DELETE data — only add/update/alter columns
If cleanup is required, ONLY the user may issue a manual command

Code enforcement:

local-storage.ts must only: write files, create directories, read files
No deleteImage, deleteProductImages, or similar functions

2. DEPLOYMENT AUTHORIZATION REQUIRED

NEVER deploy to production unless the user explicitly says:

"CLAUDE — DEPLOYMENT IS NOW AUTHORIZED."

Until then:

All work is LOCAL ONLY
No kubectl apply, docker push, or remote operations
No port-forwarding to production
No connecting to Kubernetes clusters

3. LOCAL DEVELOPMENT BY DEFAULT

In local mode:

Use docker-compose.local.yml (NO MinIO)
Use local filesystem storage at ./storage
Connect to local PostgreSQL at localhost:54320
Backend runs at localhost:3010
NO remote connections, NO Kubernetes, NO MinIO

Environment:

STORAGE_DRIVER=local
STORAGE_BASE_PATH=./storage
DATABASE_URL=postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus
# MINIO_ENDPOINT is NOT set (forces local storage)

STORAGE BEHAVIOR

Local Storage Structure

/storage/products/{brand}/{state}/{product_id}/
  image-{hash}.webp
  image-{hash}-medium.webp
  image-{hash}-thumb.webp

/storage/brands/{brand}/
  logo-{hash}.webp

Storage Adapter

import { saveImage, getImageUrl } from '../utils/storage-adapter';

// Automatically uses local storage when STORAGE_DRIVER=local

Files

File	Purpose
`backend/src/utils/local-storage.ts`	Local filesystem adapter
`backend/src/utils/storage-adapter.ts`	Unified storage abstraction
`docker-compose.local.yml`	Local stack without MinIO
`start-local.sh`	Convenience startup script

FORBIDDEN ACTIONS

Deleting any data (products, snapshots, images, logs, traces)
Deploying without explicit authorization
Connecting to Kubernetes without authorization
Port-forwarding to production without authorization
Starting MinIO in local development
Using S3/MinIO SDKs when STORAGE_DRIVER=local
Automating cleanup of any kind
Dropping database tables or columns
Overwriting historical records (always append snapshots)

UI ANONYMIZATION RULES

No vendor names in forward-facing URLs: use /api/az/..., /az, /az-schedule
No "dutchie", "treez", "jane", "weedmaps", "leafly" visible in consumer UIs
Internal admin tools may show provider names for debugging

FUTURE TODO / PENDING FEATURES

Orchestrator observability dashboard
Crawl profile management UI
State machine sandbox (disabled until authorized)
Multi-state expansion beyond AZ

Multi-Site Architecture (CRITICAL)

This project has 5 working locations - always clarify which one before making changes:

Folder	Domain	Type	Purpose
`backend/`	(shared)	Express API	Single backend serving all frontends
`frontend/`	dispos.crawlsy.com	React SPA (Vite)	Legacy admin dashboard (internal use)
`cannaiq/`	cannaiq.co	React SPA + PWA	NEW admin dashboard / B2B analytics
`findadispo/`	findadispo.com	React SPA + PWA	Consumer dispensary finder
`findagram/`	findagram.co	React SPA + PWA	Consumer delivery marketplace

IMPORTANT: frontend/ vs cannaiq/ confusion:

frontend/ = OLD/legacy dashboard design, deployed to dispos.crawlsy.com (internal admin)
cannaiq/ = NEW dashboard design, deployed to cannaiq.co (customer-facing B2B)
These are DIFFERENT codebases - do NOT confuse them!

Before any frontend work, ASK: "Which site? cannaiq, findadispo, findagram, or legacy (frontend/)?"

All four frontends share:

Same backend API (port 3010)
Same PostgreSQL database
Same Kubernetes deployment for backend

Each frontend has:

Its own folder, package.json, Dockerfile
Its own domain and branding
Its own PWA manifest and service worker (cannaiq, findadispo, findagram)
Separate Docker containers in production

Multi-Domain Hosting Architecture

All three frontends are served from the same IP using host-based routing:

Kubernetes Ingress (Production):

# Each domain routes to its own frontend service
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: multi-site-ingress
spec:
  rules:
  - host: cannaiq.co
    http:
      paths:
      - path: /
        backend:
          service:
            name: cannaiq-frontend
            port: 80
      - path: /api
        backend:
          service:
            name: scraper  # shared backend
            port: 3010
  - host: findadispo.com
    http:
      paths:
      - path: /
        backend:
          service:
            name: findadispo-frontend
            port: 80
      - path: /api
        backend:
          service:
            name: scraper
            port: 3010
  - host: findagram.co
    http:
      paths:
      - path: /
        backend:
          service:
            name: findagram-frontend
            port: 80
      - path: /api
        backend:
          service:
            name: scraper
            port: 3010

Key Points:

DNS A records for all 3 domains point to same IP
Ingress controller routes based on Host header
Each frontend is a separate Docker container (nginx serving static files)
All frontends share the same backend API at /api/*
SSL/TLS handled at ingress level (cert-manager)

PWA Setup Requirements

Each frontend is a Progressive Web App (PWA). Required files in each public/ folder:

manifest.json - App metadata, icons, theme colors
service-worker.js - Offline caching, background sync
Icons - 192x192 and 512x512 PNG icons

Vite PWA Plugin Setup (in each frontend's vite.config.ts):

import { VitePWA } from 'vite-plugin-pwa'

export default defineConfig({
  plugins: [
    react(),
    VitePWA({
      registerType: 'autoUpdate',
      manifest: {
        name: 'Site Name',
        short_name: 'Short',
        theme_color: '#10b981',
        icons: [
          { src: '/icon-192.png', sizes: '192x192', type: 'image/png' },
          { src: '/icon-512.png', sizes: '512x512', type: 'image/png' }
        ]
      },
      workbox: {
        globPatterns: ['**/*.{js,css,html,ico,png,svg,woff2}']
      }
    })
  ]
})

Core Rules Summary

DB: Use the single consolidated DB (CRAWLSY_DATABASE_URL → DATABASE_URL); no dual pools; schema_migrations must exist; apply migrations 031/032/033.
Images: No MinIO. Save to local /images/products//-.webp (and brands); preserve original URL; serve via backend static.
Dutchie GraphQL: Endpoint https://dutchie.com/api-3/graphql. Variables must use productsFilter.dispensaryId (platform_dispensary_id). Mode A: Status="Active". Mode B: Status=null/activeOnly:false. No dispensaryFilter.cNameOrID.
cName/slug: Derive cName from each store's menu_url (/embedded-menu/ or /dispensary/). No hardcoded defaults. Each location must have its own valid menu_url and platform_dispensary_id; do not reuse IDs across locations. If slug is invalid/missing, mark not crawlable and log; resolve ID before crawling.
Dual-mode always: useBothModes:true to get pricing (Mode A) + full coverage (Mode B).
Batch DB writes: Chunk products/snapshots/missing (100–200) to avoid OOM.
OOS/missing: Include inactive/OOS in Mode B. Union A+B, dedupe by external_product_id+dispensary_id. Insert snapshots with stock_status; if absent from both modes, insert missing_from_feed. Do not filter OOS by default.
API/Frontend: Use /api/az/... endpoints (stores/products/brands/categories/summary/dashboard). Rebuild frontend with VITE_API_URL pointing to the backend.
Scheduling: Crawl only menu_type='dutchie' AND platform_dispensary_id IS NOT NULL. 4-hour crawl with jitter; detection job to set menu_type and resolve platform IDs.
Monitor: /scraper-monitor (and /az-schedule) should show active/recent jobs from job_run_logs/crawl_jobs, with auto-refresh.
No slug guessing: Never use defaults like "AZ-Deeply-Rooted." Always derive per store from menu_url and resolve platform IDs per location.

Detailed Rules

Use the consolidated DB everywhere
- Preferred env: CRAWLSY_DATABASE_URL (fallback DATABASE_URL).
- Do NOT create dutchie tables in the legacy DB. Apply migrations 031/032/033 to the consolidated DB and restart.
Dispensary vs Store
- Dutchie pipeline uses dispensaries (not legacy stores). For dutchie crawls, always work with dispensary ID.
- Ignore legacy fields like dutchie_plus_id and slug guessing. Use the record's menu_url and platform_dispensary_id.
Menu detection and platform IDs
- Set menu_type from menu_url detection; resolve platform_dispensary_id for menu_type='dutchie'.
- Admin should have "refresh detection" and "resolve ID" actions; schedule/crawl only when menu_type='dutchie' AND platform_dispensary_id is set.
Queries and mapping
- The DB returns snake_case; code expects camelCase. Always alias/map:
  - platform_dispensary_id AS "platformDispensaryId"
  - Map via mapDbRowToDispensary when loading dispensaries (scheduler, crawler, admin crawl).
- Avoid SELECT *; explicitly select and/or map fields.
Scheduling
- /scraper-schedule should accept filters/search (All vs AZ-only, name).
- "Run Now"/scheduler must skip or warn if menu_type!='dutchie' or platform_dispensary_id missing.
- Use dispensary_crawl_status view; show reason when not crawlable.
Crawling
- Trigger dutchie crawls by dispensary ID (e.g., /api/az/admin/crawl/:id or runDispensaryOrchestrator(id)).
- Update existing products (by stable product ID), append snapshots for history (every 4h cadence), download images locally (/images/...), store local URLs.
- Use dutchie GraphQL pipeline only for menu_type='dutchie'.
Frontend
- Forward-facing URLs: /api/az, /az, /az-schedule; no vendor names.
- /scraper-schedule: add filters/search, keep as master view for all schedules; reflect platform ID/menu_type status and controls (resolve ID, run now, enable/disable/delete).
No slug guessing
- Do not guess slugs; use the DB record's menu_url and ID. Resolve platform ID from the URL/cName; if set, crawl directly by ID.
Verify locally before pushing
- Apply migrations, restart backend, ensure auth (users table) exists, run dutchie crawl for a known dispensary (e.g., Deeply Rooted), check /api/az/dashboard, /api/az/stores/:id/products, /az, /scraper-schedule.
Image storage (no MinIO)
- Save images to local filesystem only. Do not create or use MinIO in Docker.
- Product images: /images/products/<dispensary_id>/<product_id>-<hash>.webp (+medium/+thumb).
- Brand images: /images/brands/<brand_slug_or_sku>-<hash>.webp.
- Store local URLs in DB fields (keep original URLs as fallback only).
- Serve /images via backend static middleware.
Dutchie GraphQL fetch rules
- Endpoint: https://dutchie.com/api-3/graphql (NOT api-gw.dutchie.com which no longer exists).
- Variables: Use productsFilter.dispensaryId = platform_dispensary_id (MongoDB ObjectId, e.g., 6405ef617056e8014d79101b).
- Do NOT use dispensaryFilter.cNameOrID - that's outdated.
- cName (e.g., AZ-Deeply-Rooted) is only for Referer/Origin headers and Puppeteer session bootstrapping.
- Mode A: Status: "Active" - returns active products with pricing
- Mode B: Status: null / activeOnly: false - returns all products including OOS/inactive
- Example payload:
```
{
  "operationName": "FilteredProducts",
  "variables": {
    "productsFilter": {
      "dispensaryId": "6405ef617056e8014d79101b",
      "pricingType": "rec",
      "Status": "Active"
    }
  },
  "extensions": {
    "persistedQuery": { "version": 1, "sha256Hash": "<hash>" }
  }
}
```
- Headers (server-side axios only): Chrome UA, Origin: https://dutchie.com, Referer: https://dutchie.com/embedded-menu/<cName>, accept: application/json, content-type: application/json.
- If local DNS can't resolve, run fetch from an environment that can (K8s pod/remote host), not from browser.
- Use server-side axios with embedded-menu headers; include CF/session cookie from Puppeteer if needed.

Stop over-prep; run the crawl

To seed/refresh a store, run a one-off crawl by dispensary ID (example for Deeply Rooted):

DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" \
npx tsx -e "const { crawlDispensaryProducts } = require('./src/dutchie-az/services/product-crawler'); const d={id:112,name:'Deeply Rooted',platform:'dutchie',platformDispensaryId:'6405ef617056e8014d79101b',menuType:'dutchie'}; crawlDispensaryProducts(d,'rec',{useBothModes:true}).then(r=>{console.log(r);process.exit(0);}).catch(e=>{console.error(e);process.exit(1);});"

If local DNS is blocked, run the same command inside the scraper pod via kubectl exec ... -- bash -lc '...'.

After crawl, verify counts via dutchie_products, dutchie_product_snapshots, and dispensaries.last_crawl_at. Do not inspect the legacy products table for Dutchie.

Fetch troubleshooting
- If 403 or empty data: log status + first GraphQL error; include cf_clearance/session cookie from Puppeteer; ensure headers match a real Chrome request; ensure variables use productsFilter.dispensaryId.
- If DNS fails locally, do NOT debug DNS—run the fetch from an environment that resolves (K8s/remote) or via Puppeteer-captured headers/cookies. No browser/CORS attempts.
Views and metrics
- Keep v_brands/v_categories/v_brand_history based on dutchie_products and preserve brand_count metrics. Do not drop brand_count.
Batch DB writes to avoid OOM
- Do NOT build one giant upsert/insert payload for products/snapshots/missing marks.
- Chunk arrays (e.g., 100–200 items) and upsert/insert in a loop; drop references after each chunk.
- Apply to products, product snapshots, and any "mark missing" logic to keep memory low during crawls.
Use dual-mode crawls by default
- Always run with useBothModes:true to combine:
  - Mode A (active feed with pricing/stock)
  - Mode B (max coverage including OOS/inactive)
- Union/dedupe by product ID so you keep full coverage and pricing in one run.
- If you only run Mode B, prices will be null; dual-mode fills pricing while retaining OOS items.
Capture OOS and missing items
- GraphQL variables must include inactive/OOS (Status: All / activeOnly:false). Mode B already returns OOS/inactive; union with Mode A to keep pricing.
- After unioning Mode A/B, upsert products and insert snapshots with stock_status from the feed. If an existing product is absent from both Mode A and Mode B for the run, insert a snapshot with is_present_in_feed=false and stock_status='missing_from_feed'.
- Do not filter out OOS/missing in the API; only filter when the user requests (e.g., stockStatus=in_stock). Expose stock_status/in_stock from the latest snapshot (fallback to product).
- Verify with /api/az/stores/:id/products?stockStatus=out_of_stock and ?stockStatus=missing_from_feed.
Menu discovery must crawl the website when menu_url is null
- For dispensaries with no menu_url or unknown menu_type, crawl the dispensary.website (if present) to find provider links (dutchie, treez, jane, weedmaps, leafly, etc.). Follow “menu/order/shop” links up to a shallow depth with timeouts/rate limits.
- If a provider link is found, set menu_url, set menu_type, and store detection metadata; if dutchie, derive cName from menu_url and resolve platform_dispensary_id; store resolved_at and detection details.
- Do NOT mark a dispensary not_crawlable solely because menu_url is null; only mark not_crawlable if the website crawl fails to find a menu or returns 403/404/invalid. Log the reason in provider_detection_data and crawl_status_reason.
- Keep this as the menu discovery job (separate from product crawls); log successes/errors to job_run_logs. Only schedule product crawls for stores with menu_type='dutchie' AND platform_dispensary_id IS NOT NULL.
Preserve all stock statuses (including unknown)
- Do not filter or drop stock_status values in API/UI; pass through whatever is stored on the latest snapshot/product. Expected values include: in_stock, out_of_stock (if exposed), missing_from_feed, unknown. Only apply filters when explicitly requested by the user.
Never delete or overwrite historical data
- Do not delete products/snapshots or overwrite historical records. Always append snapshots for changes (price/stock/qty), and mark missing_from_feed instead of removing records. Historical data must remain intact for analytics.
Deployment via CI/CD only
- Test locally, commit clean changes, and let CI/CD build and deploy to Kubernetes at code.cannabrands.app. Do NOT manually build/push images or tweak prod pods. Deploy backend first, smoke-test APIs, then frontend; roll back via CI/CD if needed.
Per-location cName and platform_dispensary_id resolution
- For each dispensary, menu_url and cName must be valid for that exact location; no hardcoded defaults and no sharing platform_dispensary_id across locations.
- Derive cName from menu_url per store: /embedded-menu/<cName> or /dispensary/<cName>.
- Resolve platform_dispensary_id from that cName using GraphQL GetAddressBasedDispensaryData.
- If the slug is invalid/missing, mark the store not crawlable and log it; do not crawl with a mismatched cName/ID. Store the error in provider_detection_data.resolution_error.
- Before crawling, validate that the cName from menu_url matches the resolved platform ID; if mismatched, re-resolve before proceeding.
API endpoints (AZ pipeline)
- Use /api/az/... endpoints: stores, products, brands, categories, summary, dashboard
- Rebuild frontend with VITE_API_URL pointing to the backend
- Dispensary Detail and analytics must use AZ endpoints
Monitoring and logging
- /scraper-monitor (and /az-schedule) should show active/recent jobs from job_run_logs/crawl_jobs
- Auto-refresh every 30 seconds
- System Logs page should show real log data, not just startup messages
Dashboard Architecture - CRITICAL
- Frontend: If you see old labels like "Active Proxies" or "Active Stores", it means the old dashboard bundle is being served. Rebuild the frontend with VITE_API_URL pointing to the correct backend and redeploy. Clear browser cache. Confirm new labels show up.
- Backend: /api/dashboard/stats MUST use the consolidated DB (same pool as dutchie-az module). Use the correct tables: dutchie_products, dispensaries, and views like v_dashboard_stats, v_latest_snapshots. Do NOT use a separate legacy connection. Do NOT query az_products (doesn't exist) or legacy stores/products tables.
- DB Connectivity: Use the proper DB host/role. Errors like role "dutchie" does not exist mean you're exec'ing into the wrong Postgres pod or using wrong credentials. Confirm the correct DATABASE_URL and test: kubectl exec deployment/scraper -n dispensary-scraper -- psql $DATABASE_URL -c '\dt'
- After fixing: Dashboard should show real data (e.g., 777 products) instead of zeros. Do NOT revert to legacy tables; point dashboard queries to the consolidated DB/views.
- Checklist:
  1. Rebuild/redeploy frontend with correct API URL, clear cache
  2. Fix /api/dashboard/* to use the consolidated DB pool and dutchie views/tables
  3. Test /api/dashboard/stats from the scraper pod; then reload the UI

Deployment (Gitea + Kubernetes)

Registry: Gitea at code.cannabrands.app/creationshop/dispensary-scraper

Build and push (from backend directory):

# Login to Gitea container registry
docker login code.cannabrands.app

# Build the image
cd backend
docker build -t code.cannabrands.app/creationshop/dispensary-scraper:latest .

# Push to registry
docker push code.cannabrands.app/creationshop/dispensary-scraper:latest

Deploy to Kubernetes:

# Restart deployments to pull new image
kubectl rollout restart deployment/scraper -n dispensary-scraper
kubectl rollout restart deployment/scraper-worker -n dispensary-scraper

# Watch rollout status
kubectl rollout status deployment/scraper -n dispensary-scraper
kubectl rollout status deployment/scraper-worker -n dispensary-scraper

Check pods:

kubectl get pods -n dispensary-scraper
kubectl logs -f deployment/scraper -n dispensary-scraper
kubectl logs -f deployment/scraper-worker -n dispensary-scraper

K8s manifests are in /k8s/ folder (scraper.yaml, scraper-worker.yaml, etc.)
imagePullSecrets use regcred secret for Gitea registry auth

Crawler Architecture
- Scraper pod (1 replica): Runs the Express API server + scheduler. The scheduler enqueues detection and crawl jobs to the database queue (crawl_jobs table).
- Scraper-worker pods (5 replicas): Each worker runs dist/dutchie-az/services/worker.js, polling the job queue and processing jobs.
- Job types processed by workers:
  - menu_detection / menu_detection_single: Detect menu provider type and resolve platform_dispensary_id from menu_url
  - dutchie_product_crawl: Crawl products from Dutchie GraphQL API for dispensaries with valid platform IDs
- Job schedules (managed in job_schedules table):
  - dutchie_az_menu_detection: Runs daily with 60-min jitter, detects menu type for dispensaries with unknown menu_type
  - dutchie_az_product_crawl: Runs every 4 hours with 30-min jitter, crawls products from all detected Dutchie dispensaries
- Trigger schedules manually: curl -X POST /api/az/admin/schedules/{id}/trigger
- Check schedule status: curl /api/az/admin/schedules
- Worker logs: kubectl logs -f deployment/scraper-worker -n dispensary-scraper

Crawler Maintenance Procedure (Check Jobs, Requeue, Restart) When crawlers are stuck or jobs aren't processing, follow this procedure:

Step 1: Check Job Status

# Port-forward to production
kubectl port-forward -n dispensary-scraper deployment/scraper 3099:3010 &

# Check active/stuck jobs
curl -s http://localhost:3099/api/az/monitor/active-jobs | jq .

# Check recent job history
curl -s "http://localhost:3099/api/az/monitor/jobs?limit=20" | jq '.jobs[] | {id, job_type, status, dispensary_id, started_at, products_found, duration_min: (.duration_ms/60000 | floor)}'

# Check schedule status
curl -s http://localhost:3099/api/az/admin/schedules | jq '.schedules[] | {id, jobName, enabled, lastRunAt, lastStatus, nextRunAt}'

Step 2: Reset Stuck Jobs Jobs are considered stuck if they have status='running' but no heartbeat in >30 minutes:

# Via API (if endpoint exists)
curl -s -X POST http://localhost:3099/api/az/admin/reset-stuck-jobs

# Via direct DB (if API not available)
kubectl exec -n dispensary-scraper deployment/scraper -- psql $DATABASE_URL -c "
  UPDATE dispensary_crawl_jobs
  SET status = 'failed',
      error_message = 'Job timed out - worker stopped sending heartbeats',
      completed_at = NOW()
  WHERE status = 'running'
    AND (last_heartbeat_at < NOW() - INTERVAL '30 minutes' OR last_heartbeat_at IS NULL);
"

Step 3: Requeue Jobs (Trigger Fresh Crawl)

# Trigger product crawl schedule (typically ID 1)
curl -s -X POST http://localhost:3099/api/az/admin/schedules/1/trigger

# Trigger menu detection schedule (typically ID 2)
curl -s -X POST http://localhost:3099/api/az/admin/schedules/2/trigger

# Or crawl a specific dispensary
curl -s -X POST http://localhost:3099/api/az/admin/crawl/112

Step 4: Restart Crawler Workers

# Restart scraper-worker pods (clears any stuck processes)
kubectl rollout restart deployment/scraper-worker -n dispensary-scraper

# Watch rollout progress
kubectl rollout status deployment/scraper-worker -n dispensary-scraper

# Optionally restart main scraper pod too
kubectl rollout restart deployment/scraper -n dispensary-scraper

Step 5: Monitor Recovery

# Watch worker logs
kubectl logs -f deployment/scraper-worker -n dispensary-scraper --tail=50

# Check dashboard for product counts
curl -s http://localhost:3099/api/az/dashboard | jq '{totalStores, totalProducts, storesByType}'

# Verify jobs are processing
curl -s http://localhost:3099/api/az/monitor/active-jobs | jq .

Quick One-Liner for Full Reset:

# Reset stuck jobs and restart workers
kubectl exec -n dispensary-scraper deployment/scraper -- psql $DATABASE_URL -c "UPDATE dispensary_crawl_jobs SET status='failed', completed_at=NOW() WHERE status='running' AND (last_heartbeat_at < NOW() - INTERVAL '30 minutes' OR last_heartbeat_at IS NULL);" && kubectl rollout restart deployment/scraper-worker -n dispensary-scraper && kubectl rollout status deployment/scraper-worker -n dispensary-scraper

Cleanup port-forwards when done:

pkill -f "port-forward.*dispensary-scraper"

Frontend Architecture - AVOID OVER-ENGINEERING

Key Principles:

ONE BACKEND serves ALL domains (cannaiq.co, findadispo.com, findagram.co)
Do NOT create separate backend services for each domain
The existing dispensary-scraper backend handles everything

Frontend Build Differences:

frontend/ uses Vite (outputs to dist/, uses VITE_ env vars) → dispos.crawlsy.com (legacy)
cannaiq/ uses Vite (outputs to dist/, uses VITE_ env vars) → cannaiq.co (NEW)
findadispo/ uses Create React App (outputs to build/, uses REACT_APP_ env vars) → findadispo.com
findagram/ uses Create React App (outputs to build/, uses REACT_APP_ env vars) → findagram.co

CRA vs Vite Dockerfile Differences:

# Vite (frontend, cannaiq)
ENV VITE_API_URL=https://api.domain.com
RUN npm run build
COPY --from=builder /app/dist /usr/share/nginx/html

# CRA (findadispo, findagram)
ENV REACT_APP_API_URL=https://api.domain.com
RUN npm run build
COPY --from=builder /app/build /usr/share/nginx/html

lucide-react Icon Gotchas:

Not all icons exist in older versions (e.g., Cannabis doesn't exist)
Use Leaf as a substitute for cannabis-related icons
When doing search/replace for icon names, be careful not to replace text content
Example: "Cannabis-infused food" should NOT become "Leaf-infused food"

Deployment Options:

Separate containers (current): Each frontend in its own nginx container
Single container (better): One nginx with multi-domain config serving all frontends

Single Container Multi-Domain Approach:

# Build all frontends
FROM node:20-slim AS builder-cannaiq
WORKDIR /app/cannaiq
COPY cannaiq/package*.json ./
RUN npm install
COPY cannaiq/ ./
RUN npm run build

FROM node:20-slim AS builder-findadispo
WORKDIR /app/findadispo
COPY findadispo/package*.json ./
RUN npm install
COPY findadispo/ ./
RUN npm run build

FROM node:20-slim AS builder-findagram
WORKDIR /app/findagram
COPY findagram/package*.json ./
RUN npm install
COPY findagram/ ./
RUN npm run build

# Production nginx with multi-domain routing
FROM nginx:alpine
COPY --from=builder-cannaiq /app/cannaiq/dist /var/www/cannaiq
COPY --from=builder-findadispo /app/findadispo/dist /var/www/findadispo
COPY --from=builder-findagram /app/findagram/build /var/www/findagram
COPY nginx-multi-domain.conf /etc/nginx/conf.d/default.conf

nginx-multi-domain.conf:

server {
    listen 80;
    server_name cannaiq.co www.cannaiq.co;
    root /var/www/cannaiq;
    location / { try_files $uri $uri/ /index.html; }
}

server {
    listen 80;
    server_name findadispo.com www.findadispo.com;
    root /var/www/findadispo;
    location / { try_files $uri $uri/ /index.html; }
}

server {
    listen 80;
    server_name findagram.co www.findagram.co;
    root /var/www/findagram;
    location / { try_files $uri $uri/ /index.html; }
}

Common Mistakes to AVOID:

Creating a FastAPI/Express backend just for findagram or findadispo
Creating separate Docker images per domain when one would work
Replacing icon names with sed without checking for text content collisions
Using npm ci in Dockerfiles when package-lock.json doesn't exist (use npm install)

30 KiB Raw Blame History Unescape Escape