Files
cannaiq/CLAUDE.md
Kelly 1d1263afc6 docs: Add mandatory local mode checklist for crawls and tests
CLAUDE.md now requires explicit local mode confirmation before
running any crawler, orchestrator, sandbox test, or image scrape.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-06 12:37:48 -07:00

669 lines
30 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
## Claude Guidelines for this Project
---
## PERMANENT RULES (NEVER VIOLATE)
### 1. NO DELETION OF DATA — EVER
CannaiQ is a **historical analytics system**. Data retention is **permanent by design**.
**NEVER delete:**
- Product records
- Crawled snapshots
- Images
- Directories
- Logs
- Orchestrator traces
- Profiles
- Selector configs
- Crawl outcomes
- Store data
- Brand data
**NEVER automate cleanup:**
- No cron or scheduled job may `rm`, `unlink`, `delete`, `purge`, `prune`, `clean`, or `reset` any storage directory or DB row
- No migration may DELETE data — only add/update/alter columns
- If cleanup is required, ONLY the user may issue a manual command
**Code enforcement:**
- `local-storage.ts` must only: write files, create directories, read files
- No `deleteImage`, `deleteProductImages`, or similar functions
### 2. DEPLOYMENT AUTHORIZATION REQUIRED
**NEVER deploy to production unless the user explicitly says:**
> "CLAUDE — DEPLOYMENT IS NOW AUTHORIZED."
Until then:
- All work is LOCAL ONLY
- No `kubectl apply`, `docker push`, or remote operations
- No port-forwarding to production
- No connecting to Kubernetes clusters
### 3. LOCAL DEVELOPMENT BY DEFAULT
**In local mode:**
- Use `docker-compose.local.yml` (NO MinIO)
- Use local filesystem storage at `./storage`
- Connect to local PostgreSQL at `localhost:54320`
- Backend runs at `localhost:3010`
- NO remote connections, NO Kubernetes, NO MinIO
**Environment:**
```bash
STORAGE_DRIVER=local
STORAGE_BASE_PATH=./storage
DATABASE_URL=postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus
# MINIO_ENDPOINT is NOT set (forces local storage)
```
### 4. MANDATORY LOCAL MODE FOR ALL CRAWLS AND TESTS
**Before running ANY of the following, CONFIRM local mode is active:**
- Crawler execution
- Orchestrator flows
- Sandbox tests
- Image scrape tests
- Module import tests
**Pre-execution checklist:**
1.`./start-local.sh` or `docker-compose -f docker-compose.local.yml up` running
2.`STORAGE_DRIVER=local`
3.`STORAGE_BASE_PATH=./storage`
4. ✅ NO MinIO, NO S3
5. ✅ NO port-forward
6. ✅ NO Kubernetes connection
7. ✅ Storage writes go to `/storage/products/{brand}/{state}/{product_id}/`
**If any condition is not met, DO NOT proceed with the crawl or test.**
---
## STORAGE BEHAVIOR
### Local Storage Structure
```
/storage/products/{brand}/{state}/{product_id}/
image-{hash}.webp
image-{hash}-medium.webp
image-{hash}-thumb.webp
/storage/brands/{brand}/
logo-{hash}.webp
```
### Storage Adapter
```typescript
import { saveImage, getImageUrl } from '../utils/storage-adapter';
// Automatically uses local storage when STORAGE_DRIVER=local
```
### Files
| File | Purpose |
|------|---------|
| `backend/src/utils/local-storage.ts` | Local filesystem adapter |
| `backend/src/utils/storage-adapter.ts` | Unified storage abstraction |
| `docker-compose.local.yml` | Local stack without MinIO |
| `start-local.sh` | Convenience startup script |
---
## FORBIDDEN ACTIONS
1. **Deleting any data** (products, snapshots, images, logs, traces)
2. **Deploying without explicit authorization**
3. **Connecting to Kubernetes** without authorization
4. **Port-forwarding to production** without authorization
5. **Starting MinIO** in local development
6. **Using S3/MinIO SDKs** when `STORAGE_DRIVER=local`
7. **Automating cleanup** of any kind
8. **Dropping database tables or columns**
9. **Overwriting historical records** (always append snapshots)
---
## UI ANONYMIZATION RULES
- No vendor names in forward-facing URLs: use `/api/az/...`, `/az`, `/az-schedule`
- No "dutchie", "treez", "jane", "weedmaps", "leafly" visible in consumer UIs
- Internal admin tools may show provider names for debugging
---
## FUTURE TODO / PENDING FEATURES
- [ ] Orchestrator observability dashboard
- [ ] Crawl profile management UI
- [ ] State machine sandbox (disabled until authorized)
- [ ] Multi-state expansion beyond AZ
---
### Multi-Site Architecture (CRITICAL)
This project has **5 working locations** - always clarify which one before making changes:
| Folder | Domain | Type | Purpose |
|--------|--------|------|---------|
| `backend/` | (shared) | Express API | Single backend serving all frontends |
| `frontend/` | dispos.crawlsy.com | React SPA (Vite) | Legacy admin dashboard (internal use) |
| `cannaiq/` | cannaiq.co | React SPA + PWA | NEW admin dashboard / B2B analytics |
| `findadispo/` | findadispo.com | React SPA + PWA | Consumer dispensary finder |
| `findagram/` | findagram.co | React SPA + PWA | Consumer delivery marketplace |
**IMPORTANT: `frontend/` vs `cannaiq/` confusion:**
- `frontend/` = OLD/legacy dashboard design, deployed to `dispos.crawlsy.com` (internal admin)
- `cannaiq/` = NEW dashboard design, deployed to `cannaiq.co` (customer-facing B2B)
- These are DIFFERENT codebases - do NOT confuse them!
**Before any frontend work, ASK: "Which site? cannaiq, findadispo, findagram, or legacy (frontend/)?"**
All four frontends share:
- Same backend API (port 3010)
- Same PostgreSQL database
- Same Kubernetes deployment for backend
Each frontend has:
- Its own folder, package.json, Dockerfile
- Its own domain and branding
- Its own PWA manifest and service worker (cannaiq, findadispo, findagram)
- Separate Docker containers in production
---
### Multi-Domain Hosting Architecture
All three frontends are served from the **same IP** using **host-based routing**:
**Kubernetes Ingress (Production):**
```yaml
# Each domain routes to its own frontend service
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: multi-site-ingress
spec:
rules:
- host: cannaiq.co
http:
paths:
- path: /
backend:
service:
name: cannaiq-frontend
port: 80
- path: /api
backend:
service:
name: scraper # shared backend
port: 3010
- host: findadispo.com
http:
paths:
- path: /
backend:
service:
name: findadispo-frontend
port: 80
- path: /api
backend:
service:
name: scraper
port: 3010
- host: findagram.co
http:
paths:
- path: /
backend:
service:
name: findagram-frontend
port: 80
- path: /api
backend:
service:
name: scraper
port: 3010
```
**Key Points:**
- DNS A records for all 3 domains point to same IP
- Ingress controller routes based on `Host` header
- Each frontend is a separate Docker container (nginx serving static files)
- All frontends share the same backend API at `/api/*`
- SSL/TLS handled at ingress level (cert-manager)
---
### PWA Setup Requirements
Each frontend is a **Progressive Web App (PWA)**. Required files in each `public/` folder:
1. **manifest.json** - App metadata, icons, theme colors
2. **service-worker.js** - Offline caching, background sync
3. **Icons** - 192x192 and 512x512 PNG icons
**Vite PWA Plugin Setup** (in each frontend's vite.config.ts):
```typescript
import { VitePWA } from 'vite-plugin-pwa'
export default defineConfig({
plugins: [
react(),
VitePWA({
registerType: 'autoUpdate',
manifest: {
name: 'Site Name',
short_name: 'Short',
theme_color: '#10b981',
icons: [
{ src: '/icon-192.png', sizes: '192x192', type: 'image/png' },
{ src: '/icon-512.png', sizes: '512x512', type: 'image/png' }
]
},
workbox: {
globPatterns: ['**/*.{js,css,html,ico,png,svg,woff2}']
}
})
]
})
```
---
### Core Rules Summary
- **DB**: Use the single consolidated DB (CRAWLSY_DATABASE_URL → DATABASE_URL); no dual pools; schema_migrations must exist; apply migrations 031/032/033.
- **Images**: No MinIO. Save to local /images/products/<disp>/<prod>-<hash>.webp (and brands); preserve original URL; serve via backend static.
- **Dutchie GraphQL**: Endpoint https://dutchie.com/api-3/graphql. Variables must use productsFilter.dispensaryId (platform_dispensary_id). Mode A: Status="Active". Mode B: Status=null/activeOnly:false. No dispensaryFilter.cNameOrID.
- **cName/slug**: Derive cName from each store's menu_url (/embedded-menu/<cName> or /dispensary/<slug>). No hardcoded defaults. Each location must have its own valid menu_url and platform_dispensary_id; do not reuse IDs across locations. If slug is invalid/missing, mark not crawlable and log; resolve ID before crawling.
- **Dual-mode always**: useBothModes:true to get pricing (Mode A) + full coverage (Mode B).
- **Batch DB writes**: Chunk products/snapshots/missing (100200) to avoid OOM.
- **OOS/missing**: Include inactive/OOS in Mode B. Union A+B, dedupe by external_product_id+dispensary_id. Insert snapshots with stock_status; if absent from both modes, insert missing_from_feed. Do not filter OOS by default.
- **API/Frontend**: Use /api/az/... endpoints (stores/products/brands/categories/summary/dashboard). Rebuild frontend with VITE_API_URL pointing to the backend.
- **Scheduling**: Crawl only menu_type='dutchie' AND platform_dispensary_id IS NOT NULL. 4-hour crawl with jitter; detection job to set menu_type and resolve platform IDs.
- **Monitor**: /scraper-monitor (and /az-schedule) should show active/recent jobs from job_run_logs/crawl_jobs, with auto-refresh.
- **No slug guessing**: Never use defaults like "AZ-Deeply-Rooted." Always derive per store from menu_url and resolve platform IDs per location.
---
### Detailed Rules
1) **Use the consolidated DB everywhere**
- Preferred env: `CRAWLSY_DATABASE_URL` (fallback `DATABASE_URL`).
- Do NOT create dutchie tables in the legacy DB. Apply migrations 031/032/033 to the consolidated DB and restart.
2) **Dispensary vs Store**
- Dutchie pipeline uses `dispensaries` (not legacy `stores`). For dutchie crawls, always work with dispensary ID.
- Ignore legacy fields like `dutchie_plus_id` and slug guessing. Use the record's `menu_url` and `platform_dispensary_id`.
3) **Menu detection and platform IDs**
- Set `menu_type` from `menu_url` detection; resolve `platform_dispensary_id` for `menu_type='dutchie'`.
- Admin should have "refresh detection" and "resolve ID" actions; schedule/crawl only when `menu_type='dutchie'` AND `platform_dispensary_id` is set.
4) **Queries and mapping**
- The DB returns snake_case; code expects camelCase. Always alias/map:
- `platform_dispensary_id AS "platformDispensaryId"`
- Map via `mapDbRowToDispensary` when loading dispensaries (scheduler, crawler, admin crawl).
- Avoid `SELECT *`; explicitly select and/or map fields.
5) **Scheduling**
- `/scraper-schedule` should accept filters/search (All vs AZ-only, name).
- "Run Now"/scheduler must skip or warn if `menu_type!='dutchie'` or `platform_dispensary_id` missing.
- Use `dispensary_crawl_status` view; show reason when not crawlable.
6) **Crawling**
- Trigger dutchie crawls by dispensary ID (e.g., `/api/az/admin/crawl/:id` or `runDispensaryOrchestrator(id)`).
- Update existing products (by stable product ID), append snapshots for history (every 4h cadence), download images locally (`/images/...`), store local URLs.
- Use dutchie GraphQL pipeline only for `menu_type='dutchie'`.
7) **Frontend**
- Forward-facing URLs: `/api/az`, `/az`, `/az-schedule`; no vendor names.
- `/scraper-schedule`: add filters/search, keep as master view for all schedules; reflect platform ID/menu_type status and controls (resolve ID, run now, enable/disable/delete).
8) **No slug guessing**
- Do not guess slugs; use the DB record's `menu_url` and ID. Resolve platform ID from the URL/cName; if set, crawl directly by ID.
9) **Verify locally before pushing**
- Apply migrations, restart backend, ensure auth (`users` table) exists, run dutchie crawl for a known dispensary (e.g., Deeply Rooted), check `/api/az/dashboard`, `/api/az/stores/:id/products`, `/az`, `/scraper-schedule`.
10) **Image storage (no MinIO)**
- Save images to local filesystem only. Do not create or use MinIO in Docker.
- Product images: `/images/products/<dispensary_id>/<product_id>-<hash>.webp` (+medium/+thumb).
- Brand images: `/images/brands/<brand_slug_or_sku>-<hash>.webp`.
- Store local URLs in DB fields (keep original URLs as fallback only).
- Serve `/images` via backend static middleware.
11) **Dutchie GraphQL fetch rules**
- **Endpoint**: `https://dutchie.com/api-3/graphql` (NOT `api-gw.dutchie.com` which no longer exists).
- **Variables**: Use `productsFilter.dispensaryId` = `platform_dispensary_id` (MongoDB ObjectId, e.g., `6405ef617056e8014d79101b`).
- Do NOT use `dispensaryFilter.cNameOrID` - that's outdated.
- `cName` (e.g., `AZ-Deeply-Rooted`) is only for Referer/Origin headers and Puppeteer session bootstrapping.
- **Mode A**: `Status: "Active"` - returns active products with pricing
- **Mode B**: `Status: null` / `activeOnly: false` - returns all products including OOS/inactive
- **Example payload**:
```json
{
"operationName": "FilteredProducts",
"variables": {
"productsFilter": {
"dispensaryId": "6405ef617056e8014d79101b",
"pricingType": "rec",
"Status": "Active"
}
},
"extensions": {
"persistedQuery": { "version": 1, "sha256Hash": "<hash>" }
}
}
```
- **Headers** (server-side axios only): Chrome UA, `Origin: https://dutchie.com`, `Referer: https://dutchie.com/embedded-menu/<cName>`, `accept: application/json`, `content-type: application/json`.
- If local DNS can't resolve, run fetch from an environment that can (K8s pod/remote host), not from browser.
- Use server-side axios with embedded-menu headers; include CF/session cookie from Puppeteer if needed.
12) **Stop over-prep; run the crawl**
- To seed/refresh a store, run a one-off crawl by dispensary ID (example for Deeply Rooted):
```
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" \
npx tsx -e "const { crawlDispensaryProducts } = require('./src/dutchie-az/services/product-crawler'); const d={id:112,name:'Deeply Rooted',platform:'dutchie',platformDispensaryId:'6405ef617056e8014d79101b',menuType:'dutchie'}; crawlDispensaryProducts(d,'rec',{useBothModes:true}).then(r=>{console.log(r);process.exit(0);}).catch(e=>{console.error(e);process.exit(1);});"
```
If local DNS is blocked, run the same command inside the scraper pod via `kubectl exec ... -- bash -lc '...'`.
- After crawl, verify counts via `dutchie_products`, `dutchie_product_snapshots`, and `dispensaries.last_crawl_at`. Do not inspect the legacy `products` table for Dutchie.
13) **Fetch troubleshooting**
- If 403 or empty data: log status + first GraphQL error; include cf_clearance/session cookie from Puppeteer; ensure headers match a real Chrome request; ensure variables use `productsFilter.dispensaryId`.
- If DNS fails locally, do NOT debug DNS—run the fetch from an environment that resolves (K8s/remote) or via Puppeteer-captured headers/cookies. No browser/CORS attempts.
14) **Views and metrics**
- Keep v_brands/v_categories/v_brand_history based on `dutchie_products` and preserve brand_count metrics. Do not drop brand_count.
15) **Batch DB writes to avoid OOM**
- Do NOT build one giant upsert/insert payload for products/snapshots/missing marks.
- Chunk arrays (e.g., 100200 items) and upsert/insert in a loop; drop references after each chunk.
- Apply to products, product snapshots, and any "mark missing" logic to keep memory low during crawls.
16) **Use dual-mode crawls by default**
- Always run with `useBothModes:true` to combine:
- Mode A (active feed with pricing/stock)
- Mode B (max coverage including OOS/inactive)
- Union/dedupe by product ID so you keep full coverage and pricing in one run.
- If you only run Mode B, prices will be null; dual-mode fills pricing while retaining OOS items.
17) **Capture OOS and missing items**
- GraphQL variables must include inactive/OOS (Status: All / activeOnly:false). Mode B already returns OOS/inactive; union with Mode A to keep pricing.
- After unioning Mode A/B, upsert products and insert snapshots with stock_status from the feed. If an existing product is absent from both Mode A and Mode B for the run, insert a snapshot with is_present_in_feed=false and stock_status='missing_from_feed'.
- Do not filter out OOS/missing in the API; only filter when the user requests (e.g., stockStatus=in_stock). Expose stock_status/in_stock from the latest snapshot (fallback to product).
- Verify with `/api/az/stores/:id/products?stockStatus=out_of_stock` and `?stockStatus=missing_from_feed`.
18) **Menu discovery must crawl the website when menu_url is null**
- For dispensaries with no menu_url or unknown menu_type, crawl the dispensary.website (if present) to find provider links (dutchie, treez, jane, weedmaps, leafly, etc.). Follow “menu/order/shop” links up to a shallow depth with timeouts/rate limits.
- If a provider link is found, set menu_url, set menu_type, and store detection metadata; if dutchie, derive cName from menu_url and resolve platform_dispensary_id; store resolved_at and detection details.
- Do NOT mark a dispensary not_crawlable solely because menu_url is null; only mark not_crawlable if the website crawl fails to find a menu or returns 403/404/invalid. Log the reason in provider_detection_data and crawl_status_reason.
- Keep this as the menu discovery job (separate from product crawls); log successes/errors to job_run_logs. Only schedule product crawls for stores with menu_type='dutchie' AND platform_dispensary_id IS NOT NULL.
19) **Preserve all stock statuses (including unknown)**
- Do not filter or drop stock_status values in API/UI; pass through whatever is stored on the latest snapshot/product. Expected values include: in_stock, out_of_stock (if exposed), missing_from_feed, unknown. Only apply filters when explicitly requested by the user.
20) **Never delete or overwrite historical data**
- Do not delete products/snapshots or overwrite historical records. Always append snapshots for changes (price/stock/qty), and mark missing_from_feed instead of removing records. Historical data must remain intact for analytics.
21) **Deployment via CI/CD only**
- Test locally, commit clean changes, and let CI/CD build and deploy to Kubernetes at code.cannabrands.app. Do NOT manually build/push images or tweak prod pods. Deploy backend first, smoke-test APIs, then frontend; roll back via CI/CD if needed.
18) **Per-location cName and platform_dispensary_id resolution**
- For each dispensary, menu_url and cName must be valid for that exact location; no hardcoded defaults and no sharing platform_dispensary_id across locations.
- Derive cName from menu_url per store: `/embedded-menu/<cName>` or `/dispensary/<cName>`.
- Resolve platform_dispensary_id from that cName using GraphQL GetAddressBasedDispensaryData.
- If the slug is invalid/missing, mark the store not crawlable and log it; do not crawl with a mismatched cName/ID. Store the error in `provider_detection_data.resolution_error`.
- Before crawling, validate that the cName from menu_url matches the resolved platform ID; if mismatched, re-resolve before proceeding.
19) **API endpoints (AZ pipeline)**
- Use /api/az/... endpoints: stores, products, brands, categories, summary, dashboard
- Rebuild frontend with VITE_API_URL pointing to the backend
- Dispensary Detail and analytics must use AZ endpoints
20) **Monitoring and logging**
- /scraper-monitor (and /az-schedule) should show active/recent jobs from job_run_logs/crawl_jobs
- Auto-refresh every 30 seconds
- System Logs page should show real log data, not just startup messages
21) **Dashboard Architecture - CRITICAL**
- **Frontend**: If you see old labels like "Active Proxies" or "Active Stores", it means the old dashboard bundle is being served. Rebuild the frontend with `VITE_API_URL` pointing to the correct backend and redeploy. Clear browser cache. Confirm new labels show up.
- **Backend**: `/api/dashboard/stats` MUST use the consolidated DB (same pool as dutchie-az module). Use the correct tables: `dutchie_products`, `dispensaries`, and views like `v_dashboard_stats`, `v_latest_snapshots`. Do NOT use a separate legacy connection. Do NOT query `az_products` (doesn't exist) or legacy `stores`/`products` tables.
- **DB Connectivity**: Use the proper DB host/role. Errors like `role "dutchie" does not exist` mean you're exec'ing into the wrong Postgres pod or using wrong credentials. Confirm the correct `DATABASE_URL` and test: `kubectl exec deployment/scraper -n dispensary-scraper -- psql $DATABASE_URL -c '\dt'`
- **After fixing**: Dashboard should show real data (e.g., 777 products) instead of zeros. Do NOT revert to legacy tables; point dashboard queries to the consolidated DB/views.
- **Checklist**:
1. Rebuild/redeploy frontend with correct API URL, clear cache
2. Fix `/api/dashboard/*` to use the consolidated DB pool and dutchie views/tables
3. Test `/api/dashboard/stats` from the scraper pod; then reload the UI
22) **Deployment (Gitea + Kubernetes)**
- **Registry**: Gitea at `code.cannabrands.app/creationshop/dispensary-scraper`
- **Build and push** (from backend directory):
```bash
# Login to Gitea container registry
docker login code.cannabrands.app
# Build the image
cd backend
docker build -t code.cannabrands.app/creationshop/dispensary-scraper:latest .
# Push to registry
docker push code.cannabrands.app/creationshop/dispensary-scraper:latest
```
- **Deploy to Kubernetes**:
```bash
# Restart deployments to pull new image
kubectl rollout restart deployment/scraper -n dispensary-scraper
kubectl rollout restart deployment/scraper-worker -n dispensary-scraper
# Watch rollout status
kubectl rollout status deployment/scraper -n dispensary-scraper
kubectl rollout status deployment/scraper-worker -n dispensary-scraper
```
- **Check pods**:
```bash
kubectl get pods -n dispensary-scraper
kubectl logs -f deployment/scraper -n dispensary-scraper
kubectl logs -f deployment/scraper-worker -n dispensary-scraper
```
- K8s manifests are in `/k8s/` folder (scraper.yaml, scraper-worker.yaml, etc.)
- imagePullSecrets use `regcred` secret for Gitea registry auth
23) **Crawler Architecture**
- **Scraper pod (1 replica)**: Runs the Express API server + scheduler. The scheduler enqueues detection and crawl jobs to the database queue (`crawl_jobs` table).
- **Scraper-worker pods (5 replicas)**: Each worker runs `dist/dutchie-az/services/worker.js`, polling the job queue and processing jobs.
- **Job types processed by workers**:
- `menu_detection` / `menu_detection_single`: Detect menu provider type and resolve platform_dispensary_id from menu_url
- `dutchie_product_crawl`: Crawl products from Dutchie GraphQL API for dispensaries with valid platform IDs
- **Job schedules** (managed in `job_schedules` table):
- `dutchie_az_menu_detection`: Runs daily with 60-min jitter, detects menu type for dispensaries with unknown menu_type
- `dutchie_az_product_crawl`: Runs every 4 hours with 30-min jitter, crawls products from all detected Dutchie dispensaries
- **Trigger schedules manually**: `curl -X POST /api/az/admin/schedules/{id}/trigger`
- **Check schedule status**: `curl /api/az/admin/schedules`
- **Worker logs**: `kubectl logs -f deployment/scraper-worker -n dispensary-scraper`
24) **Crawler Maintenance Procedure (Check Jobs, Requeue, Restart)**
When crawlers are stuck or jobs aren't processing, follow this procedure:
**Step 1: Check Job Status**
```bash
# Port-forward to production
kubectl port-forward -n dispensary-scraper deployment/scraper 3099:3010 &
# Check active/stuck jobs
curl -s http://localhost:3099/api/az/monitor/active-jobs | jq .
# Check recent job history
curl -s "http://localhost:3099/api/az/monitor/jobs?limit=20" | jq '.jobs[] | {id, job_type, status, dispensary_id, started_at, products_found, duration_min: (.duration_ms/60000 | floor)}'
# Check schedule status
curl -s http://localhost:3099/api/az/admin/schedules | jq '.schedules[] | {id, jobName, enabled, lastRunAt, lastStatus, nextRunAt}'
```
**Step 2: Reset Stuck Jobs**
Jobs are considered stuck if they have `status='running'` but no heartbeat in >30 minutes:
```bash
# Via API (if endpoint exists)
curl -s -X POST http://localhost:3099/api/az/admin/reset-stuck-jobs
# Via direct DB (if API not available)
kubectl exec -n dispensary-scraper deployment/scraper -- psql $DATABASE_URL -c "
UPDATE dispensary_crawl_jobs
SET status = 'failed',
error_message = 'Job timed out - worker stopped sending heartbeats',
completed_at = NOW()
WHERE status = 'running'
AND (last_heartbeat_at < NOW() - INTERVAL '30 minutes' OR last_heartbeat_at IS NULL);
"
```
**Step 3: Requeue Jobs (Trigger Fresh Crawl)**
```bash
# Trigger product crawl schedule (typically ID 1)
curl -s -X POST http://localhost:3099/api/az/admin/schedules/1/trigger
# Trigger menu detection schedule (typically ID 2)
curl -s -X POST http://localhost:3099/api/az/admin/schedules/2/trigger
# Or crawl a specific dispensary
curl -s -X POST http://localhost:3099/api/az/admin/crawl/112
```
**Step 4: Restart Crawler Workers**
```bash
# Restart scraper-worker pods (clears any stuck processes)
kubectl rollout restart deployment/scraper-worker -n dispensary-scraper
# Watch rollout progress
kubectl rollout status deployment/scraper-worker -n dispensary-scraper
# Optionally restart main scraper pod too
kubectl rollout restart deployment/scraper -n dispensary-scraper
```
**Step 5: Monitor Recovery**
```bash
# Watch worker logs
kubectl logs -f deployment/scraper-worker -n dispensary-scraper --tail=50
# Check dashboard for product counts
curl -s http://localhost:3099/api/az/dashboard | jq '{totalStores, totalProducts, storesByType}'
# Verify jobs are processing
curl -s http://localhost:3099/api/az/monitor/active-jobs | jq .
```
**Quick One-Liner for Full Reset:**
```bash
# Reset stuck jobs and restart workers
kubectl exec -n dispensary-scraper deployment/scraper -- psql $DATABASE_URL -c "UPDATE dispensary_crawl_jobs SET status='failed', completed_at=NOW() WHERE status='running' AND (last_heartbeat_at < NOW() - INTERVAL '30 minutes' OR last_heartbeat_at IS NULL);" && kubectl rollout restart deployment/scraper-worker -n dispensary-scraper && kubectl rollout status deployment/scraper-worker -n dispensary-scraper
```
**Cleanup port-forwards when done:**
```bash
pkill -f "port-forward.*dispensary-scraper"
```
25) **Frontend Architecture - AVOID OVER-ENGINEERING**
**Key Principles:**
- **ONE BACKEND** serves ALL domains (cannaiq.co, findadispo.com, findagram.co)
- Do NOT create separate backend services for each domain
- The existing `dispensary-scraper` backend handles everything
**Frontend Build Differences:**
- `frontend/` uses **Vite** (outputs to `dist/`, uses `VITE_` env vars) → dispos.crawlsy.com (legacy)
- `cannaiq/` uses **Vite** (outputs to `dist/`, uses `VITE_` env vars) → cannaiq.co (NEW)
- `findadispo/` uses **Create React App** (outputs to `build/`, uses `REACT_APP_` env vars) → findadispo.com
- `findagram/` uses **Create React App** (outputs to `build/`, uses `REACT_APP_` env vars) → findagram.co
**CRA vs Vite Dockerfile Differences:**
```dockerfile
# Vite (frontend, cannaiq)
ENV VITE_API_URL=https://api.domain.com
RUN npm run build
COPY --from=builder /app/dist /usr/share/nginx/html
# CRA (findadispo, findagram)
ENV REACT_APP_API_URL=https://api.domain.com
RUN npm run build
COPY --from=builder /app/build /usr/share/nginx/html
```
**lucide-react Icon Gotchas:**
- Not all icons exist in older versions (e.g., `Cannabis` doesn't exist)
- Use `Leaf` as a substitute for cannabis-related icons
- When doing search/replace for icon names, be careful not to replace text content
- Example: "Cannabis-infused food" should NOT become "Leaf-infused food"
**Deployment Options:**
1. **Separate containers** (current): Each frontend in its own nginx container
2. **Single container** (better): One nginx with multi-domain config serving all frontends
**Single Container Multi-Domain Approach:**
```dockerfile
# Build all frontends
FROM node:20-slim AS builder-cannaiq
WORKDIR /app/cannaiq
COPY cannaiq/package*.json ./
RUN npm install
COPY cannaiq/ ./
RUN npm run build
FROM node:20-slim AS builder-findadispo
WORKDIR /app/findadispo
COPY findadispo/package*.json ./
RUN npm install
COPY findadispo/ ./
RUN npm run build
FROM node:20-slim AS builder-findagram
WORKDIR /app/findagram
COPY findagram/package*.json ./
RUN npm install
COPY findagram/ ./
RUN npm run build
# Production nginx with multi-domain routing
FROM nginx:alpine
COPY --from=builder-cannaiq /app/cannaiq/dist /var/www/cannaiq
COPY --from=builder-findadispo /app/findadispo/dist /var/www/findadispo
COPY --from=builder-findagram /app/findagram/build /var/www/findagram
COPY nginx-multi-domain.conf /etc/nginx/conf.d/default.conf
```
**nginx-multi-domain.conf:**
```nginx
server {
listen 80;
server_name cannaiq.co www.cannaiq.co;
root /var/www/cannaiq;
location / { try_files $uri $uri/ /index.html; }
}
server {
listen 80;
server_name findadispo.com www.findadispo.com;
root /var/www/findadispo;
location / { try_files $uri $uri/ /index.html; }
}
server {
listen 80;
server_name findagram.co www.findagram.co;
root /var/www/findagram;
location / { try_files $uri $uri/ /index.html; }
}
```
**Common Mistakes to AVOID:**
- Creating a FastAPI/Express backend just for findagram or findadispo
- Creating separate Docker images per domain when one would work
- Replacing icon names with sed without checking for text content collisions
- Using `npm ci` in Dockerfiles when package-lock.json doesn't exist (use `npm install`)