Add CLAUDE guidelines for consolidated pipeline

2025-12-02 13:28:23 -07:00
parent 9219d8a77a
commit 04b5c3bd09
32 changed files with 4485 additions and 169 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,40 @@
+## Claude Guidelines for this Project
+
+1) **Use the consolidated DB everywhere**
+   - Preferred env: `CRAWLSY_DATABASE_URL` (fallback `DATABASE_URL`).
+   - Do NOT create dutchie tables in the legacy DB. Apply migrations 031/032/033 to the consolidated DB and restart.
+
+2) **Dispensary vs Store**
+   - Dutchie pipeline uses `dispensaries` (not legacy `stores`). For dutchie crawls, always work with dispensary ID.
+   - Ignore legacy fields like `dutchie_plus_id` and slug guessing. Use the record’s `menu_url` and `platform_dispensary_id`.
+
+3) **Menu detection and platform IDs**
+   - Set `menu_type` from `menu_url` detection; resolve `platform_dispensary_id` for `menu_type='dutchie'`.
+   - Admin should have “refresh detection” and “resolve ID” actions; schedule/crawl only when `menu_type='dutchie'` AND `platform_dispensary_id` is set.
+
+4) **Queries and mapping**
+   - The DB returns snake_case; code expects camelCase. Always alias/map:
+     - `platform_dispensary_id AS "platformDispensaryId"`
+     - Map via `mapDbRowToDispensary` when loading dispensaries (scheduler, crawler, admin crawl).
+   - Avoid `SELECT *`; explicitly select and/or map fields.
+
+5) **Scheduling**
+   - `/scraper-schedule` should accept filters/search (All vs AZ-only, name).
+   - “Run Now”/scheduler must skip or warn if `menu_type!='dutchie'` or `platform_dispensary_id` missing.
+   - Use `dispensary_crawl_status` view; show reason when not crawlable.
+
+6) **Crawling**
+   - Trigger dutchie crawls by dispensary ID (e.g., `/api/az/admin/crawl/:id` or `runDispensaryOrchestrator(id)`).
+   - Update existing products (by stable product ID), append snapshots for history (every 4h cadence), download images locally (`/images/...`), store local URLs.
+   - Use dutchie GraphQL pipeline only for `menu_type='dutchie'`.
+
+7) **Frontend**
+   - Forward-facing URLs: `/api/az`, `/az`, `/az-schedule`; no vendor names.
+   - `/scraper-schedule`: add filters/search, keep as master view for all schedules; reflect platform ID/menu_type status and controls (resolve ID, run now, enable/disable/delete).
+
+8) **No slug guessing**
+   - Do not guess slugs; use the DB record’s `menu_url` and ID. Resolve platform ID from the URL/cName; if set, crawl directly by ID.
+
+9) **Verify locally before pushing**
+   - Apply migrations, restart backend, ensure auth (`users` table) exists, run dutchie crawl for a known dispensary (e.g., Deeply Rooted), check `/api/az/dashboard`, `/api/az/stores/:id/products`, `/az`, `/scraper-schedule`.
+