Add CLAUDE guidelines for consolidated pipeline

This commit is contained in:
Kelly
2025-12-02 13:28:23 -07:00
parent 9219d8a77a
commit 04b5c3bd09
32 changed files with 4485 additions and 169 deletions

40
CLAUDE.md Normal file
View File

@@ -0,0 +1,40 @@
## Claude Guidelines for this Project
1) **Use the consolidated DB everywhere**
- Preferred env: `CRAWLSY_DATABASE_URL` (fallback `DATABASE_URL`).
- Do NOT create dutchie tables in the legacy DB. Apply migrations 031/032/033 to the consolidated DB and restart.
2) **Dispensary vs Store**
- Dutchie pipeline uses `dispensaries` (not legacy `stores`). For dutchie crawls, always work with dispensary ID.
- Ignore legacy fields like `dutchie_plus_id` and slug guessing. Use the records `menu_url` and `platform_dispensary_id`.
3) **Menu detection and platform IDs**
- Set `menu_type` from `menu_url` detection; resolve `platform_dispensary_id` for `menu_type='dutchie'`.
- Admin should have “refresh detection” and “resolve ID” actions; schedule/crawl only when `menu_type='dutchie'` AND `platform_dispensary_id` is set.
4) **Queries and mapping**
- The DB returns snake_case; code expects camelCase. Always alias/map:
- `platform_dispensary_id AS "platformDispensaryId"`
- Map via `mapDbRowToDispensary` when loading dispensaries (scheduler, crawler, admin crawl).
- Avoid `SELECT *`; explicitly select and/or map fields.
5) **Scheduling**
- `/scraper-schedule` should accept filters/search (All vs AZ-only, name).
- “Run Now”/scheduler must skip or warn if `menu_type!='dutchie'` or `platform_dispensary_id` missing.
- Use `dispensary_crawl_status` view; show reason when not crawlable.
6) **Crawling**
- Trigger dutchie crawls by dispensary ID (e.g., `/api/az/admin/crawl/:id` or `runDispensaryOrchestrator(id)`).
- Update existing products (by stable product ID), append snapshots for history (every 4h cadence), download images locally (`/images/...`), store local URLs.
- Use dutchie GraphQL pipeline only for `menu_type='dutchie'`.
7) **Frontend**
- Forward-facing URLs: `/api/az`, `/az`, `/az-schedule`; no vendor names.
- `/scraper-schedule`: add filters/search, keep as master view for all schedules; reflect platform ID/menu_type status and controls (resolve ID, run now, enable/disable/delete).
8) **No slug guessing**
- Do not guess slugs; use the DB records `menu_url` and ID. Resolve platform ID from the URL/cName; if set, crawl directly by ID.
9) **Verify locally before pushing**
- Apply migrations, restart backend, ensure auth (`users` table) exists, run dutchie crawl for a known dispensary (e.g., Deeply Rooted), check `/api/az/dashboard`, `/api/az/stores/:id/products`, `/az`, `/scraper-schedule`.