feat: AZ dispensary harmonization with Dutchie source of truth

Major changes:
- Add harmonize-az-dispensaries.ts script to sync dispensaries with Dutchie API
- Add migration 057 for crawl_enabled and dutchie_verified fields
- Remove legacy dutchie-az module (replaced by platforms/dutchie)
- Clean up deprecated crawlers, scrapers, and orchestrator code
- Update location-discovery to not fallback to slug when ID is missing
- Add crawl-rotator service for proxy rotation
- Add types/index.ts for shared type definitions
- Add woodpecker-agent k8s manifest

Harmonization script:
- Queries ConsumerDispensaries API for all 32 AZ cities
- Matches dispensaries by platform_dispensary_id (not slug)
- Updates existing records with full Dutchie data
- Creates new records for unmatched Dutchie dispensaries
- Disables dispensaries not found in Dutchie

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Kelly
2025-12-08 10:19:49 -07:00
parent 948a732dd5
commit b7cfec0770
112 changed files with 3163 additions and 34694 deletions

View File

@@ -193,6 +193,44 @@ CannaiQ has **TWO databases** with distinct purposes:
| `dutchie_menus` | **Canonical CannaiQ database** - All schema, migrations, and application data | READ/WRITE |
| `dutchie_legacy` | **Legacy read-only archive** - Historical data from old system | READ-ONLY |
### Store vs Dispensary Terminology
**"Store" and "Dispensary" are SYNONYMS in CannaiQ.**
| Term | Usage | DB Table |
|------|-------|----------|
| Store | API routes (`/api/stores`) | `dispensaries` |
| Dispensary | DB table, internal code | `dispensaries` |
- `/api/stores` and `/api/dispensaries` both query the `dispensaries` table
- There is NO `stores` table in use - it's a legacy empty table
- Use these terms interchangeably in code and documentation
### Canonical vs Legacy Tables
**CANONICAL TABLES (USE THESE):**
| Table | Purpose | Row Count |
|-------|---------|-----------|
| `dispensaries` | Store/dispensary records | ~188+ rows |
| `dutchie_products` | Product catalog | ~37,000+ rows |
| `dutchie_product_snapshots` | Price/stock history | ~millions |
| `store_products` | Canonical product schema | ~37,000+ rows |
| `store_product_snapshots` | Canonical snapshot schema | growing |
**LEGACY TABLES (EMPTY - DO NOT USE):**
| Table | Status | Action |
|-------|--------|--------|
| `stores` | EMPTY (0 rows) | Use `dispensaries` instead |
| `products` | EMPTY (0 rows) | Use `dutchie_products` or `store_products` |
| `categories` | EMPTY (0 rows) | Categories stored in product records |
**Code must NEVER:**
- Query the `stores` table (use `dispensaries`)
- Query the `products` table (use `dutchie_products` or `store_products`)
- Query the `categories` table (categories are in product records)
**CRITICAL RULES:**
- **Migrations ONLY run on `dutchie_menus`** - NEVER on `dutchie_legacy`
- **Application code connects ONLY to `dutchie_menus`**
@@ -615,15 +653,28 @@ export default defineConfig({
### Detailed Rules
1) **Dispensary vs Store**
- Dutchie pipeline uses `dispensaries` (not legacy `stores`). For dutchie crawls, always work with dispensary ID.
1) **Dispensary = Store (SAME THING)**
- "Dispensary" and "store" are synonyms in CannaiQ. Use interchangeably.
- **API endpoint**: `/api/stores` (NOT `/api/dispensaries`)
- **DB table**: `dispensaries`
- When you need to create/query stores via API, use `/api/stores`
- Use the record's `menu_url` and `platform_dispensary_id`.
2) **Menu detection and platform IDs**
2) **API Authentication**
- **Trusted Origins (no auth needed)**:
- IPs: `127.0.0.1`, `::1`, `::ffff:127.0.0.1`
- Origins: `https://cannaiq.co`, `https://findadispo.com`, `https://findagram.co`
- Also: `http://localhost:3010`, `http://localhost:8080`, `http://localhost:5173`
- Requests from trusted IPs/origins get automatic admin access (`role: 'internal'`)
- **Remote (non-trusted)**: Use Bearer token (JWT or API token). NO username/password auth.
- Never try to login with username/password via API - use tokens only.
- See `src/auth/middleware.ts` for `TRUSTED_ORIGINS` and `TRUSTED_IPS` lists.
3) **Menu detection and platform IDs**
- Set `menu_type` from `menu_url` detection; resolve `platform_dispensary_id` for `menu_type='dutchie'`.
- Admin should have "refresh detection" and "resolve ID" actions; schedule/crawl only when `menu_type='dutchie'` AND `platform_dispensary_id` is set.
3) **Queries and mapping**
4) **Queries and mapping**
- The DB returns snake_case; code expects camelCase. Always alias/map:
- `platform_dispensary_id AS "platformDispensaryId"`
- Map via `mapDbRowToDispensary` when loading dispensaries (scheduler, crawler, admin crawl).