diff --git a/.gitignore b/.gitignore new file mode 100644 index 00000000..a1723d09 --- /dev/null +++ b/.gitignore @@ -0,0 +1,48 @@ +# Dependencies +node_modules/ + +# Build outputs (compiled JS, not source) +backend/dist/ +cannaiq/dist/ +findadispo/build/ +findagram/build/ +frontend/dist/ + +# Environment files (local secrets) +.env +.env.local +.env.*.local +backend/.env +backend/.env.local + +# Database dumps and backups (large files) +*.dump +*.sql.backup +backup_*.sql + +# IDE +.idea/ +.vscode/ +*.swp +*.swo + +# OS +.DS_Store +Thumbs.db + +# Logs +*.log +npm-debug.log* + +# Local storage (runtime data, not source) +backend/storage/ + +# Vite cache +**/node_modules/.vite/ + +# Test coverage +coverage/ + +# Temporary files +*.tmp +*.temp diff --git a/CLAUDE.md b/CLAUDE.md index 073e92ab..114470ce 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -30,7 +30,57 @@ CannaiQ is a **historical analytics system**. Data retention is **permanent by d - `local-storage.ts` must only: write files, create directories, read files - No `deleteImage`, `deleteProductImages`, or similar functions -### 2. DEPLOYMENT AUTHORIZATION REQUIRED +### 2. NO PROCESS KILLING — EVER + +**Claude must NEVER run process-killing commands:** +- No `pkill` +- No `kill -9` +- No `xargs kill` +- No `lsof | kill` +- No `killall` +- No `fuser -k` + +**Claude must NOT manage host processes.** Only user scripts manage the local environment. + +**Correct behavior:** +- If backend is running on port 3010 → say: "Backend already running" +- If backend is NOT running → say: "Please run `./setup-local.sh`" + +**Process management is done ONLY by user scripts:** +```bash +./setup-local.sh # Start local environment +./stop-local.sh # Stop local environment +``` + +### 3. NO MANUAL SERVER STARTUP — EVER + +**Claude must NEVER start the backend manually:** +- No `npx tsx src/index.ts` +- No `node dist/index.js` +- No `npm run dev` with custom env vars +- No `DATABASE_URL=... npx tsx ...` + +**Claude must NEVER set DATABASE_URL in shell commands:** +- DB connection uses `CANNAIQ_DB_*` env vars or `CANNAIQ_DB_URL` from the user's environment +- Never hardcode connection strings in bash commands +- Never override env vars to bypass the user's DB setup + +**If backend is not running:** +- Say: "Please run `./setup-local.sh`" +- Do NOT attempt to start it yourself + +**If a dependency is missing:** +- Add it to `package.json` +- Say: "Please run `cd backend && npm install`" +- Do NOT try to solve it by starting a custom dev server + +**The ONLY way to start local services:** +```bash +cd backend +./setup-local.sh +``` + +### 4. DEPLOYMENT AUTHORIZATION REQUIRED **NEVER deploy to production unless the user explicitly says:** > "CLAUDE — DEPLOYMENT IS NOW AUTHORIZED." @@ -41,42 +91,275 @@ Until then: - No port-forwarding to production - No connecting to Kubernetes clusters -### 3. LOCAL DEVELOPMENT BY DEFAULT +### 5. DATABASE CONNECTION ARCHITECTURE + +**Migration code is CLI-only. Runtime code must NOT import `src/db/migrate.ts`.** + +| Module | Purpose | Import From | +|--------|---------|-------------| +| `src/db/migrate.ts` | CLI migrations only | **NEVER import at runtime** | +| `src/db/pool.ts` | Runtime database pool | `import { pool } from '../db/pool'` | +| `src/dutchie-az/db/connection.ts` | Canonical connection helper | Alternative for runtime | + +**Runtime gets DB connections ONLY via:** +```typescript +import { pool } from '../db/pool'; +// or +import { getPool } from '../dutchie-az/db/connection'; +``` + +**To run migrations:** +```bash +cd backend +npx tsx src/db/migrate.ts +``` + +**Why this matters:** +- `migrate.ts` validates env vars strictly and throws at module load time +- Importing it at runtime causes startup crashes if env vars aren't perfect +- `pool.ts` uses lazy initialization - only validates when first query is made + +### 6. LOCAL DEVELOPMENT BY DEFAULT + +**Quick Start:** +```bash +./setup-local.sh +``` + +**Services (all started by setup-local.sh):** +| Service | URL | Purpose | +|---------|-----|---------| +| PostgreSQL | localhost:54320 | cannaiq-postgres container | +| Backend API | http://localhost:3010 | Express API server | +| CannaiQ Admin | http://localhost:8080/admin | B2B admin dashboard | +| FindADispo | http://localhost:3001 | Consumer dispensary finder | +| Findagram | http://localhost:3002 | Consumer delivery marketplace | **In local mode:** - Use `docker-compose.local.yml` (NO MinIO) - Use local filesystem storage at `./storage` -- Connect to local PostgreSQL at `localhost:54320` +- Connect to `cannaiq-postgres` at `localhost:54320` - Backend runs at `localhost:3010` +- All three frontends run on separate ports (8080, 3001, 3002) - NO remote connections, NO Kubernetes, NO MinIO **Environment:** +- All DB config is in `backend/.env` +- STORAGE_DRIVER=local +- STORAGE_BASE_PATH=./storage + +**Local Admin Bootstrap:** ```bash -STORAGE_DRIVER=local -STORAGE_BASE_PATH=./storage -DATABASE_URL=postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus -# MINIO_ENDPOINT is NOT set (forces local storage) +cd backend +npx tsx src/scripts/bootstrap-local-admin.ts ``` -### 4. MANDATORY LOCAL MODE FOR ALL CRAWLS AND TESTS +Creates/resets a deterministic local admin user: +| Field | Value | +|-------|-------| +| Email | `admin@local.test` | +| Password | `admin123` | +| Role | `superadmin` | -**Before running ANY of the following, CONFIRM local mode is active:** -- Crawler execution -- Orchestrator flows -- Sandbox tests -- Image scrape tests -- Module import tests +This is a LOCAL-DEV helper only. Never use these credentials in production. -**Pre-execution checklist:** -1. ✅ `./start-local.sh` or `docker-compose -f docker-compose.local.yml up` running -2. ✅ `STORAGE_DRIVER=local` -3. ✅ `STORAGE_BASE_PATH=./storage` -4. ✅ NO MinIO, NO S3 -5. ✅ NO port-forward -6. ✅ NO Kubernetes connection -7. ✅ Storage writes go to `/storage/products/{brand}/{state}/{product_id}/` +**Manual startup (if not using setup-local.sh):** +```bash +# Terminal 1: Start PostgreSQL +docker-compose -f docker-compose.local.yml up -d -**If any condition is not met, DO NOT proceed with the crawl or test.** +# Terminal 2: Start Backend +cd backend && npm run dev + +# Terminal 3: Start Frontend +cd cannaiq && npm run dev:admin +``` + +**Stop services:** +```bash +./stop-local.sh +``` + +--- + +## DATABASE MODEL (CRITICAL) + +### Database Architecture + +CannaiQ has **TWO databases** with distinct purposes: + +| Database | Purpose | Access | +|----------|---------|--------| +| `dutchie_menus` | **Canonical CannaiQ database** - All schema, migrations, and application data | READ/WRITE | +| `dutchie_legacy` | **Legacy read-only archive** - Historical data from old system | READ-ONLY | + +**CRITICAL RULES:** +- **Migrations ONLY run on `dutchie_menus`** - NEVER on `dutchie_legacy` +- **Application code connects ONLY to `dutchie_menus`** +- **ETL scripts READ from `dutchie_legacy`, WRITE to `dutchie_menus`** +- `dutchie_legacy` is frozen - NO writes, NO schema changes, NO migrations + +### Environment Variables + +**CannaiQ Database (dutchie_menus) - PRIMARY:** +```bash +# All application/migration DB access uses these env vars: +CANNAIQ_DB_HOST=localhost # Database host +CANNAIQ_DB_PORT=54320 # Database port +CANNAIQ_DB_NAME=dutchie_menus # MUST be dutchie_menus +CANNAIQ_DB_USER=dutchie # Database user +CANNAIQ_DB_PASS= # Database password + +# OR use a full connection string: +CANNAIQ_DB_URL=postgresql://user:pass@host:port/dutchie_menus +``` + +**Legacy Database (dutchie_legacy) - ETL ONLY:** +```bash +# Only used by ETL scripts for reading legacy data: +LEGACY_DB_HOST=localhost +LEGACY_DB_PORT=54320 +LEGACY_DB_NAME=dutchie_legacy # READ-ONLY - never migrated +LEGACY_DB_USER=dutchie +LEGACY_DB_PASS= + +# OR use a full connection string: +LEGACY_DB_URL=postgresql://user:pass@host:port/dutchie_legacy +``` + +**Key Rules:** +- `CANNAIQ_DB_NAME` MUST be `dutchie_menus` for application/migrations +- `LEGACY_DB_NAME` is `dutchie_legacy` - READ-ONLY for ETL only +- ALL application code MUST use `CANNAIQ_DB_*` environment variables +- No hardcoded database names anywhere in the codebase +- `backend/.env` controls all database access for local development + +**State Modeling:** +- States (AZ, MI, CA, NV, etc.) are modeled via `states` table + `state_id` on dispensaries +- NO separate databases per state +- Use `state_code` or `state_id` columns for filtering + +### Migration and ETL Procedure + +**Step 1: Run schema migration (on dutchie_menus ONLY):** +```bash +cd backend +psql "postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" \ + -f migrations/041_cannaiq_canonical_schema.sql +``` + +**Step 2: Run ETL to copy legacy data:** +```bash +cd backend +npx tsx src/scripts/etl/042_legacy_import.ts +# Reads from dutchie_legacy, writes to dutchie_menus +``` + +### Database Access Rules + +**Claude MUST NOT:** +- Connect to any database besides the canonical CannaiQ database +- Use raw connection strings in shell commands +- Run `psql` commands directly +- Construct database URLs manually +- Create or rename databases automatically +- Run `npm run migrate` without explicit user authorization +- Patch schema at runtime (no ALTER TABLE from scripts) + +**All data access MUST go through:** +- LOCAL CannaiQ backend HTTP API endpoints +- Internal CannaiQ application code (using canonical connection pool) +- Ask user to run SQL manually if absolutely needed + +**Local service management:** +- User starts services via `./setup-local.sh` (ONLY the user runs this) +- If port 3010 responds, assume backend is running +- If port 3010 does NOT respond, tell user: "Backend is not running; please run `./setup-local.sh`" +- Claude may only access the app via HTTP: `http://localhost:3010` (API), `http://localhost:8080/admin` (UI) +- Never restart, kill, or manage local processes — that is the user's responsibility + +### Migrations + +**Rules:** +- Migrations may be WRITTEN but only the USER runs them after review +- Never execute migrations automatically +- Only additive migrations (no DROP/DELETE) +- Write schema-tolerant code that handles missing optional columns + +**If schema changes are needed:** +1. Generate a proper migration file in `backend/migrations/*.sql` +2. Show the migration to the user +3. Wait for explicit authorization before running +4. Never run migrations automatically - only the user runs them after review + +**Schema tolerance:** +- If a column is missing at runtime, prefer making the code tolerant (treat field as optional) instead of auto-creating the column +- Queries should gracefully handle missing columns by omitting them or using NULL defaults + +### Canonical Schema Migration (041/042) + +**Migration 041** (`backend/migrations/041_cannaiq_canonical_schema.sql`): +- Creates canonical CannaiQ tables: `states`, `chains`, `brands`, `store_products`, `store_product_snapshots`, `crawl_runs` +- Adds `state_id` and `chain_id` columns to `dispensaries` +- Adds status columns to `dispensary_crawler_profiles` +- SCHEMA ONLY - no data inserts from legacy tables + +**ETL Script 042** (`backend/src/scripts/etl/042_legacy_import.ts`): +- Copies data from `dutchie_products` → `store_products` +- Copies data from `dutchie_product_snapshots` → `store_product_snapshots` +- Extracts brands from product data into `brands` table +- Links dispensaries to chains and states +- INSERT-ONLY and IDEMPOTENT (uses ON CONFLICT DO NOTHING) +- Run manually: `cd backend && npx tsx src/scripts/etl/042_legacy_import.ts` + +**Tables touched by ETL:** +| Source Table | Target Table | +|--------------|--------------| +| `dutchie_products` | `store_products` | +| `dutchie_product_snapshots` | `store_product_snapshots` | +| (brand names extracted) | `brands` | +| (state codes mapped) | `dispensaries.state_id` | +| (chain names matched) | `dispensaries.chain_id` | + +**Legacy tables remain intact** - `dutchie_products` and `dutchie_product_snapshots` are not modified. + +**Migration 045** (`backend/migrations/045_add_image_columns.sql`): +- Adds `thumbnail_url` to `store_products` and `store_product_snapshots` +- `image_url` already exists from migration 041 +- ETL 042 populates `image_url` from legacy `primary_image_url` where present +- `thumbnail_url` is NULL for legacy data - future crawls can populate it + +### Deprecated Connection Module + +The custom connection module at `src/dutchie-az/db/connection` is **DEPRECATED**. + +**All code using `getClient` from this module must be refactored to:** +- Use the CannaiQ API endpoints instead +- Use the orchestrator through the API +- Use the canonical DB pool from the main application + +--- + +## FORBIDDEN ACTIONS + +1. **Deleting any data** (products, snapshots, images, logs, traces) +2. **Deploying without explicit authorization** +3. **Connecting to Kubernetes** without authorization +4. **Port-forwarding to production** without authorization +5. **Starting MinIO** in local development +6. **Using S3/MinIO SDKs** when `STORAGE_DRIVER=local` +7. **Automating cleanup** of any kind +8. **Dropping database tables or columns** +9. **Overwriting historical records** (always append snapshots) +10. **Runtime schema patching** (ALTER TABLE from scripts) +11. **Using `getClient` from deprecated connection module** +12. **Creating ad-hoc database connections** outside the canonical pool +13. **Auto-adding missing columns** at runtime +14. **Killing local processes** (`pkill`, `kill`, `kill -9`, etc.) +15. **Starting backend/frontend directly** with custom env vars +16. **Running `lsof -ti:PORT | xargs kill`** or similar process-killing commands +17. **Using hardcoded database names** in code or comments +18. **Creating or connecting to a second database** --- @@ -113,20 +396,6 @@ import { saveImage, getImageUrl } from '../utils/storage-adapter'; --- -## FORBIDDEN ACTIONS - -1. **Deleting any data** (products, snapshots, images, logs, traces) -2. **Deploying without explicit authorization** -3. **Connecting to Kubernetes** without authorization -4. **Port-forwarding to production** without authorization -5. **Starting MinIO** in local development -6. **Using S3/MinIO SDKs** when `STORAGE_DRIVER=local` -7. **Automating cleanup** of any kind -8. **Dropping database tables or columns** -9. **Overwriting historical records** (always append snapshots) - ---- - ## UI ANONYMIZATION RULES - No vendor names in forward-facing URLs: use `/api/az/...`, `/az`, `/az-schedule` @@ -146,24 +415,24 @@ import { saveImage, getImageUrl } from '../utils/storage-adapter'; ### Multi-Site Architecture (CRITICAL) -This project has **5 working locations** - always clarify which one before making changes: +This project has **4 active locations** (plus 1 deprecated) - always clarify which one before making changes: | Folder | Domain | Type | Purpose | |--------|--------|------|---------| | `backend/` | (shared) | Express API | Single backend serving all frontends | -| `frontend/` | dispos.crawlsy.com | React SPA (Vite) | Legacy admin dashboard (internal use) | -| `cannaiq/` | cannaiq.co | React SPA + PWA | NEW admin dashboard / B2B analytics | +| `frontend/` | (DEPRECATED) | React SPA (Vite) | DEPRECATED - was dispos.crawlsy.com, now removed | +| `cannaiq/` | cannaiq.co | React SPA + PWA | Admin dashboard / B2B analytics | | `findadispo/` | findadispo.com | React SPA + PWA | Consumer dispensary finder | | `findagram/` | findagram.co | React SPA + PWA | Consumer delivery marketplace | -**IMPORTANT: `frontend/` vs `cannaiq/` confusion:** -- `frontend/` = OLD/legacy dashboard design, deployed to `dispos.crawlsy.com` (internal admin) -- `cannaiq/` = NEW dashboard design, deployed to `cannaiq.co` (customer-facing B2B) -- These are DIFFERENT codebases - do NOT confuse them! +**NOTE: `frontend/` folder is DEPRECATED:** +- `frontend/` = OLD/legacy dashboard - NO LONGER DEPLOYED (removed from k8s) +- `cannaiq/` = Primary admin dashboard, deployed to `cannaiq.co` +- Do NOT use or modify `frontend/` folder - it will be archived/removed -**Before any frontend work, ASK: "Which site? cannaiq, findadispo, findagram, or legacy (frontend/)?"** +**Before any frontend work, ASK: "Which site? cannaiq, findadispo, or findagram?"** -All four frontends share: +All three active frontends share: - Same backend API (port 3010) - Same PostgreSQL database - Same Kubernetes deployment for backend @@ -277,314 +546,156 @@ export default defineConfig({ ### Core Rules Summary -- **DB**: Use the single consolidated DB (CRAWLSY_DATABASE_URL → DATABASE_URL); no dual pools; schema_migrations must exist; apply migrations 031/032/033. +- **DB**: Use the single CannaiQ database via `CANNAIQ_DB_*` env vars. No hardcoded names. - **Images**: No MinIO. Save to local /images/products//-.webp (and brands); preserve original URL; serve via backend static. -- **Dutchie GraphQL**: Endpoint https://dutchie.com/api-3/graphql. Variables must use productsFilter.dispensaryId (platform_dispensary_id). Mode A: Status="Active". Mode B: Status=null/activeOnly:false. No dispensaryFilter.cNameOrID. -- **cName/slug**: Derive cName from each store's menu_url (/embedded-menu/ or /dispensary/). No hardcoded defaults. Each location must have its own valid menu_url and platform_dispensary_id; do not reuse IDs across locations. If slug is invalid/missing, mark not crawlable and log; resolve ID before crawling. +- **Dutchie GraphQL**: Endpoint https://dutchie.com/api-3/graphql. Variables must use productsFilter.dispensaryId (platform_dispensary_id). Mode A: Status="Active". Mode B: Status=null/activeOnly:false. +- **cName/slug**: Derive cName from each store's menu_url (/embedded-menu/ or /dispensary/). No hardcoded defaults. - **Dual-mode always**: useBothModes:true to get pricing (Mode A) + full coverage (Mode B). - **Batch DB writes**: Chunk products/snapshots/missing (100–200) to avoid OOM. -- **OOS/missing**: Include inactive/OOS in Mode B. Union A+B, dedupe by external_product_id+dispensary_id. Insert snapshots with stock_status; if absent from both modes, insert missing_from_feed. Do not filter OOS by default. -- **API/Frontend**: Use /api/az/... endpoints (stores/products/brands/categories/summary/dashboard). Rebuild frontend with VITE_API_URL pointing to the backend. -- **Scheduling**: Crawl only menu_type='dutchie' AND platform_dispensary_id IS NOT NULL. 4-hour crawl with jitter; detection job to set menu_type and resolve platform IDs. -- **Monitor**: /scraper-monitor (and /az-schedule) should show active/recent jobs from job_run_logs/crawl_jobs, with auto-refresh. -- **No slug guessing**: Never use defaults like "AZ-Deeply-Rooted." Always derive per store from menu_url and resolve platform IDs per location. +- **OOS/missing**: Include inactive/OOS in Mode B. Union A+B, dedupe by external_product_id+dispensary_id. +- **API/Frontend**: Use /api/az/... endpoints (stores/products/brands/categories/summary/dashboard). +- **Scheduling**: Crawl only menu_type='dutchie' AND platform_dispensary_id IS NOT NULL. 4-hour crawl with jitter. +- **Monitor**: /scraper-monitor (and /az-schedule) should show active/recent jobs from job_run_logs/crawl_jobs. +- **No slug guessing**: Never use defaults. Always derive per store from menu_url and resolve platform IDs per location. --- ### Detailed Rules -1) **Use the consolidated DB everywhere** - - Preferred env: `CRAWLSY_DATABASE_URL` (fallback `DATABASE_URL`). - - Do NOT create dutchie tables in the legacy DB. Apply migrations 031/032/033 to the consolidated DB and restart. - -2) **Dispensary vs Store** +1) **Dispensary vs Store** - Dutchie pipeline uses `dispensaries` (not legacy `stores`). For dutchie crawls, always work with dispensary ID. - - Ignore legacy fields like `dutchie_plus_id` and slug guessing. Use the record's `menu_url` and `platform_dispensary_id`. + - Use the record's `menu_url` and `platform_dispensary_id`. -3) **Menu detection and platform IDs** +2) **Menu detection and platform IDs** - Set `menu_type` from `menu_url` detection; resolve `platform_dispensary_id` for `menu_type='dutchie'`. - Admin should have "refresh detection" and "resolve ID" actions; schedule/crawl only when `menu_type='dutchie'` AND `platform_dispensary_id` is set. -4) **Queries and mapping** +3) **Queries and mapping** - The DB returns snake_case; code expects camelCase. Always alias/map: - `platform_dispensary_id AS "platformDispensaryId"` - Map via `mapDbRowToDispensary` when loading dispensaries (scheduler, crawler, admin crawl). - Avoid `SELECT *`; explicitly select and/or map fields. -5) **Scheduling** +4) **Scheduling** - `/scraper-schedule` should accept filters/search (All vs AZ-only, name). - "Run Now"/scheduler must skip or warn if `menu_type!='dutchie'` or `platform_dispensary_id` missing. - Use `dispensary_crawl_status` view; show reason when not crawlable. -6) **Crawling** - - Trigger dutchie crawls by dispensary ID (e.g., `/api/az/admin/crawl/:id` or `runDispensaryOrchestrator(id)`). +5) **Crawling** + - Trigger dutchie crawls by dispensary ID (e.g., `POST /api/admin/crawl/:id`). - Update existing products (by stable product ID), append snapshots for history (every 4h cadence), download images locally (`/images/...`), store local URLs. - Use dutchie GraphQL pipeline only for `menu_type='dutchie'`. -7) **Frontend** +6) **Frontend** - Forward-facing URLs: `/api/az`, `/az`, `/az-schedule`; no vendor names. - - `/scraper-schedule`: add filters/search, keep as master view for all schedules; reflect platform ID/menu_type status and controls (resolve ID, run now, enable/disable/delete). + - `/scraper-schedule`: add filters/search, keep as master view for all schedules; reflect platform ID/menu_type status and controls. -8) **No slug guessing** +7) **No slug guessing** - Do not guess slugs; use the DB record's `menu_url` and ID. Resolve platform ID from the URL/cName; if set, crawl directly by ID. -9) **Verify locally before pushing** - - Apply migrations, restart backend, ensure auth (`users` table) exists, run dutchie crawl for a known dispensary (e.g., Deeply Rooted), check `/api/az/dashboard`, `/api/az/stores/:id/products`, `/az`, `/scraper-schedule`. - -10) **Image storage (no MinIO)** +8) **Image storage (no MinIO)** - Save images to local filesystem only. Do not create or use MinIO in Docker. - Product images: `/images/products//-.webp` (+medium/+thumb). - Brand images: `/images/brands/-.webp`. - Store local URLs in DB fields (keep original URLs as fallback only). - Serve `/images` via backend static middleware. -11) **Dutchie GraphQL fetch rules** - - **Endpoint**: `https://dutchie.com/api-3/graphql` (NOT `api-gw.dutchie.com` which no longer exists). - - **Variables**: Use `productsFilter.dispensaryId` = `platform_dispensary_id` (MongoDB ObjectId, e.g., `6405ef617056e8014d79101b`). - - Do NOT use `dispensaryFilter.cNameOrID` - that's outdated. - - `cName` (e.g., `AZ-Deeply-Rooted`) is only for Referer/Origin headers and Puppeteer session bootstrapping. +9) **Dutchie GraphQL fetch rules** + - **Endpoint**: `https://dutchie.com/api-3/graphql` + - **Variables**: Use `productsFilter.dispensaryId` = `platform_dispensary_id` (MongoDB ObjectId). - **Mode A**: `Status: "Active"` - returns active products with pricing - **Mode B**: `Status: null` / `activeOnly: false` - returns all products including OOS/inactive - - **Example payload**: - ```json - { - "operationName": "FilteredProducts", - "variables": { - "productsFilter": { - "dispensaryId": "6405ef617056e8014d79101b", - "pricingType": "rec", - "Status": "Active" - } - }, - "extensions": { - "persistedQuery": { "version": 1, "sha256Hash": "" } - } - } - ``` - - **Headers** (server-side axios only): Chrome UA, `Origin: https://dutchie.com`, `Referer: https://dutchie.com/embedded-menu/`, `accept: application/json`, `content-type: application/json`. - - If local DNS can't resolve, run fetch from an environment that can (K8s pod/remote host), not from browser. - - Use server-side axios with embedded-menu headers; include CF/session cookie from Puppeteer if needed. + - **Headers** (server-side axios only): Chrome UA, `Origin: https://dutchie.com`, `Referer: https://dutchie.com/embedded-menu/`. -12) **Stop over-prep; run the crawl** - - To seed/refresh a store, run a one-off crawl by dispensary ID (example for Deeply Rooted): - ``` - DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" \ - npx tsx -e "const { crawlDispensaryProducts } = require('./src/dutchie-az/services/product-crawler'); const d={id:112,name:'Deeply Rooted',platform:'dutchie',platformDispensaryId:'6405ef617056e8014d79101b',menuType:'dutchie'}; crawlDispensaryProducts(d,'rec',{useBothModes:true}).then(r=>{console.log(r);process.exit(0);}).catch(e=>{console.error(e);process.exit(1);});" - ``` - If local DNS is blocked, run the same command inside the scraper pod via `kubectl exec ... -- bash -lc '...'`. - - After crawl, verify counts via `dutchie_products`, `dutchie_product_snapshots`, and `dispensaries.last_crawl_at`. Do not inspect the legacy `products` table for Dutchie. - -13) **Fetch troubleshooting** - - If 403 or empty data: log status + first GraphQL error; include cf_clearance/session cookie from Puppeteer; ensure headers match a real Chrome request; ensure variables use `productsFilter.dispensaryId`. - - If DNS fails locally, do NOT debug DNS—run the fetch from an environment that resolves (K8s/remote) or via Puppeteer-captured headers/cookies. No browser/CORS attempts. - -14) **Views and metrics** - - Keep v_brands/v_categories/v_brand_history based on `dutchie_products` and preserve brand_count metrics. Do not drop brand_count. - -15) **Batch DB writes to avoid OOM** +10) **Batch DB writes to avoid OOM** - Do NOT build one giant upsert/insert payload for products/snapshots/missing marks. - Chunk arrays (e.g., 100–200 items) and upsert/insert in a loop; drop references after each chunk. - - Apply to products, product snapshots, and any "mark missing" logic to keep memory low during crawls. -16) **Use dual-mode crawls by default** - - Always run with `useBothModes:true` to combine: - - Mode A (active feed with pricing/stock) - - Mode B (max coverage including OOS/inactive) +11) **Use dual-mode crawls by default** + - Always run with `useBothModes:true` to combine Mode A (pricing) + Mode B (full coverage). - Union/dedupe by product ID so you keep full coverage and pricing in one run. - - If you only run Mode B, prices will be null; dual-mode fills pricing while retaining OOS items. -17) **Capture OOS and missing items** - - GraphQL variables must include inactive/OOS (Status: All / activeOnly:false). Mode B already returns OOS/inactive; union with Mode A to keep pricing. - - After unioning Mode A/B, upsert products and insert snapshots with stock_status from the feed. If an existing product is absent from both Mode A and Mode B for the run, insert a snapshot with is_present_in_feed=false and stock_status='missing_from_feed'. - - Do not filter out OOS/missing in the API; only filter when the user requests (e.g., stockStatus=in_stock). Expose stock_status/in_stock from the latest snapshot (fallback to product). - - Verify with `/api/az/stores/:id/products?stockStatus=out_of_stock` and `?stockStatus=missing_from_feed`. +12) **Capture OOS and missing items** + - GraphQL variables must include inactive/OOS (Status: All / activeOnly:false). + - After unioning Mode A/B, upsert products and insert snapshots with stock_status from the feed. + - If an existing product is absent from both modes, insert a snapshot with is_present_in_feed=false and stock_status='missing_from_feed'. -18) **Menu discovery must crawl the website when menu_url is null** - - For dispensaries with no menu_url or unknown menu_type, crawl the dispensary.website (if present) to find provider links (dutchie, treez, jane, weedmaps, leafly, etc.). Follow “menu/order/shop” links up to a shallow depth with timeouts/rate limits. - - If a provider link is found, set menu_url, set menu_type, and store detection metadata; if dutchie, derive cName from menu_url and resolve platform_dispensary_id; store resolved_at and detection details. - - Do NOT mark a dispensary not_crawlable solely because menu_url is null; only mark not_crawlable if the website crawl fails to find a menu or returns 403/404/invalid. Log the reason in provider_detection_data and crawl_status_reason. - - Keep this as the menu discovery job (separate from product crawls); log successes/errors to job_run_logs. Only schedule product crawls for stores with menu_type='dutchie' AND platform_dispensary_id IS NOT NULL. +13) **Preserve all stock statuses (including unknown)** + - Do not filter or drop stock_status values in API/UI; pass through whatever is stored. + - Expected values: in_stock, out_of_stock, missing_from_feed, unknown. -19) **Preserve all stock statuses (including unknown)** - - Do not filter or drop stock_status values in API/UI; pass through whatever is stored on the latest snapshot/product. Expected values include: in_stock, out_of_stock (if exposed), missing_from_feed, unknown. Only apply filters when explicitly requested by the user. +14) **Never delete or overwrite historical data** + - Do not delete products/snapshots or overwrite historical records. + - Always append snapshots for changes (price/stock/qty), and mark missing_from_feed instead of removing records. -20) **Never delete or overwrite historical data** - - Do not delete products/snapshots or overwrite historical records. Always append snapshots for changes (price/stock/qty), and mark missing_from_feed instead of removing records. Historical data must remain intact for analytics. - -21) **Deployment via CI/CD only** - - Test locally, commit clean changes, and let CI/CD build and deploy to Kubernetes at code.cannabrands.app. Do NOT manually build/push images or tweak prod pods. Deploy backend first, smoke-test APIs, then frontend; roll back via CI/CD if needed. - -18) **Per-location cName and platform_dispensary_id resolution** - - For each dispensary, menu_url and cName must be valid for that exact location; no hardcoded defaults and no sharing platform_dispensary_id across locations. +15) **Per-location cName and platform_dispensary_id resolution** + - For each dispensary, menu_url and cName must be valid for that exact location. - Derive cName from menu_url per store: `/embedded-menu/` or `/dispensary/`. - Resolve platform_dispensary_id from that cName using GraphQL GetAddressBasedDispensaryData. - - If the slug is invalid/missing, mark the store not crawlable and log it; do not crawl with a mismatched cName/ID. Store the error in `provider_detection_data.resolution_error`. - - Before crawling, validate that the cName from menu_url matches the resolved platform ID; if mismatched, re-resolve before proceeding. + - If the slug is invalid/missing, mark the store not crawlable and log it. -19) **API endpoints (AZ pipeline)** - - Use /api/az/... endpoints: stores, products, brands, categories, summary, dashboard - - Rebuild frontend with VITE_API_URL pointing to the backend - - Dispensary Detail and analytics must use AZ endpoints +16) **API Route Semantics** -20) **Monitoring and logging** + **Route Groups:** + - `/api/admin/...` = Admin/operator actions (crawl triggers, health checks) + - `/api/az/...` = Arizona data slice (stores, products, metrics) + - `/api/v1/...` = Public API for external consumers (WordPress, etc.) + + **Crawl Trigger (CANONICAL):** + ``` + POST /api/admin/crawl/:dispensaryId + ``` + +17) **Monitoring and logging** - /scraper-monitor (and /az-schedule) should show active/recent jobs from job_run_logs/crawl_jobs - Auto-refresh every 30 seconds - System Logs page should show real log data, not just startup messages -21) **Dashboard Architecture - CRITICAL** - - **Frontend**: If you see old labels like "Active Proxies" or "Active Stores", it means the old dashboard bundle is being served. Rebuild the frontend with `VITE_API_URL` pointing to the correct backend and redeploy. Clear browser cache. Confirm new labels show up. - - **Backend**: `/api/dashboard/stats` MUST use the consolidated DB (same pool as dutchie-az module). Use the correct tables: `dutchie_products`, `dispensaries`, and views like `v_dashboard_stats`, `v_latest_snapshots`. Do NOT use a separate legacy connection. Do NOT query `az_products` (doesn't exist) or legacy `stores`/`products` tables. - - **DB Connectivity**: Use the proper DB host/role. Errors like `role "dutchie" does not exist` mean you're exec'ing into the wrong Postgres pod or using wrong credentials. Confirm the correct `DATABASE_URL` and test: `kubectl exec deployment/scraper -n dispensary-scraper -- psql $DATABASE_URL -c '\dt'` - - **After fixing**: Dashboard should show real data (e.g., 777 products) instead of zeros. Do NOT revert to legacy tables; point dashboard queries to the consolidated DB/views. - - **Checklist**: - 1. Rebuild/redeploy frontend with correct API URL, clear cache - 2. Fix `/api/dashboard/*` to use the consolidated DB pool and dutchie views/tables - 3. Test `/api/dashboard/stats` from the scraper pod; then reload the UI +18) **Dashboard Architecture** + - **Frontend**: Rebuild the frontend with `VITE_API_URL` pointing to the correct backend and redeploy. + - **Backend**: `/api/dashboard/stats` MUST use the canonical DB pool. Use the correct tables: `dutchie_products`, `dispensaries`, and views like `v_dashboard_stats`, `v_latest_snapshots`. -22) **Deployment (Gitea + Kubernetes)** +19) **Deployment (Gitea + Kubernetes)** - **Registry**: Gitea at `code.cannabrands.app/creationshop/dispensary-scraper` - **Build and push** (from backend directory): ```bash - # Login to Gitea container registry docker login code.cannabrands.app - - # Build the image cd backend docker build -t code.cannabrands.app/creationshop/dispensary-scraper:latest . - - # Push to registry docker push code.cannabrands.app/creationshop/dispensary-scraper:latest ``` - **Deploy to Kubernetes**: ```bash - # Restart deployments to pull new image kubectl rollout restart deployment/scraper -n dispensary-scraper kubectl rollout restart deployment/scraper-worker -n dispensary-scraper - - # Watch rollout status kubectl rollout status deployment/scraper -n dispensary-scraper - kubectl rollout status deployment/scraper-worker -n dispensary-scraper - ``` - - **Check pods**: - ```bash - kubectl get pods -n dispensary-scraper - kubectl logs -f deployment/scraper -n dispensary-scraper - kubectl logs -f deployment/scraper-worker -n dispensary-scraper ``` - K8s manifests are in `/k8s/` folder (scraper.yaml, scraper-worker.yaml, etc.) - - imagePullSecrets use `regcred` secret for Gitea registry auth -23) **Crawler Architecture** - - **Scraper pod (1 replica)**: Runs the Express API server + scheduler. The scheduler enqueues detection and crawl jobs to the database queue (`crawl_jobs` table). - - **Scraper-worker pods (5 replicas)**: Each worker runs `dist/dutchie-az/services/worker.js`, polling the job queue and processing jobs. - - **Job types processed by workers**: - - `menu_detection` / `menu_detection_single`: Detect menu provider type and resolve platform_dispensary_id from menu_url - - `dutchie_product_crawl`: Crawl products from Dutchie GraphQL API for dispensaries with valid platform IDs +20) **Crawler Architecture** + - **Scraper pod (1 replica)**: Runs the Express API server + scheduler. + - **Scraper-worker pods (5 replicas)**: Each worker runs `dist/dutchie-az/services/worker.js`, polling the job queue. + - **Job types**: `menu_detection`, `menu_detection_single`, `dutchie_product_crawl` - **Job schedules** (managed in `job_schedules` table): - - `dutchie_az_menu_detection`: Runs daily with 60-min jitter, detects menu type for dispensaries with unknown menu_type - - `dutchie_az_product_crawl`: Runs every 4 hours with 30-min jitter, crawls products from all detected Dutchie dispensaries - - **Trigger schedules manually**: `curl -X POST /api/az/admin/schedules/{id}/trigger` + - `dutchie_az_menu_detection`: Runs daily with 60-min jitter + - `dutchie_az_product_crawl`: Runs every 4 hours with 30-min jitter + - **Trigger schedules**: `curl -X POST /api/az/admin/schedules/{id}/trigger` - **Check schedule status**: `curl /api/az/admin/schedules` - - **Worker logs**: `kubectl logs -f deployment/scraper-worker -n dispensary-scraper` -24) **Crawler Maintenance Procedure (Check Jobs, Requeue, Restart)** - When crawlers are stuck or jobs aren't processing, follow this procedure: - - **Step 1: Check Job Status** - ```bash - # Port-forward to production - kubectl port-forward -n dispensary-scraper deployment/scraper 3099:3010 & - - # Check active/stuck jobs - curl -s http://localhost:3099/api/az/monitor/active-jobs | jq . - - # Check recent job history - curl -s "http://localhost:3099/api/az/monitor/jobs?limit=20" | jq '.jobs[] | {id, job_type, status, dispensary_id, started_at, products_found, duration_min: (.duration_ms/60000 | floor)}' - - # Check schedule status - curl -s http://localhost:3099/api/az/admin/schedules | jq '.schedules[] | {id, jobName, enabled, lastRunAt, lastStatus, nextRunAt}' - ``` - - **Step 2: Reset Stuck Jobs** - Jobs are considered stuck if they have `status='running'` but no heartbeat in >30 minutes: - ```bash - # Via API (if endpoint exists) - curl -s -X POST http://localhost:3099/api/az/admin/reset-stuck-jobs - - # Via direct DB (if API not available) - kubectl exec -n dispensary-scraper deployment/scraper -- psql $DATABASE_URL -c " - UPDATE dispensary_crawl_jobs - SET status = 'failed', - error_message = 'Job timed out - worker stopped sending heartbeats', - completed_at = NOW() - WHERE status = 'running' - AND (last_heartbeat_at < NOW() - INTERVAL '30 minutes' OR last_heartbeat_at IS NULL); - " - ``` - - **Step 3: Requeue Jobs (Trigger Fresh Crawl)** - ```bash - # Trigger product crawl schedule (typically ID 1) - curl -s -X POST http://localhost:3099/api/az/admin/schedules/1/trigger - - # Trigger menu detection schedule (typically ID 2) - curl -s -X POST http://localhost:3099/api/az/admin/schedules/2/trigger - - # Or crawl a specific dispensary - curl -s -X POST http://localhost:3099/api/az/admin/crawl/112 - ``` - - **Step 4: Restart Crawler Workers** - ```bash - # Restart scraper-worker pods (clears any stuck processes) - kubectl rollout restart deployment/scraper-worker -n dispensary-scraper - - # Watch rollout progress - kubectl rollout status deployment/scraper-worker -n dispensary-scraper - - # Optionally restart main scraper pod too - kubectl rollout restart deployment/scraper -n dispensary-scraper - ``` - - **Step 5: Monitor Recovery** - ```bash - # Watch worker logs - kubectl logs -f deployment/scraper-worker -n dispensary-scraper --tail=50 - - # Check dashboard for product counts - curl -s http://localhost:3099/api/az/dashboard | jq '{totalStores, totalProducts, storesByType}' - - # Verify jobs are processing - curl -s http://localhost:3099/api/az/monitor/active-jobs | jq . - ``` - - **Quick One-Liner for Full Reset:** - ```bash - # Reset stuck jobs and restart workers - kubectl exec -n dispensary-scraper deployment/scraper -- psql $DATABASE_URL -c "UPDATE dispensary_crawl_jobs SET status='failed', completed_at=NOW() WHERE status='running' AND (last_heartbeat_at < NOW() - INTERVAL '30 minutes' OR last_heartbeat_at IS NULL);" && kubectl rollout restart deployment/scraper-worker -n dispensary-scraper && kubectl rollout status deployment/scraper-worker -n dispensary-scraper - ``` - - **Cleanup port-forwards when done:** - ```bash - pkill -f "port-forward.*dispensary-scraper" - ``` - -25) **Frontend Architecture - AVOID OVER-ENGINEERING** +21) **Frontend Architecture - AVOID OVER-ENGINEERING** **Key Principles:** - **ONE BACKEND** serves ALL domains (cannaiq.co, findadispo.com, findagram.co) - Do NOT create separate backend services for each domain - - The existing `dispensary-scraper` backend handles everything **Frontend Build Differences:** - - `frontend/` uses **Vite** (outputs to `dist/`, uses `VITE_` env vars) → dispos.crawlsy.com (legacy) - - `cannaiq/` uses **Vite** (outputs to `dist/`, uses `VITE_` env vars) → cannaiq.co (NEW) + - `cannaiq/` uses **Vite** (outputs to `dist/`, uses `VITE_` env vars) → cannaiq.co - `findadispo/` uses **Create React App** (outputs to `build/`, uses `REACT_APP_` env vars) → findadispo.com - `findagram/` uses **Create React App** (outputs to `build/`, uses `REACT_APP_` env vars) → findagram.co **CRA vs Vite Dockerfile Differences:** ```dockerfile - # Vite (frontend, cannaiq) + # Vite (cannaiq) ENV VITE_API_URL=https://api.domain.com RUN npm run build COPY --from=builder /app/dist /usr/share/nginx/html @@ -595,74 +706,316 @@ export default defineConfig({ COPY --from=builder /app/build /usr/share/nginx/html ``` - **lucide-react Icon Gotchas:** - - Not all icons exist in older versions (e.g., `Cannabis` doesn't exist) - - Use `Leaf` as a substitute for cannabis-related icons - - When doing search/replace for icon names, be careful not to replace text content - - Example: "Cannabis-infused food" should NOT become "Leaf-infused food" - - **Deployment Options:** - 1. **Separate containers** (current): Each frontend in its own nginx container - 2. **Single container** (better): One nginx with multi-domain config serving all frontends - - **Single Container Multi-Domain Approach:** - ```dockerfile - # Build all frontends - FROM node:20-slim AS builder-cannaiq - WORKDIR /app/cannaiq - COPY cannaiq/package*.json ./ - RUN npm install - COPY cannaiq/ ./ - RUN npm run build - - FROM node:20-slim AS builder-findadispo - WORKDIR /app/findadispo - COPY findadispo/package*.json ./ - RUN npm install - COPY findadispo/ ./ - RUN npm run build - - FROM node:20-slim AS builder-findagram - WORKDIR /app/findagram - COPY findagram/package*.json ./ - RUN npm install - COPY findagram/ ./ - RUN npm run build - - # Production nginx with multi-domain routing - FROM nginx:alpine - COPY --from=builder-cannaiq /app/cannaiq/dist /var/www/cannaiq - COPY --from=builder-findadispo /app/findadispo/dist /var/www/findadispo - COPY --from=builder-findagram /app/findagram/build /var/www/findagram - COPY nginx-multi-domain.conf /etc/nginx/conf.d/default.conf - ``` - - **nginx-multi-domain.conf:** - ```nginx - server { - listen 80; - server_name cannaiq.co www.cannaiq.co; - root /var/www/cannaiq; - location / { try_files $uri $uri/ /index.html; } - } - - server { - listen 80; - server_name findadispo.com www.findadispo.com; - root /var/www/findadispo; - location / { try_files $uri $uri/ /index.html; } - } - - server { - listen 80; - server_name findagram.co www.findagram.co; - root /var/www/findagram; - location / { try_files $uri $uri/ /index.html; } - } - ``` - **Common Mistakes to AVOID:** - Creating a FastAPI/Express backend just for findagram or findadispo - Creating separate Docker images per domain when one would work - - Replacing icon names with sed without checking for text content collisions - Using `npm ci` in Dockerfiles when package-lock.json doesn't exist (use `npm install`) + +--- + +## Admin UI Integration (Dutchie Discovery System) + +The admin frontend includes a dedicated Discovery page located at: + + cannaiq/src/pages/Discovery.tsx + +This page is the operational interface that administrators use for +managing the Dutchie discovery pipeline. While it does not define API +features itself, it is the primary consumer of the Dutchie Discovery API. + +### Responsibilities of the Discovery UI + +The UI enables administrators to: + +- View all discovered Dutchie locations +- Filter by status: + - discovered + - verified + - merged (linked to an existing dispensary) + - rejected +- Inspect individual location details (metadata, raw address, menu URL) +- Verify & create a new canonical dispensary +- Verify & link to an existing canonical dispensary +- Reject or unreject discovered locations +- Promote verified/merged locations into full crawlers via the orchestrator + +### API Endpoints Consumed by the Discovery UI + +The Discovery UI uses platform-agnostic routes with neutral slugs (see `docs/platform-slug-mapping.md`): + +**Platform Slug**: `dt` = Dutchie (trademark-safe URL) + +- `GET /api/discovery/platforms/dt/locations` +- `GET /api/discovery/platforms/dt/locations/:id` +- `POST /api/discovery/platforms/dt/locations/:id/verify-create` +- `POST /api/discovery/platforms/dt/locations/:id/verify-link` +- `POST /api/discovery/platforms/dt/locations/:id/reject` +- `POST /api/discovery/platforms/dt/locations/:id/unreject` +- `GET /api/discovery/platforms/dt/locations/:id/match-candidates` +- `GET /api/discovery/platforms/dt/cities` +- `GET /api/discovery/platforms/dt/summary` +- `POST /api/orchestrator/platforms/dt/promote/:id` + +These endpoints are defined in: +- `backend/src/dutchie-az/discovery/routes.ts` +- `backend/src/dutchie-az/discovery/promoteDiscoveryLocation.ts` + +### Frontend API Helper + +The file: + + cannaiq/src/lib/api.ts + +implements the client-side wrappers for calling these endpoints: + +- `getPlatformDiscoverySummary(platformSlug)` +- `getPlatformDiscoveryLocations(platformSlug, params)` +- `getPlatformDiscoveryLocation(platformSlug, id)` +- `verifyCreatePlatformLocation(platformSlug, id, verifiedBy)` +- `verifyLinkPlatformLocation(platformSlug, id, dispensaryId, verifiedBy)` +- `rejectPlatformLocation(platformSlug, id, reason, verifiedBy)` +- `unrejectPlatformLocation(platformSlug, id)` +- `getPlatformLocationMatchCandidates(platformSlug, id)` +- `getPlatformDiscoveryCities(platformSlug, params)` +- `promotePlatformDiscoveryLocation(platformSlug, id)` + +Where `platformSlug` is a neutral two-letter slug (e.g., `'dt'` for Dutchie). +These helpers must be kept synchronized with backend routes. + +### UI/Backend Contract + +The Discovery UI must always: +- Treat discovery data as **non-canonical** until verified. +- Not assume a discovery location is crawl-ready. +- Initiate promotion only after verification steps. +- Handle all statuses safely: discovered, verified, merged, rejected. + +The backend must always: +- Preserve discovery data even if rejected. +- Never automatically merge or promote a location. +- Allow idempotent verification and linking actions. +- Expose complete metadata to help operators make verification decisions. + +# Coordinate Capture (Platform Discovery) + +The DtLocationDiscoveryService captures geographic coordinates (latitude, longitude) whenever a platform's store payload provides them. + +## Behavior: + +- On INSERT: + - If the Dutchie API/GraphQL payload includes coordinates, they are saved into: + - dutchie_discovery_locations.latitude + - dutchie_discovery_locations.longitude + +- On UPDATE: + - Coordinates are only filled if the existing row has NULL values. + - Coordinates are never overwritten once set (prevents pollution if later payloads omit or degrade coordinate accuracy). + +- Logging: + - When coordinates are detected and captured: + "Extracted coordinates for : , " + +- Summary Statistics: + - The discovery runner reports a count of: + - locations with coordinates + - locations without coordinates + +## Purpose: + +Coordinate capture enables: +- City/state validation (cross-checking submitted address vs lat/lng) +- Distance-based duplicate detection +- Location clustering for analytics +- Mapping/front-end visualization +- Future multi-platform reconciliation +- Improved dispensary matching during verify-link flow + +Coordinate capture is part of the discovery phase only. +Canonical `dispensaries` entries may later be enriched with verified coordinates during promotion. + +# CannaiQ — Analytics V2 Examples & API Structure Extension + +This section contains examples from `backend/docs/ANALYTICS_V2_EXAMPLES.md` and extends the Analytics V2 API definition to include: + +- response payload formats +- time window semantics +- rec/med segmentation usage +- SQL/TS pseudo-code examples +- endpoint expectations + +--- + +# Analytics V2: Supported Endpoints + +Base URL prefix: /api/analytics/v2 + +All endpoints accept `?window=7d|30d|90d` unless noted otherwise. + +## 1. Price Analytics + +### GET /api/analytics/v2/price/product/:storeProductId +Returns price history for a canonical store product. + +Example response: +{ + "storeProductId": 123, + "window": "30d", + "points": [ + { "date": "2025-02-01", "price": 32, "in_stock": true }, + { "date": "2025-02-02", "price": 30, "in_stock": true } + ] +} + +### GET /api/analytics/v2/price/rec-vs-med?categoryId=XYZ +Compares category pricing between recreational and medical-only states. + +Example response: +{ + "categoryId": "flower", + "rec": { "avg": 29.44, "median": 28.00, "states": ["CO", "WA", ...] }, + "med": { "avg": 33.10, "median": 31.00, "states": ["FL", "PA", ...] } +} + +--- + +## 2. Brand Analytics + +### GET /api/analytics/v2/brand/:name/penetration +Returns penetration across states. + +{ + "brand": "Wyld", + "window": "90d", + "penetration": [ + { "state": "AZ", "stores": 28 }, + { "state": "MI", "stores": 34 } + ] +} + +### GET /api/analytics/v2/brand/:name/rec-vs-med +Returns penetration split by rec vs med segmentation. + +--- + +## 3. Category Analytics + +### GET /api/analytics/v2/category/:name/growth +7d/30d/90d snapshot comparison: + +{ + "category": "vape", + "window": "30d", + "growth": { + "current_sku_count": 420, + "previous_sku_count": 380, + "delta": 40 + } +} + +### GET /api/analytics/v2/category/rec-vs-med +Category-level comparisons. + +--- + +## 4. Store Analytics + +### GET /api/analytics/v2/store/:storeId/changes +Product-level changes: + +{ + "storeId": 88, + "window": "30d", + "added": [...], + "removed": [...], + "price_changes": [...], + "restocks": [...], + "oos_events": [...] +} + +### GET /api/analytics/v2/store/:storeId/summary + +--- + +## 5. State Analytics + +### GET /api/analytics/v2/state/legal-breakdown +State rec/med/no-program segmentation summary. + +### GET /api/analytics/v2/state/rec-vs-med-pricing +State-level pricing comparison. + +### GET /api/analytics/v2/state/recreational +List rec-legal state codes. + +### GET /api/analytics/v2/state/medical-only +List med-only state codes. + +--- + +# Windowing Semantics + +Definition: window is applied to canonical snapshots. +Equivalent to: + +WHERE snapshot_at >= NOW() - INTERVAL '' + +--- + +# Rec/Med Segmentation Rules + +rec_states: + states.recreational_legal = TRUE + +med_only_states: + states.medical_legal = TRUE AND states.recreational_legal = FALSE + +no_program: + both flags FALSE or NULL + +Analytics must use this segmentation consistently. + +--- + +# Response Structure Requirements + +Every analytics v2 endpoint must: + +- include the window used +- include segmentation if relevant +- include state codes when state-level grouping is used +- return safe empty arrays if no data +- NEVER throw on missing data +- be versionable (v2 must not break previous analytics APIs) + +--- + +# Service Responsibilities Summary + +### PriceAnalyticsService +- compute time-series price trends +- compute average/median price by state +- compute rec-vs-med price comparisons + +### BrandPenetrationService +- compute presence across stores and states +- rec-vs-med brand footprint +- detect expansion / contraction + +### CategoryAnalyticsService +- compute SKU count changes +- category pricing +- rec-vs-med category dynamics + +### StoreAnalyticsService +- detect SKU additions/drops +- price changes +- restocks & OOS events + +### StateAnalyticsService +- legal breakdown +- coverage gaps +- rec-vs-med scoring + +--- + +# END Analytics V2 spec extension diff --git a/backend/.env.example b/backend/.env.example new file mode 100644 index 00000000..63913c71 --- /dev/null +++ b/backend/.env.example @@ -0,0 +1,50 @@ +# CannaiQ Backend Environment Configuration +# Copy this file to .env and fill in the values + +# Server +PORT=3010 +NODE_ENV=development + +# ============================================================================= +# CANNAIQ DATABASE (dutchie_menus) - PRIMARY DATABASE +# ============================================================================= +# This is where ALL schema migrations run and where canonical tables live. +# All CANNAIQ_DB_* variables are REQUIRED - no defaults. +# The application will fail to start if any are missing. + +CANNAIQ_DB_HOST=localhost +CANNAIQ_DB_PORT=54320 +CANNAIQ_DB_NAME=dutchie_menus # MUST be dutchie_menus - NOT dutchie_legacy +CANNAIQ_DB_USER=dutchie +CANNAIQ_DB_PASS= + +# Alternative: Use a full connection URL instead of individual vars +# If set, this takes priority over individual vars above +# CANNAIQ_DB_URL=postgresql://user:pass@host:port/dutchie_menus + +# ============================================================================= +# LEGACY DATABASE (dutchie_legacy) - READ-ONLY FOR ETL +# ============================================================================= +# Used ONLY by ETL scripts to read historical data. +# NEVER run migrations against this database. +# These are only needed when running 042_legacy_import.ts + +LEGACY_DB_HOST=localhost +LEGACY_DB_PORT=54320 +LEGACY_DB_NAME=dutchie_legacy # READ-ONLY - never migrated +LEGACY_DB_USER=dutchie +LEGACY_DB_PASS= + +# Alternative: Use a full connection URL instead of individual vars +# LEGACY_DB_URL=postgresql://user:pass@host:port/dutchie_legacy + +# ============================================================================= +# LOCAL STORAGE +# ============================================================================= +# Local image storage path (no MinIO) +LOCAL_IMAGES_PATH=./public/images + +# ============================================================================= +# AUTHENTICATION +# ============================================================================= +JWT_SECRET=your-secret-key-change-in-production diff --git a/backend/docker-compose.local.yml b/backend/docker-compose.local.yml new file mode 100644 index 00000000..48151d30 --- /dev/null +++ b/backend/docker-compose.local.yml @@ -0,0 +1,30 @@ +# CannaiQ Local Development Environment +# Run: docker-compose -f docker-compose.local.yml up -d +# +# Services: +# - cannaiq-postgres: PostgreSQL at localhost:54320 +# +# Note: Backend and frontend run outside Docker for faster dev iteration + +version: '3.8' + +services: + cannaiq-postgres: + image: postgres:15-alpine + container_name: cannaiq-postgres + environment: + POSTGRES_USER: cannaiq + POSTGRES_PASSWORD: cannaiq_local_pass + POSTGRES_DB: cannaiq + ports: + - "54320:5432" + volumes: + - cannaiq-postgres-data:/var/lib/postgresql/data + healthcheck: + test: ["CMD-SHELL", "pg_isready -U cannaiq"] + interval: 10s + timeout: 5s + retries: 5 + +volumes: + cannaiq-postgres-data: diff --git a/backend/docs/ANALYTICS_RUNBOOK.md b/backend/docs/ANALYTICS_RUNBOOK.md new file mode 100644 index 00000000..cf782b59 --- /dev/null +++ b/backend/docs/ANALYTICS_RUNBOOK.md @@ -0,0 +1,712 @@ +# CannaiQ Analytics Runbook + +Phase 3: Analytics Engine - Complete Implementation Guide + +## Overview + +The CannaiQ Analytics Engine provides real-time insights into cannabis market data across price trends, brand penetration, category performance, store changes, and competitive positioning. + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ API Layer │ +│ /api/az/analytics/* │ +└─────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ Analytics Services │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │ +│ │PriceTrend │ │Penetration │ │CategoryAnalytics │ │ +│ │Service │ │Service │ │Service │ │ +│ └──────────────┘ └──────────────┘ └──────────────────────┘ │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │ +│ │StoreChange │ │BrandOpportunity│ │AnalyticsCache │ │ +│ │Service │ │Service │ │(15-min TTL) │ │ +│ └──────────────┘ └──────────────┘ └──────────────────────┘ │ +└─────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ Canonical Tables │ +│ store_products │ store_product_snapshots │ brands │ categories │ +│ dispensaries │ brand_snapshots │ category_snapshots │ +└─────────────────────────────────────────────────────────────────┘ +``` + +## Services + +### 1. PriceTrendService + +Provides time-series price analytics. + +**Key Methods:** +| Method | Description | +|--------|-------------| +| `getProductPriceTrend(productId, storeId?, days)` | Price history for a product | +| `getBrandPriceTrend(brandName, filters)` | Average prices for a brand | +| `getCategoryPriceTrend(category, filters)` | Category-level price trends | +| `getPriceSummary(filters)` | 7d/30d/90d price averages | +| `detectPriceCompression(category, state?)` | Price war detection | +| `getGlobalPriceStats()` | Market-wide pricing overview | + +**Filters:** +```typescript +interface PriceFilters { + storeId?: number; + brandName?: string; + category?: string; + state?: string; + days?: number; // default: 30 +} +``` + +**Price Compression Detection:** +- Calculates standard deviation of prices within category +- Returns compression score 0-100 (higher = more compressed) +- Identifies brands converging toward mean price + +--- + +### 2. PenetrationService + +Tracks brand market presence across stores and states. + +**Key Methods:** +| Method | Description | +|--------|-------------| +| `getBrandPenetration(brandName, filters)` | Store count, SKU count, coverage | +| `getTopBrandsByPenetration(limit, filters)` | Leaderboard of dominant brands | +| `getPenetrationTrend(brandName, days)` | Historical penetration growth | +| `getShelfShareByCategory(brandName)` | % of shelf per category | +| `getBrandPresenceByState(brandName)` | Multi-state presence map | +| `getStoresCarryingBrand(brandName)` | List of stores carrying brand | +| `getPenetrationHeatmap(brandName?)` | Geographic distribution | + +**Penetration Calculation:** +``` +Penetration % = (Stores with Brand / Total Stores in Market) × 100 +``` + +--- + +### 3. CategoryAnalyticsService + +Analyzes category performance and trends. + +**Key Methods:** +| Method | Description | +|--------|-------------| +| `getCategorySummary(category?, filters)` | SKU count, avg price, stores | +| `getCategoryGrowth(days, filters)` | 7d/30d/90d growth rates | +| `getCategoryGrowthTrend(category, days)` | Time-series category growth | +| `getCategoryHeatmap(metric, periods)` | Visual heatmap data | +| `getTopMovers(limit, days)` | Fastest growing/declining categories | +| `getSubcategoryBreakdown(category)` | Drill-down into subcategories | + +**Time Windows:** +- 7 days: Short-term volatility +- 30 days: Monthly trends +- 90 days: Seasonal patterns + +--- + +### 4. StoreChangeService + +Tracks product adds/drops, brand changes, and price movements per store. + +**Key Methods:** +| Method | Description | +|--------|-------------| +| `getStoreChangeSummary(storeId)` | Overview of recent changes | +| `getStoreChangeEvents(storeId, filters)` | Event log (add, drop, price, OOS) | +| `getNewBrands(storeId, days)` | Brands added to store | +| `getLostBrands(storeId, days)` | Brands dropped from store | +| `getProductChanges(storeId, type, days)` | Filtered product changes | +| `getCategoryLeaderboard(category, limit)` | Top stores for category | +| `getMostActiveStores(days, limit)` | Stores with most changes | +| `compareStores(store1, store2)` | Side-by-side store comparison | + +**Event Types:** +- `added` - New product appeared +- `discontinued` - Product removed +- `price_drop` - Price decreased +- `price_increase` - Price increased +- `restocked` - OOS → In Stock +- `out_of_stock` - In Stock → OOS + +--- + +### 5. BrandOpportunityService + +Competitive intelligence and opportunity identification. + +**Key Methods:** +| Method | Description | +|--------|-------------| +| `getBrandOpportunity(brandName)` | Full opportunity analysis | +| `getMarketPositionSummary(brandName)` | Market position vs competitors | +| `getAlerts(filters)` | Analytics-generated alerts | +| `markAlertsRead(alertIds)` | Mark alerts as read | + +**Opportunity Analysis Includes:** +- White space stores (potential targets) +- Competitive threats (brands gaining share) +- Pricing opportunities (underpriced vs market) +- Missing SKU recommendations + +--- + +### 6. AnalyticsCache + +In-memory caching with database fallback. + +**Configuration:** +```typescript +const cache = new AnalyticsCache(pool, { + defaultTtlMinutes: 15, +}); +``` + +**Usage Pattern:** +```typescript +const data = await cache.getOrCompute(cacheKey, async () => { + // Expensive query here + return result; +}); +``` + +**Cache Management:** +- `GET /api/az/analytics/cache/stats` - View cache stats +- `POST /api/az/analytics/cache/clear?pattern=price*` - Clear by pattern +- Auto-cleanup of expired entries every 5 minutes + +--- + +## API Endpoints Reference + +### Price Endpoints + +```bash +# Product price trend (last 30 days) +GET /api/az/analytics/price/product/12345?days=30 + +# Brand price trend with filters +GET /api/az/analytics/price/brand/Cookies?storeId=101&category=Flower&days=90 + +# Category median price +GET /api/az/analytics/price/category/Vaporizers?state=AZ + +# Price summary (7d/30d/90d) +GET /api/az/analytics/price/summary?brand=Stiiizy&state=AZ + +# Detect price wars +GET /api/az/analytics/price/compression/Flower?state=AZ + +# Global stats +GET /api/az/analytics/price/global +``` + +### Penetration Endpoints + +```bash +# Brand penetration +GET /api/az/analytics/penetration/brand/Cookies + +# Top brands leaderboard +GET /api/az/analytics/penetration/top?limit=20&state=AZ&category=Flower + +# Penetration trend +GET /api/az/analytics/penetration/trend/Cookies?days=90 + +# Shelf share by category +GET /api/az/analytics/penetration/shelf-share/Cookies + +# Multi-state presence +GET /api/az/analytics/penetration/by-state/Cookies + +# Stores carrying brand +GET /api/az/analytics/penetration/stores/Cookies + +# Heatmap data +GET /api/az/analytics/penetration/heatmap?brand=Cookies +``` + +### Category Endpoints + +```bash +# Category summary +GET /api/az/analytics/category/summary?category=Flower&state=AZ + +# Category growth (7d/30d/90d) +GET /api/az/analytics/category/growth?days=30&state=AZ + +# Category trend +GET /api/az/analytics/category/trend/Concentrates?days=90 + +# Heatmap +GET /api/az/analytics/category/heatmap?metric=growth&periods=12 + +# Top movers (growing/declining) +GET /api/az/analytics/category/top-movers?limit=5&days=30 + +# Subcategory breakdown +GET /api/az/analytics/category/Edibles/subcategories +``` + +### Store Endpoints + +```bash +# Store change summary +GET /api/az/analytics/store/101/summary + +# Event log +GET /api/az/analytics/store/101/events?type=price_drop&days=7&limit=50 + +# New brands +GET /api/az/analytics/store/101/brands/new?days=30 + +# Lost brands +GET /api/az/analytics/store/101/brands/lost?days=30 + +# Product changes by type +GET /api/az/analytics/store/101/products/changes?type=added&days=7 + +# Category leaderboard +GET /api/az/analytics/store/leaderboard/Flower?limit=20 + +# Most active stores +GET /api/az/analytics/store/most-active?days=7&limit=10 + +# Compare two stores +GET /api/az/analytics/store/compare?store1=101&store2=102 +``` + +### Brand Opportunity Endpoints + +```bash +# Full opportunity analysis +GET /api/az/analytics/brand/Cookies/opportunity + +# Market position summary +GET /api/az/analytics/brand/Cookies/position + +# Get alerts +GET /api/az/analytics/alerts?brand=Cookies&type=competitive&unreadOnly=true + +# Mark alerts read +POST /api/az/analytics/alerts/mark-read +Body: { "alertIds": [1, 2, 3] } +``` + +### Maintenance Endpoints + +```bash +# Capture daily snapshots (run by scheduler) +POST /api/az/analytics/snapshots/capture + +# Cache statistics +GET /api/az/analytics/cache/stats + +# Clear cache (admin) +POST /api/az/analytics/cache/clear?pattern=price* +``` + +--- + +## Incremental Computation + +Analytics are designed for real-time queries without full recomputation: + +### Snapshot Strategy + +1. **Raw Data**: `store_products` (current state) +2. **Historical**: `store_product_snapshots` (time-series) +3. **Aggregated**: `brand_snapshots`, `category_snapshots` (daily rollups) + +### Window Calculations + +```sql +-- 7-day window +WHERE crawled_at >= NOW() - INTERVAL '7 days' + +-- 30-day window +WHERE crawled_at >= NOW() - INTERVAL '30 days' + +-- 90-day window +WHERE crawled_at >= NOW() - INTERVAL '90 days' +``` + +### Materialized Views (Optional) + +For heavy queries, create materialized views: + +```sql +CREATE MATERIALIZED VIEW mv_brand_daily_metrics AS +SELECT + DATE(sps.captured_at) as date, + sp.brand_id, + COUNT(DISTINCT sp.dispensary_id) as store_count, + COUNT(*) as sku_count, + AVG(sp.price_rec) as avg_price +FROM store_product_snapshots sps +JOIN store_products sp ON sps.store_product_id = sp.id +WHERE sps.captured_at >= NOW() - INTERVAL '90 days' +GROUP BY DATE(sps.captured_at), sp.brand_id; + +-- Refresh daily +REFRESH MATERIALIZED VIEW CONCURRENTLY mv_brand_daily_metrics; +``` + +--- + +## Scheduled Jobs + +### Daily Snapshot Capture + +Trigger via cron or scheduler: + +```bash +curl -X POST http://localhost:3010/api/az/analytics/snapshots/capture +``` + +This calls: +- `capture_brand_snapshots()` - Captures brand metrics +- `capture_category_snapshots()` - Captures category metrics + +### Cache Cleanup + +Automatic cleanup every 5 minutes via in-memory timer. + +For manual cleanup: +```bash +curl -X POST http://localhost:3010/api/az/analytics/cache/clear +``` + +--- + +## Extending Analytics (Future Phases) + +### Phase 6: Intelligence Engine +- Automated alert generation +- Recommendation engine +- Price prediction + +### Phase 7: Orders Integration +- Sales velocity analytics +- Reorder predictions +- Inventory turnover + +### Phase 8: Advanced ML +- Demand forecasting +- Price elasticity modeling +- Customer segmentation + +--- + +## Troubleshooting + +### Common Issues + +**1. Slow queries** +- Check cache stats: `GET /api/az/analytics/cache/stats` +- Increase cache TTL if data doesn't need real-time freshness +- Add indexes on frequently filtered columns + +**2. Empty results** +- Verify data exists in source tables +- Check filter parameters (case-sensitive brand names) +- Verify state codes are valid + +**3. Stale data** +- Run snapshot capture: `POST /api/az/analytics/snapshots/capture` +- Clear cache: `POST /api/az/analytics/cache/clear` + +### Debugging + +Enable query logging: +```typescript +// In service constructor +this.debug = process.env.ANALYTICS_DEBUG === 'true'; +``` + +--- + +## Data Contracts + +### Price Trend Response +```typescript +interface PriceTrend { + productId?: number; + storeId?: number; + brandName?: string; + category?: string; + dataPoints: Array<{ + date: string; + minPrice: number | null; + maxPrice: number | null; + avgPrice: number | null; + wholesalePrice: number | null; + sampleSize: number; + }>; + summary: { + currentAvg: number | null; + previousAvg: number | null; + changePercent: number | null; + trend: 'up' | 'down' | 'stable'; + volatilityScore: number | null; + }; +} +``` + +### Brand Penetration Response +```typescript +interface BrandPenetration { + brandName: string; + totalStores: number; + storesWithBrand: number; + penetrationPercent: number; + skuCount: number; + avgPrice: number | null; + priceRange: { min: number; max: number } | null; + topCategories: Array<{ category: string; count: number }>; + stateBreakdown?: Array<{ state: string; storeCount: number }>; +} +``` + +### Category Growth Response +```typescript +interface CategoryGrowth { + category: string; + currentCount: number; + previousCount: number; + growthPercent: number; + growthTrend: 'up' | 'down' | 'stable'; + avgPrice: number | null; + priceChange: number | null; + topBrands: Array<{ brandName: string; count: number }>; +} +``` + +--- + +## Files Reference + +| File | Purpose | +|------|---------| +| `src/dutchie-az/services/analytics/price-trends.ts` | Price analytics | +| `src/dutchie-az/services/analytics/penetration.ts` | Brand penetration | +| `src/dutchie-az/services/analytics/category-analytics.ts` | Category metrics | +| `src/dutchie-az/services/analytics/store-changes.ts` | Store event tracking | +| `src/dutchie-az/services/analytics/brand-opportunity.ts` | Competitive intel | +| `src/dutchie-az/services/analytics/cache.ts` | Caching layer | +| `src/dutchie-az/services/analytics/index.ts` | Module exports | +| `src/dutchie-az/routes/analytics.ts` | API routes (680 LOC) | +| `src/multi-state/state-query-service.ts` | Cross-state analytics | + +--- + +--- + +## Analytics V2: Rec/Med State Segmentation + +Phase 3 Enhancement: Enhanced analytics with recreational vs medical-only state analysis. + +### V2 API Endpoints + +All V2 endpoints are prefixed with `/api/analytics/v2` + +#### V2 Price Analytics + +```bash +# Price trends for a specific product +GET /api/analytics/v2/price/product/12345?window=30d + +# Price by category and state (with rec/med segmentation) +GET /api/analytics/v2/price/category/Flower?state=AZ + +# Price by brand and state +GET /api/analytics/v2/price/brand/Cookies?state=AZ + +# Most volatile products +GET /api/analytics/v2/price/volatile?window=30d&limit=50&state=AZ + +# Rec vs Med price comparison by category +GET /api/analytics/v2/price/rec-vs-med?category=Flower +``` + +#### V2 Brand Penetration + +```bash +# Brand penetration metrics with state breakdown +GET /api/analytics/v2/brand/Cookies/penetration?window=30d + +# Brand market position within categories +GET /api/analytics/v2/brand/Cookies/market-position?category=Flower&state=AZ + +# Brand presence in rec vs med-only states +GET /api/analytics/v2/brand/Cookies/rec-vs-med + +# Top brands by penetration +GET /api/analytics/v2/brand/top?limit=25&state=AZ + +# Brands expanding or contracting +GET /api/analytics/v2/brand/expansion-contraction?window=30d&limit=25 +``` + +#### V2 Category Analytics + +```bash +# Category growth metrics +GET /api/analytics/v2/category/Flower/growth?window=30d + +# Category growth trend over time +GET /api/analytics/v2/category/Flower/trend?window=30d + +# Top brands in category +GET /api/analytics/v2/category/Flower/top-brands?limit=25&state=AZ + +# All categories with metrics +GET /api/analytics/v2/category/all?state=AZ&limit=50 + +# Rec vs Med category comparison +GET /api/analytics/v2/category/rec-vs-med?category=Flower + +# Fastest growing categories +GET /api/analytics/v2/category/fastest-growing?window=30d&limit=25 +``` + +#### V2 Store Analytics + +```bash +# Store change summary +GET /api/analytics/v2/store/101/summary?window=30d + +# Product change events +GET /api/analytics/v2/store/101/events?window=7d&limit=100 + +# Store inventory composition +GET /api/analytics/v2/store/101/inventory + +# Store price positioning vs market +GET /api/analytics/v2/store/101/price-position + +# Most active stores by changes +GET /api/analytics/v2/store/most-active?window=7d&limit=25&state=AZ +``` + +#### V2 State Analytics + +```bash +# State market summary +GET /api/analytics/v2/state/AZ/summary + +# All states with coverage metrics +GET /api/analytics/v2/state/all + +# Legal state breakdown (rec, med-only, no program) +GET /api/analytics/v2/state/legal-breakdown + +# Rec vs Med pricing by category +GET /api/analytics/v2/state/rec-vs-med-pricing?category=Flower + +# States with coverage gaps +GET /api/analytics/v2/state/coverage-gaps + +# Cross-state pricing comparison +GET /api/analytics/v2/state/price-comparison +``` + +### V2 Services Architecture + +``` +src/services/analytics/ +├── index.ts # Exports all V2 services +├── types.ts # Shared type definitions +├── PriceAnalyticsService.ts # Price trends and volatility +├── BrandPenetrationService.ts # Brand market presence +├── CategoryAnalyticsService.ts # Category growth analysis +├── StoreAnalyticsService.ts # Store change tracking +└── StateAnalyticsService.ts # State-level analytics + +src/routes/analytics-v2.ts # V2 API route handlers +``` + +### Key V2 Features + +1. **Rec/Med State Segmentation**: All analytics can be filtered and compared by legal status +2. **State Coverage Gaps**: Identify legal states with missing or stale data +3. **Cross-State Pricing**: Compare prices across recreational and medical-only markets +4. **Brand Footprint Analysis**: Track brand presence in rec vs med states +5. **Category Comparison**: Compare category performance by legal status + +### V2 Migration Path + +1. Run migration 052 for state cannabis flags: + ```bash + psql "$DATABASE_URL" -f migrations/052_add_state_cannabis_flags.sql + ``` + +2. Run migration 053 for analytics indexes: + ```bash + psql "$DATABASE_URL" -f migrations/053_analytics_indexes.sql + ``` + +3. Restart backend to pick up new routes + +### V2 Response Examples + +**Rec vs Med Price Comparison:** +```json +{ + "category": "Flower", + "recreational": { + "state_count": 15, + "product_count": 12500, + "avg_price": 35.50, + "median_price": 32.00 + }, + "medical_only": { + "state_count": 8, + "product_count": 5200, + "avg_price": 42.00, + "median_price": 40.00 + }, + "price_diff_percent": -15.48 +} +``` + +**Legal State Breakdown:** +```json +{ + "recreational_states": { + "count": 24, + "dispensary_count": 850, + "product_count": 125000, + "states": [ + { "code": "CA", "name": "California", "dispensary_count": 250 }, + { "code": "CO", "name": "Colorado", "dispensary_count": 150 } + ] + }, + "medical_only_states": { + "count": 18, + "dispensary_count": 320, + "product_count": 45000, + "states": [ + { "code": "FL", "name": "Florida", "dispensary_count": 120 } + ] + }, + "no_program_states": { + "count": 9, + "states": [ + { "code": "ID", "name": "Idaho" } + ] + } +} +``` + +--- + +*Phase 3 Analytics Engine - Fully Implemented* +*V2 Rec/Med State Analytics - Added December 2024* diff --git a/backend/docs/ANALYTICS_V2_EXAMPLES.md b/backend/docs/ANALYTICS_V2_EXAMPLES.md new file mode 100644 index 00000000..97d834cc --- /dev/null +++ b/backend/docs/ANALYTICS_V2_EXAMPLES.md @@ -0,0 +1,594 @@ +# Analytics V2 API Examples + +## Overview + +All endpoints are prefixed with `/api/analytics/v2` + +### Filtering Options + +**Time Windows:** +- `?window=7d` - Last 7 days +- `?window=30d` - Last 30 days (default) +- `?window=90d` - Last 90 days + +**Legal Type Filtering:** +- `?legalType=recreational` - Recreational states only +- `?legalType=medical_only` - Medical-only states (not recreational) +- `?legalType=no_program` - States with no cannabis program + +--- + +## 1. Price Analytics + +### GET /price/product/:id + +Get price trends for a specific store product. + +**Request:** +```bash +GET /api/analytics/v2/price/product/12345?window=30d +``` + +**Response:** +```json +{ + "store_product_id": 12345, + "product_name": "Blue Dream 3.5g", + "brand_name": "Cookies", + "category": "Flower", + "dispensary_id": 101, + "dispensary_name": "Green Leaf Dispensary", + "state_code": "AZ", + "data_points": [ + { + "date": "2024-11-06", + "price_rec": 45.00, + "price_med": 40.00, + "price_rec_special": null, + "price_med_special": null, + "is_on_special": false + }, + { + "date": "2024-11-07", + "price_rec": 42.00, + "price_med": 38.00, + "price_rec_special": null, + "price_med_special": null, + "is_on_special": false + } + ], + "summary": { + "current_price": 42.00, + "min_price": 40.00, + "max_price": 48.00, + "avg_price": 43.50, + "price_change_count": 3, + "volatility_percent": 8.2 + } +} +``` + +### GET /price/rec-vs-med + +Get recreational vs medical-only price comparison by category. + +**Request:** +```bash +GET /api/analytics/v2/price/rec-vs-med?category=Flower +``` + +**Response:** +```json +[ + { + "category": "Flower", + "rec_avg": 38.50, + "rec_median": 35.00, + "med_avg": 42.00, + "med_median": 40.00 + }, + { + "category": "Concentrates", + "rec_avg": 45.00, + "rec_median": 42.00, + "med_avg": 48.00, + "med_median": 45.00 + } +] +``` + +--- + +## 2. Brand Analytics + +### GET /brand/:name/penetration + +Get brand penetration metrics with state breakdown. + +**Request:** +```bash +GET /api/analytics/v2/brand/Cookies/penetration?window=30d +``` + +**Response:** +```json +{ + "brand_name": "Cookies", + "total_dispensaries": 125, + "total_skus": 450, + "avg_skus_per_dispensary": 3.6, + "states_present": ["AZ", "CA", "CO", "NV", "MI"], + "state_breakdown": [ + { + "state_code": "CA", + "state_name": "California", + "legal_type": "recreational", + "dispensary_count": 45, + "sku_count": 180, + "avg_skus_per_dispensary": 4.0, + "market_share_percent": 12.5 + }, + { + "state_code": "AZ", + "state_name": "Arizona", + "legal_type": "recreational", + "dispensary_count": 32, + "sku_count": 128, + "avg_skus_per_dispensary": 4.0, + "market_share_percent": 15.2 + } + ], + "penetration_trend": [ + { + "date": "2024-11-01", + "dispensary_count": 120, + "new_dispensaries": 0, + "dropped_dispensaries": 0 + }, + { + "date": "2024-11-08", + "dispensary_count": 123, + "new_dispensaries": 3, + "dropped_dispensaries": 0 + }, + { + "date": "2024-11-15", + "dispensary_count": 125, + "new_dispensaries": 2, + "dropped_dispensaries": 0 + } + ] +} +``` + +### GET /brand/:name/rec-vs-med + +Get brand presence in recreational vs medical-only states. + +**Request:** +```bash +GET /api/analytics/v2/brand/Cookies/rec-vs-med +``` + +**Response:** +```json +{ + "brand_name": "Cookies", + "rec_states_count": 4, + "rec_states": ["AZ", "CA", "CO", "NV"], + "rec_dispensary_count": 110, + "rec_avg_skus": 3.8, + "med_only_states_count": 2, + "med_only_states": ["FL", "OH"], + "med_only_dispensary_count": 15, + "med_only_avg_skus": 2.5 +} +``` + +--- + +## 3. Category Analytics + +### GET /category/:name/growth + +Get category growth metrics with state breakdown. + +**Request:** +```bash +GET /api/analytics/v2/category/Flower/growth?window=30d +``` + +**Response:** +```json +{ + "category": "Flower", + "current_sku_count": 5200, + "current_dispensary_count": 320, + "avg_price": 38.50, + "growth_data": [ + { + "date": "2024-11-01", + "sku_count": 4800, + "dispensary_count": 310, + "avg_price": 39.00 + }, + { + "date": "2024-11-15", + "sku_count": 5000, + "dispensary_count": 315, + "avg_price": 38.75 + }, + { + "date": "2024-12-01", + "sku_count": 5200, + "dispensary_count": 320, + "avg_price": 38.50 + } + ], + "state_breakdown": [ + { + "state_code": "CA", + "state_name": "California", + "legal_type": "recreational", + "sku_count": 2100, + "dispensary_count": 145, + "avg_price": 36.00 + }, + { + "state_code": "AZ", + "state_name": "Arizona", + "legal_type": "recreational", + "sku_count": 950, + "dispensary_count": 85, + "avg_price": 40.00 + } + ] +} +``` + +### GET /category/rec-vs-med + +Get category comparison between recreational and medical-only states. + +**Request:** +```bash +GET /api/analytics/v2/category/rec-vs-med +``` + +**Response:** +```json +[ + { + "category": "Flower", + "recreational": { + "state_count": 15, + "dispensary_count": 650, + "sku_count": 12500, + "avg_price": 35.50, + "median_price": 32.00 + }, + "medical_only": { + "state_count": 8, + "dispensary_count": 220, + "sku_count": 4200, + "avg_price": 42.00, + "median_price": 40.00 + }, + "price_diff_percent": -15.48 + }, + { + "category": "Concentrates", + "recreational": { + "state_count": 15, + "dispensary_count": 600, + "sku_count": 8500, + "avg_price": 42.00, + "median_price": 40.00 + }, + "medical_only": { + "state_count": 8, + "dispensary_count": 200, + "sku_count": 3100, + "avg_price": 48.00, + "median_price": 45.00 + }, + "price_diff_percent": -12.50 + } +] +``` + +--- + +## 4. Store Analytics + +### GET /store/:id/summary + +Get change summary for a store over a time window. + +**Request:** +```bash +GET /api/analytics/v2/store/101/summary?window=30d +``` + +**Response:** +```json +{ + "dispensary_id": 101, + "dispensary_name": "Green Leaf Dispensary", + "state_code": "AZ", + "window": "30d", + "products_added": 45, + "products_dropped": 12, + "brands_added": ["Alien Labs", "Connected"], + "brands_dropped": ["House Brand"], + "price_changes": 156, + "avg_price_change_percent": 3.2, + "stock_in_events": 89, + "stock_out_events": 34, + "current_product_count": 512, + "current_in_stock_count": 478 +} +``` + +### GET /store/:id/events + +Get recent product change events for a store. + +**Request:** +```bash +GET /api/analytics/v2/store/101/events?window=7d&limit=50 +``` + +**Response:** +```json +[ + { + "store_product_id": 12345, + "product_name": "Blue Dream 3.5g", + "brand_name": "Cookies", + "category": "Flower", + "event_type": "price_change", + "event_date": "2024-12-05T14:30:00.000Z", + "old_value": "45.00", + "new_value": "42.00" + }, + { + "store_product_id": 12346, + "product_name": "OG Kush 1g", + "brand_name": "Alien Labs", + "category": "Flower", + "event_type": "added", + "event_date": "2024-12-04T10:00:00.000Z", + "old_value": null, + "new_value": null + }, + { + "store_product_id": 12300, + "product_name": "Sour Diesel Cart", + "brand_name": "Select", + "category": "Vaporizers", + "event_type": "stock_out", + "event_date": "2024-12-03T16:45:00.000Z", + "old_value": "true", + "new_value": "false" + } +] +``` + +--- + +## 5. State Analytics + +### GET /state/:code/summary + +Get market summary for a specific state with rec/med breakdown. + +**Request:** +```bash +GET /api/analytics/v2/state/AZ/summary +``` + +**Response:** +```json +{ + "state_code": "AZ", + "state_name": "Arizona", + "legal_status": { + "recreational_legal": true, + "rec_year": 2020, + "medical_legal": true, + "med_year": 2010 + }, + "coverage": { + "dispensary_count": 145, + "product_count": 18500, + "brand_count": 320, + "category_count": 12, + "snapshot_count": 2450000, + "last_crawl_at": "2024-12-06T02:30:00.000Z" + }, + "pricing": { + "avg_price": 42.50, + "median_price": 38.00, + "min_price": 5.00, + "max_price": 250.00 + }, + "top_categories": [ + { "category": "Flower", "count": 5200 }, + { "category": "Concentrates", "count": 3800 }, + { "category": "Vaporizers", "count": 2950 }, + { "category": "Edibles", "count": 2400 }, + { "category": "Pre-Rolls", "count": 1850 } + ], + "top_brands": [ + { "brand": "Cookies", "count": 450 }, + { "brand": "Alien Labs", "count": 380 }, + { "brand": "Connected", "count": 320 }, + { "brand": "Stiiizy", "count": 290 }, + { "brand": "Raw Garden", "count": 275 } + ] +} +``` + +### GET /state/legal-breakdown + +Get breakdown by legal status (recreational, medical-only, no program). + +**Request:** +```bash +GET /api/analytics/v2/state/legal-breakdown +``` + +**Response:** +```json +{ + "recreational_states": { + "count": 24, + "dispensary_count": 850, + "product_count": 125000, + "snapshot_count": 15000000, + "states": [ + { "code": "CA", "name": "California", "dispensary_count": 250 }, + { "code": "CO", "name": "Colorado", "dispensary_count": 150 }, + { "code": "AZ", "name": "Arizona", "dispensary_count": 145 }, + { "code": "MI", "name": "Michigan", "dispensary_count": 120 } + ] + }, + "medical_only_states": { + "count": 18, + "dispensary_count": 320, + "product_count": 45000, + "snapshot_count": 5000000, + "states": [ + { "code": "FL", "name": "Florida", "dispensary_count": 120 }, + { "code": "OH", "name": "Ohio", "dispensary_count": 85 }, + { "code": "PA", "name": "Pennsylvania", "dispensary_count": 75 } + ] + }, + "no_program_states": { + "count": 9, + "states": [ + { "code": "ID", "name": "Idaho" }, + { "code": "WY", "name": "Wyoming" }, + { "code": "KS", "name": "Kansas" } + ] + } +} +``` + +### GET /state/recreational + +Get list of recreational state codes. + +**Request:** +```bash +GET /api/analytics/v2/state/recreational +``` + +**Response:** +```json +{ + "legal_type": "recreational", + "states": ["AK", "AZ", "CA", "CO", "CT", "DE", "IL", "MA", "MD", "ME", "MI", "MN", "MO", "MT", "NJ", "NM", "NV", "NY", "OH", "OR", "RI", "VA", "VT", "WA"], + "count": 24 +} +``` + +### GET /state/medical-only + +Get list of medical-only state codes (not recreational). + +**Request:** +```bash +GET /api/analytics/v2/state/medical-only +``` + +**Response:** +```json +{ + "legal_type": "medical_only", + "states": ["AR", "FL", "HI", "LA", "MS", "ND", "NH", "OK", "PA", "SD", "UT", "WV"], + "count": 12 +} +``` + +### GET /state/rec-vs-med-pricing + +Get rec vs med price comparison by category. + +**Request:** +```bash +GET /api/analytics/v2/state/rec-vs-med-pricing?category=Flower +``` + +**Response:** +```json +[ + { + "category": "Flower", + "recreational": { + "state_count": 15, + "product_count": 12500, + "avg_price": 35.50, + "median_price": 32.00 + }, + "medical_only": { + "state_count": 8, + "product_count": 5200, + "avg_price": 42.00, + "median_price": 40.00 + }, + "price_diff_percent": -15.48 + } +] +``` + +--- + +## How These Endpoints Support Portals + +### Brand Portal Use Cases + +1. **Track brand penetration**: Use `/brand/:name/penetration` to see how many stores carry the brand +2. **Compare rec vs med markets**: Use `/brand/:name/rec-vs-med` to understand footprint by legal status +3. **Identify expansion opportunities**: Use `/state/coverage-gaps` to find underserved markets +4. **Monitor pricing**: Use `/price/brand/:brand` to track pricing by state + +### Buyer Portal Use Cases + +1. **Compare stores**: Use `/store/:id/summary` to see activity levels +2. **Track price changes**: Use `/store/:id/events` to monitor competitor pricing +3. **Analyze categories**: Use `/category/:name/growth` to identify trending products +4. **State-level insights**: Use `/state/:code/summary` for market overview + +--- + +## Time Window Filtering + +All time-based endpoints support the `window` query parameter: + +| Value | Description | +|-------|-------------| +| `7d` | Last 7 days | +| `30d` | Last 30 days (default) | +| `90d` | Last 90 days | + +The window affects: +- `store_product_snapshots.captured_at` for historical data +- `store_products.first_seen_at` / `last_seen_at` for product lifecycle +- `crawl_runs.started_at` for crawl-based metrics + +--- + +## Rec/Med Segmentation + +All state-level endpoints automatically segment by: + +- **Recreational**: `states.recreational_legal = TRUE` +- **Medical-only**: `states.medical_legal = TRUE AND states.recreational_legal = FALSE` +- **No program**: Both flags are FALSE or NULL + +This segmentation appears in: +- `legal_type` field in responses +- State breakdown arrays +- Price comparison endpoints diff --git a/backend/migrations/037_dispensary_crawler_profiles.sql b/backend/migrations/037_dispensary_crawler_profiles.sql new file mode 100644 index 00000000..7377343d --- /dev/null +++ b/backend/migrations/037_dispensary_crawler_profiles.sql @@ -0,0 +1,90 @@ +-- Migration 037: Add per-store crawler profiles for Dutchie dispensaries +-- This enables per-store crawler configuration without changing shared logic +-- Phase 1: Schema only - no automatic behavior changes + +-- Create the crawler profiles table +CREATE TABLE IF NOT EXISTS dispensary_crawler_profiles ( + id SERIAL PRIMARY KEY, + dispensary_id INTEGER NOT NULL REFERENCES dispensaries(id) ON DELETE CASCADE, + + -- Human readable name for this profile + profile_name VARCHAR(255) NOT NULL, + + -- High-level type, e.g. 'dutchie', 'treez', 'jane' + crawler_type VARCHAR(50) NOT NULL, + + -- Optional key for mapping to a per-store crawler module later, + -- e.g. 'curaleaf-dispensary-gilbert' + profile_key VARCHAR(255), + + -- Generic configuration bucket; will hold selectors, URLs, flags, etc. + config JSONB NOT NULL DEFAULT '{}'::jsonb, + + -- Execution hints (safe defaults; can be overridden in config if needed) + timeout_ms INTEGER DEFAULT 30000, + download_images BOOLEAN DEFAULT TRUE, + track_stock BOOLEAN DEFAULT TRUE, + + version INTEGER DEFAULT 1, + enabled BOOLEAN DEFAULT TRUE, + + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW() +); + +-- Unique index on dispensary_id + profile_name +CREATE UNIQUE INDEX IF NOT EXISTS dispensary_crawler_profiles_unique_name + ON dispensary_crawler_profiles (dispensary_id, profile_name); + +-- Index for finding enabled profiles by type +CREATE INDEX IF NOT EXISTS idx_crawler_profiles_type_enabled + ON dispensary_crawler_profiles (crawler_type, enabled); + +-- Index for dispensary lookup +CREATE INDEX IF NOT EXISTS idx_crawler_profiles_dispensary + ON dispensary_crawler_profiles (dispensary_id); + +-- Add FK from dispensaries to active profile +DO $$ +BEGIN + IF NOT EXISTS (SELECT 1 FROM information_schema.columns + WHERE table_name = 'dispensaries' + AND column_name = 'active_crawler_profile_id') THEN + ALTER TABLE dispensaries + ADD COLUMN active_crawler_profile_id INTEGER NULL + REFERENCES dispensary_crawler_profiles(id) ON DELETE SET NULL; + END IF; +END $$; + +-- Create index on the FK for faster joins +CREATE INDEX IF NOT EXISTS idx_dispensaries_active_profile + ON dispensaries (active_crawler_profile_id) + WHERE active_crawler_profile_id IS NOT NULL; + +-- Create or replace trigger function for updated_at +CREATE OR REPLACE FUNCTION set_updated_at_timestamp() +RETURNS TRIGGER AS $$ +BEGIN + NEW.updated_at = NOW(); + RETURN NEW; +END; +$$ LANGUAGE plpgsql; + +-- Add trigger to keep updated_at fresh (drop first if exists to avoid duplicates) +DROP TRIGGER IF EXISTS dispensary_crawler_profiles_set_timestamp ON dispensary_crawler_profiles; +CREATE TRIGGER dispensary_crawler_profiles_set_timestamp +BEFORE UPDATE ON dispensary_crawler_profiles +FOR EACH ROW EXECUTE PROCEDURE set_updated_at_timestamp(); + +-- Add comments for documentation +COMMENT ON TABLE dispensary_crawler_profiles IS 'Per-store crawler configuration profiles. Each dispensary can have multiple profiles but only one active at a time.'; +COMMENT ON COLUMN dispensary_crawler_profiles.profile_name IS 'Human readable name for the profile, e.g. "Curaleaf Gilbert - Dutchie v1"'; +COMMENT ON COLUMN dispensary_crawler_profiles.crawler_type IS 'The crawler implementation type: dutchie, treez, jane, sandbox, custom'; +COMMENT ON COLUMN dispensary_crawler_profiles.profile_key IS 'Optional identifier for per-store crawler module mapping'; +COMMENT ON COLUMN dispensary_crawler_profiles.config IS 'JSONB configuration for the crawler. Schema depends on crawler_type.'; +COMMENT ON COLUMN dispensary_crawler_profiles.timeout_ms IS 'Request timeout in milliseconds (default 30000)'; +COMMENT ON COLUMN dispensary_crawler_profiles.download_images IS 'Whether to download product images locally'; +COMMENT ON COLUMN dispensary_crawler_profiles.track_stock IS 'Whether to track inventory/stock levels'; +COMMENT ON COLUMN dispensary_crawler_profiles.version IS 'Profile version number for A/B testing or upgrades'; +COMMENT ON COLUMN dispensary_crawler_profiles.enabled IS 'Whether this profile can be used (soft delete)'; +COMMENT ON COLUMN dispensaries.active_crawler_profile_id IS 'FK to the currently active crawler profile for this dispensary'; diff --git a/backend/migrations/038_profile_status_field.sql b/backend/migrations/038_profile_status_field.sql new file mode 100644 index 00000000..f150aaa4 --- /dev/null +++ b/backend/migrations/038_profile_status_field.sql @@ -0,0 +1,84 @@ +-- Migration: Add status field to dispensary_crawler_profiles +-- This adds a proper status column for crawler state machine +-- Status values: 'production', 'sandbox', 'needs_manual', 'disabled' + +-- Add status column with default 'production' for existing profiles +ALTER TABLE dispensary_crawler_profiles +ADD COLUMN IF NOT EXISTS status VARCHAR(50) DEFAULT 'production'; + +-- Add next_retry_at column for sandbox retry scheduling +ALTER TABLE dispensary_crawler_profiles +ADD COLUMN IF NOT EXISTS next_retry_at TIMESTAMPTZ; + +-- Add sandbox_attempt_count for quick lookup +ALTER TABLE dispensary_crawler_profiles +ADD COLUMN IF NOT EXISTS sandbox_attempt_count INTEGER DEFAULT 0; + +-- Add last_sandbox_at for tracking +ALTER TABLE dispensary_crawler_profiles +ADD COLUMN IF NOT EXISTS last_sandbox_at TIMESTAMPTZ; + +-- Create index for finding profiles by status +CREATE INDEX IF NOT EXISTS idx_crawler_profiles_status +ON dispensary_crawler_profiles(status) WHERE enabled = true; + +-- Create index for finding profiles needing retry +CREATE INDEX IF NOT EXISTS idx_crawler_profiles_next_retry +ON dispensary_crawler_profiles(next_retry_at) WHERE enabled = true AND status = 'sandbox'; + +-- Add comment explaining status values +COMMENT ON COLUMN dispensary_crawler_profiles.status IS +'Crawler status: production (ready for regular crawls), sandbox (discovery mode), needs_manual (max retries exceeded), disabled (turned off)'; + +-- Update existing profiles to have status based on config if present +UPDATE dispensary_crawler_profiles +SET status = COALESCE(config->>'status', 'production') +WHERE status IS NULL OR status = ''; + +-- Backfill sandbox_attempt_count from config +UPDATE dispensary_crawler_profiles +SET sandbox_attempt_count = COALESCE( + jsonb_array_length(config->'sandboxAttempts'), + 0 +) +WHERE config->'sandboxAttempts' IS NOT NULL; + +-- Backfill next_retry_at from config +UPDATE dispensary_crawler_profiles +SET next_retry_at = (config->>'nextRetryAt')::timestamptz +WHERE config->>'nextRetryAt' IS NOT NULL; + +-- Create view for crawler profile summary +CREATE OR REPLACE VIEW v_crawler_profile_summary AS +SELECT + dcp.id, + dcp.dispensary_id, + d.name AS dispensary_name, + d.city, + d.menu_type, + dcp.profile_name, + dcp.profile_key, + dcp.crawler_type, + dcp.status, + dcp.enabled, + dcp.sandbox_attempt_count, + dcp.next_retry_at, + dcp.last_sandbox_at, + dcp.created_at, + dcp.updated_at, + CASE + WHEN dcp.profile_key IS NOT NULL THEN 'per-store' + ELSE 'legacy' + END AS crawler_mode, + CASE + WHEN dcp.status = 'production' THEN 'Ready' + WHEN dcp.status = 'sandbox' AND dcp.next_retry_at <= NOW() THEN 'Retry Due' + WHEN dcp.status = 'sandbox' THEN 'Waiting' + WHEN dcp.status = 'needs_manual' THEN 'Needs Manual' + WHEN dcp.status = 'disabled' THEN 'Disabled' + ELSE 'Unknown' + END AS status_display +FROM dispensary_crawler_profiles dcp +JOIN dispensaries d ON d.id = dcp.dispensary_id +WHERE dcp.enabled = true +ORDER BY dcp.status, dcp.updated_at DESC; diff --git a/backend/migrations/039_crawl_orchestration_traces.sql b/backend/migrations/039_crawl_orchestration_traces.sql new file mode 100644 index 00000000..34040f0a --- /dev/null +++ b/backend/migrations/039_crawl_orchestration_traces.sql @@ -0,0 +1,73 @@ +-- Migration: Create crawl_orchestration_traces table +-- Purpose: Store detailed step-by-step traces for every crawl orchestration run +-- This enables full visibility into per-store crawler behavior + +CREATE TABLE IF NOT EXISTS crawl_orchestration_traces ( + id SERIAL PRIMARY KEY, + dispensary_id INTEGER NOT NULL REFERENCES dispensaries(id) ON DELETE CASCADE, + run_id VARCHAR(255), -- UUID or job ID for this crawl run + profile_id INTEGER REFERENCES dispensary_crawler_profiles(id) ON DELETE SET NULL, + profile_key VARCHAR(255), -- e.g. "trulieve-scottsdale" + crawler_module VARCHAR(255), -- Full path to .ts file loaded + state_at_start VARCHAR(50), -- sandbox, production, legacy, disabled + state_at_end VARCHAR(50), -- sandbox, production, needs_manual, etc. + + -- The trace: ordered array of step objects + trace JSONB NOT NULL DEFAULT '[]'::jsonb, + + -- Summary metrics for quick querying + total_steps INTEGER DEFAULT 0, + duration_ms INTEGER, + success BOOLEAN, + error_message TEXT, + products_found INTEGER, + + -- Timestamps + started_at TIMESTAMPTZ DEFAULT NOW(), + completed_at TIMESTAMPTZ, + created_at TIMESTAMPTZ DEFAULT NOW() +); + +-- Index for quick lookup by dispensary +CREATE INDEX IF NOT EXISTS idx_traces_dispensary_id +ON crawl_orchestration_traces(dispensary_id); + +-- Index for finding latest trace per dispensary +CREATE INDEX IF NOT EXISTS idx_traces_dispensary_created +ON crawl_orchestration_traces(dispensary_id, created_at DESC); + +-- Index for finding traces by run_id +CREATE INDEX IF NOT EXISTS idx_traces_run_id +ON crawl_orchestration_traces(run_id) WHERE run_id IS NOT NULL; + +-- Index for finding traces by profile +CREATE INDEX IF NOT EXISTS idx_traces_profile_id +ON crawl_orchestration_traces(profile_id) WHERE profile_id IS NOT NULL; + +-- Comment explaining trace structure +COMMENT ON COLUMN crawl_orchestration_traces.trace IS +'Ordered array of step objects. Each step has: +{ + "step": 1, + "action": "load_profile", + "description": "Loading crawler profile for dispensary", + "timestamp": 1701234567890, + "duration_ms": 45, + "input": { ... }, + "output": { ... }, + "what": "Description of what happened", + "why": "Reason this step was taken", + "where": "Code location / module", + "how": "Method or approach used", + "when": "ISO timestamp" +}'; + +-- View for easy access to latest traces +CREATE OR REPLACE VIEW v_latest_crawl_traces AS +SELECT DISTINCT ON (dispensary_id) + cot.*, + d.name AS dispensary_name, + d.city AS dispensary_city +FROM crawl_orchestration_traces cot +JOIN dispensaries d ON d.id = cot.dispensary_id +ORDER BY dispensary_id, cot.created_at DESC; diff --git a/backend/migrations/040_dispensary_dba_name.sql b/backend/migrations/040_dispensary_dba_name.sql new file mode 100644 index 00000000..f21b32da --- /dev/null +++ b/backend/migrations/040_dispensary_dba_name.sql @@ -0,0 +1,73 @@ +-- Migration 040: Add dba_name column to dispensaries table +-- DBA (Doing Business As) name - the name the dispensary operates under, +-- which may differ from the legal entity name +-- This migration is idempotent - safe to run multiple times + +-- Add dba_name column +DO $$ +BEGIN + IF NOT EXISTS (SELECT 1 FROM information_schema.columns WHERE table_name = 'dispensaries' AND column_name = 'dba_name') THEN + ALTER TABLE dispensaries ADD COLUMN dba_name TEXT DEFAULT NULL; + END IF; +END $$; + +-- Add company_name column (legal entity name) +DO $$ +BEGIN + IF NOT EXISTS (SELECT 1 FROM information_schema.columns WHERE table_name = 'dispensaries' AND column_name = 'company_name') THEN + ALTER TABLE dispensaries ADD COLUMN company_name TEXT DEFAULT NULL; + END IF; +END $$; + +-- Add azdhs_id for Arizona Department of Health Services license number +DO $$ +BEGIN + IF NOT EXISTS (SELECT 1 FROM information_schema.columns WHERE table_name = 'dispensaries' AND column_name = 'azdhs_id') THEN + ALTER TABLE dispensaries ADD COLUMN azdhs_id INTEGER DEFAULT NULL; + END IF; +END $$; + +-- Add phone column +DO $$ +BEGIN + IF NOT EXISTS (SELECT 1 FROM information_schema.columns WHERE table_name = 'dispensaries' AND column_name = 'phone') THEN + ALTER TABLE dispensaries ADD COLUMN phone TEXT DEFAULT NULL; + END IF; +END $$; + +-- Add email column +DO $$ +BEGIN + IF NOT EXISTS (SELECT 1 FROM information_schema.columns WHERE table_name = 'dispensaries' AND column_name = 'email') THEN + ALTER TABLE dispensaries ADD COLUMN email TEXT DEFAULT NULL; + END IF; +END $$; + +-- Add google_rating column +DO $$ +BEGIN + IF NOT EXISTS (SELECT 1 FROM information_schema.columns WHERE table_name = 'dispensaries' AND column_name = 'google_rating') THEN + ALTER TABLE dispensaries ADD COLUMN google_rating NUMERIC(2,1) DEFAULT NULL; + END IF; +END $$; + +-- Add google_review_count column +DO $$ +BEGIN + IF NOT EXISTS (SELECT 1 FROM information_schema.columns WHERE table_name = 'dispensaries' AND column_name = 'google_review_count') THEN + ALTER TABLE dispensaries ADD COLUMN google_review_count INTEGER DEFAULT NULL; + END IF; +END $$; + +-- Add comments for documentation +COMMENT ON COLUMN dispensaries.dba_name IS 'DBA (Doing Business As) name - the public-facing name the dispensary operates under'; +COMMENT ON COLUMN dispensaries.company_name IS 'Legal entity/company name that owns the dispensary'; +COMMENT ON COLUMN dispensaries.azdhs_id IS 'Arizona Department of Health Services license number'; +COMMENT ON COLUMN dispensaries.phone IS 'Contact phone number'; +COMMENT ON COLUMN dispensaries.email IS 'Contact email address'; +COMMENT ON COLUMN dispensaries.google_rating IS 'Google Maps rating (1.0 to 5.0)'; +COMMENT ON COLUMN dispensaries.google_review_count IS 'Number of Google reviews'; + +-- Create index for searching by dba_name +CREATE INDEX IF NOT EXISTS idx_dispensaries_dba_name ON dispensaries (dba_name); +CREATE INDEX IF NOT EXISTS idx_dispensaries_azdhs_id ON dispensaries (azdhs_id); diff --git a/backend/migrations/041_cannaiq_canonical_schema.sql b/backend/migrations/041_cannaiq_canonical_schema.sql new file mode 100644 index 00000000..6ac86be3 --- /dev/null +++ b/backend/migrations/041_cannaiq_canonical_schema.sql @@ -0,0 +1,376 @@ +-- Migration 041: CannaiQ Canonical Schema +-- +-- This migration adds the canonical CannaiQ schema tables and columns. +-- ALL CHANGES ARE ADDITIVE - NO DROPS, NO DELETES, NO TRUNCATES. +-- +-- Run with: psql $CANNAIQ_DB_URL -f migrations/041_cannaiq_canonical_schema.sql +-- +-- Tables created: +-- - states (new) +-- - chains (new) +-- - brands (new) +-- - store_products (new - normalized view of current menu) +-- - store_product_snapshots (new - historical crawl data) +-- - crawl_runs (new - replaces/supplements dispensary_crawl_jobs) +-- +-- Tables modified: +-- - dispensaries (add state_id, chain_id FKs) +-- - dispensary_crawler_profiles (add status, allow_autopromote, validated_at) +-- - crawl_orchestration_traces (add run_id FK) +-- + +-- ===================================================== +-- 1) STATES TABLE +-- ===================================================== +CREATE TABLE IF NOT EXISTS states ( + id SERIAL PRIMARY KEY, + code VARCHAR(2) NOT NULL UNIQUE, + name VARCHAR(100) NOT NULL, + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW() +); + +-- Insert known states +INSERT INTO states (code, name) VALUES + ('AZ', 'Arizona'), + ('CA', 'California'), + ('CO', 'Colorado'), + ('FL', 'Florida'), + ('IL', 'Illinois'), + ('MA', 'Massachusetts'), + ('MD', 'Maryland'), + ('MI', 'Michigan'), + ('MO', 'Missouri'), + ('NV', 'Nevada'), + ('NJ', 'New Jersey'), + ('NY', 'New York'), + ('OH', 'Ohio'), + ('OK', 'Oklahoma'), + ('OR', 'Oregon'), + ('PA', 'Pennsylvania'), + ('WA', 'Washington') +ON CONFLICT (code) DO NOTHING; + +COMMENT ON TABLE states IS 'US states where CannaiQ operates. Single source of truth for state codes.'; + +-- ===================================================== +-- 2) CHAINS TABLE (retail groups) +-- ===================================================== +CREATE TABLE IF NOT EXISTS chains ( + id SERIAL PRIMARY KEY, + name VARCHAR(255) NOT NULL, + slug VARCHAR(255) NOT NULL UNIQUE, + website_url TEXT, + logo_url TEXT, + description TEXT, + is_active BOOLEAN DEFAULT TRUE, + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_chains_slug ON chains(slug); +CREATE INDEX IF NOT EXISTS idx_chains_active ON chains(is_active) WHERE is_active = TRUE; + +COMMENT ON TABLE chains IS 'Retail chains/groups that own multiple dispensary locations (e.g., Curaleaf, Trulieve).'; + +-- ===================================================== +-- 3) BRANDS TABLE (canonical brand catalog) +-- ===================================================== +CREATE TABLE IF NOT EXISTS brands ( + id SERIAL PRIMARY KEY, + name VARCHAR(255) NOT NULL, + slug VARCHAR(255) NOT NULL UNIQUE, + external_id VARCHAR(100), -- Provider-specific brand ID + website_url TEXT, + instagram_handle VARCHAR(100), + logo_url TEXT, + description TEXT, + is_portfolio_brand BOOLEAN DEFAULT FALSE, -- TRUE if brand we represent + is_active BOOLEAN DEFAULT TRUE, + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_brands_slug ON brands(slug); +CREATE INDEX IF NOT EXISTS idx_brands_external_id ON brands(external_id) WHERE external_id IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_brands_portfolio ON brands(is_portfolio_brand) WHERE is_portfolio_brand = TRUE; + +COMMENT ON TABLE brands IS 'Canonical brand catalog. Brands may appear across multiple dispensaries.'; +COMMENT ON COLUMN brands.is_portfolio_brand IS 'TRUE if this is a brand we represent/manage (vs third-party brand)'; + +-- ===================================================== +-- 4) ADD state_id AND chain_id TO dispensaries +-- ===================================================== +ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS state_id INTEGER REFERENCES states(id); +ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS chain_id INTEGER REFERENCES chains(id); + +-- NOTE: state_id backfill is done by ETL script (042_legacy_import.ts), not this migration. + +CREATE INDEX IF NOT EXISTS idx_dispensaries_state_id ON dispensaries(state_id); +CREATE INDEX IF NOT EXISTS idx_dispensaries_chain_id ON dispensaries(chain_id) WHERE chain_id IS NOT NULL; + +COMMENT ON COLUMN dispensaries.state_id IS 'FK to states table. Canonical state reference.'; +COMMENT ON COLUMN dispensaries.chain_id IS 'FK to chains table. NULL if independent dispensary.'; + +-- ===================================================== +-- 5) STORE_PRODUCTS TABLE (current menu state) +-- ===================================================== +-- This is the normalized "what is currently on the menu" table. +-- It supplements dutchie_products with a provider-agnostic structure. + +CREATE TABLE IF NOT EXISTS store_products ( + id SERIAL PRIMARY KEY, + dispensary_id INTEGER NOT NULL REFERENCES dispensaries(id) ON DELETE CASCADE, + product_id INTEGER REFERENCES products(id) ON DELETE SET NULL, -- Link to canonical product + brand_id INTEGER REFERENCES brands(id) ON DELETE SET NULL, -- Link to canonical brand + + -- Provider-specific identifiers + provider VARCHAR(50) NOT NULL DEFAULT 'dutchie', -- dutchie, treez, jane, etc. + provider_product_id VARCHAR(100), -- Platform-specific product ID + provider_brand_id VARCHAR(100), -- Platform-specific brand ID + + -- Raw data from platform (not normalized) + name_raw VARCHAR(500) NOT NULL, + brand_name_raw VARCHAR(255), + category_raw VARCHAR(100), + subcategory_raw VARCHAR(100), + + -- Pricing + price_rec NUMERIC(10,2), + price_med NUMERIC(10,2), + price_rec_special NUMERIC(10,2), + price_med_special NUMERIC(10,2), + is_on_special BOOLEAN DEFAULT FALSE, + special_name TEXT, + discount_percent NUMERIC(5,2), + + -- Inventory + is_in_stock BOOLEAN DEFAULT TRUE, + stock_quantity INTEGER, + stock_status VARCHAR(50) DEFAULT 'in_stock', + + -- Potency + thc_percent NUMERIC(5,2), + cbd_percent NUMERIC(5,2), + + -- Images + image_url TEXT, + local_image_path TEXT, + + -- Timestamps + first_seen_at TIMESTAMPTZ DEFAULT NOW(), + last_seen_at TIMESTAMPTZ DEFAULT NOW(), + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW(), + + UNIQUE(dispensary_id, provider, provider_product_id) +); + +CREATE INDEX IF NOT EXISTS idx_store_products_dispensary ON store_products(dispensary_id); +CREATE INDEX IF NOT EXISTS idx_store_products_product ON store_products(product_id) WHERE product_id IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_store_products_brand ON store_products(brand_id) WHERE brand_id IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_store_products_provider ON store_products(provider); +CREATE INDEX IF NOT EXISTS idx_store_products_in_stock ON store_products(dispensary_id, is_in_stock); +CREATE INDEX IF NOT EXISTS idx_store_products_special ON store_products(dispensary_id, is_on_special) WHERE is_on_special = TRUE; +CREATE INDEX IF NOT EXISTS idx_store_products_last_seen ON store_products(last_seen_at DESC); + +COMMENT ON TABLE store_products IS 'Current state of products on each dispensary menu. Provider-agnostic.'; +COMMENT ON COLUMN store_products.product_id IS 'FK to canonical products table. NULL if not yet mapped.'; +COMMENT ON COLUMN store_products.brand_id IS 'FK to canonical brands table. NULL if not yet mapped.'; + +-- ===================================================== +-- 6) STORE_PRODUCT_SNAPSHOTS TABLE (historical data) +-- ===================================================== +-- This is the critical time-series table for analytics. +-- One row per product per crawl. + +CREATE TABLE IF NOT EXISTS store_product_snapshots ( + id SERIAL PRIMARY KEY, + dispensary_id INTEGER NOT NULL REFERENCES dispensaries(id) ON DELETE CASCADE, + store_product_id INTEGER REFERENCES store_products(id) ON DELETE SET NULL, + product_id INTEGER REFERENCES products(id) ON DELETE SET NULL, + + -- Provider info + provider VARCHAR(50) NOT NULL DEFAULT 'dutchie', + provider_product_id VARCHAR(100), + + -- Link to crawl run + crawl_run_id INTEGER, -- FK added after crawl_runs table created + + -- Capture timestamp + captured_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + + -- Raw data from platform + name_raw VARCHAR(500), + brand_name_raw VARCHAR(255), + category_raw VARCHAR(100), + subcategory_raw VARCHAR(100), + + -- Pricing at time of capture + price_rec NUMERIC(10,2), + price_med NUMERIC(10,2), + price_rec_special NUMERIC(10,2), + price_med_special NUMERIC(10,2), + is_on_special BOOLEAN DEFAULT FALSE, + discount_percent NUMERIC(5,2), + + -- Inventory at time of capture + is_in_stock BOOLEAN DEFAULT TRUE, + stock_quantity INTEGER, + stock_status VARCHAR(50) DEFAULT 'in_stock', + + -- Potency at time of capture + thc_percent NUMERIC(5,2), + cbd_percent NUMERIC(5,2), + + -- Image URL at time of capture + image_url TEXT, + + -- Full raw response for debugging + raw_data JSONB, + + created_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_snapshots_dispensary_captured ON store_product_snapshots(dispensary_id, captured_at DESC); +CREATE INDEX IF NOT EXISTS idx_snapshots_product_captured ON store_product_snapshots(product_id, captured_at DESC) WHERE product_id IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_snapshots_store_product ON store_product_snapshots(store_product_id) WHERE store_product_id IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_snapshots_crawl_run ON store_product_snapshots(crawl_run_id) WHERE crawl_run_id IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_snapshots_captured_at ON store_product_snapshots(captured_at DESC); + +COMMENT ON TABLE store_product_snapshots IS 'Historical crawl data. One row per product per crawl. NEVER DELETE.'; +COMMENT ON COLUMN store_product_snapshots.captured_at IS 'When this snapshot was captured (crawl time).'; + +-- ===================================================== +-- 7) CRAWL_RUNS TABLE (job execution records) +-- ===================================================== +CREATE TABLE IF NOT EXISTS crawl_runs ( + id SERIAL PRIMARY KEY, + dispensary_id INTEGER NOT NULL REFERENCES dispensaries(id) ON DELETE CASCADE, + + -- Provider + provider VARCHAR(50) NOT NULL DEFAULT 'dutchie', + + -- Execution times + started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + finished_at TIMESTAMPTZ, + duration_ms INTEGER, + + -- Status + status VARCHAR(20) NOT NULL DEFAULT 'running', -- running, success, failed, partial + error_message TEXT, + + -- Results + products_found INTEGER DEFAULT 0, + products_new INTEGER DEFAULT 0, + products_updated INTEGER DEFAULT 0, + snapshots_written INTEGER DEFAULT 0, + + -- Metadata + worker_id VARCHAR(100), + trigger_type VARCHAR(50) DEFAULT 'scheduled', -- scheduled, manual, api + metadata JSONB DEFAULT '{}', + + created_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_crawl_runs_dispensary ON crawl_runs(dispensary_id); +CREATE INDEX IF NOT EXISTS idx_crawl_runs_status ON crawl_runs(status); +CREATE INDEX IF NOT EXISTS idx_crawl_runs_started ON crawl_runs(started_at DESC); +CREATE INDEX IF NOT EXISTS idx_crawl_runs_dispensary_started ON crawl_runs(dispensary_id, started_at DESC); + +COMMENT ON TABLE crawl_runs IS 'Each crawl execution. Links to snapshots and traces.'; + +-- Add FK from store_product_snapshots to crawl_runs +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM information_schema.table_constraints + WHERE constraint_name = 'store_product_snapshots_crawl_run_id_fkey' + ) THEN + ALTER TABLE store_product_snapshots + ADD CONSTRAINT store_product_snapshots_crawl_run_id_fkey + FOREIGN KEY (crawl_run_id) REFERENCES crawl_runs(id) ON DELETE SET NULL; + END IF; +END $$; + +-- ===================================================== +-- 8) UPDATE crawl_orchestration_traces +-- ===================================================== +-- Add run_id FK if not exists +ALTER TABLE crawl_orchestration_traces + ADD COLUMN IF NOT EXISTS crawl_run_id INTEGER REFERENCES crawl_runs(id) ON DELETE SET NULL; + +CREATE INDEX IF NOT EXISTS idx_traces_crawl_run + ON crawl_orchestration_traces(crawl_run_id) + WHERE crawl_run_id IS NOT NULL; + +-- ===================================================== +-- 9) UPDATE dispensary_crawler_profiles +-- ===================================================== +-- Add missing columns from canonical schema +ALTER TABLE dispensary_crawler_profiles + ADD COLUMN IF NOT EXISTS status VARCHAR(50) DEFAULT 'sandbox'; + +ALTER TABLE dispensary_crawler_profiles + ADD COLUMN IF NOT EXISTS allow_autopromote BOOLEAN DEFAULT FALSE; + +ALTER TABLE dispensary_crawler_profiles + ADD COLUMN IF NOT EXISTS validated_at TIMESTAMPTZ; + +CREATE INDEX IF NOT EXISTS idx_profiles_status + ON dispensary_crawler_profiles(status); + +COMMENT ON COLUMN dispensary_crawler_profiles.status IS 'Profile status: sandbox, production, needs_manual, disabled'; +COMMENT ON COLUMN dispensary_crawler_profiles.allow_autopromote IS 'Whether this profile can be auto-promoted from sandbox to production'; +COMMENT ON COLUMN dispensary_crawler_profiles.validated_at IS 'When this profile was last validated as working'; + +-- ===================================================== +-- 10) VIEWS FOR BACKWARD COMPATIBILITY +-- ===================================================== + +-- View to get latest snapshot per store product +CREATE OR REPLACE VIEW v_latest_store_snapshots AS +SELECT DISTINCT ON (dispensary_id, provider_product_id) + sps.* +FROM store_product_snapshots sps +ORDER BY dispensary_id, provider_product_id, captured_at DESC; + +-- View to get crawl run summary per dispensary +CREATE OR REPLACE VIEW v_dispensary_crawl_summary AS +SELECT + d.id AS dispensary_id, + d.name AS dispensary_name, + d.city, + d.state, + COUNT(DISTINCT sp.id) AS current_product_count, + COUNT(DISTINCT sp.id) FILTER (WHERE sp.is_in_stock) AS in_stock_count, + COUNT(DISTINCT sp.id) FILTER (WHERE sp.is_on_special) AS on_special_count, + MAX(cr.finished_at) AS last_crawl_at, + (SELECT status FROM crawl_runs WHERE dispensary_id = d.id ORDER BY started_at DESC LIMIT 1) AS last_crawl_status +FROM dispensaries d +LEFT JOIN store_products sp ON sp.dispensary_id = d.id +LEFT JOIN crawl_runs cr ON cr.dispensary_id = d.id +GROUP BY d.id, d.name, d.city, d.state; + +-- ===================================================== +-- 11) COMMENTS +-- ===================================================== +COMMENT ON TABLE states IS 'Canonical list of US states. Use state_id FK in dispensaries.'; +COMMENT ON TABLE chains IS 'Retail chains (multi-location operators).'; +COMMENT ON TABLE brands IS 'Canonical brand catalog across all providers.'; +COMMENT ON TABLE store_products IS 'Current menu state per dispensary. Provider-agnostic.'; +COMMENT ON TABLE store_product_snapshots IS 'Historical price/stock data. One row per product per crawl.'; +COMMENT ON TABLE crawl_runs IS 'Crawl execution records. Links snapshots to runs.'; + +-- ===================================================== +-- MIGRATION COMPLETE +-- ===================================================== +-- +-- Next steps (manual - not in this migration): +-- 1. Populate chains table from known retail groups +-- 2. Populate brands table from existing dutchie_products.brand_name +-- 3. Migrate data from dutchie_products → store_products +-- 4. Migrate data from dutchie_product_snapshots → store_product_snapshots +-- 5. Link dispensaries.chain_id to chains where applicable +-- diff --git a/backend/migrations/043_add_states_table.sql b/backend/migrations/043_add_states_table.sql new file mode 100644 index 00000000..9821ec3f --- /dev/null +++ b/backend/migrations/043_add_states_table.sql @@ -0,0 +1,50 @@ +-- Migration 043: Add States Table +-- +-- Creates the states table if it does not exist. +-- Safe to run multiple times (idempotent). +-- +-- Run with: +-- CANNAIQ_DB_URL="postgresql://..." psql $CANNAIQ_DB_URL -f migrations/043_add_states_table.sql + +-- ===================================================== +-- 1) CREATE STATES TABLE +-- ===================================================== +CREATE TABLE IF NOT EXISTS states ( + id SERIAL PRIMARY KEY, + code TEXT NOT NULL UNIQUE, + name TEXT NOT NULL, + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW() +); + +-- ===================================================== +-- 2) INSERT CORE US STATES +-- ===================================================== +INSERT INTO states (code, name) VALUES + ('AZ', 'Arizona'), + ('CA', 'California'), + ('CO', 'Colorado'), + ('FL', 'Florida'), + ('IL', 'Illinois'), + ('MA', 'Massachusetts'), + ('MD', 'Maryland'), + ('MI', 'Michigan'), + ('MO', 'Missouri'), + ('NV', 'Nevada'), + ('NJ', 'New Jersey'), + ('NY', 'New York'), + ('OH', 'Ohio'), + ('OK', 'Oklahoma'), + ('OR', 'Oregon'), + ('PA', 'Pennsylvania'), + ('WA', 'Washington') +ON CONFLICT (code) DO NOTHING; + +-- ===================================================== +-- 3) ADD INDEX +-- ===================================================== +CREATE INDEX IF NOT EXISTS idx_states_code ON states(code); + +-- ===================================================== +-- DONE +-- ===================================================== diff --git a/backend/migrations/044_add_provider_detection_data.sql b/backend/migrations/044_add_provider_detection_data.sql new file mode 100644 index 00000000..f3351eed --- /dev/null +++ b/backend/migrations/044_add_provider_detection_data.sql @@ -0,0 +1,45 @@ +-- Migration 044: Add provider_detection_data column to dispensaries +-- +-- This column stores detection metadata for menu provider discovery. +-- Used by menu-detection.ts and discovery.ts to track: +-- - Detected provider type +-- - Resolution attempts +-- - Error messages +-- - not_crawlable flag +-- +-- Run with: psql $CANNAIQ_DB_URL -f migrations/044_add_provider_detection_data.sql +-- +-- ALL CHANGES ARE ADDITIVE - NO DROPS, NO DELETES, NO TRUNCATES. + +-- Add provider_detection_data to dispensaries table +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM information_schema.columns + WHERE table_name = 'dispensaries' AND column_name = 'provider_detection_data' + ) THEN + ALTER TABLE dispensaries + ADD COLUMN provider_detection_data JSONB DEFAULT NULL; + + RAISE NOTICE 'Added provider_detection_data column to dispensaries table'; + ELSE + RAISE NOTICE 'provider_detection_data column already exists on dispensaries table'; + END IF; +END; +$$ LANGUAGE plpgsql; + +-- Add index for querying by not_crawlable flag +CREATE INDEX IF NOT EXISTS idx_dispensaries_provider_detection_not_crawlable + ON dispensaries ((provider_detection_data->>'not_crawlable')) + WHERE provider_detection_data IS NOT NULL; + +-- Add index for querying by detected provider +CREATE INDEX IF NOT EXISTS idx_dispensaries_provider_detection_provider + ON dispensaries ((provider_detection_data->>'detected_provider')) + WHERE provider_detection_data IS NOT NULL; + +COMMENT ON COLUMN dispensaries.provider_detection_data IS 'JSONB metadata from menu provider detection. Keys: detected_provider, resolution_error, not_crawlable, detection_timestamp'; + +-- ===================================================== +-- MIGRATION COMPLETE +-- ===================================================== diff --git a/backend/migrations/045_add_image_columns.sql b/backend/migrations/045_add_image_columns.sql new file mode 100644 index 00000000..933f85c6 --- /dev/null +++ b/backend/migrations/045_add_image_columns.sql @@ -0,0 +1,27 @@ +-- Migration 045: Add thumbnail_url columns to canonical tables +-- +-- NOTE: image_url already exists in both tables from migration 041. +-- This migration adds thumbnail_url for cached thumbnail images. + +DO $$ +BEGIN + -- Add thumbnail_url to store_products if not exists + IF NOT EXISTS ( + SELECT 1 FROM information_schema.columns + WHERE table_name = 'store_products' AND column_name = 'thumbnail_url' + ) THEN + ALTER TABLE store_products ADD COLUMN thumbnail_url TEXT NULL; + END IF; + + -- Add thumbnail_url to store_product_snapshots if not exists + IF NOT EXISTS ( + SELECT 1 FROM information_schema.columns + WHERE table_name = 'store_product_snapshots' AND column_name = 'thumbnail_url' + ) THEN + ALTER TABLE store_product_snapshots ADD COLUMN thumbnail_url TEXT NULL; + END IF; +END; +$$ LANGUAGE plpgsql; + +COMMENT ON COLUMN store_products.thumbnail_url IS 'URL to cached thumbnail image'; +COMMENT ON COLUMN store_product_snapshots.thumbnail_url IS 'URL to cached thumbnail image at time of snapshot'; diff --git a/backend/migrations/046_crawler_reliability.sql b/backend/migrations/046_crawler_reliability.sql new file mode 100644 index 00000000..4e05cc59 --- /dev/null +++ b/backend/migrations/046_crawler_reliability.sql @@ -0,0 +1,351 @@ +-- Migration 046: Crawler Reliability & Stabilization +-- Phase 1: Add fields for error taxonomy, retry management, and self-healing + +-- ============================================================ +-- PART 1: Error Taxonomy - Standardized error codes +-- ============================================================ + +-- Create enum for standardized error codes +DO $$ +BEGIN + IF NOT EXISTS (SELECT 1 FROM pg_type WHERE typname = 'crawl_error_code') THEN + CREATE TYPE crawl_error_code AS ENUM ( + 'SUCCESS', + 'RATE_LIMITED', + 'BLOCKED_PROXY', + 'HTML_CHANGED', + 'TIMEOUT', + 'AUTH_FAILED', + 'NETWORK_ERROR', + 'PARSE_ERROR', + 'NO_PRODUCTS', + 'UNKNOWN_ERROR' + ); + END IF; +END; +$$ LANGUAGE plpgsql; + +-- ============================================================ +-- PART 2: Dispensary Crawl Configuration +-- ============================================================ + +-- Add crawl config columns to dispensaries +DO $$ +BEGIN + -- Crawl frequency (minutes between crawls) + IF NOT EXISTS ( + SELECT 1 FROM information_schema.columns + WHERE table_name = 'dispensaries' AND column_name = 'crawl_frequency_minutes' + ) THEN + ALTER TABLE dispensaries ADD COLUMN crawl_frequency_minutes INTEGER DEFAULT 240; + END IF; + + -- Max retries per crawl + IF NOT EXISTS ( + SELECT 1 FROM information_schema.columns + WHERE table_name = 'dispensaries' AND column_name = 'max_retries' + ) THEN + ALTER TABLE dispensaries ADD COLUMN max_retries INTEGER DEFAULT 3; + END IF; + + -- Current proxy ID + IF NOT EXISTS ( + SELECT 1 FROM information_schema.columns + WHERE table_name = 'dispensaries' AND column_name = 'current_proxy_id' + ) THEN + ALTER TABLE dispensaries ADD COLUMN current_proxy_id INTEGER NULL; + END IF; + + -- Current user agent + IF NOT EXISTS ( + SELECT 1 FROM information_schema.columns + WHERE table_name = 'dispensaries' AND column_name = 'current_user_agent' + ) THEN + ALTER TABLE dispensaries ADD COLUMN current_user_agent TEXT NULL; + END IF; + + -- Next scheduled run + IF NOT EXISTS ( + SELECT 1 FROM information_schema.columns + WHERE table_name = 'dispensaries' AND column_name = 'next_crawl_at' + ) THEN + ALTER TABLE dispensaries ADD COLUMN next_crawl_at TIMESTAMPTZ NULL; + END IF; + + -- Last successful crawl + IF NOT EXISTS ( + SELECT 1 FROM information_schema.columns + WHERE table_name = 'dispensaries' AND column_name = 'last_success_at' + ) THEN + ALTER TABLE dispensaries ADD COLUMN last_success_at TIMESTAMPTZ NULL; + END IF; + + -- Last error code (using text for flexibility, validated in app) + IF NOT EXISTS ( + SELECT 1 FROM information_schema.columns + WHERE table_name = 'dispensaries' AND column_name = 'last_error_code' + ) THEN + ALTER TABLE dispensaries ADD COLUMN last_error_code TEXT NULL; + END IF; + + -- Crawl status: active, degraded, paused, failed + IF NOT EXISTS ( + SELECT 1 FROM information_schema.columns + WHERE table_name = 'dispensaries' AND column_name = 'crawl_status' + ) THEN + ALTER TABLE dispensaries ADD COLUMN crawl_status TEXT DEFAULT 'active'; + END IF; + + -- Backoff multiplier (increases with failures) + IF NOT EXISTS ( + SELECT 1 FROM information_schema.columns + WHERE table_name = 'dispensaries' AND column_name = 'backoff_multiplier' + ) THEN + ALTER TABLE dispensaries ADD COLUMN backoff_multiplier NUMERIC(4,2) DEFAULT 1.0; + END IF; + + -- Total attempt count (lifetime) + IF NOT EXISTS ( + SELECT 1 FROM information_schema.columns + WHERE table_name = 'dispensaries' AND column_name = 'total_attempts' + ) THEN + ALTER TABLE dispensaries ADD COLUMN total_attempts INTEGER DEFAULT 0; + END IF; + + -- Total success count (lifetime) + IF NOT EXISTS ( + SELECT 1 FROM information_schema.columns + WHERE table_name = 'dispensaries' AND column_name = 'total_successes' + ) THEN + ALTER TABLE dispensaries ADD COLUMN total_successes INTEGER DEFAULT 0; + END IF; +END; +$$ LANGUAGE plpgsql; + +-- ============================================================ +-- PART 3: Enhanced Job Tracking +-- ============================================================ + +-- Add columns to dispensary_crawl_jobs +DO $$ +BEGIN + -- Error code + IF NOT EXISTS ( + SELECT 1 FROM information_schema.columns + WHERE table_name = 'dispensary_crawl_jobs' AND column_name = 'error_code' + ) THEN + ALTER TABLE dispensary_crawl_jobs ADD COLUMN error_code TEXT NULL; + END IF; + + -- Proxy used for this job + IF NOT EXISTS ( + SELECT 1 FROM information_schema.columns + WHERE table_name = 'dispensary_crawl_jobs' AND column_name = 'proxy_used' + ) THEN + ALTER TABLE dispensary_crawl_jobs ADD COLUMN proxy_used TEXT NULL; + END IF; + + -- User agent used for this job + IF NOT EXISTS ( + SELECT 1 FROM information_schema.columns + WHERE table_name = 'dispensary_crawl_jobs' AND column_name = 'user_agent_used' + ) THEN + ALTER TABLE dispensary_crawl_jobs ADD COLUMN user_agent_used TEXT NULL; + END IF; + + -- Attempt number for this job + IF NOT EXISTS ( + SELECT 1 FROM information_schema.columns + WHERE table_name = 'dispensary_crawl_jobs' AND column_name = 'attempt_number' + ) THEN + ALTER TABLE dispensary_crawl_jobs ADD COLUMN attempt_number INTEGER DEFAULT 1; + END IF; + + -- Backoff delay applied (ms) + IF NOT EXISTS ( + SELECT 1 FROM information_schema.columns + WHERE table_name = 'dispensary_crawl_jobs' AND column_name = 'backoff_delay_ms' + ) THEN + ALTER TABLE dispensary_crawl_jobs ADD COLUMN backoff_delay_ms INTEGER DEFAULT 0; + END IF; + + -- HTTP status code received + IF NOT EXISTS ( + SELECT 1 FROM information_schema.columns + WHERE table_name = 'dispensary_crawl_jobs' AND column_name = 'http_status' + ) THEN + ALTER TABLE dispensary_crawl_jobs ADD COLUMN http_status INTEGER NULL; + END IF; + + -- Response time (ms) + IF NOT EXISTS ( + SELECT 1 FROM information_schema.columns + WHERE table_name = 'dispensary_crawl_jobs' AND column_name = 'response_time_ms' + ) THEN + ALTER TABLE dispensary_crawl_jobs ADD COLUMN response_time_ms INTEGER NULL; + END IF; +END; +$$ LANGUAGE plpgsql; + +-- ============================================================ +-- PART 4: Crawl History Table (for detailed tracking) +-- ============================================================ + +CREATE TABLE IF NOT EXISTS crawl_attempts ( + id SERIAL PRIMARY KEY, + dispensary_id INTEGER NOT NULL REFERENCES dispensaries(id), + job_id INTEGER REFERENCES dispensary_crawl_jobs(id), + + -- Timing + started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + finished_at TIMESTAMPTZ, + duration_ms INTEGER, + + -- Result + error_code TEXT NOT NULL DEFAULT 'UNKNOWN_ERROR', + error_message TEXT, + http_status INTEGER, + + -- Context + attempt_number INTEGER NOT NULL DEFAULT 1, + proxy_used TEXT, + user_agent_used TEXT, + + -- Metrics + products_found INTEGER DEFAULT 0, + products_upserted INTEGER DEFAULT 0, + snapshots_created INTEGER DEFAULT 0, + + -- Metadata + metadata JSONB, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +-- Index for quick lookups +CREATE INDEX IF NOT EXISTS idx_crawl_attempts_dispensary_id ON crawl_attempts(dispensary_id); +CREATE INDEX IF NOT EXISTS idx_crawl_attempts_error_code ON crawl_attempts(error_code); +CREATE INDEX IF NOT EXISTS idx_crawl_attempts_started_at ON crawl_attempts(started_at DESC); + +-- ============================================================ +-- PART 5: Views for Monitoring +-- ============================================================ + +-- Drop existing view if exists +DROP VIEW IF EXISTS v_crawler_status; + +-- Crawler status view with all reliability fields +CREATE VIEW v_crawler_status AS +SELECT + d.id, + d.name, + d.slug, + d.menu_type, + d.platform_dispensary_id, + d.crawl_status, + d.consecutive_failures, + d.last_crawl_at, + d.last_success_at, + d.last_failure_at, + d.last_error_code, + d.next_crawl_at, + d.crawl_frequency_minutes, + d.max_retries, + d.current_proxy_id, + d.current_user_agent, + d.backoff_multiplier, + d.total_attempts, + d.total_successes, + d.product_count, + CASE + WHEN d.total_attempts > 0 + THEN ROUND(d.total_successes::NUMERIC / d.total_attempts * 100, 1) + ELSE 0 + END AS success_rate, + CASE + WHEN d.crawl_status = 'failed' THEN 'FAILED' + WHEN d.crawl_status = 'paused' THEN 'PAUSED' + WHEN d.crawl_status = 'degraded' THEN 'DEGRADED' + WHEN d.menu_type IS NULL OR d.menu_type = 'unknown' THEN 'NEEDS_DETECTION' + WHEN d.platform_dispensary_id IS NULL THEN 'NEEDS_PLATFORM_ID' + WHEN d.next_crawl_at IS NULL THEN 'NOT_SCHEDULED' + WHEN d.next_crawl_at <= NOW() THEN 'DUE' + ELSE 'SCHEDULED' + END AS schedule_status, + d.failed_at, + d.failure_notes +FROM dispensaries d +WHERE d.state = 'AZ'; + +-- Drop existing view if exists +DROP VIEW IF EXISTS v_crawl_error_summary; + +-- Error summary view +CREATE VIEW v_crawl_error_summary AS +SELECT + error_code, + COUNT(*) as total_occurrences, + COUNT(DISTINCT dispensary_id) as affected_stores, + MAX(started_at) as last_occurrence, + AVG(duration_ms)::INTEGER as avg_duration_ms +FROM crawl_attempts +WHERE started_at > NOW() - INTERVAL '7 days' +GROUP BY error_code +ORDER BY total_occurrences DESC; + +-- Drop existing view if exists +DROP VIEW IF EXISTS v_crawl_health; + +-- Overall crawl health view +CREATE VIEW v_crawl_health AS +SELECT + COUNT(*) FILTER (WHERE crawl_status = 'active') as active_crawlers, + COUNT(*) FILTER (WHERE crawl_status = 'degraded') as degraded_crawlers, + COUNT(*) FILTER (WHERE crawl_status = 'paused') as paused_crawlers, + COUNT(*) FILTER (WHERE crawl_status = 'failed') as failed_crawlers, + COUNT(*) FILTER (WHERE next_crawl_at <= NOW()) as due_now, + COUNT(*) FILTER (WHERE consecutive_failures > 0) as stores_with_failures, + AVG(consecutive_failures)::NUMERIC(4,2) as avg_consecutive_failures, + COUNT(*) FILTER (WHERE last_success_at > NOW() - INTERVAL '24 hours') as successful_last_24h +FROM dispensaries +WHERE state = 'AZ' AND menu_type = 'dutchie'; + +-- ============================================================ +-- PART 6: Constraint for minimum crawl gap +-- ============================================================ + +-- Function to check minimum crawl gap (2 minutes) +CREATE OR REPLACE FUNCTION check_minimum_crawl_gap() +RETURNS TRIGGER AS $$ +BEGIN + -- Only check for new pending jobs + IF NEW.status = 'pending' AND NEW.dispensary_id IS NOT NULL THEN + -- Check if there's a recent job for same dispensary + IF EXISTS ( + SELECT 1 FROM dispensary_crawl_jobs + WHERE dispensary_id = NEW.dispensary_id + AND id != NEW.id + AND status IN ('pending', 'running') + AND created_at > NOW() - INTERVAL '2 minutes' + ) THEN + RAISE EXCEPTION 'Minimum 2-minute gap required between crawls for same dispensary'; + END IF; + END IF; + RETURN NEW; +END; +$$ LANGUAGE plpgsql; + +-- Create trigger (drop first if exists) +DROP TRIGGER IF EXISTS enforce_minimum_crawl_gap ON dispensary_crawl_jobs; +CREATE TRIGGER enforce_minimum_crawl_gap + BEFORE INSERT ON dispensary_crawl_jobs + FOR EACH ROW + EXECUTE FUNCTION check_minimum_crawl_gap(); + +-- ============================================================ +-- PART 7: Comments +-- ============================================================ + +COMMENT ON TABLE crawl_attempts IS 'Detailed history of every crawl attempt for analytics and debugging'; +COMMENT ON VIEW v_crawler_status IS 'Current status of all crawlers with reliability metrics'; +COMMENT ON VIEW v_crawl_error_summary IS 'Summary of errors by type over last 7 days'; +COMMENT ON VIEW v_crawl_health IS 'Overall health metrics for the crawling system'; diff --git a/backend/migrations/046_raw_payloads_table.sql b/backend/migrations/046_raw_payloads_table.sql new file mode 100644 index 00000000..a0b2f799 --- /dev/null +++ b/backend/migrations/046_raw_payloads_table.sql @@ -0,0 +1,130 @@ +-- Migration 046: Raw Payloads Table +-- +-- Immutable event stream for raw crawler responses. +-- NEVER delete or overwrite historical payloads. +-- +-- Run with: +-- DATABASE_URL="postgresql://..." psql $DATABASE_URL -f migrations/046_raw_payloads_table.sql + +-- ===================================================== +-- 1) RAW_PAYLOADS TABLE +-- ===================================================== +CREATE TABLE IF NOT EXISTS raw_payloads ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + + -- Store reference + dispensary_id INTEGER NOT NULL REFERENCES dispensaries(id) ON DELETE CASCADE, + + -- Crawl run reference (nullable for backfilled data) + crawl_run_id INTEGER REFERENCES crawl_runs(id) ON DELETE SET NULL, + + -- Platform identification + platform VARCHAR(50) NOT NULL DEFAULT 'dutchie', + + -- Versioning for schema evolution + payload_version INTEGER NOT NULL DEFAULT 1, + + -- The raw JSON response from the crawler (immutable) + raw_json JSONB NOT NULL, + + -- Metadata + product_count INTEGER, -- Number of products in payload + pricing_type VARCHAR(20), -- 'rec', 'med', or 'both' + crawl_mode VARCHAR(20), -- 'mode_a', 'mode_b', 'dual' + + -- Timestamps + fetched_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + + -- Hydration status + processed BOOLEAN NOT NULL DEFAULT FALSE, + normalized_at TIMESTAMPTZ, + hydration_error TEXT, + hydration_attempts INTEGER DEFAULT 0, + + -- Audit + created_at TIMESTAMPTZ DEFAULT NOW() +); + +-- ===================================================== +-- 2) INDEXES FOR EFFICIENT QUERYING +-- ===================================================== + +-- Primary lookup: unprocessed payloads in FIFO order +CREATE INDEX IF NOT EXISTS idx_raw_payloads_unprocessed + ON raw_payloads(fetched_at ASC) + WHERE processed = FALSE; + +-- Store-based lookups +CREATE INDEX IF NOT EXISTS idx_raw_payloads_dispensary + ON raw_payloads(dispensary_id, fetched_at DESC); + +-- Platform filtering +CREATE INDEX IF NOT EXISTS idx_raw_payloads_platform + ON raw_payloads(platform); + +-- Crawl run linkage +CREATE INDEX IF NOT EXISTS idx_raw_payloads_crawl_run + ON raw_payloads(crawl_run_id) + WHERE crawl_run_id IS NOT NULL; + +-- Error tracking +CREATE INDEX IF NOT EXISTS idx_raw_payloads_errors + ON raw_payloads(hydration_attempts, processed) + WHERE hydration_error IS NOT NULL; + +-- ===================================================== +-- 3) HYDRATION LOCKS TABLE (distributed locking) +-- ===================================================== +CREATE TABLE IF NOT EXISTS hydration_locks ( + id SERIAL PRIMARY KEY, + lock_name VARCHAR(100) NOT NULL UNIQUE, + worker_id VARCHAR(100) NOT NULL, + acquired_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + expires_at TIMESTAMPTZ NOT NULL, + heartbeat_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_hydration_locks_expires + ON hydration_locks(expires_at); + +-- ===================================================== +-- 4) HYDRATION_RUNS TABLE (audit trail) +-- ===================================================== +CREATE TABLE IF NOT EXISTS hydration_runs ( + id SERIAL PRIMARY KEY, + worker_id VARCHAR(100) NOT NULL, + started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + finished_at TIMESTAMPTZ, + status VARCHAR(20) NOT NULL DEFAULT 'running', -- running, completed, failed + + -- Metrics + payloads_processed INTEGER DEFAULT 0, + products_upserted INTEGER DEFAULT 0, + snapshots_created INTEGER DEFAULT 0, + brands_created INTEGER DEFAULT 0, + errors_count INTEGER DEFAULT 0, + + -- Error details + error_message TEXT, + + created_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_hydration_runs_status + ON hydration_runs(status, started_at DESC); + +-- ===================================================== +-- 5) COMMENTS +-- ===================================================== +COMMENT ON TABLE raw_payloads IS 'Immutable event stream of raw crawler responses. NEVER DELETE.'; +COMMENT ON COLUMN raw_payloads.raw_json IS 'Complete raw JSON from GraphQL/API response. Immutable.'; +COMMENT ON COLUMN raw_payloads.payload_version IS 'Schema version for normalization compatibility.'; +COMMENT ON COLUMN raw_payloads.processed IS 'TRUE when payload has been hydrated to canonical tables.'; +COMMENT ON COLUMN raw_payloads.normalized_at IS 'When the payload was successfully hydrated.'; + +COMMENT ON TABLE hydration_locks IS 'Distributed locks for hydration workers to prevent double-processing.'; +COMMENT ON TABLE hydration_runs IS 'Audit trail of hydration job executions.'; + +-- ===================================================== +-- MIGRATION COMPLETE +-- ===================================================== diff --git a/backend/migrations/047_analytics_infrastructure.sql b/backend/migrations/047_analytics_infrastructure.sql new file mode 100644 index 00000000..47f5da00 --- /dev/null +++ b/backend/migrations/047_analytics_infrastructure.sql @@ -0,0 +1,473 @@ +-- Migration 047: Analytics Infrastructure +-- Phase 3: Analytics Dashboards for CannaiQ +-- Creates views, functions, and tables for price trends, brand penetration, category growth, etc. + +-- ============================================================ +-- ANALYTICS CACHE TABLE (for expensive query results) +-- ============================================================ +CREATE TABLE IF NOT EXISTS analytics_cache ( + id SERIAL PRIMARY KEY, + cache_key VARCHAR(255) NOT NULL UNIQUE, + cache_data JSONB NOT NULL, + computed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + expires_at TIMESTAMPTZ NOT NULL, + query_time_ms INTEGER, + created_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_analytics_cache_key ON analytics_cache(cache_key); +CREATE INDEX IF NOT EXISTS idx_analytics_cache_expires ON analytics_cache(expires_at); + +-- ============================================================ +-- PRICE EXTRACTION HELPER FUNCTION +-- Extracts pricing from JSONB latest_raw_payload +-- ============================================================ +CREATE OR REPLACE FUNCTION extract_min_price(payload JSONB) +RETURNS NUMERIC AS $$ +DECLARE + prices JSONB; + min_val NUMERIC; +BEGIN + -- Try recPrices first (retail prices) + prices := payload->'recPrices'; + IF prices IS NOT NULL AND jsonb_array_length(prices) > 0 THEN + SELECT MIN(value::NUMERIC) INTO min_val FROM jsonb_array_elements_text(prices) AS value WHERE value ~ '^[0-9.]+$'; + IF min_val IS NOT NULL THEN RETURN min_val; END IF; + END IF; + + -- Try Prices array + prices := payload->'Prices'; + IF prices IS NOT NULL AND jsonb_array_length(prices) > 0 THEN + SELECT MIN(value::NUMERIC) INTO min_val FROM jsonb_array_elements_text(prices) AS value WHERE value ~ '^[0-9.]+$'; + IF min_val IS NOT NULL THEN RETURN min_val; END IF; + END IF; + + RETURN NULL; +END; +$$ LANGUAGE plpgsql IMMUTABLE; + +CREATE OR REPLACE FUNCTION extract_max_price(payload JSONB) +RETURNS NUMERIC AS $$ +DECLARE + prices JSONB; + max_val NUMERIC; +BEGIN + prices := payload->'recPrices'; + IF prices IS NOT NULL AND jsonb_array_length(prices) > 0 THEN + SELECT MAX(value::NUMERIC) INTO max_val FROM jsonb_array_elements_text(prices) AS value WHERE value ~ '^[0-9.]+$'; + IF max_val IS NOT NULL THEN RETURN max_val; END IF; + END IF; + + prices := payload->'Prices'; + IF prices IS NOT NULL AND jsonb_array_length(prices) > 0 THEN + SELECT MAX(value::NUMERIC) INTO max_val FROM jsonb_array_elements_text(prices) AS value WHERE value ~ '^[0-9.]+$'; + IF max_val IS NOT NULL THEN RETURN max_val; END IF; + END IF; + + RETURN NULL; +END; +$$ LANGUAGE plpgsql IMMUTABLE; + +CREATE OR REPLACE FUNCTION extract_wholesale_price(payload JSONB) +RETURNS NUMERIC AS $$ +DECLARE + prices JSONB; + min_val NUMERIC; +BEGIN + prices := payload->'wholesalePrices'; + IF prices IS NOT NULL AND jsonb_array_length(prices) > 0 THEN + SELECT MIN(value::NUMERIC) INTO min_val FROM jsonb_array_elements_text(prices) AS value WHERE value ~ '^[0-9.]+$'; + RETURN min_val; + END IF; + RETURN NULL; +END; +$$ LANGUAGE plpgsql IMMUTABLE; + +-- ============================================================ +-- VIEW: v_product_pricing +-- Flattened view of products with extracted pricing +-- ============================================================ +CREATE OR REPLACE VIEW v_product_pricing AS +SELECT + dp.id, + dp.dispensary_id, + dp.name, + dp.brand_name, + dp.brand_id, + dp.type as category, + dp.subcategory, + dp.strain_type, + dp.stock_status, + dp.status, + d.name as store_name, + d.city, + d.state, + extract_min_price(dp.latest_raw_payload) as min_price, + extract_max_price(dp.latest_raw_payload) as max_price, + extract_wholesale_price(dp.latest_raw_payload) as wholesale_price, + dp.thc, + dp.cbd, + dp.updated_at, + dp.created_at +FROM dutchie_products dp +JOIN dispensaries d ON dp.dispensary_id = d.id; + +-- ============================================================ +-- VIEW: v_brand_store_presence +-- Which brands are in which stores +-- ============================================================ +CREATE OR REPLACE VIEW v_brand_store_presence AS +SELECT + dp.brand_name, + dp.brand_id, + dp.dispensary_id, + d.name as store_name, + d.city, + d.state, + dp.type as category, + COUNT(*) as sku_count, + AVG(extract_min_price(dp.latest_raw_payload)) as avg_price, + SUM(CASE WHEN dp.stock_status = 'in_stock' THEN 1 ELSE 0 END) as in_stock_count, + MAX(dp.updated_at) as last_updated +FROM dutchie_products dp +JOIN dispensaries d ON dp.dispensary_id = d.id +WHERE dp.brand_name IS NOT NULL +GROUP BY dp.brand_name, dp.brand_id, dp.dispensary_id, d.name, d.city, d.state, dp.type; + +-- ============================================================ +-- VIEW: v_category_store_summary +-- Category breakdown per store +-- ============================================================ +CREATE OR REPLACE VIEW v_category_store_summary AS +SELECT + dp.dispensary_id, + d.name as store_name, + d.city, + d.state, + dp.type as category, + COUNT(*) as sku_count, + COUNT(DISTINCT dp.brand_name) as brand_count, + AVG(extract_min_price(dp.latest_raw_payload)) as avg_price, + MIN(extract_min_price(dp.latest_raw_payload)) as min_price, + MAX(extract_max_price(dp.latest_raw_payload)) as max_price, + SUM(CASE WHEN dp.stock_status = 'in_stock' THEN 1 ELSE 0 END) as in_stock_count +FROM dutchie_products dp +JOIN dispensaries d ON dp.dispensary_id = d.id +WHERE dp.type IS NOT NULL +GROUP BY dp.dispensary_id, d.name, d.city, d.state, dp.type; + +-- ============================================================ +-- VIEW: v_brand_summary +-- Global brand statistics +-- ============================================================ +CREATE OR REPLACE VIEW v_brand_summary AS +SELECT + dp.brand_name, + dp.brand_id, + COUNT(*) as total_skus, + COUNT(DISTINCT dp.dispensary_id) as store_count, + COUNT(DISTINCT dp.type) as category_count, + AVG(extract_min_price(dp.latest_raw_payload)) as avg_price, + MIN(extract_min_price(dp.latest_raw_payload)) as min_price, + MAX(extract_max_price(dp.latest_raw_payload)) as max_price, + SUM(CASE WHEN dp.stock_status = 'in_stock' THEN 1 ELSE 0 END) as in_stock_skus, + ARRAY_AGG(DISTINCT dp.type) FILTER (WHERE dp.type IS NOT NULL) as categories, + MAX(dp.updated_at) as last_updated +FROM dutchie_products dp +WHERE dp.brand_name IS NOT NULL +GROUP BY dp.brand_name, dp.brand_id +ORDER BY total_skus DESC; + +-- ============================================================ +-- VIEW: v_category_summary +-- Global category statistics +-- ============================================================ +CREATE OR REPLACE VIEW v_category_summary AS +SELECT + dp.type as category, + COUNT(*) as total_skus, + COUNT(DISTINCT dp.brand_name) as brand_count, + COUNT(DISTINCT dp.dispensary_id) as store_count, + AVG(extract_min_price(dp.latest_raw_payload)) as avg_price, + MIN(extract_min_price(dp.latest_raw_payload)) as min_price, + MAX(extract_max_price(dp.latest_raw_payload)) as max_price, + SUM(CASE WHEN dp.stock_status = 'in_stock' THEN 1 ELSE 0 END) as in_stock_skus +FROM dutchie_products dp +WHERE dp.type IS NOT NULL +GROUP BY dp.type +ORDER BY total_skus DESC; + +-- ============================================================ +-- VIEW: v_store_summary +-- Store-level statistics +-- ============================================================ +CREATE OR REPLACE VIEW v_store_summary AS +SELECT + d.id as store_id, + d.name as store_name, + d.city, + d.state, + d.chain_id, + COUNT(dp.id) as total_skus, + COUNT(DISTINCT dp.brand_name) as brand_count, + COUNT(DISTINCT dp.type) as category_count, + AVG(extract_min_price(dp.latest_raw_payload)) as avg_price, + SUM(CASE WHEN dp.stock_status = 'in_stock' THEN 1 ELSE 0 END) as in_stock_skus, + d.last_crawl_at, + d.product_count +FROM dispensaries d +LEFT JOIN dutchie_products dp ON d.id = dp.dispensary_id +GROUP BY d.id, d.name, d.city, d.state, d.chain_id, d.last_crawl_at, d.product_count; + +-- ============================================================ +-- TABLE: brand_snapshots (for historical brand tracking) +-- ============================================================ +CREATE TABLE IF NOT EXISTS brand_snapshots ( + id SERIAL PRIMARY KEY, + brand_name VARCHAR(255) NOT NULL, + brand_id VARCHAR(255), + snapshot_date DATE NOT NULL, + store_count INTEGER NOT NULL DEFAULT 0, + total_skus INTEGER NOT NULL DEFAULT 0, + avg_price NUMERIC(10,2), + in_stock_skus INTEGER NOT NULL DEFAULT 0, + categories TEXT[], + created_at TIMESTAMPTZ DEFAULT NOW(), + UNIQUE(brand_name, snapshot_date) +); + +CREATE INDEX IF NOT EXISTS idx_brand_snapshots_brand ON brand_snapshots(brand_name); +CREATE INDEX IF NOT EXISTS idx_brand_snapshots_date ON brand_snapshots(snapshot_date); + +-- ============================================================ +-- TABLE: category_snapshots (for historical category tracking) +-- ============================================================ +CREATE TABLE IF NOT EXISTS category_snapshots ( + id SERIAL PRIMARY KEY, + category VARCHAR(255) NOT NULL, + snapshot_date DATE NOT NULL, + store_count INTEGER NOT NULL DEFAULT 0, + brand_count INTEGER NOT NULL DEFAULT 0, + total_skus INTEGER NOT NULL DEFAULT 0, + avg_price NUMERIC(10,2), + in_stock_skus INTEGER NOT NULL DEFAULT 0, + created_at TIMESTAMPTZ DEFAULT NOW(), + UNIQUE(category, snapshot_date) +); + +CREATE INDEX IF NOT EXISTS idx_category_snapshots_cat ON category_snapshots(category); +CREATE INDEX IF NOT EXISTS idx_category_snapshots_date ON category_snapshots(snapshot_date); + +-- ============================================================ +-- TABLE: store_change_events (for tracking store changes) +-- ============================================================ +CREATE TABLE IF NOT EXISTS store_change_events ( + id SERIAL PRIMARY KEY, + store_id INTEGER NOT NULL REFERENCES dispensaries(id), + event_type VARCHAR(50) NOT NULL, -- brand_added, brand_removed, product_added, product_removed, price_change, stock_change + event_date DATE NOT NULL, + brand_name VARCHAR(255), + product_id INTEGER, + product_name VARCHAR(500), + category VARCHAR(255), + old_value TEXT, + new_value TEXT, + metadata JSONB, + created_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_store_events_store ON store_change_events(store_id); +CREATE INDEX IF NOT EXISTS idx_store_events_type ON store_change_events(event_type); +CREATE INDEX IF NOT EXISTS idx_store_events_date ON store_change_events(event_date); +CREATE INDEX IF NOT EXISTS idx_store_events_brand ON store_change_events(brand_name); + +-- ============================================================ +-- TABLE: analytics_alerts +-- ============================================================ +CREATE TABLE IF NOT EXISTS analytics_alerts ( + id SERIAL PRIMARY KEY, + alert_type VARCHAR(50) NOT NULL, -- price_warning, brand_dropped, competitive_intrusion, restock_event + severity VARCHAR(20) NOT NULL DEFAULT 'info', -- info, warning, critical + title VARCHAR(255) NOT NULL, + description TEXT, + store_id INTEGER REFERENCES dispensaries(id), + brand_name VARCHAR(255), + product_id INTEGER, + category VARCHAR(255), + metadata JSONB, + is_read BOOLEAN DEFAULT FALSE, + created_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_analytics_alerts_type ON analytics_alerts(alert_type); +CREATE INDEX IF NOT EXISTS idx_analytics_alerts_read ON analytics_alerts(is_read); +CREATE INDEX IF NOT EXISTS idx_analytics_alerts_created ON analytics_alerts(created_at DESC); + +-- ============================================================ +-- FUNCTION: Capture daily brand snapshots +-- ============================================================ +CREATE OR REPLACE FUNCTION capture_brand_snapshots() +RETURNS INTEGER AS $$ +DECLARE + inserted_count INTEGER; +BEGIN + INSERT INTO brand_snapshots (brand_name, brand_id, snapshot_date, store_count, total_skus, avg_price, in_stock_skus, categories) + SELECT + brand_name, + brand_id, + CURRENT_DATE, + COUNT(DISTINCT dispensary_id), + COUNT(*), + AVG(extract_min_price(latest_raw_payload)), + SUM(CASE WHEN stock_status = 'in_stock' THEN 1 ELSE 0 END), + ARRAY_AGG(DISTINCT type) FILTER (WHERE type IS NOT NULL) + FROM dutchie_products + WHERE brand_name IS NOT NULL + GROUP BY brand_name, brand_id + ON CONFLICT (brand_name, snapshot_date) + DO UPDATE SET + store_count = EXCLUDED.store_count, + total_skus = EXCLUDED.total_skus, + avg_price = EXCLUDED.avg_price, + in_stock_skus = EXCLUDED.in_stock_skus, + categories = EXCLUDED.categories; + + GET DIAGNOSTICS inserted_count = ROW_COUNT; + RETURN inserted_count; +END; +$$ LANGUAGE plpgsql; + +-- ============================================================ +-- FUNCTION: Capture daily category snapshots +-- ============================================================ +CREATE OR REPLACE FUNCTION capture_category_snapshots() +RETURNS INTEGER AS $$ +DECLARE + inserted_count INTEGER; +BEGIN + INSERT INTO category_snapshots (category, snapshot_date, store_count, brand_count, total_skus, avg_price, in_stock_skus) + SELECT + type, + CURRENT_DATE, + COUNT(DISTINCT dispensary_id), + COUNT(DISTINCT brand_name), + COUNT(*), + AVG(extract_min_price(latest_raw_payload)), + SUM(CASE WHEN stock_status = 'in_stock' THEN 1 ELSE 0 END) + FROM dutchie_products + WHERE type IS NOT NULL + GROUP BY type + ON CONFLICT (category, snapshot_date) + DO UPDATE SET + store_count = EXCLUDED.store_count, + brand_count = EXCLUDED.brand_count, + total_skus = EXCLUDED.total_skus, + avg_price = EXCLUDED.avg_price, + in_stock_skus = EXCLUDED.in_stock_skus; + + GET DIAGNOSTICS inserted_count = ROW_COUNT; + RETURN inserted_count; +END; +$$ LANGUAGE plpgsql; + +-- ============================================================ +-- FUNCTION: Calculate price volatility for a product +-- ============================================================ +CREATE OR REPLACE FUNCTION calculate_price_volatility( + p_product_id INTEGER, + p_days INTEGER DEFAULT 30 +) +RETURNS NUMERIC AS $$ +DECLARE + std_dev NUMERIC; + avg_price NUMERIC; +BEGIN + -- Using dutchie_product_snapshots if available + SELECT + STDDEV(rec_min_price_cents / 100.0), + AVG(rec_min_price_cents / 100.0) + INTO std_dev, avg_price + FROM dutchie_product_snapshots + WHERE dutchie_product_id = p_product_id + AND crawled_at >= NOW() - (p_days || ' days')::INTERVAL + AND rec_min_price_cents IS NOT NULL; + + IF avg_price IS NULL OR avg_price = 0 THEN + RETURN NULL; + END IF; + + -- Return coefficient of variation (CV) + RETURN ROUND((std_dev / avg_price) * 100, 2); +END; +$$ LANGUAGE plpgsql; + +-- ============================================================ +-- FUNCTION: Get brand penetration stats +-- ============================================================ +CREATE OR REPLACE FUNCTION get_brand_penetration( + p_brand_name VARCHAR, + p_state VARCHAR DEFAULT NULL +) +RETURNS TABLE ( + total_stores BIGINT, + stores_carrying BIGINT, + penetration_pct NUMERIC, + total_skus BIGINT, + avg_skus_per_store NUMERIC, + shelf_share_pct NUMERIC +) AS $$ +BEGIN + RETURN QUERY + WITH store_counts AS ( + SELECT + COUNT(DISTINCT d.id) as total, + COUNT(DISTINCT CASE WHEN dp.brand_name = p_brand_name THEN dp.dispensary_id END) as carrying + FROM dispensaries d + LEFT JOIN dutchie_products dp ON d.id = dp.dispensary_id + WHERE (p_state IS NULL OR d.state = p_state) + ), + sku_counts AS ( + SELECT + COUNT(*) as brand_skus, + COUNT(DISTINCT dispensary_id) as stores_with_brand + FROM dutchie_products + WHERE brand_name = p_brand_name + ), + total_skus AS ( + SELECT COUNT(*) as total FROM dutchie_products dp + JOIN dispensaries d ON dp.dispensary_id = d.id + WHERE (p_state IS NULL OR d.state = p_state) + ) + SELECT + sc.total, + sc.carrying, + ROUND((sc.carrying::NUMERIC / NULLIF(sc.total, 0)) * 100, 2), + skc.brand_skus, + ROUND(skc.brand_skus::NUMERIC / NULLIF(skc.stores_with_brand, 0), 2), + ROUND((skc.brand_skus::NUMERIC / NULLIF(ts.total, 0)) * 100, 2) + FROM store_counts sc, sku_counts skc, total_skus ts; +END; +$$ LANGUAGE plpgsql; + +-- ============================================================ +-- Initial snapshot capture (run manually if needed) +-- ============================================================ +-- Note: Run these after migration to capture initial snapshots: +-- SELECT capture_brand_snapshots(); +-- SELECT capture_category_snapshots(); + +-- ============================================================ +-- Grant permissions +-- ============================================================ +-- Views are accessible to all roles by default + +COMMENT ON VIEW v_product_pricing IS 'Flattened product view with extracted pricing from JSONB'; +COMMENT ON VIEW v_brand_store_presence IS 'Brand presence across stores with SKU counts'; +COMMENT ON VIEW v_brand_summary IS 'Global brand statistics'; +COMMENT ON VIEW v_category_summary IS 'Global category statistics'; +COMMENT ON VIEW v_store_summary IS 'Store-level statistics'; +COMMENT ON TABLE analytics_cache IS 'Cache for expensive analytics queries'; +COMMENT ON TABLE brand_snapshots IS 'Historical daily snapshots of brand metrics'; +COMMENT ON TABLE category_snapshots IS 'Historical daily snapshots of category metrics'; +COMMENT ON TABLE store_change_events IS 'Log of brand/product changes at stores'; +COMMENT ON TABLE analytics_alerts IS 'Analytics-generated alerts and notifications'; diff --git a/backend/migrations/048_production_sync_monitoring.sql b/backend/migrations/048_production_sync_monitoring.sql new file mode 100644 index 00000000..a22ee10f --- /dev/null +++ b/backend/migrations/048_production_sync_monitoring.sql @@ -0,0 +1,598 @@ +-- Migration 048: Production Sync + Monitoring Infrastructure +-- Phase 5: Full Production Sync + Monitoring +-- +-- Creates: +-- 1. Sync orchestrator tables +-- 2. Dead-letter queue (DLQ) +-- 3. System metrics tracking +-- 4. Integrity check results +-- 5. Auto-fix audit log + +-- ============================================================ +-- SYNC ORCHESTRATOR TABLES +-- ============================================================ + +-- Orchestrator state and control +CREATE TABLE IF NOT EXISTS sync_orchestrator_state ( + id INTEGER PRIMARY KEY DEFAULT 1 CHECK (id = 1), -- Singleton row + status VARCHAR(20) NOT NULL DEFAULT 'SLEEPING', -- RUNNING, SLEEPING, LOCKED, PAUSED + current_worker_id VARCHAR(100), + last_heartbeat_at TIMESTAMPTZ, + last_run_started_at TIMESTAMPTZ, + last_run_completed_at TIMESTAMPTZ, + last_run_duration_ms INTEGER, + last_run_payloads_processed INTEGER DEFAULT 0, + last_run_errors INTEGER DEFAULT 0, + consecutive_failures INTEGER DEFAULT 0, + is_paused BOOLEAN DEFAULT FALSE, + pause_reason TEXT, + config JSONB DEFAULT '{ + "batchSize": 50, + "pollIntervalMs": 5000, + "maxRetries": 3, + "lockTimeoutMs": 300000, + "enableAnalyticsPrecompute": true, + "enableIntegrityChecks": true + }'::jsonb, + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW() +); + +-- Insert singleton row if not exists +INSERT INTO sync_orchestrator_state (id) VALUES (1) ON CONFLICT (id) DO NOTHING; + +-- Sync run history +CREATE TABLE IF NOT EXISTS sync_runs ( + id SERIAL PRIMARY KEY, + run_id UUID DEFAULT gen_random_uuid() UNIQUE NOT NULL, + worker_id VARCHAR(100) NOT NULL, + status VARCHAR(20) NOT NULL DEFAULT 'running', -- running, completed, failed, cancelled + started_at TIMESTAMPTZ DEFAULT NOW(), + finished_at TIMESTAMPTZ, + duration_ms INTEGER, + + -- Metrics + payloads_queued INTEGER DEFAULT 0, + payloads_processed INTEGER DEFAULT 0, + payloads_skipped INTEGER DEFAULT 0, + payloads_failed INTEGER DEFAULT 0, + payloads_dlq INTEGER DEFAULT 0, + + products_upserted INTEGER DEFAULT 0, + products_inserted INTEGER DEFAULT 0, + products_updated INTEGER DEFAULT 0, + products_discontinued INTEGER DEFAULT 0, + + snapshots_created INTEGER DEFAULT 0, + + -- Error tracking + errors JSONB DEFAULT '[]'::jsonb, + error_summary TEXT, + + -- Diff stats (before/after) + diff_stats JSONB DEFAULT '{}'::jsonb, + + -- Analytics precompute triggered + analytics_updated BOOLEAN DEFAULT FALSE, + analytics_duration_ms INTEGER, + + created_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_sync_runs_status ON sync_runs(status); +CREATE INDEX IF NOT EXISTS idx_sync_runs_started_at ON sync_runs(started_at DESC); +CREATE INDEX IF NOT EXISTS idx_sync_runs_run_id ON sync_runs(run_id); + +-- ============================================================ +-- DEAD-LETTER QUEUE (DLQ) +-- ============================================================ + +-- DLQ for failed payloads +CREATE TABLE IF NOT EXISTS raw_payloads_dlq ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + original_payload_id UUID NOT NULL, + dispensary_id INTEGER REFERENCES dispensaries(id), + state_code VARCHAR(2), + platform VARCHAR(50) DEFAULT 'dutchie', + + -- Original payload data (preserved) + raw_json JSONB NOT NULL, + product_count INTEGER, + pricing_type VARCHAR(10), + crawl_mode VARCHAR(20), + + -- DLQ metadata + moved_to_dlq_at TIMESTAMPTZ DEFAULT NOW(), + failure_count INTEGER DEFAULT 0, + + -- Error history (array of error objects) + error_history JSONB DEFAULT '[]'::jsonb, + last_error_type VARCHAR(50), + last_error_message TEXT, + last_error_at TIMESTAMPTZ, + + -- Retry tracking + retry_count INTEGER DEFAULT 0, + last_retry_at TIMESTAMPTZ, + next_retry_at TIMESTAMPTZ, + + -- Resolution + status VARCHAR(20) DEFAULT 'pending', -- pending, retrying, resolved, abandoned + resolved_at TIMESTAMPTZ, + resolved_by VARCHAR(100), + resolution_notes TEXT, + + created_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_dlq_status ON raw_payloads_dlq(status); +CREATE INDEX IF NOT EXISTS idx_dlq_dispensary ON raw_payloads_dlq(dispensary_id); +CREATE INDEX IF NOT EXISTS idx_dlq_error_type ON raw_payloads_dlq(last_error_type); +CREATE INDEX IF NOT EXISTS idx_dlq_moved_at ON raw_payloads_dlq(moved_to_dlq_at DESC); + +-- ============================================================ +-- SYSTEM METRICS +-- ============================================================ + +-- System metrics time series +CREATE TABLE IF NOT EXISTS system_metrics ( + id SERIAL PRIMARY KEY, + metric_name VARCHAR(100) NOT NULL, + metric_value NUMERIC NOT NULL, + labels JSONB DEFAULT '{}', + recorded_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_metrics_name_time ON system_metrics(metric_name, recorded_at DESC); +CREATE INDEX IF NOT EXISTS idx_metrics_recorded_at ON system_metrics(recorded_at DESC); + +-- Metrics snapshot (current state, updated continuously) +CREATE TABLE IF NOT EXISTS system_metrics_current ( + metric_name VARCHAR(100) PRIMARY KEY, + metric_value NUMERIC NOT NULL, + labels JSONB DEFAULT '{}', + updated_at TIMESTAMPTZ DEFAULT NOW() +); + +-- Error buckets for classification +CREATE TABLE IF NOT EXISTS error_buckets ( + id SERIAL PRIMARY KEY, + error_type VARCHAR(50) NOT NULL, + error_message TEXT, + source_table VARCHAR(50), + source_id TEXT, + dispensary_id INTEGER, + state_code VARCHAR(2), + context JSONB DEFAULT '{}', + occurred_at TIMESTAMPTZ DEFAULT NOW(), + acknowledged BOOLEAN DEFAULT FALSE, + acknowledged_at TIMESTAMPTZ, + acknowledged_by VARCHAR(100) +); + +CREATE INDEX IF NOT EXISTS idx_error_buckets_type ON error_buckets(error_type); +CREATE INDEX IF NOT EXISTS idx_error_buckets_occurred ON error_buckets(occurred_at DESC); +CREATE INDEX IF NOT EXISTS idx_error_buckets_unacked ON error_buckets(acknowledged) WHERE acknowledged = FALSE; + +-- ============================================================ +-- INTEGRITY CHECK RESULTS +-- ============================================================ + +CREATE TABLE IF NOT EXISTS integrity_check_runs ( + id SERIAL PRIMARY KEY, + run_id UUID DEFAULT gen_random_uuid() UNIQUE NOT NULL, + check_type VARCHAR(50) NOT NULL, -- daily, on_demand, scheduled + triggered_by VARCHAR(100), + started_at TIMESTAMPTZ DEFAULT NOW(), + finished_at TIMESTAMPTZ, + status VARCHAR(20) DEFAULT 'running', -- running, completed, failed + + -- Results summary + total_checks INTEGER DEFAULT 0, + passed_checks INTEGER DEFAULT 0, + failed_checks INTEGER DEFAULT 0, + warning_checks INTEGER DEFAULT 0, + + -- Detailed results + results JSONB DEFAULT '[]'::jsonb, + + created_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_integrity_runs_status ON integrity_check_runs(status); +CREATE INDEX IF NOT EXISTS idx_integrity_runs_started ON integrity_check_runs(started_at DESC); + +-- Individual integrity check results +CREATE TABLE IF NOT EXISTS integrity_check_results ( + id SERIAL PRIMARY KEY, + run_id UUID REFERENCES integrity_check_runs(run_id) ON DELETE CASCADE, + check_name VARCHAR(100) NOT NULL, + check_category VARCHAR(50) NOT NULL, + status VARCHAR(20) NOT NULL, -- passed, failed, warning, skipped + + -- Check details + expected_value TEXT, + actual_value TEXT, + difference TEXT, + affected_count INTEGER DEFAULT 0, + + -- Context + details JSONB DEFAULT '{}', + affected_ids JSONB DEFAULT '[]'::jsonb, + + -- Remediation + can_auto_fix BOOLEAN DEFAULT FALSE, + fix_routine VARCHAR(100), + + checked_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_integrity_results_run ON integrity_check_results(run_id); +CREATE INDEX IF NOT EXISTS idx_integrity_results_status ON integrity_check_results(status); + +-- ============================================================ +-- AUTO-FIX AUDIT LOG +-- ============================================================ + +CREATE TABLE IF NOT EXISTS auto_fix_runs ( + id SERIAL PRIMARY KEY, + run_id UUID DEFAULT gen_random_uuid() UNIQUE NOT NULL, + routine_name VARCHAR(100) NOT NULL, + triggered_by VARCHAR(100) NOT NULL, + trigger_type VARCHAR(20) NOT NULL, -- manual, auto, scheduled + + started_at TIMESTAMPTZ DEFAULT NOW(), + finished_at TIMESTAMPTZ, + status VARCHAR(20) DEFAULT 'running', -- running, completed, failed, rolled_back + + -- What was changed + rows_affected INTEGER DEFAULT 0, + changes JSONB DEFAULT '[]'::jsonb, + + -- Dry run support + is_dry_run BOOLEAN DEFAULT FALSE, + dry_run_preview JSONB, + + -- Error handling + error_message TEXT, + + created_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_fix_runs_routine ON auto_fix_runs(routine_name); +CREATE INDEX IF NOT EXISTS idx_fix_runs_started ON auto_fix_runs(started_at DESC); + +-- ============================================================ +-- ALERTS TABLE +-- ============================================================ + +CREATE TABLE IF NOT EXISTS system_alerts ( + id SERIAL PRIMARY KEY, + alert_type VARCHAR(50) NOT NULL, + severity VARCHAR(20) NOT NULL, -- info, warning, error, critical + title VARCHAR(255) NOT NULL, + message TEXT, + source VARCHAR(100), + + -- Context + context JSONB DEFAULT '{}', + + -- State + status VARCHAR(20) DEFAULT 'active', -- active, acknowledged, resolved, muted + acknowledged_at TIMESTAMPTZ, + acknowledged_by VARCHAR(100), + resolved_at TIMESTAMPTZ, + resolved_by VARCHAR(100), + + -- Deduplication + fingerprint VARCHAR(64), -- Hash for dedup + occurrence_count INTEGER DEFAULT 1, + first_occurred_at TIMESTAMPTZ DEFAULT NOW(), + last_occurred_at TIMESTAMPTZ DEFAULT NOW(), + + created_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_alerts_status ON system_alerts(status); +CREATE INDEX IF NOT EXISTS idx_alerts_severity ON system_alerts(severity); +CREATE INDEX IF NOT EXISTS idx_alerts_type ON system_alerts(alert_type); +CREATE INDEX IF NOT EXISTS idx_alerts_fingerprint ON system_alerts(fingerprint); +CREATE INDEX IF NOT EXISTS idx_alerts_active ON system_alerts(status, created_at DESC) WHERE status = 'active'; + +-- ============================================================ +-- HELPER VIEWS +-- ============================================================ + +-- Current sync status view +CREATE OR REPLACE VIEW v_sync_status AS +SELECT + sos.status as orchestrator_status, + sos.current_worker_id, + sos.last_heartbeat_at, + sos.is_paused, + sos.pause_reason, + sos.consecutive_failures, + sos.last_run_started_at, + sos.last_run_completed_at, + sos.last_run_duration_ms, + sos.last_run_payloads_processed, + sos.last_run_errors, + sos.config, + (SELECT COUNT(*) FROM raw_payloads WHERE processed = FALSE) as unprocessed_payloads, + (SELECT COUNT(*) FROM raw_payloads_dlq WHERE status = 'pending') as dlq_pending, + (SELECT COUNT(*) FROM system_alerts WHERE status = 'active') as active_alerts, + ( + SELECT json_build_object( + 'total', COUNT(*), + 'completed', COUNT(*) FILTER (WHERE status = 'completed'), + 'failed', COUNT(*) FILTER (WHERE status = 'failed') + ) + FROM sync_runs + WHERE started_at >= NOW() - INTERVAL '24 hours' + ) as runs_24h +FROM sync_orchestrator_state sos +WHERE sos.id = 1; + +-- DLQ summary view +CREATE OR REPLACE VIEW v_dlq_summary AS +SELECT + status, + last_error_type, + COUNT(*) as count, + MIN(moved_to_dlq_at) as oldest, + MAX(moved_to_dlq_at) as newest +FROM raw_payloads_dlq +GROUP BY status, last_error_type +ORDER BY count DESC; + +-- Error bucket summary (last 24h) +CREATE OR REPLACE VIEW v_error_summary AS +SELECT + error_type, + COUNT(*) as count, + COUNT(*) FILTER (WHERE acknowledged = FALSE) as unacknowledged, + MIN(occurred_at) as first_occurred, + MAX(occurred_at) as last_occurred +FROM error_buckets +WHERE occurred_at >= NOW() - INTERVAL '24 hours' +GROUP BY error_type +ORDER BY count DESC; + +-- Metrics summary view +CREATE OR REPLACE VIEW v_metrics_summary AS +SELECT + metric_name, + metric_value, + labels, + updated_at, + NOW() - updated_at as age +FROM system_metrics_current +ORDER BY metric_name; + +-- ============================================================ +-- HELPER FUNCTIONS +-- ============================================================ + +-- Record a metric +CREATE OR REPLACE FUNCTION record_metric( + p_name VARCHAR(100), + p_value NUMERIC, + p_labels JSONB DEFAULT '{}' +) RETURNS VOID AS $$ +BEGIN + -- Insert into time series + INSERT INTO system_metrics (metric_name, metric_value, labels) + VALUES (p_name, p_value, p_labels); + + -- Upsert current value + INSERT INTO system_metrics_current (metric_name, metric_value, labels, updated_at) + VALUES (p_name, p_value, p_labels, NOW()) + ON CONFLICT (metric_name) DO UPDATE SET + metric_value = EXCLUDED.metric_value, + labels = EXCLUDED.labels, + updated_at = NOW(); +END; +$$ LANGUAGE plpgsql; + +-- Record an error +CREATE OR REPLACE FUNCTION record_error( + p_type VARCHAR(50), + p_message TEXT, + p_source_table VARCHAR(50) DEFAULT NULL, + p_source_id TEXT DEFAULT NULL, + p_dispensary_id INTEGER DEFAULT NULL, + p_context JSONB DEFAULT '{}' +) RETURNS INTEGER AS $$ +DECLARE + v_id INTEGER; +BEGIN + INSERT INTO error_buckets ( + error_type, error_message, source_table, source_id, + dispensary_id, context + ) + VALUES ( + p_type, p_message, p_source_table, p_source_id, + p_dispensary_id, p_context + ) + RETURNING id INTO v_id; + + -- Update error count metric + PERFORM record_metric( + 'error_count_' || p_type, + COALESCE((SELECT metric_value FROM system_metrics_current WHERE metric_name = 'error_count_' || p_type), 0) + 1 + ); + + RETURN v_id; +END; +$$ LANGUAGE plpgsql; + +-- Create or update alert (with deduplication) +CREATE OR REPLACE FUNCTION upsert_alert( + p_type VARCHAR(50), + p_severity VARCHAR(20), + p_title VARCHAR(255), + p_message TEXT DEFAULT NULL, + p_source VARCHAR(100) DEFAULT NULL, + p_context JSONB DEFAULT '{}' +) RETURNS INTEGER AS $$ +DECLARE + v_fingerprint VARCHAR(64); + v_id INTEGER; +BEGIN + -- Generate fingerprint for dedup + v_fingerprint := md5(p_type || p_title || COALESCE(p_source, '')); + + -- Try to find existing active alert + SELECT id INTO v_id + FROM system_alerts + WHERE fingerprint = v_fingerprint AND status = 'active'; + + IF v_id IS NOT NULL THEN + -- Update existing alert + UPDATE system_alerts + SET occurrence_count = occurrence_count + 1, + last_occurred_at = NOW(), + context = p_context + WHERE id = v_id; + ELSE + -- Create new alert + INSERT INTO system_alerts ( + alert_type, severity, title, message, source, context, fingerprint + ) + VALUES ( + p_type, p_severity, p_title, p_message, p_source, p_context, v_fingerprint + ) + RETURNING id INTO v_id; + END IF; + + RETURN v_id; +END; +$$ LANGUAGE plpgsql; + +-- Move payload to DLQ +CREATE OR REPLACE FUNCTION move_to_dlq( + p_payload_id UUID, + p_error_type VARCHAR(50), + p_error_message TEXT +) RETURNS UUID AS $$ +DECLARE + v_dlq_id UUID; + v_payload RECORD; +BEGIN + -- Get the original payload + SELECT * INTO v_payload + FROM raw_payloads + WHERE id = p_payload_id; + + IF v_payload IS NULL THEN + RAISE EXCEPTION 'Payload not found: %', p_payload_id; + END IF; + + -- Insert into DLQ + INSERT INTO raw_payloads_dlq ( + original_payload_id, dispensary_id, state_code, platform, + raw_json, product_count, pricing_type, crawl_mode, + failure_count, last_error_type, last_error_message, last_error_at, + error_history + ) + VALUES ( + p_payload_id, v_payload.dispensary_id, + (SELECT state FROM dispensaries WHERE id = v_payload.dispensary_id), + v_payload.platform, + v_payload.raw_json, v_payload.product_count, v_payload.pricing_type, v_payload.crawl_mode, + v_payload.hydration_attempts, + p_error_type, p_error_message, NOW(), + COALESCE(v_payload.hydration_error::jsonb, '[]'::jsonb) || jsonb_build_object( + 'type', p_error_type, + 'message', p_error_message, + 'at', NOW() + ) + ) + RETURNING id INTO v_dlq_id; + + -- Mark original as processed (moved to DLQ) + UPDATE raw_payloads + SET processed = TRUE, + hydration_error = 'Moved to DLQ: ' || p_error_message + WHERE id = p_payload_id; + + -- Record metric + PERFORM record_metric('payloads_dlq_total', + COALESCE((SELECT metric_value FROM system_metrics_current WHERE metric_name = 'payloads_dlq_total'), 0) + 1 + ); + + -- Create alert for DLQ + PERFORM upsert_alert( + 'DLQ_ARRIVAL', + 'warning', + 'Payload moved to Dead-Letter Queue', + p_error_message, + 'hydration', + jsonb_build_object('payload_id', p_payload_id, 'dlq_id', v_dlq_id, 'error_type', p_error_type) + ); + + RETURN v_dlq_id; +END; +$$ LANGUAGE plpgsql; + +-- Cleanup old metrics (keep 7 days of time series) +CREATE OR REPLACE FUNCTION cleanup_old_metrics() RETURNS INTEGER AS $$ +DECLARE + v_deleted INTEGER; +BEGIN + DELETE FROM system_metrics + WHERE recorded_at < NOW() - INTERVAL '7 days'; + + GET DIAGNOSTICS v_deleted = ROW_COUNT; + RETURN v_deleted; +END; +$$ LANGUAGE plpgsql; + +-- ============================================================ +-- ENSURE RAW_PAYLOADS HAS REQUIRED COLUMNS +-- ============================================================ + +-- Add state column to raw_payloads if not exists +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM information_schema.columns + WHERE table_name = 'raw_payloads' AND column_name = 'state_code' + ) THEN + ALTER TABLE raw_payloads ADD COLUMN state_code VARCHAR(2); + END IF; +END $$; + +-- ============================================================ +-- INITIAL METRICS +-- ============================================================ + +-- Initialize core metrics +INSERT INTO system_metrics_current (metric_name, metric_value, labels) +VALUES + ('payloads_unprocessed', 0, '{}'), + ('payloads_processed_today', 0, '{}'), + ('hydration_errors', 0, '{}'), + ('hydration_success_rate', 100, '{}'), + ('canonical_rows_inserted', 0, '{}'), + ('canonical_rows_updated', 0, '{}'), + ('canonical_rows_discontinued', 0, '{}'), + ('snapshot_volume', 0, '{}'), + ('ingestion_latency_avg_ms', 0, '{}'), + ('payloads_dlq_total', 0, '{}') +ON CONFLICT (metric_name) DO NOTHING; + +-- ============================================================ +-- COMMENTS +-- ============================================================ + +COMMENT ON TABLE sync_orchestrator_state IS 'Singleton table tracking orchestrator status and config'; +COMMENT ON TABLE sync_runs IS 'History of sync runs with metrics'; +COMMENT ON TABLE raw_payloads_dlq IS 'Dead-letter queue for failed payloads'; +COMMENT ON TABLE system_metrics IS 'Time-series metrics storage'; +COMMENT ON TABLE system_metrics_current IS 'Current metric values (fast lookup)'; +COMMENT ON TABLE error_buckets IS 'Classified errors for monitoring'; +COMMENT ON TABLE integrity_check_runs IS 'Integrity check execution history'; +COMMENT ON TABLE integrity_check_results IS 'Individual check results'; +COMMENT ON TABLE auto_fix_runs IS 'Audit log for auto-fix routines'; +COMMENT ON TABLE system_alerts IS 'System alerts with deduplication'; diff --git a/backend/migrations/050_cannaiq_canonical_v2.sql b/backend/migrations/050_cannaiq_canonical_v2.sql new file mode 100644 index 00000000..00b97efb --- /dev/null +++ b/backend/migrations/050_cannaiq_canonical_v2.sql @@ -0,0 +1,750 @@ +-- ============================================================================ +-- Migration 050: CannaiQ Canonical Schema v2 +-- ============================================================================ +-- +-- Purpose: Add canonical tables for multi-state analytics, pricing engine, +-- promotions, intelligence, and brand/buyer portals. +-- +-- RULES: +-- - STRICTLY ADDITIVE (no DROP, DELETE, TRUNCATE, or ALTER column type) +-- - All new tables use IF NOT EXISTS +-- - All new columns use ADD COLUMN IF NOT EXISTS +-- - All indexes use IF NOT EXISTS +-- - Compatible with existing dutchie_products, dispensaries, etc. +-- +-- Run with: +-- psql $CANNAIQ_DB_URL -f migrations/050_cannaiq_canonical_v2.sql +-- +-- ============================================================================ + + +-- ============================================================================ +-- SECTION 1: STATES TABLE +-- ============================================================================ +-- Reference table for US states. Already may exist from 041/043. +-- This is idempotent. + +CREATE TABLE IF NOT EXISTS states ( + id SERIAL PRIMARY KEY, + code VARCHAR(2) NOT NULL UNIQUE, + name VARCHAR(100) NOT NULL, + timezone VARCHAR(50) DEFAULT 'America/Phoenix', + is_active BOOLEAN DEFAULT TRUE, + crawl_enabled BOOLEAN DEFAULT TRUE, + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW() +); + +-- Insert states if not present +INSERT INTO states (code, name, timezone) VALUES + ('AZ', 'Arizona', 'America/Phoenix'), + ('CA', 'California', 'America/Los_Angeles'), + ('CO', 'Colorado', 'America/Denver'), + ('FL', 'Florida', 'America/New_York'), + ('IL', 'Illinois', 'America/Chicago'), + ('MA', 'Massachusetts', 'America/New_York'), + ('MD', 'Maryland', 'America/New_York'), + ('MI', 'Michigan', 'America/Detroit'), + ('MO', 'Missouri', 'America/Chicago'), + ('NV', 'Nevada', 'America/Los_Angeles'), + ('NJ', 'New Jersey', 'America/New_York'), + ('NY', 'New York', 'America/New_York'), + ('OH', 'Ohio', 'America/New_York'), + ('OK', 'Oklahoma', 'America/Chicago'), + ('OR', 'Oregon', 'America/Los_Angeles'), + ('PA', 'Pennsylvania', 'America/New_York'), + ('WA', 'Washington', 'America/Los_Angeles') +ON CONFLICT (code) DO UPDATE SET + timezone = EXCLUDED.timezone, + updated_at = NOW(); + +CREATE INDEX IF NOT EXISTS idx_states_code ON states(code); +CREATE INDEX IF NOT EXISTS idx_states_active ON states(is_active) WHERE is_active = TRUE; + +COMMENT ON TABLE states IS 'US states where CannaiQ operates. Single source of truth for state configuration.'; + + +-- ============================================================================ +-- SECTION 2: CHAINS TABLE (Retail Groups) +-- ============================================================================ +-- Chains are multi-location operators like Curaleaf, Trulieve, Harvest, etc. + +CREATE TABLE IF NOT EXISTS chains ( + id SERIAL PRIMARY KEY, + name VARCHAR(255) NOT NULL, + slug VARCHAR(255) NOT NULL UNIQUE, + + -- Branding + website_url TEXT, + logo_url TEXT, + description TEXT, + + -- Business info + headquarters_city VARCHAR(100), + headquarters_state_id INTEGER REFERENCES states(id), + founded_year INTEGER, + + -- Status + is_active BOOLEAN DEFAULT TRUE, + is_public BOOLEAN DEFAULT FALSE, -- Publicly traded? + stock_ticker VARCHAR(10), + + -- Metadata + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_chains_slug ON chains(slug); +CREATE INDEX IF NOT EXISTS idx_chains_active ON chains(is_active) WHERE is_active = TRUE; + +COMMENT ON TABLE chains IS 'Retail chains/groups that own multiple dispensary locations.'; + + +-- ============================================================================ +-- SECTION 3: CANONICAL BRANDS TABLE +-- ============================================================================ +-- This is the master brand catalog across all providers and states. +-- Distinct from the per-store `brands` table which tracks store-level brand presence. + +CREATE TABLE IF NOT EXISTS canonical_brands ( + id SERIAL PRIMARY KEY, + name VARCHAR(255) NOT NULL, + slug VARCHAR(255) NOT NULL UNIQUE, + + -- External IDs from various platforms + dutchie_brand_id VARCHAR(100), + jane_brand_id VARCHAR(100), + treez_brand_id VARCHAR(100), + weedmaps_brand_id VARCHAR(100), + + -- Branding + logo_url TEXT, + local_logo_path TEXT, -- Local storage path + website_url TEXT, + instagram_handle VARCHAR(100), + description TEXT, + + -- Classification + is_portfolio_brand BOOLEAN DEFAULT FALSE, -- TRUE if brand we represent + is_house_brand BOOLEAN DEFAULT FALSE, -- TRUE if dispensary house brand + parent_company VARCHAR(255), -- Parent company name if subsidiary + + -- State presence + states_available TEXT[], -- Array of state codes where brand is present + + -- Status + is_active BOOLEAN DEFAULT TRUE, + is_verified BOOLEAN DEFAULT FALSE, -- Manually verified brand info + verified_at TIMESTAMPTZ, + + -- Metadata + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_canonical_brands_slug ON canonical_brands(slug); +CREATE INDEX IF NOT EXISTS idx_canonical_brands_dutchie ON canonical_brands(dutchie_brand_id) WHERE dutchie_brand_id IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_canonical_brands_portfolio ON canonical_brands(is_portfolio_brand) WHERE is_portfolio_brand = TRUE; +CREATE INDEX IF NOT EXISTS idx_canonical_brands_states ON canonical_brands USING GIN(states_available); + +COMMENT ON TABLE canonical_brands IS 'Canonical brand catalog across all providers. Master brand reference.'; +COMMENT ON COLUMN canonical_brands.is_portfolio_brand IS 'TRUE if this is a brand CannaiQ represents/manages.'; + + +-- ============================================================================ +-- SECTION 4: CRAWL_RUNS TABLE +-- ============================================================================ +-- One record per crawl execution. Links to snapshots. + +CREATE TABLE IF NOT EXISTS crawl_runs ( + id SERIAL PRIMARY KEY, + dispensary_id INTEGER NOT NULL REFERENCES dispensaries(id) ON DELETE CASCADE, + state_id INTEGER REFERENCES states(id), + + -- Provider info + provider VARCHAR(50) NOT NULL DEFAULT 'dutchie', + + -- Timing + started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + finished_at TIMESTAMPTZ, + duration_ms INTEGER, + + -- Status + status VARCHAR(20) NOT NULL DEFAULT 'running', -- running, success, failed, partial + error_code VARCHAR(50), + error_message TEXT, + http_status INTEGER, + + -- Results + products_found INTEGER DEFAULT 0, + products_new INTEGER DEFAULT 0, + products_updated INTEGER DEFAULT 0, + products_missing INTEGER DEFAULT 0, -- Products gone from feed + snapshots_written INTEGER DEFAULT 0, + + -- Infrastructure + worker_id VARCHAR(100), + worker_hostname VARCHAR(100), + proxy_used TEXT, + trigger_type VARCHAR(50) DEFAULT 'scheduled', -- scheduled, manual, api + + -- Metadata + metadata JSONB DEFAULT '{}', + + created_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_crawl_runs_dispensary ON crawl_runs(dispensary_id); +CREATE INDEX IF NOT EXISTS idx_crawl_runs_state ON crawl_runs(state_id) WHERE state_id IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_crawl_runs_status ON crawl_runs(status); +CREATE INDEX IF NOT EXISTS idx_crawl_runs_started ON crawl_runs(started_at DESC); +CREATE INDEX IF NOT EXISTS idx_crawl_runs_dispensary_started ON crawl_runs(dispensary_id, started_at DESC); + +COMMENT ON TABLE crawl_runs IS 'Each crawl execution. Links to snapshots and traces.'; + + +-- ============================================================================ +-- SECTION 5: STORE_PRODUCTS TABLE (Current Menu State) +-- ============================================================================ +-- Canonical representation of what's currently on the menu. +-- Provider-agnostic structure for analytics. + +CREATE TABLE IF NOT EXISTS store_products ( + id SERIAL PRIMARY KEY, + dispensary_id INTEGER NOT NULL REFERENCES dispensaries(id) ON DELETE CASCADE, + state_id INTEGER REFERENCES states(id), + + -- Links to canonical entities + canonical_brand_id INTEGER REFERENCES canonical_brands(id) ON DELETE SET NULL, + category_id INTEGER REFERENCES categories(id) ON DELETE SET NULL, + + -- Provider-specific identifiers + provider VARCHAR(50) NOT NULL DEFAULT 'dutchie', + provider_product_id VARCHAR(100) NOT NULL, -- Platform product ID + provider_brand_id VARCHAR(100), -- Platform brand ID + enterprise_product_id VARCHAR(100), -- Cross-store product ID + + -- Raw data from platform (not normalized) + name VARCHAR(500) NOT NULL, + brand_name VARCHAR(255), + category VARCHAR(100), + subcategory VARCHAR(100), + strain_type VARCHAR(50), + description TEXT, + + -- Pricing (current) + price_rec NUMERIC(10,2), + price_med NUMERIC(10,2), + price_rec_special NUMERIC(10,2), + price_med_special NUMERIC(10,2), + is_on_special BOOLEAN DEFAULT FALSE, + special_name TEXT, + discount_percent NUMERIC(5,2), + price_unit VARCHAR(20) DEFAULT 'each', -- gram, ounce, each, mg + + -- Inventory + is_in_stock BOOLEAN DEFAULT TRUE, + stock_quantity INTEGER, + stock_status VARCHAR(50) DEFAULT 'in_stock', -- in_stock, out_of_stock, low_stock, missing_from_feed + + -- Potency + thc_percent NUMERIC(5,2), + cbd_percent NUMERIC(5,2), + thc_mg NUMERIC(10,2), + cbd_mg NUMERIC(10,2), + + -- Weight/Size + weight_value NUMERIC(10,2), + weight_unit VARCHAR(20), -- g, oz, mg + + -- Images + image_url TEXT, + local_image_path TEXT, + thumbnail_url TEXT, + + -- Flags + is_featured BOOLEAN DEFAULT FALSE, + medical_only BOOLEAN DEFAULT FALSE, + rec_only BOOLEAN DEFAULT FALSE, + + -- Menu position (for tracking prominence) + menu_position INTEGER, + + -- Timestamps + first_seen_at TIMESTAMPTZ DEFAULT NOW(), + last_seen_at TIMESTAMPTZ DEFAULT NOW(), + last_price_change_at TIMESTAMPTZ, + last_stock_change_at TIMESTAMPTZ, + + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW(), + + UNIQUE(dispensary_id, provider, provider_product_id) +); + +CREATE INDEX IF NOT EXISTS idx_store_products_dispensary ON store_products(dispensary_id); +CREATE INDEX IF NOT EXISTS idx_store_products_state ON store_products(state_id) WHERE state_id IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_store_products_brand ON store_products(canonical_brand_id) WHERE canonical_brand_id IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_store_products_category ON store_products(category) WHERE category IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_store_products_in_stock ON store_products(dispensary_id, is_in_stock); +CREATE INDEX IF NOT EXISTS idx_store_products_special ON store_products(dispensary_id, is_on_special) WHERE is_on_special = TRUE; +CREATE INDEX IF NOT EXISTS idx_store_products_last_seen ON store_products(last_seen_at DESC); +CREATE INDEX IF NOT EXISTS idx_store_products_provider ON store_products(provider); +CREATE INDEX IF NOT EXISTS idx_store_products_enterprise ON store_products(enterprise_product_id) WHERE enterprise_product_id IS NOT NULL; + +COMMENT ON TABLE store_products IS 'Current state of products on each dispensary menu. Provider-agnostic.'; + + +-- ============================================================================ +-- SECTION 6: STORE_PRODUCT_SNAPSHOTS TABLE (Historical Data) +-- ============================================================================ +-- Time-series data for analytics. One row per product per crawl. +-- CRITICAL: NEVER DELETE from this table. + +CREATE TABLE IF NOT EXISTS store_product_snapshots ( + id SERIAL PRIMARY KEY, + dispensary_id INTEGER NOT NULL REFERENCES dispensaries(id) ON DELETE CASCADE, + store_product_id INTEGER REFERENCES store_products(id) ON DELETE SET NULL, + state_id INTEGER REFERENCES states(id), + + -- Provider info + provider VARCHAR(50) NOT NULL DEFAULT 'dutchie', + provider_product_id VARCHAR(100), + + -- Link to crawl run + crawl_run_id INTEGER REFERENCES crawl_runs(id) ON DELETE SET NULL, + + -- Capture timestamp + captured_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + + -- Raw data from platform + name VARCHAR(500), + brand_name VARCHAR(255), + category VARCHAR(100), + subcategory VARCHAR(100), + + -- Pricing at time of capture + price_rec NUMERIC(10,2), + price_med NUMERIC(10,2), + price_rec_special NUMERIC(10,2), + price_med_special NUMERIC(10,2), + is_on_special BOOLEAN DEFAULT FALSE, + discount_percent NUMERIC(5,2), + + -- Inventory at time of capture + is_in_stock BOOLEAN DEFAULT TRUE, + stock_quantity INTEGER, + stock_status VARCHAR(50) DEFAULT 'in_stock', + is_present_in_feed BOOLEAN DEFAULT TRUE, -- FALSE = missing from feed + + -- Potency at time of capture + thc_percent NUMERIC(5,2), + cbd_percent NUMERIC(5,2), + + -- Menu position (for tracking prominence changes) + menu_position INTEGER, + + -- Image URL at time of capture + image_url TEXT, + + -- Full raw response for debugging + raw_data JSONB, + + created_at TIMESTAMPTZ DEFAULT NOW() +); + +-- Partitioning-ready indexes (for future table partitioning by month) +CREATE INDEX IF NOT EXISTS idx_snapshots_dispensary_captured ON store_product_snapshots(dispensary_id, captured_at DESC); +CREATE INDEX IF NOT EXISTS idx_snapshots_state_captured ON store_product_snapshots(state_id, captured_at DESC) WHERE state_id IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_snapshots_product_captured ON store_product_snapshots(store_product_id, captured_at DESC) WHERE store_product_id IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_snapshots_crawl_run ON store_product_snapshots(crawl_run_id) WHERE crawl_run_id IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_snapshots_captured_at ON store_product_snapshots(captured_at DESC); +CREATE INDEX IF NOT EXISTS idx_snapshots_brand ON store_product_snapshots(brand_name) WHERE brand_name IS NOT NULL; + +COMMENT ON TABLE store_product_snapshots IS 'Historical crawl data. One row per product per crawl. NEVER DELETE.'; + + +-- ============================================================================ +-- SECTION 7: ADD state_id AND chain_id TO DISPENSARIES +-- ============================================================================ +-- Link dispensaries to states and chains tables. + +ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS state_id INTEGER REFERENCES states(id); +ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS chain_id INTEGER REFERENCES chains(id); + +CREATE INDEX IF NOT EXISTS idx_dispensaries_state_id ON dispensaries(state_id); +CREATE INDEX IF NOT EXISTS idx_dispensaries_chain_id ON dispensaries(chain_id) WHERE chain_id IS NOT NULL; + +-- Backfill state_id from existing state column +UPDATE dispensaries d +SET state_id = s.id +FROM states s +WHERE d.state = s.code + AND d.state_id IS NULL; + +COMMENT ON COLUMN dispensaries.state_id IS 'FK to states table. Canonical state reference.'; +COMMENT ON COLUMN dispensaries.chain_id IS 'FK to chains table. NULL if independent dispensary.'; + + +-- ============================================================================ +-- SECTION 8: BRAND PENETRATION TABLE +-- ============================================================================ +-- Pre-computed brand presence across stores for analytics dashboards. + +CREATE TABLE IF NOT EXISTS brand_penetration ( + id SERIAL PRIMARY KEY, + canonical_brand_id INTEGER NOT NULL REFERENCES canonical_brands(id) ON DELETE CASCADE, + state_id INTEGER NOT NULL REFERENCES states(id) ON DELETE CASCADE, + + -- Metrics + stores_carrying INTEGER DEFAULT 0, + stores_total INTEGER DEFAULT 0, + penetration_pct NUMERIC(5,2) DEFAULT 0, + + -- Product breakdown + products_count INTEGER DEFAULT 0, + products_in_stock INTEGER DEFAULT 0, + products_on_special INTEGER DEFAULT 0, + + -- Pricing + avg_price NUMERIC(10,2), + min_price NUMERIC(10,2), + max_price NUMERIC(10,2), + + -- Time range + calculated_at TIMESTAMPTZ DEFAULT NOW(), + period_start TIMESTAMPTZ, + period_end TIMESTAMPTZ, + + UNIQUE(canonical_brand_id, state_id, calculated_at) +); + +CREATE INDEX IF NOT EXISTS idx_brand_penetration_brand ON brand_penetration(canonical_brand_id); +CREATE INDEX IF NOT EXISTS idx_brand_penetration_state ON brand_penetration(state_id); +CREATE INDEX IF NOT EXISTS idx_brand_penetration_calculated ON brand_penetration(calculated_at DESC); + +COMMENT ON TABLE brand_penetration IS 'Pre-computed brand penetration metrics by state.'; + + +-- ============================================================================ +-- SECTION 9: PRICE_ALERTS TABLE +-- ============================================================================ +-- Track significant price changes for intelligence/alerts. + +CREATE TABLE IF NOT EXISTS price_alerts ( + id SERIAL PRIMARY KEY, + store_product_id INTEGER REFERENCES store_products(id) ON DELETE CASCADE, + dispensary_id INTEGER NOT NULL REFERENCES dispensaries(id) ON DELETE CASCADE, + state_id INTEGER REFERENCES states(id), + + -- What changed + alert_type VARCHAR(50) NOT NULL, -- price_drop, price_increase, new_special, special_ended + + -- Values + old_price NUMERIC(10,2), + new_price NUMERIC(10,2), + change_amount NUMERIC(10,2), + change_percent NUMERIC(5,2), + + -- Context + product_name VARCHAR(500), + brand_name VARCHAR(255), + category VARCHAR(100), + + -- Status + is_processed BOOLEAN DEFAULT FALSE, + processed_at TIMESTAMPTZ, + + created_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_price_alerts_dispensary ON price_alerts(dispensary_id); +CREATE INDEX IF NOT EXISTS idx_price_alerts_state ON price_alerts(state_id) WHERE state_id IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_price_alerts_type ON price_alerts(alert_type); +CREATE INDEX IF NOT EXISTS idx_price_alerts_unprocessed ON price_alerts(is_processed) WHERE is_processed = FALSE; +CREATE INDEX IF NOT EXISTS idx_price_alerts_created ON price_alerts(created_at DESC); + +COMMENT ON TABLE price_alerts IS 'Significant price changes for intelligence/alerting.'; + + +-- ============================================================================ +-- SECTION 10: RAW_PAYLOADS TABLE +-- ============================================================================ +-- Store raw API responses for replay/debugging. Separate from snapshots. + +CREATE TABLE IF NOT EXISTS raw_payloads ( + id SERIAL PRIMARY KEY, + dispensary_id INTEGER NOT NULL REFERENCES dispensaries(id) ON DELETE CASCADE, + crawl_run_id INTEGER REFERENCES crawl_runs(id) ON DELETE SET NULL, + + -- Payload info + provider VARCHAR(50) NOT NULL DEFAULT 'dutchie', + payload_type VARCHAR(50) NOT NULL DEFAULT 'products', -- products, brands, specials + + -- The raw data + payload JSONB NOT NULL, + payload_size_bytes INTEGER, + + -- Deduplication + payload_hash VARCHAR(64), -- SHA256 for deduplication + + -- Processing status + is_processed BOOLEAN DEFAULT FALSE, + processed_at TIMESTAMPTZ, + + captured_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_raw_payloads_dispensary ON raw_payloads(dispensary_id, captured_at DESC); +CREATE INDEX IF NOT EXISTS idx_raw_payloads_crawl_run ON raw_payloads(crawl_run_id) WHERE crawl_run_id IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_raw_payloads_unprocessed ON raw_payloads(is_processed) WHERE is_processed = FALSE; +CREATE INDEX IF NOT EXISTS idx_raw_payloads_hash ON raw_payloads(payload_hash) WHERE payload_hash IS NOT NULL; + +COMMENT ON TABLE raw_payloads IS 'Raw API responses for replay/debugging. Enables re-hydration.'; + + +-- ============================================================================ +-- SECTION 11: ANALYTICS CACHE TABLES +-- ============================================================================ +-- Pre-computed analytics for dashboard performance. + +-- Daily store metrics +CREATE TABLE IF NOT EXISTS analytics_store_daily ( + id SERIAL PRIMARY KEY, + dispensary_id INTEGER NOT NULL REFERENCES dispensaries(id) ON DELETE CASCADE, + state_id INTEGER REFERENCES states(id), + date DATE NOT NULL, + + -- Product counts + total_products INTEGER DEFAULT 0, + in_stock_products INTEGER DEFAULT 0, + out_of_stock_products INTEGER DEFAULT 0, + on_special_products INTEGER DEFAULT 0, + + -- Brand/category diversity + unique_brands INTEGER DEFAULT 0, + unique_categories INTEGER DEFAULT 0, + + -- Pricing + avg_price NUMERIC(10,2), + median_price NUMERIC(10,2), + + -- Crawl health + crawl_count INTEGER DEFAULT 0, + successful_crawls INTEGER DEFAULT 0, + + created_at TIMESTAMPTZ DEFAULT NOW(), + + UNIQUE(dispensary_id, date) +); + +CREATE INDEX IF NOT EXISTS idx_analytics_store_daily_dispensary ON analytics_store_daily(dispensary_id, date DESC); +CREATE INDEX IF NOT EXISTS idx_analytics_store_daily_state ON analytics_store_daily(state_id, date DESC) WHERE state_id IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_analytics_store_daily_date ON analytics_store_daily(date DESC); + + +-- Daily brand metrics +CREATE TABLE IF NOT EXISTS analytics_brand_daily ( + id SERIAL PRIMARY KEY, + canonical_brand_id INTEGER NOT NULL REFERENCES canonical_brands(id) ON DELETE CASCADE, + state_id INTEGER REFERENCES states(id), + date DATE NOT NULL, + + -- Presence + stores_carrying INTEGER DEFAULT 0, + products_count INTEGER DEFAULT 0, + + -- Stock + in_stock_count INTEGER DEFAULT 0, + out_of_stock_count INTEGER DEFAULT 0, + + -- Pricing + avg_price NUMERIC(10,2), + min_price NUMERIC(10,2), + max_price NUMERIC(10,2), + on_special_count INTEGER DEFAULT 0, + + created_at TIMESTAMPTZ DEFAULT NOW(), + + UNIQUE(canonical_brand_id, state_id, date) +); + +CREATE INDEX IF NOT EXISTS idx_analytics_brand_daily_brand ON analytics_brand_daily(canonical_brand_id, date DESC); +CREATE INDEX IF NOT EXISTS idx_analytics_brand_daily_state ON analytics_brand_daily(state_id, date DESC) WHERE state_id IS NOT NULL; + + +-- ============================================================================ +-- SECTION 12: VIEWS FOR COMPATIBILITY +-- ============================================================================ + +-- View: Latest snapshot per store product +CREATE OR REPLACE VIEW v_latest_store_snapshots AS +SELECT DISTINCT ON (dispensary_id, provider_product_id) + sps.* +FROM store_product_snapshots sps +ORDER BY dispensary_id, provider_product_id, captured_at DESC; + +-- View: Crawl run summary per dispensary +CREATE OR REPLACE VIEW v_dispensary_crawl_summary AS +SELECT + d.id AS dispensary_id, + COALESCE(d.dba_name, d.name) AS dispensary_name, + d.city, + d.state, + d.state_id, + s.name AS state_name, + COUNT(DISTINCT sp.id) AS current_product_count, + COUNT(DISTINCT sp.id) FILTER (WHERE sp.is_in_stock) AS in_stock_count, + COUNT(DISTINCT sp.id) FILTER (WHERE sp.is_on_special) AS on_special_count, + MAX(cr.finished_at) AS last_crawl_at, + (SELECT status FROM crawl_runs WHERE dispensary_id = d.id ORDER BY started_at DESC LIMIT 1) AS last_crawl_status +FROM dispensaries d +LEFT JOIN states s ON s.id = d.state_id +LEFT JOIN store_products sp ON sp.dispensary_id = d.id +LEFT JOIN crawl_runs cr ON cr.dispensary_id = d.id +GROUP BY d.id, d.dba_name, d.name, d.city, d.state, d.state_id, s.name; + +-- View: Brand presence across stores +CREATE OR REPLACE VIEW v_brand_store_presence AS +SELECT + cb.id AS brand_id, + cb.name AS brand_name, + cb.slug AS brand_slug, + s.id AS state_id, + s.code AS state_code, + COUNT(DISTINCT sp.dispensary_id) AS store_count, + COUNT(sp.id) AS product_count, + COUNT(sp.id) FILTER (WHERE sp.is_in_stock) AS in_stock_count, + AVG(sp.price_rec) AS avg_price, + MIN(sp.price_rec) AS min_price, + MAX(sp.price_rec) AS max_price +FROM canonical_brands cb +JOIN store_products sp ON sp.canonical_brand_id = cb.id +LEFT JOIN states s ON s.id = sp.state_id +GROUP BY cb.id, cb.name, cb.slug, s.id, s.code; + + +-- ============================================================================ +-- SECTION 13: ADD FK FROM store_product_snapshots TO crawl_runs +-- ============================================================================ + +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM information_schema.table_constraints + WHERE constraint_name = 'store_product_snapshots_crawl_run_id_fkey' + ) THEN + ALTER TABLE store_product_snapshots + ADD CONSTRAINT store_product_snapshots_crawl_run_id_fkey + FOREIGN KEY (crawl_run_id) REFERENCES crawl_runs(id) ON DELETE SET NULL; + END IF; +END $$; + + +-- ============================================================================ +-- SECTION 14: ADD crawl_run_id TO crawl_orchestration_traces +-- ============================================================================ + +ALTER TABLE crawl_orchestration_traces + ADD COLUMN IF NOT EXISTS crawl_run_id INTEGER REFERENCES crawl_runs(id) ON DELETE SET NULL; + +CREATE INDEX IF NOT EXISTS idx_traces_crawl_run + ON crawl_orchestration_traces(crawl_run_id) + WHERE crawl_run_id IS NOT NULL; + + +-- ============================================================================ +-- SECTION 15: UPDATE dispensary_crawler_profiles +-- ============================================================================ +-- Add status columns for profile lifecycle. + +ALTER TABLE dispensary_crawler_profiles + ADD COLUMN IF NOT EXISTS status VARCHAR(50) DEFAULT 'sandbox'; + +ALTER TABLE dispensary_crawler_profiles + ADD COLUMN IF NOT EXISTS allow_autopromote BOOLEAN DEFAULT FALSE; + +ALTER TABLE dispensary_crawler_profiles + ADD COLUMN IF NOT EXISTS validated_at TIMESTAMPTZ; + +CREATE INDEX IF NOT EXISTS idx_profiles_status + ON dispensary_crawler_profiles(status); + +COMMENT ON COLUMN dispensary_crawler_profiles.status IS 'Profile status: sandbox, production, needs_manual, disabled'; + + +-- ============================================================================ +-- SECTION 16: UPDATE dispensary_crawl_jobs WITH ADDITIONAL COLUMNS +-- ============================================================================ +-- Add columns needed for enhanced job tracking. + +ALTER TABLE dispensary_crawl_jobs + ADD COLUMN IF NOT EXISTS worker_id VARCHAR(100); + +ALTER TABLE dispensary_crawl_jobs + ADD COLUMN IF NOT EXISTS worker_hostname VARCHAR(100); + +ALTER TABLE dispensary_crawl_jobs + ADD COLUMN IF NOT EXISTS claimed_by VARCHAR(100); + +ALTER TABLE dispensary_crawl_jobs + ADD COLUMN IF NOT EXISTS claimed_at TIMESTAMPTZ; + +ALTER TABLE dispensary_crawl_jobs + ADD COLUMN IF NOT EXISTS locked_until TIMESTAMPTZ; + +ALTER TABLE dispensary_crawl_jobs + ADD COLUMN IF NOT EXISTS last_heartbeat_at TIMESTAMPTZ; + +ALTER TABLE dispensary_crawl_jobs + ADD COLUMN IF NOT EXISTS max_retries INTEGER DEFAULT 3; + +ALTER TABLE dispensary_crawl_jobs + ADD COLUMN IF NOT EXISTS products_upserted INTEGER DEFAULT 0; + +ALTER TABLE dispensary_crawl_jobs + ADD COLUMN IF NOT EXISTS snapshots_created INTEGER DEFAULT 0; + +ALTER TABLE dispensary_crawl_jobs + ADD COLUMN IF NOT EXISTS current_page INTEGER DEFAULT 0; + +ALTER TABLE dispensary_crawl_jobs + ADD COLUMN IF NOT EXISTS total_pages INTEGER; + +CREATE INDEX IF NOT EXISTS idx_crawl_jobs_status_pending ON dispensary_crawl_jobs(status) WHERE status = 'pending'; +CREATE INDEX IF NOT EXISTS idx_crawl_jobs_claimed_by ON dispensary_crawl_jobs(claimed_by) WHERE claimed_by IS NOT NULL; + + +-- ============================================================================ +-- SECTION 17: QUEUE MONITORING VIEWS +-- ============================================================================ + +CREATE OR REPLACE VIEW v_queue_stats AS +SELECT + (SELECT COUNT(*) FROM dispensary_crawl_jobs WHERE status = 'pending') AS pending_jobs, + (SELECT COUNT(*) FROM dispensary_crawl_jobs WHERE status = 'running') AS running_jobs, + (SELECT COUNT(*) FROM dispensary_crawl_jobs WHERE status = 'completed' AND completed_at > NOW() - INTERVAL '1 hour') AS completed_1h, + (SELECT COUNT(*) FROM dispensary_crawl_jobs WHERE status = 'failed' AND completed_at > NOW() - INTERVAL '1 hour') AS failed_1h, + (SELECT COUNT(DISTINCT worker_id) FROM dispensary_crawl_jobs WHERE status = 'running' AND worker_id IS NOT NULL) AS active_workers, + (SELECT AVG(EXTRACT(EPOCH FROM (completed_at - started_at))) FROM dispensary_crawl_jobs WHERE status = 'completed' AND completed_at > NOW() - INTERVAL '1 hour') AS avg_duration_seconds; + +CREATE OR REPLACE VIEW v_active_workers AS +SELECT + worker_id, + worker_hostname, + COUNT(*) AS current_jobs, + SUM(products_found) AS total_products_found, + SUM(products_upserted) AS total_products_upserted, + SUM(snapshots_created) AS total_snapshots, + MIN(claimed_at) AS first_claimed_at, + MAX(last_heartbeat_at) AS last_heartbeat +FROM dispensary_crawl_jobs +WHERE status = 'running' AND worker_id IS NOT NULL +GROUP BY worker_id, worker_hostname; + + +-- ============================================================================ +-- DONE +-- ============================================================================ + +SELECT 'Migration 050 completed successfully. Canonical schema v2 is ready.' AS status; diff --git a/backend/migrations/051_cannaiq_canonical_safe_bootstrap.sql b/backend/migrations/051_cannaiq_canonical_safe_bootstrap.sql new file mode 100644 index 00000000..31142975 --- /dev/null +++ b/backend/migrations/051_cannaiq_canonical_safe_bootstrap.sql @@ -0,0 +1,642 @@ +-- ============================================================================ +-- Migration 051: CannaiQ Canonical Schema - Safe Bootstrap +-- ============================================================================ +-- +-- Purpose: Create the canonical CannaiQ schema tables from scratch. +-- This migration is FULLY IDEMPOTENT and safe to run multiple times. +-- +-- SAFETY RULES FOLLOWED: +-- 1. ALL tables use CREATE TABLE IF NOT EXISTS +-- 2. ALL columns use ALTER TABLE ADD COLUMN IF NOT EXISTS +-- 3. ALL indexes use CREATE INDEX IF NOT EXISTS +-- 4. NO DROP, DELETE, TRUNCATE, or destructive operations +-- 5. NO assumptions about existing data or column existence +-- 6. NO dependencies on migrations 041, 043, or 050 +-- 7. Compatible with dutchie_menus database as it exists today +-- 8. Safe handling of pre-existing states table with missing columns +-- +-- Tables Created: +-- - states (US state reference table) +-- - chains (retail chain/group table) +-- - crawl_runs (crawl execution records) +-- - store_products (current menu state) +-- - store_product_snapshots (historical price/stock data) +-- +-- Columns Added: +-- - dispensaries.state_id (FK to states) +-- - dispensaries.chain_id (FK to chains) +-- +-- Run with: +-- psql "postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" \ +-- -f migrations/051_cannaiq_canonical_safe_bootstrap.sql +-- +-- ============================================================================ + + +-- ============================================================================ +-- SECTION 1: STATES TABLE +-- ============================================================================ +-- Reference table for US states where CannaiQ operates. +-- This section handles the case where the table exists but is missing columns. + +-- First, create the table if it doesn't exist (minimal definition) +CREATE TABLE IF NOT EXISTS states ( + id SERIAL PRIMARY KEY, + code VARCHAR(2) NOT NULL, + name VARCHAR(100) NOT NULL, + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW() +); + +-- Now safely add any missing columns (each is independent, won't fail if exists) +ALTER TABLE states ADD COLUMN IF NOT EXISTS timezone TEXT; +ALTER TABLE states ADD COLUMN IF NOT EXISTS is_active BOOLEAN DEFAULT TRUE; +ALTER TABLE states ADD COLUMN IF NOT EXISTS crawl_enabled BOOLEAN DEFAULT TRUE; + +-- Add unique constraint on code if not exists +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM pg_constraint + WHERE conname = 'states_code_key' AND conrelid = 'states'::regclass + ) THEN + -- Check if there's already a unique constraint with a different name + IF NOT EXISTS ( + SELECT 1 FROM pg_indexes + WHERE tablename = 'states' AND indexdef LIKE '%UNIQUE%code%' + ) THEN + ALTER TABLE states ADD CONSTRAINT states_code_key UNIQUE (code); + END IF; + END IF; +EXCEPTION + WHEN duplicate_object THEN + NULL; -- Constraint already exists + WHEN OTHERS THEN + NULL; -- Handle any other errors gracefully +END $$; + +-- Set default timezone values for existing rows that have NULL +UPDATE states SET timezone = 'America/Phoenix' WHERE timezone IS NULL AND code = 'AZ'; +UPDATE states SET timezone = 'America/Los_Angeles' WHERE timezone IS NULL AND code IN ('CA', 'NV', 'OR', 'WA'); +UPDATE states SET timezone = 'America/Denver' WHERE timezone IS NULL AND code = 'CO'; +UPDATE states SET timezone = 'America/New_York' WHERE timezone IS NULL AND code IN ('FL', 'MA', 'MD', 'NJ', 'NY', 'OH', 'PA'); +UPDATE states SET timezone = 'America/Chicago' WHERE timezone IS NULL AND code IN ('IL', 'MO', 'OK'); +UPDATE states SET timezone = 'America/Detroit' WHERE timezone IS NULL AND code = 'MI'; + +-- Set default is_active for existing rows +UPDATE states SET is_active = TRUE WHERE is_active IS NULL; +UPDATE states SET crawl_enabled = TRUE WHERE crawl_enabled IS NULL; + +-- Insert known states (idempotent - ON CONFLICT DO UPDATE to fill missing values) +INSERT INTO states (code, name, timezone, is_active, crawl_enabled) VALUES + ('AZ', 'Arizona', 'America/Phoenix', TRUE, TRUE), + ('CA', 'California', 'America/Los_Angeles', TRUE, TRUE), + ('CO', 'Colorado', 'America/Denver', TRUE, TRUE), + ('FL', 'Florida', 'America/New_York', TRUE, TRUE), + ('IL', 'Illinois', 'America/Chicago', TRUE, TRUE), + ('MA', 'Massachusetts', 'America/New_York', TRUE, TRUE), + ('MD', 'Maryland', 'America/New_York', TRUE, TRUE), + ('MI', 'Michigan', 'America/Detroit', TRUE, TRUE), + ('MO', 'Missouri', 'America/Chicago', TRUE, TRUE), + ('NV', 'Nevada', 'America/Los_Angeles', TRUE, TRUE), + ('NJ', 'New Jersey', 'America/New_York', TRUE, TRUE), + ('NY', 'New York', 'America/New_York', TRUE, TRUE), + ('OH', 'Ohio', 'America/New_York', TRUE, TRUE), + ('OK', 'Oklahoma', 'America/Chicago', TRUE, TRUE), + ('OR', 'Oregon', 'America/Los_Angeles', TRUE, TRUE), + ('PA', 'Pennsylvania', 'America/New_York', TRUE, TRUE), + ('WA', 'Washington', 'America/Los_Angeles', TRUE, TRUE) +ON CONFLICT (code) DO UPDATE SET + timezone = COALESCE(states.timezone, EXCLUDED.timezone), + is_active = COALESCE(states.is_active, EXCLUDED.is_active), + crawl_enabled = COALESCE(states.crawl_enabled, EXCLUDED.crawl_enabled), + updated_at = NOW(); + +CREATE INDEX IF NOT EXISTS idx_states_code ON states(code); +CREATE INDEX IF NOT EXISTS idx_states_active ON states(is_active) WHERE is_active = TRUE; + +COMMENT ON TABLE states IS 'US states where CannaiQ operates. Single source of truth for state configuration.'; + + +-- ============================================================================ +-- SECTION 2: CHAINS TABLE +-- ============================================================================ +-- Retail chains/groups that own multiple dispensary locations. +-- Examples: Curaleaf, Trulieve, Harvest, Columbia Care + +CREATE TABLE IF NOT EXISTS chains ( + id SERIAL PRIMARY KEY, + name VARCHAR(255) NOT NULL, + slug VARCHAR(255) NOT NULL, + website_url TEXT, + logo_url TEXT, + description TEXT, + headquarters_city VARCHAR(100), + headquarters_state_id INTEGER, + founded_year INTEGER, + is_active BOOLEAN DEFAULT TRUE, + is_public BOOLEAN DEFAULT FALSE, + stock_ticker VARCHAR(10), + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW() +); + +-- Add unique constraint on slug if not exists +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM pg_constraint + WHERE conname = 'chains_slug_key' AND conrelid = 'chains'::regclass + ) THEN + ALTER TABLE chains ADD CONSTRAINT chains_slug_key UNIQUE (slug); + END IF; +EXCEPTION + WHEN duplicate_object THEN + NULL; + WHEN OTHERS THEN + NULL; +END $$; + +-- Add FK to states if not exists +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM pg_constraint + WHERE conname = 'chains_headquarters_state_id_fkey' + ) THEN + ALTER TABLE chains + ADD CONSTRAINT chains_headquarters_state_id_fkey + FOREIGN KEY (headquarters_state_id) REFERENCES states(id) ON DELETE SET NULL; + END IF; +EXCEPTION + WHEN duplicate_object THEN + NULL; + WHEN OTHERS THEN + NULL; +END $$; + +CREATE INDEX IF NOT EXISTS idx_chains_slug ON chains(slug); +CREATE INDEX IF NOT EXISTS idx_chains_active ON chains(is_active) WHERE is_active = TRUE; + +COMMENT ON TABLE chains IS 'Retail chains/groups that own multiple dispensary locations.'; + + +-- ============================================================================ +-- SECTION 3: ADD state_id AND chain_id TO DISPENSARIES +-- ============================================================================ +-- Link existing dispensaries table to states and chains. + +ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS state_id INTEGER; +ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS chain_id INTEGER; + +-- Add FK constraints if not exist +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM pg_constraint + WHERE conname = 'dispensaries_state_id_fkey' + ) THEN + ALTER TABLE dispensaries + ADD CONSTRAINT dispensaries_state_id_fkey + FOREIGN KEY (state_id) REFERENCES states(id) ON DELETE SET NULL; + END IF; +EXCEPTION + WHEN duplicate_object THEN + NULL; + WHEN OTHERS THEN + NULL; +END $$; + +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM pg_constraint + WHERE conname = 'dispensaries_chain_id_fkey' + ) THEN + ALTER TABLE dispensaries + ADD CONSTRAINT dispensaries_chain_id_fkey + FOREIGN KEY (chain_id) REFERENCES chains(id) ON DELETE SET NULL; + END IF; +EXCEPTION + WHEN duplicate_object THEN + NULL; + WHEN OTHERS THEN + NULL; +END $$; + +CREATE INDEX IF NOT EXISTS idx_dispensaries_state_id ON dispensaries(state_id); +CREATE INDEX IF NOT EXISTS idx_dispensaries_chain_id ON dispensaries(chain_id) WHERE chain_id IS NOT NULL; + +-- Backfill state_id from existing state column (safe - only updates NULL values) +UPDATE dispensaries d +SET state_id = s.id +FROM states s +WHERE d.state = s.code + AND d.state_id IS NULL; + +COMMENT ON COLUMN dispensaries.state_id IS 'FK to states table. Canonical state reference.'; +COMMENT ON COLUMN dispensaries.chain_id IS 'FK to chains table. NULL if independent dispensary.'; + + +-- ============================================================================ +-- SECTION 4: CRAWL_RUNS TABLE +-- ============================================================================ +-- One record per crawl execution. Links to snapshots. + +CREATE TABLE IF NOT EXISTS crawl_runs ( + id SERIAL PRIMARY KEY, + dispensary_id INTEGER NOT NULL, + state_id INTEGER, + + -- Provider info + provider VARCHAR(50) NOT NULL DEFAULT 'dutchie', + + -- Timing + started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + finished_at TIMESTAMPTZ, + duration_ms INTEGER, + + -- Status + status VARCHAR(20) NOT NULL DEFAULT 'running', + error_code VARCHAR(50), + error_message TEXT, + http_status INTEGER, + + -- Results + products_found INTEGER DEFAULT 0, + products_new INTEGER DEFAULT 0, + products_updated INTEGER DEFAULT 0, + products_missing INTEGER DEFAULT 0, + snapshots_written INTEGER DEFAULT 0, + + -- Infrastructure + worker_id VARCHAR(100), + worker_hostname VARCHAR(100), + proxy_used TEXT, + trigger_type VARCHAR(50) DEFAULT 'scheduled', + + -- Metadata + metadata JSONB DEFAULT '{}', + + created_at TIMESTAMPTZ DEFAULT NOW() +); + +-- Add FK constraints if not exist +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM pg_constraint + WHERE conname = 'crawl_runs_dispensary_id_fkey' + ) THEN + ALTER TABLE crawl_runs + ADD CONSTRAINT crawl_runs_dispensary_id_fkey + FOREIGN KEY (dispensary_id) REFERENCES dispensaries(id) ON DELETE CASCADE; + END IF; +EXCEPTION + WHEN duplicate_object THEN + NULL; + WHEN OTHERS THEN + NULL; +END $$; + +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM pg_constraint + WHERE conname = 'crawl_runs_state_id_fkey' + ) THEN + ALTER TABLE crawl_runs + ADD CONSTRAINT crawl_runs_state_id_fkey + FOREIGN KEY (state_id) REFERENCES states(id) ON DELETE SET NULL; + END IF; +EXCEPTION + WHEN duplicate_object THEN + NULL; + WHEN OTHERS THEN + NULL; +END $$; + +CREATE INDEX IF NOT EXISTS idx_crawl_runs_dispensary ON crawl_runs(dispensary_id); +CREATE INDEX IF NOT EXISTS idx_crawl_runs_state ON crawl_runs(state_id) WHERE state_id IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_crawl_runs_status ON crawl_runs(status); +CREATE INDEX IF NOT EXISTS idx_crawl_runs_started ON crawl_runs(started_at DESC); +CREATE INDEX IF NOT EXISTS idx_crawl_runs_dispensary_started ON crawl_runs(dispensary_id, started_at DESC); + +COMMENT ON TABLE crawl_runs IS 'Each crawl execution. Links to snapshots and traces.'; + + +-- ============================================================================ +-- SECTION 5: STORE_PRODUCTS TABLE +-- ============================================================================ +-- Current state of products on each dispensary menu. +-- Provider-agnostic structure for analytics. + +CREATE TABLE IF NOT EXISTS store_products ( + id SERIAL PRIMARY KEY, + dispensary_id INTEGER NOT NULL, + state_id INTEGER, + + -- Provider-specific identifiers + provider VARCHAR(50) NOT NULL DEFAULT 'dutchie', + provider_product_id VARCHAR(100) NOT NULL, + provider_brand_id VARCHAR(100), + enterprise_product_id VARCHAR(100), + + -- Raw data from platform (not normalized) + name VARCHAR(500) NOT NULL, + brand_name VARCHAR(255), + category VARCHAR(100), + subcategory VARCHAR(100), + strain_type VARCHAR(50), + description TEXT, + + -- Pricing (current) + price_rec NUMERIC(10,2), + price_med NUMERIC(10,2), + price_rec_special NUMERIC(10,2), + price_med_special NUMERIC(10,2), + is_on_special BOOLEAN DEFAULT FALSE, + special_name TEXT, + discount_percent NUMERIC(5,2), + price_unit VARCHAR(20) DEFAULT 'each', + + -- Inventory + is_in_stock BOOLEAN DEFAULT TRUE, + stock_quantity INTEGER, + stock_status VARCHAR(50) DEFAULT 'in_stock', + + -- Potency + thc_percent NUMERIC(5,2), + cbd_percent NUMERIC(5,2), + thc_mg NUMERIC(10,2), + cbd_mg NUMERIC(10,2), + + -- Weight/Size + weight_value NUMERIC(10,2), + weight_unit VARCHAR(20), + + -- Images + image_url TEXT, + local_image_path TEXT, + thumbnail_url TEXT, + + -- Flags + is_featured BOOLEAN DEFAULT FALSE, + medical_only BOOLEAN DEFAULT FALSE, + rec_only BOOLEAN DEFAULT FALSE, + + -- Menu position (for tracking prominence) + menu_position INTEGER, + + -- Timestamps + first_seen_at TIMESTAMPTZ DEFAULT NOW(), + last_seen_at TIMESTAMPTZ DEFAULT NOW(), + last_price_change_at TIMESTAMPTZ, + last_stock_change_at TIMESTAMPTZ, + + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW() +); + +-- Add unique constraint if not exists +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM pg_constraint + WHERE conname = 'store_products_dispensary_provider_product_key' + ) THEN + ALTER TABLE store_products + ADD CONSTRAINT store_products_dispensary_provider_product_key + UNIQUE (dispensary_id, provider, provider_product_id); + END IF; +EXCEPTION + WHEN duplicate_object THEN + NULL; + WHEN OTHERS THEN + NULL; +END $$; + +-- Add FK constraints if not exist +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM pg_constraint + WHERE conname = 'store_products_dispensary_id_fkey' + ) THEN + ALTER TABLE store_products + ADD CONSTRAINT store_products_dispensary_id_fkey + FOREIGN KEY (dispensary_id) REFERENCES dispensaries(id) ON DELETE CASCADE; + END IF; +EXCEPTION + WHEN duplicate_object THEN + NULL; + WHEN OTHERS THEN + NULL; +END $$; + +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM pg_constraint + WHERE conname = 'store_products_state_id_fkey' + ) THEN + ALTER TABLE store_products + ADD CONSTRAINT store_products_state_id_fkey + FOREIGN KEY (state_id) REFERENCES states(id) ON DELETE SET NULL; + END IF; +EXCEPTION + WHEN duplicate_object THEN + NULL; + WHEN OTHERS THEN + NULL; +END $$; + +CREATE INDEX IF NOT EXISTS idx_store_products_dispensary ON store_products(dispensary_id); +CREATE INDEX IF NOT EXISTS idx_store_products_state ON store_products(state_id) WHERE state_id IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_store_products_category ON store_products(category) WHERE category IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_store_products_brand_name ON store_products(brand_name) WHERE brand_name IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_store_products_in_stock ON store_products(dispensary_id, is_in_stock); +CREATE INDEX IF NOT EXISTS idx_store_products_special ON store_products(dispensary_id, is_on_special) WHERE is_on_special = TRUE; +CREATE INDEX IF NOT EXISTS idx_store_products_last_seen ON store_products(last_seen_at DESC); +CREATE INDEX IF NOT EXISTS idx_store_products_provider ON store_products(provider); +CREATE INDEX IF NOT EXISTS idx_store_products_enterprise ON store_products(enterprise_product_id) WHERE enterprise_product_id IS NOT NULL; + +COMMENT ON TABLE store_products IS 'Current state of products on each dispensary menu. Provider-agnostic.'; + + +-- ============================================================================ +-- SECTION 6: STORE_PRODUCT_SNAPSHOTS TABLE +-- ============================================================================ +-- Historical price/stock data. One row per product per crawl. +-- CRITICAL: NEVER DELETE from this table. + +CREATE TABLE IF NOT EXISTS store_product_snapshots ( + id SERIAL PRIMARY KEY, + dispensary_id INTEGER NOT NULL, + store_product_id INTEGER, + state_id INTEGER, + + -- Provider info + provider VARCHAR(50) NOT NULL DEFAULT 'dutchie', + provider_product_id VARCHAR(100), + + -- Link to crawl run + crawl_run_id INTEGER, + + -- Capture timestamp + captured_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + + -- Raw data from platform + name VARCHAR(500), + brand_name VARCHAR(255), + category VARCHAR(100), + subcategory VARCHAR(100), + + -- Pricing at time of capture + price_rec NUMERIC(10,2), + price_med NUMERIC(10,2), + price_rec_special NUMERIC(10,2), + price_med_special NUMERIC(10,2), + is_on_special BOOLEAN DEFAULT FALSE, + discount_percent NUMERIC(5,2), + + -- Inventory at time of capture + is_in_stock BOOLEAN DEFAULT TRUE, + stock_quantity INTEGER, + stock_status VARCHAR(50) DEFAULT 'in_stock', + is_present_in_feed BOOLEAN DEFAULT TRUE, + + -- Potency at time of capture + thc_percent NUMERIC(5,2), + cbd_percent NUMERIC(5,2), + + -- Menu position (for tracking prominence changes) + menu_position INTEGER, + + -- Image URL at time of capture + image_url TEXT, + + -- Full raw response for debugging + raw_data JSONB, + + created_at TIMESTAMPTZ DEFAULT NOW() +); + +-- Add FK constraints if not exist +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM pg_constraint + WHERE conname = 'store_product_snapshots_dispensary_id_fkey' + ) THEN + ALTER TABLE store_product_snapshots + ADD CONSTRAINT store_product_snapshots_dispensary_id_fkey + FOREIGN KEY (dispensary_id) REFERENCES dispensaries(id) ON DELETE CASCADE; + END IF; +EXCEPTION + WHEN duplicate_object THEN + NULL; + WHEN OTHERS THEN + NULL; +END $$; + +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM pg_constraint + WHERE conname = 'store_product_snapshots_store_product_id_fkey' + ) THEN + ALTER TABLE store_product_snapshots + ADD CONSTRAINT store_product_snapshots_store_product_id_fkey + FOREIGN KEY (store_product_id) REFERENCES store_products(id) ON DELETE SET NULL; + END IF; +EXCEPTION + WHEN duplicate_object THEN + NULL; + WHEN OTHERS THEN + NULL; +END $$; + +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM pg_constraint + WHERE conname = 'store_product_snapshots_state_id_fkey' + ) THEN + ALTER TABLE store_product_snapshots + ADD CONSTRAINT store_product_snapshots_state_id_fkey + FOREIGN KEY (state_id) REFERENCES states(id) ON DELETE SET NULL; + END IF; +EXCEPTION + WHEN duplicate_object THEN + NULL; + WHEN OTHERS THEN + NULL; +END $$; + +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM pg_constraint + WHERE conname = 'store_product_snapshots_crawl_run_id_fkey' + ) THEN + ALTER TABLE store_product_snapshots + ADD CONSTRAINT store_product_snapshots_crawl_run_id_fkey + FOREIGN KEY (crawl_run_id) REFERENCES crawl_runs(id) ON DELETE SET NULL; + END IF; +EXCEPTION + WHEN duplicate_object THEN + NULL; + WHEN OTHERS THEN + NULL; +END $$; + +-- Indexes optimized for analytics queries +CREATE INDEX IF NOT EXISTS idx_snapshots_dispensary_captured ON store_product_snapshots(dispensary_id, captured_at DESC); +CREATE INDEX IF NOT EXISTS idx_snapshots_state_captured ON store_product_snapshots(state_id, captured_at DESC) WHERE state_id IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_snapshots_product_captured ON store_product_snapshots(store_product_id, captured_at DESC) WHERE store_product_id IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_snapshots_crawl_run ON store_product_snapshots(crawl_run_id) WHERE crawl_run_id IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_snapshots_captured_at ON store_product_snapshots(captured_at DESC); +CREATE INDEX IF NOT EXISTS idx_snapshots_brand ON store_product_snapshots(brand_name) WHERE brand_name IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_snapshots_provider_product ON store_product_snapshots(provider_product_id) WHERE provider_product_id IS NOT NULL; + +COMMENT ON TABLE store_product_snapshots IS 'Historical crawl data. One row per product per crawl. NEVER DELETE.'; + + +-- ============================================================================ +-- SECTION 7: VIEWS FOR BACKWARD COMPATIBILITY +-- ============================================================================ + +-- View: Latest snapshot per store product +CREATE OR REPLACE VIEW v_latest_store_snapshots AS +SELECT DISTINCT ON (dispensary_id, provider_product_id) + sps.* +FROM store_product_snapshots sps +ORDER BY dispensary_id, provider_product_id, captured_at DESC; + +-- View: Crawl run summary per dispensary +CREATE OR REPLACE VIEW v_dispensary_crawl_summary AS +SELECT + d.id AS dispensary_id, + COALESCE(d.dba_name, d.name) AS dispensary_name, + d.city, + d.state, + d.state_id, + s.name AS state_name, + COUNT(DISTINCT sp.id) AS current_product_count, + COUNT(DISTINCT sp.id) FILTER (WHERE sp.is_in_stock) AS in_stock_count, + COUNT(DISTINCT sp.id) FILTER (WHERE sp.is_on_special) AS on_special_count, + MAX(cr.finished_at) AS last_crawl_at, + (SELECT status FROM crawl_runs WHERE dispensary_id = d.id ORDER BY started_at DESC LIMIT 1) AS last_crawl_status +FROM dispensaries d +LEFT JOIN states s ON s.id = d.state_id +LEFT JOIN store_products sp ON sp.dispensary_id = d.id +LEFT JOIN crawl_runs cr ON cr.dispensary_id = d.id +GROUP BY d.id, d.dba_name, d.name, d.city, d.state, d.state_id, s.name; + + +-- ============================================================================ +-- MIGRATION 051 COMPLETE +-- ============================================================================ + +SELECT 'Migration 051 completed successfully. Canonical schema is ready.' AS status; diff --git a/backend/migrations/051_create_mv_state_metrics.sql b/backend/migrations/051_create_mv_state_metrics.sql new file mode 100644 index 00000000..c942aa5a --- /dev/null +++ b/backend/migrations/051_create_mv_state_metrics.sql @@ -0,0 +1,98 @@ +-- Migration 051: Create materialized view for state metrics +-- Used by Analytics V2 state endpoints for fast aggregated queries +-- Canonical tables: states, dispensaries, store_products, store_product_snapshots, brands + +-- Drop existing view if it exists (for clean recreation) +DROP MATERIALIZED VIEW IF EXISTS mv_state_metrics; + +-- Create materialized view with comprehensive state metrics +-- Schema verified via information_schema on 2025-12-06 +-- Real columns used: +-- states: id, code, name, recreational_legal, medical_legal, rec_year, med_year +-- dispensaries: id, state_id (NO is_active column) +-- store_products: id, dispensary_id, brand_id, category_raw, price_rec, price_med, is_in_stock +-- store_product_snapshots: id, store_product_id, captured_at +-- brands: id (joined via sp.brand_id) + +CREATE MATERIALIZED VIEW mv_state_metrics AS +SELECT + s.id AS state_id, + s.code AS state, + s.name AS state_name, + COALESCE(s.recreational_legal, FALSE) AS recreational_legal, + COALESCE(s.medical_legal, FALSE) AS medical_legal, + s.rec_year, + s.med_year, + + -- Dispensary metrics + COUNT(DISTINCT d.id) AS dispensary_count, + + -- Product metrics + COUNT(DISTINCT sp.id) AS total_products, + COUNT(DISTINCT sp.id) FILTER (WHERE sp.is_in_stock = TRUE) AS in_stock_products, + COUNT(DISTINCT sp.id) FILTER (WHERE sp.is_in_stock = FALSE) AS out_of_stock_products, + + -- Brand metrics (using brand_id FK, not brand_name) + COUNT(DISTINCT sp.brand_id) FILTER (WHERE sp.brand_id IS NOT NULL) AS unique_brands, + + -- Category metrics (using category_raw, not category) + COUNT(DISTINCT sp.category_raw) FILTER (WHERE sp.category_raw IS NOT NULL) AS unique_categories, + + -- Pricing metrics (recreational) + AVG(sp.price_rec) FILTER (WHERE sp.price_rec IS NOT NULL AND sp.is_in_stock = TRUE) AS avg_price_rec, + PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY sp.price_rec) + FILTER (WHERE sp.price_rec IS NOT NULL AND sp.is_in_stock = TRUE) AS median_price_rec, + MIN(sp.price_rec) FILTER (WHERE sp.price_rec IS NOT NULL AND sp.is_in_stock = TRUE) AS min_price_rec, + MAX(sp.price_rec) FILTER (WHERE sp.price_rec IS NOT NULL AND sp.is_in_stock = TRUE) AS max_price_rec, + + -- Pricing metrics (medical) + AVG(sp.price_med) FILTER (WHERE sp.price_med IS NOT NULL AND sp.is_in_stock = TRUE) AS avg_price_med, + PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY sp.price_med) + FILTER (WHERE sp.price_med IS NOT NULL AND sp.is_in_stock = TRUE) AS median_price_med, + + -- Snapshot/crawl metrics + COUNT(sps.id) AS total_snapshots, + MAX(sps.captured_at) AS last_crawl_at, + MIN(sps.captured_at) AS first_crawl_at, + + -- Data freshness + CASE + WHEN MAX(sps.captured_at) > NOW() - INTERVAL '24 hours' THEN 'fresh' + WHEN MAX(sps.captured_at) > NOW() - INTERVAL '7 days' THEN 'recent' + WHEN MAX(sps.captured_at) IS NOT NULL THEN 'stale' + ELSE 'no_data' + END AS data_freshness, + + -- Metadata + NOW() AS refreshed_at + +FROM states s +LEFT JOIN dispensaries d ON d.state_id = s.id +LEFT JOIN store_products sp ON sp.dispensary_id = d.id +LEFT JOIN store_product_snapshots sps ON sps.store_product_id = sp.id +GROUP BY s.id, s.code, s.name, s.recreational_legal, s.medical_legal, s.rec_year, s.med_year; + +-- Create unique index on state code for fast lookups +CREATE UNIQUE INDEX IF NOT EXISTS mv_state_metrics_state_idx + ON mv_state_metrics (state); + +-- Create index on state_id for joins +CREATE INDEX IF NOT EXISTS mv_state_metrics_state_id_idx + ON mv_state_metrics (state_id); + +-- Create index for legal status filtering +CREATE INDEX IF NOT EXISTS mv_state_metrics_legal_idx + ON mv_state_metrics (recreational_legal, medical_legal); + +-- Create index for data freshness queries +CREATE INDEX IF NOT EXISTS mv_state_metrics_freshness_idx + ON mv_state_metrics (data_freshness); + +-- Comment on the view +COMMENT ON MATERIALIZED VIEW mv_state_metrics IS + 'Aggregated state-level metrics for Analytics V2 endpoints. Refresh periodically with: REFRESH MATERIALIZED VIEW CONCURRENTLY mv_state_metrics;'; + +-- Record migration +INSERT INTO schema_migrations (version, name, applied_at) +VALUES ('051', 'create_mv_state_metrics', NOW()) +ON CONFLICT (version) DO NOTHING; diff --git a/backend/migrations/052_add_provider_data_columns.sql b/backend/migrations/052_add_provider_data_columns.sql new file mode 100644 index 00000000..1b8f617e --- /dev/null +++ b/backend/migrations/052_add_provider_data_columns.sql @@ -0,0 +1,96 @@ +-- Migration 052: Add provider_data JSONB and frequently-queried columns +-- +-- Adds hybrid storage for legacy data: +-- 1. provider_data JSONB on both tables for all extra fields +-- 2. Specific columns for frequently-queried fields + +-- ============================================================================ +-- store_products: Add provider_data and queryable columns +-- ============================================================================ + +-- JSONB for all extra provider-specific data +ALTER TABLE store_products + ADD COLUMN IF NOT EXISTS provider_data JSONB; + +-- Frequently-queried columns +ALTER TABLE store_products + ADD COLUMN IF NOT EXISTS strain_type TEXT; + +ALTER TABLE store_products + ADD COLUMN IF NOT EXISTS medical_only BOOLEAN DEFAULT FALSE; + +ALTER TABLE store_products + ADD COLUMN IF NOT EXISTS rec_only BOOLEAN DEFAULT FALSE; + +ALTER TABLE store_products + ADD COLUMN IF NOT EXISTS brand_logo_url TEXT; + +ALTER TABLE store_products + ADD COLUMN IF NOT EXISTS platform_dispensary_id TEXT; + +-- Index for strain_type queries +CREATE INDEX IF NOT EXISTS idx_store_products_strain_type + ON store_products(strain_type) + WHERE strain_type IS NOT NULL; + +-- Index for medical/rec filtering +CREATE INDEX IF NOT EXISTS idx_store_products_medical_rec + ON store_products(medical_only, rec_only); + +-- GIN index for provider_data JSONB queries +CREATE INDEX IF NOT EXISTS idx_store_products_provider_data + ON store_products USING GIN (provider_data); + +-- ============================================================================ +-- store_product_snapshots: Add provider_data and queryable columns +-- ============================================================================ + +-- JSONB for all extra provider-specific data +ALTER TABLE store_product_snapshots + ADD COLUMN IF NOT EXISTS provider_data JSONB; + +-- Frequently-queried columns +ALTER TABLE store_product_snapshots + ADD COLUMN IF NOT EXISTS featured BOOLEAN DEFAULT FALSE; + +ALTER TABLE store_product_snapshots + ADD COLUMN IF NOT EXISTS is_below_threshold BOOLEAN DEFAULT FALSE; + +ALTER TABLE store_product_snapshots + ADD COLUMN IF NOT EXISTS is_below_kiosk_threshold BOOLEAN DEFAULT FALSE; + +-- Index for featured products +CREATE INDEX IF NOT EXISTS idx_snapshots_featured + ON store_product_snapshots(dispensary_id, featured) + WHERE featured = TRUE; + +-- Index for low stock alerts +CREATE INDEX IF NOT EXISTS idx_snapshots_below_threshold + ON store_product_snapshots(dispensary_id, is_below_threshold) + WHERE is_below_threshold = TRUE; + +-- GIN index for provider_data JSONB queries +CREATE INDEX IF NOT EXISTS idx_snapshots_provider_data + ON store_product_snapshots USING GIN (provider_data); + +-- ============================================================================ +-- Comments for documentation +-- ============================================================================ + +COMMENT ON COLUMN store_products.provider_data IS + 'JSONB blob containing all provider-specific fields not in canonical columns (effects, terpenes, cannabinoids_v2, etc.)'; + +COMMENT ON COLUMN store_products.strain_type IS + 'Cannabis strain type: Indica, Sativa, Hybrid, Indica-Hybrid, Sativa-Hybrid'; + +COMMENT ON COLUMN store_products.platform_dispensary_id IS + 'Provider platform dispensary ID (e.g., Dutchie MongoDB ObjectId)'; + +COMMENT ON COLUMN store_product_snapshots.provider_data IS + 'JSONB blob containing all provider-specific snapshot fields (options, kiosk data, etc.)'; + +COMMENT ON COLUMN store_product_snapshots.featured IS + 'Whether product was featured/highlighted at capture time'; + +COMMENT ON COLUMN store_product_snapshots.is_below_threshold IS + 'Whether product was below inventory threshold at capture time'; diff --git a/backend/migrations/052_add_state_cannabis_flags.sql b/backend/migrations/052_add_state_cannabis_flags.sql new file mode 100644 index 00000000..33209ab1 --- /dev/null +++ b/backend/migrations/052_add_state_cannabis_flags.sql @@ -0,0 +1,127 @@ +-- ============================================================================ +-- Migration 052: Add Cannabis Legalization Flags to States +-- ============================================================================ +-- +-- Purpose: Add recreational/medical cannabis legalization status and years +-- to the existing states table, then seed all 50 states + DC. +-- +-- SAFETY RULES: +-- - Uses ADD COLUMN IF NOT EXISTS (idempotent) +-- - Uses INSERT ... ON CONFLICT (code) DO UPDATE (idempotent) +-- - NO DROP, DELETE, TRUNCATE, or destructive operations +-- - Safe to run multiple times +-- +-- Run with: +-- psql "$DATABASE_URL" -f migrations/052_add_state_cannabis_flags.sql +-- +-- ============================================================================ + + +-- ============================================================================ +-- SECTION 1: Add cannabis legalization columns +-- ============================================================================ + +ALTER TABLE states ADD COLUMN IF NOT EXISTS recreational_legal BOOLEAN; +ALTER TABLE states ADD COLUMN IF NOT EXISTS rec_year INTEGER; +ALTER TABLE states ADD COLUMN IF NOT EXISTS medical_legal BOOLEAN; +ALTER TABLE states ADD COLUMN IF NOT EXISTS med_year INTEGER; + +COMMENT ON COLUMN states.recreational_legal IS 'Whether recreational cannabis is legal in this state'; +COMMENT ON COLUMN states.rec_year IS 'Year recreational cannabis was legalized (NULL if not legal)'; +COMMENT ON COLUMN states.medical_legal IS 'Whether medical cannabis is legal in this state'; +COMMENT ON COLUMN states.med_year IS 'Year medical cannabis was legalized (NULL if not legal)'; + + +-- ============================================================================ +-- SECTION 2: Seed all 50 states + DC with cannabis legalization data +-- ============================================================================ +-- Data sourced from state legalization records as of 2024 +-- States ordered by medical legalization year, then alphabetically + +INSERT INTO states (code, name, timezone, recreational_legal, rec_year, medical_legal, med_year) +VALUES + -- Recreational + Medical States (ordered by rec year) + ('WA', 'Washington', 'America/Los_Angeles', TRUE, 2012, TRUE, 1998), + ('CO', 'Colorado', 'America/Denver', TRUE, 2012, TRUE, 2000), + ('AK', 'Alaska', 'America/Anchorage', TRUE, 2014, TRUE, 1998), + ('OR', 'Oregon', 'America/Los_Angeles', TRUE, 2014, TRUE, 1998), + ('DC', 'District of Columbia', 'America/New_York', TRUE, 2015, TRUE, 2011), + ('CA', 'California', 'America/Los_Angeles', TRUE, 2016, TRUE, 1996), + ('NV', 'Nevada', 'America/Los_Angeles', TRUE, 2016, TRUE, 1998), + ('ME', 'Maine', 'America/New_York', TRUE, 2016, TRUE, 1999), + ('MA', 'Massachusetts', 'America/New_York', TRUE, 2016, TRUE, 2012), + ('MI', 'Michigan', 'America/Detroit', TRUE, 2018, TRUE, 2008), + ('IL', 'Illinois', 'America/Chicago', TRUE, 2019, TRUE, 2013), + ('AZ', 'Arizona', 'America/Phoenix', TRUE, 2020, TRUE, 2010), + ('MT', 'Montana', 'America/Denver', TRUE, 2020, TRUE, 2004), + ('NJ', 'New Jersey', 'America/New_York', TRUE, 2020, TRUE, 2010), + ('VT', 'Vermont', 'America/New_York', TRUE, 2020, TRUE, 2004), + ('CT', 'Connecticut', 'America/New_York', TRUE, 2021, TRUE, 2012), + ('NM', 'New Mexico', 'America/Denver', TRUE, 2021, TRUE, 2007), + ('NY', 'New York', 'America/New_York', TRUE, 2021, TRUE, 2014), + ('VA', 'Virginia', 'America/New_York', TRUE, 2021, TRUE, 2020), + ('MD', 'Maryland', 'America/New_York', TRUE, 2022, TRUE, 2013), + ('MO', 'Missouri', 'America/Chicago', TRUE, 2022, TRUE, 2018), + ('RI', 'Rhode Island', 'America/New_York', TRUE, 2022, TRUE, 2006), + ('DE', 'Delaware', 'America/New_York', TRUE, 2023, TRUE, 2011), + ('MN', 'Minnesota', 'America/Chicago', TRUE, 2023, TRUE, 2014), + ('OH', 'Ohio', 'America/New_York', TRUE, 2023, TRUE, 2016), + + -- Medical Only States (no recreational) + ('HI', 'Hawaii', 'Pacific/Honolulu', FALSE, NULL, TRUE, 2000), + ('NH', 'New Hampshire', 'America/New_York', FALSE, NULL, TRUE, 2013), + ('GA', 'Georgia', 'America/New_York', FALSE, NULL, TRUE, 2015), + ('LA', 'Louisiana', 'America/Chicago', FALSE, NULL, TRUE, 2015), + ('TX', 'Texas', 'America/Chicago', FALSE, NULL, TRUE, 2015), + ('AR', 'Arkansas', 'America/Chicago', FALSE, NULL, TRUE, 2016), + ('FL', 'Florida', 'America/New_York', FALSE, NULL, TRUE, 2016), + ('ND', 'North Dakota', 'America/Chicago', FALSE, NULL, TRUE, 2016), + ('PA', 'Pennsylvania', 'America/New_York', FALSE, NULL, TRUE, 2016), + ('IA', 'Iowa', 'America/Chicago', FALSE, NULL, TRUE, 2017), + ('WV', 'West Virginia', 'America/New_York', FALSE, NULL, TRUE, 2017), + ('OK', 'Oklahoma', 'America/Chicago', FALSE, NULL, TRUE, 2018), + ('UT', 'Utah', 'America/Denver', FALSE, NULL, TRUE, 2018), + ('SD', 'South Dakota', 'America/Chicago', FALSE, NULL, TRUE, 2020), + ('AL', 'Alabama', 'America/Chicago', FALSE, NULL, TRUE, 2021), + ('MS', 'Mississippi', 'America/Chicago', FALSE, NULL, TRUE, 2022), + ('KY', 'Kentucky', 'America/New_York', FALSE, NULL, TRUE, 2023), + ('NE', 'Nebraska', 'America/Chicago', FALSE, NULL, TRUE, 2024), + + -- No Cannabis Programs (neither rec nor medical) + ('ID', 'Idaho', 'America/Boise', FALSE, NULL, FALSE, NULL), + ('IN', 'Indiana', 'America/Indiana/Indianapolis', FALSE, NULL, FALSE, NULL), + ('KS', 'Kansas', 'America/Chicago', FALSE, NULL, FALSE, NULL), + ('NC', 'North Carolina', 'America/New_York', FALSE, NULL, FALSE, NULL), + ('SC', 'South Carolina', 'America/New_York', FALSE, NULL, FALSE, NULL), + ('TN', 'Tennessee', 'America/Chicago', FALSE, NULL, FALSE, NULL), + ('WI', 'Wisconsin', 'America/Chicago', FALSE, NULL, FALSE, NULL), + ('WY', 'Wyoming', 'America/Denver', FALSE, NULL, FALSE, NULL) + +ON CONFLICT (code) DO UPDATE SET + name = EXCLUDED.name, + timezone = COALESCE(states.timezone, EXCLUDED.timezone), + recreational_legal = EXCLUDED.recreational_legal, + rec_year = EXCLUDED.rec_year, + medical_legal = EXCLUDED.medical_legal, + med_year = EXCLUDED.med_year, + updated_at = NOW(); + + +-- ============================================================================ +-- SECTION 3: Add indexes for common queries +-- ============================================================================ + +CREATE INDEX IF NOT EXISTS idx_states_recreational ON states(recreational_legal) WHERE recreational_legal = TRUE; +CREATE INDEX IF NOT EXISTS idx_states_medical ON states(medical_legal) WHERE medical_legal = TRUE; + + +-- ============================================================================ +-- SECTION 4: Verification query (informational only) +-- ============================================================================ + +SELECT + 'Migration 052 completed successfully.' AS status, + (SELECT COUNT(*) FROM states WHERE recreational_legal = TRUE) AS rec_states, + (SELECT COUNT(*) FROM states WHERE medical_legal = TRUE AND recreational_legal = FALSE) AS med_only_states, + (SELECT COUNT(*) FROM states WHERE medical_legal = FALSE OR medical_legal IS NULL) AS no_program_states, + (SELECT COUNT(*) FROM states) AS total_states; diff --git a/backend/migrations/052_hydration_schema_alignment.sql b/backend/migrations/052_hydration_schema_alignment.sql new file mode 100644 index 00000000..e7615a28 --- /dev/null +++ b/backend/migrations/052_hydration_schema_alignment.sql @@ -0,0 +1,249 @@ +-- ============================================================================ +-- Migration 052: Hydration Schema Alignment +-- ============================================================================ +-- +-- Purpose: Add columns to canonical tables needed for hydration from +-- dutchie_products and dutchie_product_snapshots. +-- +-- This migration ensures store_products and store_product_snapshots can +-- receive all data from the legacy dutchie_* tables. +-- +-- SAFETY RULES: +-- - ALL columns use ADD COLUMN IF NOT EXISTS +-- - NO DROP, DELETE, TRUNCATE, or destructive operations +-- - Fully idempotent - safe to run multiple times +-- +-- Run with: +-- psql "postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" \ +-- -f migrations/052_hydration_schema_alignment.sql +-- +-- ============================================================================ + + +-- ============================================================================ +-- SECTION 1: store_products - Additional columns from dutchie_products +-- ============================================================================ + +-- Brand ID from Dutchie GraphQL (brandId field) +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS provider_brand_id VARCHAR(100); + +-- Legacy dutchie_products.id for cross-reference during migration +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS legacy_dutchie_product_id INTEGER; + +-- THC/CBD content as text (from dutchie_products.thc_content/cbd_content) +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS thc_content_text VARCHAR(50); +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS cbd_content_text VARCHAR(50); + +-- Full cannabinoid data +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS cannabinoids JSONB; + +-- Effects array +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS effects TEXT[]; + +-- Type (Flower, Edible, etc.) - maps to category in legacy +-- Already have category VARCHAR(100), but type may differ +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS product_type VARCHAR(100); + +-- Additional images array +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS additional_images TEXT[]; + +-- Local image paths (from 032 migration) +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS local_image_url TEXT; +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS local_image_thumb_url TEXT; +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS local_image_medium_url TEXT; +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS original_image_url TEXT; + +-- Status from Dutchie (Active/Inactive) +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS platform_status VARCHAR(20); + +-- Threshold flags +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS is_below_threshold BOOLEAN DEFAULT FALSE; +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS is_below_kiosk_threshold BOOLEAN DEFAULT FALSE; + +-- cName / slug from Dutchie +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS c_name VARCHAR(255); + +-- Coming soon flag +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS is_coming_soon BOOLEAN DEFAULT FALSE; + +-- Provider column already exists, ensure we have provider_dispensary_id +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS provider_dispensary_id VARCHAR(100); + +-- Enterprise product ID (cross-store product linking) +-- Already exists from migration 051 + +-- Total quantity available (from POSMetaData.children) +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS total_quantity_available INTEGER; +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS total_kiosk_quantity_available INTEGER; + +-- Weight +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS weight VARCHAR(50); + +-- Options array (size/weight options) +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS options TEXT[]; + +-- Measurements +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS measurements JSONB; + +-- Raw data from last crawl +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS raw_data JSONB; + +-- Source timestamps from Dutchie +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS source_created_at TIMESTAMPTZ; +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS source_updated_at TIMESTAMPTZ; + + +-- ============================================================================ +-- SECTION 2: store_product_snapshots - Additional columns for hydration +-- ============================================================================ + +-- Legacy dutchie_product_snapshot.id for cross-reference +ALTER TABLE store_product_snapshots ADD COLUMN IF NOT EXISTS legacy_snapshot_id INTEGER; + +-- Legacy dutchie_product_id reference +ALTER TABLE store_product_snapshots ADD COLUMN IF NOT EXISTS legacy_dutchie_product_id INTEGER; + +-- Options JSONB from dutchie_product_snapshots +ALTER TABLE store_product_snapshots ADD COLUMN IF NOT EXISTS options JSONB; + +-- Provider dispensary ID +ALTER TABLE store_product_snapshots ADD COLUMN IF NOT EXISTS provider_dispensary_id VARCHAR(100); + +-- Inventory details +ALTER TABLE store_product_snapshots ADD COLUMN IF NOT EXISTS total_quantity_available INTEGER; +ALTER TABLE store_product_snapshots ADD COLUMN IF NOT EXISTS total_kiosk_quantity_available INTEGER; + +-- Platform status at time of snapshot +ALTER TABLE store_product_snapshots ADD COLUMN IF NOT EXISTS platform_status VARCHAR(20); + +-- Threshold flags at time of snapshot +ALTER TABLE store_product_snapshots ADD COLUMN IF NOT EXISTS is_below_threshold BOOLEAN DEFAULT FALSE; +ALTER TABLE store_product_snapshots ADD COLUMN IF NOT EXISTS is_below_kiosk_threshold BOOLEAN DEFAULT FALSE; + +-- Special data +ALTER TABLE store_product_snapshots ADD COLUMN IF NOT EXISTS special_data JSONB; +ALTER TABLE store_product_snapshots ADD COLUMN IF NOT EXISTS special_name TEXT; + +-- Pricing mode (rec/med) +ALTER TABLE store_product_snapshots ADD COLUMN IF NOT EXISTS pricing_type VARCHAR(10); + +-- Crawl mode (mode_a/mode_b) +ALTER TABLE store_product_snapshots ADD COLUMN IF NOT EXISTS crawl_mode VARCHAR(20); + + +-- ============================================================================ +-- SECTION 3: crawl_runs - Additional columns for hydration +-- ============================================================================ + +-- Legacy job ID references +ALTER TABLE crawl_runs ADD COLUMN IF NOT EXISTS legacy_dispensary_crawl_job_id INTEGER; +ALTER TABLE crawl_runs ADD COLUMN IF NOT EXISTS legacy_job_run_log_id INTEGER; + +-- Schedule reference +ALTER TABLE crawl_runs ADD COLUMN IF NOT EXISTS schedule_id INTEGER; + +-- Job type +ALTER TABLE crawl_runs ADD COLUMN IF NOT EXISTS job_type VARCHAR(50); + +-- Brands found count +ALTER TABLE crawl_runs ADD COLUMN IF NOT EXISTS brands_found INTEGER DEFAULT 0; + +-- Retry count +ALTER TABLE crawl_runs ADD COLUMN IF NOT EXISTS retry_count INTEGER DEFAULT 0; + + +-- ============================================================================ +-- SECTION 4: INDEXES for hydration queries +-- ============================================================================ + +-- Index on legacy IDs for migration lookups +CREATE INDEX IF NOT EXISTS idx_store_products_legacy_id + ON store_products(legacy_dutchie_product_id) + WHERE legacy_dutchie_product_id IS NOT NULL; + +CREATE INDEX IF NOT EXISTS idx_snapshots_legacy_id + ON store_product_snapshots(legacy_snapshot_id) + WHERE legacy_snapshot_id IS NOT NULL; + +CREATE INDEX IF NOT EXISTS idx_snapshots_legacy_product_id + ON store_product_snapshots(legacy_dutchie_product_id) + WHERE legacy_dutchie_product_id IS NOT NULL; + +CREATE INDEX IF NOT EXISTS idx_crawl_runs_legacy_job_id + ON crawl_runs(legacy_dispensary_crawl_job_id) + WHERE legacy_dispensary_crawl_job_id IS NOT NULL; + +-- Index on provider_product_id for upserts +CREATE INDEX IF NOT EXISTS idx_store_products_provider_id + ON store_products(provider_product_id); + +-- Composite index for canonical key lookup +CREATE INDEX IF NOT EXISTS idx_store_products_canonical_key + ON store_products(dispensary_id, provider, provider_product_id); + + +-- ============================================================================ +-- SECTION 5: Unique constraint for idempotent hydration +-- ============================================================================ + +-- Ensure unique snapshots per product per crawl +-- This prevents duplicate snapshots during re-runs +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM pg_constraint + WHERE conname = 'store_product_snapshots_unique_per_crawl' + ) THEN + -- Can't add unique constraint on nullable columns directly, + -- so we use a partial unique index instead + CREATE UNIQUE INDEX IF NOT EXISTS idx_snapshots_unique_per_crawl + ON store_product_snapshots(store_product_id, crawl_run_id) + WHERE store_product_id IS NOT NULL AND crawl_run_id IS NOT NULL; + END IF; +EXCEPTION + WHEN duplicate_object THEN NULL; + WHEN OTHERS THEN NULL; +END $$; + + +-- ============================================================================ +-- SECTION 6: View for hydration status monitoring +-- ============================================================================ + +CREATE OR REPLACE VIEW v_hydration_status AS +SELECT + 'dutchie_products' AS source_table, + (SELECT COUNT(*) FROM dutchie_products) AS source_count, + (SELECT COUNT(*) FROM store_products WHERE legacy_dutchie_product_id IS NOT NULL) AS hydrated_count, + ROUND( + 100.0 * (SELECT COUNT(*) FROM store_products WHERE legacy_dutchie_product_id IS NOT NULL) / + NULLIF((SELECT COUNT(*) FROM dutchie_products), 0), + 2 + ) AS hydration_pct +UNION ALL +SELECT + 'dutchie_product_snapshots' AS source_table, + (SELECT COUNT(*) FROM dutchie_product_snapshots) AS source_count, + (SELECT COUNT(*) FROM store_product_snapshots WHERE legacy_snapshot_id IS NOT NULL) AS hydrated_count, + ROUND( + 100.0 * (SELECT COUNT(*) FROM store_product_snapshots WHERE legacy_snapshot_id IS NOT NULL) / + NULLIF((SELECT COUNT(*) FROM dutchie_product_snapshots), 0), + 2 + ) AS hydration_pct +UNION ALL +SELECT + 'dispensary_crawl_jobs' AS source_table, + (SELECT COUNT(*) FROM dispensary_crawl_jobs WHERE status = 'completed') AS source_count, + (SELECT COUNT(*) FROM crawl_runs WHERE legacy_dispensary_crawl_job_id IS NOT NULL) AS hydrated_count, + ROUND( + 100.0 * (SELECT COUNT(*) FROM crawl_runs WHERE legacy_dispensary_crawl_job_id IS NOT NULL) / + NULLIF((SELECT COUNT(*) FROM dispensary_crawl_jobs WHERE status = 'completed'), 0), + 2 + ) AS hydration_pct; + + +-- ============================================================================ +-- DONE +-- ============================================================================ + +SELECT 'Migration 052 completed successfully. Hydration schema aligned.' AS status; diff --git a/backend/migrations/053_analytics_indexes.sql b/backend/migrations/053_analytics_indexes.sql new file mode 100644 index 00000000..eef6d3d0 --- /dev/null +++ b/backend/migrations/053_analytics_indexes.sql @@ -0,0 +1,157 @@ +-- ============================================================================ +-- Migration 053: Analytics Engine Indexes +-- ============================================================================ +-- +-- Purpose: Add indexes optimized for analytics queries on canonical tables. +-- These indexes support price trends, brand penetration, category +-- growth, and state-level analytics. +-- +-- SAFETY RULES: +-- - Uses CREATE INDEX IF NOT EXISTS (idempotent) +-- - Uses ADD COLUMN IF NOT EXISTS for helper columns +-- - NO DROP, DELETE, TRUNCATE, or destructive operations +-- - Safe to run multiple times +-- +-- Run with: +-- psql "$DATABASE_URL" -f migrations/053_analytics_indexes.sql +-- +-- ============================================================================ + + +-- ============================================================================ +-- SECTION 1: Helper columns for analytics (if missing) +-- ============================================================================ + +-- Ensure store_products has brand_id for faster brand analytics joins +-- (brand_name exists, but a normalized brand_id helps) +ALTER TABLE store_products ADD COLUMN IF NOT EXISTS brand_id INTEGER; + +-- Ensure snapshots have category for time-series category analytics +ALTER TABLE store_product_snapshots ADD COLUMN IF NOT EXISTS category VARCHAR(100); + + +-- ============================================================================ +-- SECTION 2: Price Analytics Indexes +-- ============================================================================ + +-- Price trends by store_product over time +CREATE INDEX IF NOT EXISTS idx_snapshots_product_price_time + ON store_product_snapshots(store_product_id, captured_at DESC, price_rec, price_med) + WHERE store_product_id IS NOT NULL; + +-- Price by category over time (for category price trends) +CREATE INDEX IF NOT EXISTS idx_snapshots_category_price_time + ON store_product_snapshots(category, captured_at DESC, price_rec) + WHERE category IS NOT NULL; + +-- Price changes detection (for volatility analysis) +CREATE INDEX IF NOT EXISTS idx_products_price_change + ON store_products(last_price_change_at DESC) + WHERE last_price_change_at IS NOT NULL; + + +-- ============================================================================ +-- SECTION 3: Brand Penetration Indexes +-- ============================================================================ + +-- Brand by dispensary (for penetration counts) +CREATE INDEX IF NOT EXISTS idx_products_brand_dispensary + ON store_products(brand_name, dispensary_id) + WHERE brand_name IS NOT NULL; + +-- Brand by state (for state-level brand analytics) +CREATE INDEX IF NOT EXISTS idx_products_brand_state + ON store_products(brand_name, state_id) + WHERE brand_name IS NOT NULL AND state_id IS NOT NULL; + +-- Brand first/last seen (for penetration trends) +CREATE INDEX IF NOT EXISTS idx_products_brand_first_seen + ON store_products(brand_name, first_seen_at) + WHERE brand_name IS NOT NULL; + + +-- ============================================================================ +-- SECTION 4: Category Analytics Indexes +-- ============================================================================ + +-- Category by state (for state-level category analytics) +CREATE INDEX IF NOT EXISTS idx_products_category_state + ON store_products(category, state_id) + WHERE category IS NOT NULL; + +-- Category by dispensary +CREATE INDEX IF NOT EXISTS idx_products_category_dispensary + ON store_products(category, dispensary_id) + WHERE category IS NOT NULL; + +-- Category first seen (for growth tracking) +CREATE INDEX IF NOT EXISTS idx_products_category_first_seen + ON store_products(category, first_seen_at) + WHERE category IS NOT NULL; + + +-- ============================================================================ +-- SECTION 5: Store Analytics Indexes +-- ============================================================================ + +-- Products added/removed by dispensary +CREATE INDEX IF NOT EXISTS idx_products_dispensary_first_seen + ON store_products(dispensary_id, first_seen_at DESC); + +CREATE INDEX IF NOT EXISTS idx_products_dispensary_last_seen + ON store_products(dispensary_id, last_seen_at DESC); + +-- Stock status changes +CREATE INDEX IF NOT EXISTS idx_products_stock_change + ON store_products(dispensary_id, last_stock_change_at DESC) + WHERE last_stock_change_at IS NOT NULL; + + +-- ============================================================================ +-- SECTION 6: State Analytics Indexes +-- ============================================================================ + +-- Dispensary count by state +CREATE INDEX IF NOT EXISTS idx_dispensaries_state_active + ON dispensaries(state_id) + WHERE state_id IS NOT NULL; + +-- Products by state +CREATE INDEX IF NOT EXISTS idx_products_state_active + ON store_products(state_id, is_in_stock) + WHERE state_id IS NOT NULL; + +-- Snapshots by state for time-series +CREATE INDEX IF NOT EXISTS idx_snapshots_state_time + ON store_product_snapshots(state_id, captured_at DESC) + WHERE state_id IS NOT NULL; + + +-- ============================================================================ +-- SECTION 7: Composite indexes for common analytics queries +-- ============================================================================ + +-- Brand + Category + State (for market share calculations) +CREATE INDEX IF NOT EXISTS idx_products_brand_category_state + ON store_products(brand_name, category, state_id) + WHERE brand_name IS NOT NULL AND category IS NOT NULL; + +-- Dispensary + Category + Brand (for store-level brand analysis) +CREATE INDEX IF NOT EXISTS idx_products_disp_cat_brand + ON store_products(dispensary_id, category, brand_name) + WHERE category IS NOT NULL; + +-- Special pricing by category (for promo analysis) +CREATE INDEX IF NOT EXISTS idx_products_special_category + ON store_products(category, is_on_special) + WHERE is_on_special = TRUE; + + +-- ============================================================================ +-- SECTION 8: Verification +-- ============================================================================ + +SELECT + 'Migration 053 completed successfully.' AS status, + (SELECT COUNT(*) FROM pg_indexes WHERE indexname LIKE 'idx_products_%') AS product_indexes, + (SELECT COUNT(*) FROM pg_indexes WHERE indexname LIKE 'idx_snapshots_%') AS snapshot_indexes; diff --git a/backend/migrations/053_dutchie_discovery_schema.sql b/backend/migrations/053_dutchie_discovery_schema.sql new file mode 100644 index 00000000..969a7941 --- /dev/null +++ b/backend/migrations/053_dutchie_discovery_schema.sql @@ -0,0 +1,346 @@ +-- ============================================================================ +-- Migration 053: Dutchie Discovery Schema +-- ============================================================================ +-- +-- Purpose: Create tables for Dutchie store discovery workflow. +-- Stores are discovered and held in staging tables until verified, +-- then promoted to the canonical dispensaries table. +-- +-- Tables Created: +-- - dutchie_discovery_cities: City pages from Dutchie +-- - dutchie_discovery_locations: Individual store locations +-- +-- SAFETY RULES: +-- - ALL tables use CREATE TABLE IF NOT EXISTS +-- - NO DROP, DELETE, TRUNCATE, or destructive operations +-- - Does NOT touch canonical dispensaries table +-- - Fully idempotent - safe to run multiple times +-- +-- Run with: +-- psql "postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" \ +-- -f migrations/053_dutchie_discovery_schema.sql +-- +-- ============================================================================ + + +-- ============================================================================ +-- SECTION 1: DUTCHIE_DISCOVERY_CITIES +-- ============================================================================ +-- Stores Dutchie city pages for systematic crawling. +-- Each city can contain multiple dispensary locations. + +CREATE TABLE IF NOT EXISTS dutchie_discovery_cities ( + id BIGSERIAL PRIMARY KEY, + + -- Platform identification (future-proof for other platforms) + platform TEXT NOT NULL DEFAULT 'dutchie', + + -- City identification + city_name TEXT NOT NULL, + city_slug TEXT NOT NULL, + state_code TEXT, -- 'AZ', 'CA', 'ON', etc. + country_code TEXT NOT NULL DEFAULT 'US', + + -- Crawl management + last_crawled_at TIMESTAMPTZ, + crawl_enabled BOOLEAN NOT NULL DEFAULT TRUE, + location_count INTEGER, -- Number of locations found in this city + + -- Metadata + notes TEXT, + metadata JSONB, + + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +-- Add unique constraint if not exists +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM pg_constraint + WHERE conname = 'dutchie_discovery_cities_unique' + ) THEN + ALTER TABLE dutchie_discovery_cities + ADD CONSTRAINT dutchie_discovery_cities_unique + UNIQUE (platform, country_code, state_code, city_slug); + END IF; +EXCEPTION + WHEN duplicate_object THEN NULL; + WHEN OTHERS THEN NULL; +END $$; + +-- Indexes +CREATE INDEX IF NOT EXISTS idx_discovery_cities_platform + ON dutchie_discovery_cities(platform); + +CREATE INDEX IF NOT EXISTS idx_discovery_cities_state + ON dutchie_discovery_cities(country_code, state_code); + +CREATE INDEX IF NOT EXISTS idx_discovery_cities_crawl_enabled + ON dutchie_discovery_cities(crawl_enabled) + WHERE crawl_enabled = TRUE; + +CREATE INDEX IF NOT EXISTS idx_discovery_cities_last_crawled + ON dutchie_discovery_cities(last_crawled_at); + +COMMENT ON TABLE dutchie_discovery_cities IS 'City pages from Dutchie for systematic store discovery.'; + + +-- ============================================================================ +-- SECTION 2: DUTCHIE_DISCOVERY_LOCATIONS +-- ============================================================================ +-- Individual store locations discovered from Dutchie. +-- These are NOT promoted to canonical dispensaries until verified. + +CREATE TABLE IF NOT EXISTS dutchie_discovery_locations ( + id BIGSERIAL PRIMARY KEY, + + -- Platform identification + platform TEXT NOT NULL DEFAULT 'dutchie', + platform_location_id TEXT NOT NULL, -- Dutchie's internal Location ID + platform_slug TEXT NOT NULL, -- URL slug for the store + platform_menu_url TEXT NOT NULL, -- Full menu URL + + -- Store name + name TEXT NOT NULL, + + -- Address components + raw_address TEXT, + address_line1 TEXT, + address_line2 TEXT, + city TEXT, + state_code TEXT, -- 'AZ', 'CA', 'ON', etc. + postal_code TEXT, + country_code TEXT, -- 'US' or 'CA' + + -- Coordinates + latitude DOUBLE PRECISION, + longitude DOUBLE PRECISION, + timezone TEXT, + + -- Discovery status + status TEXT NOT NULL DEFAULT 'discovered', + -- discovered: Just found, not yet verified + -- verified: Verified and promoted to canonical dispensaries + -- rejected: Manually rejected (e.g., duplicate, test store) + -- merged: Linked to existing canonical dispensary + + -- Link to canonical dispensaries (only after verification) + dispensary_id INTEGER, + + -- Reference to discovery city + discovery_city_id BIGINT, + + -- Raw data from Dutchie + metadata JSONB, + notes TEXT, + + -- Store capabilities (from Dutchie) + offers_delivery BOOLEAN, + offers_pickup BOOLEAN, + is_recreational BOOLEAN, + is_medical BOOLEAN, + + -- Tracking + first_seen_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + last_seen_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + last_checked_at TIMESTAMPTZ, + verified_at TIMESTAMPTZ, + verified_by TEXT, -- User who verified + + active BOOLEAN NOT NULL DEFAULT TRUE, + + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +-- Add unique constraints if not exist +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM pg_constraint + WHERE conname = 'dutchie_discovery_locations_platform_id_unique' + ) THEN + ALTER TABLE dutchie_discovery_locations + ADD CONSTRAINT dutchie_discovery_locations_platform_id_unique + UNIQUE (platform, platform_location_id); + END IF; +EXCEPTION + WHEN duplicate_object THEN NULL; + WHEN OTHERS THEN NULL; +END $$; + +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM pg_constraint + WHERE conname = 'dutchie_discovery_locations_slug_unique' + ) THEN + ALTER TABLE dutchie_discovery_locations + ADD CONSTRAINT dutchie_discovery_locations_slug_unique + UNIQUE (platform, platform_slug, country_code, state_code, city); + END IF; +EXCEPTION + WHEN duplicate_object THEN NULL; + WHEN OTHERS THEN NULL; +END $$; + +-- Add FK to dispensaries if not exists (allows NULL) +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM pg_constraint + WHERE conname = 'dutchie_discovery_locations_dispensary_fk' + ) THEN + ALTER TABLE dutchie_discovery_locations + ADD CONSTRAINT dutchie_discovery_locations_dispensary_fk + FOREIGN KEY (dispensary_id) REFERENCES dispensaries(id) ON DELETE SET NULL; + END IF; +EXCEPTION + WHEN duplicate_object THEN NULL; + WHEN OTHERS THEN NULL; +END $$; + +-- Add FK to discovery cities if not exists +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM pg_constraint + WHERE conname = 'dutchie_discovery_locations_city_fk' + ) THEN + ALTER TABLE dutchie_discovery_locations + ADD CONSTRAINT dutchie_discovery_locations_city_fk + FOREIGN KEY (discovery_city_id) REFERENCES dutchie_discovery_cities(id) ON DELETE SET NULL; + END IF; +EXCEPTION + WHEN duplicate_object THEN NULL; + WHEN OTHERS THEN NULL; +END $$; + +-- Indexes +CREATE INDEX IF NOT EXISTS idx_discovery_locations_platform + ON dutchie_discovery_locations(platform); + +CREATE INDEX IF NOT EXISTS idx_discovery_locations_status + ON dutchie_discovery_locations(status); + +CREATE INDEX IF NOT EXISTS idx_discovery_locations_state + ON dutchie_discovery_locations(country_code, state_code); + +CREATE INDEX IF NOT EXISTS idx_discovery_locations_city + ON dutchie_discovery_locations(city, state_code); + +CREATE INDEX IF NOT EXISTS idx_discovery_locations_dispensary + ON dutchie_discovery_locations(dispensary_id) + WHERE dispensary_id IS NOT NULL; + +CREATE INDEX IF NOT EXISTS idx_discovery_locations_discovered + ON dutchie_discovery_locations(status, first_seen_at DESC) + WHERE status = 'discovered'; + +CREATE INDEX IF NOT EXISTS idx_discovery_locations_active + ON dutchie_discovery_locations(active) + WHERE active = TRUE; + +CREATE INDEX IF NOT EXISTS idx_discovery_locations_coords + ON dutchie_discovery_locations(latitude, longitude) + WHERE latitude IS NOT NULL AND longitude IS NOT NULL; + +COMMENT ON TABLE dutchie_discovery_locations IS 'Discovered store locations from Dutchie. Held in staging until verified.'; + + +-- ============================================================================ +-- SECTION 3: ADD CANADIAN PROVINCES TO STATES TABLE +-- ============================================================================ +-- Support for Canadian provinces (Ontario, BC, Alberta, etc.) + +INSERT INTO states (code, name, timezone, is_active, crawl_enabled) VALUES + ('AB', 'Alberta', 'America/Edmonton', TRUE, TRUE), + ('BC', 'British Columbia', 'America/Vancouver', TRUE, TRUE), + ('MB', 'Manitoba', 'America/Winnipeg', TRUE, TRUE), + ('NB', 'New Brunswick', 'America/Moncton', TRUE, TRUE), + ('NL', 'Newfoundland and Labrador', 'America/St_Johns', TRUE, TRUE), + ('NS', 'Nova Scotia', 'America/Halifax', TRUE, TRUE), + ('NT', 'Northwest Territories', 'America/Yellowknife', TRUE, TRUE), + ('NU', 'Nunavut', 'America/Iqaluit', TRUE, TRUE), + ('ON', 'Ontario', 'America/Toronto', TRUE, TRUE), + ('PE', 'Prince Edward Island', 'America/Halifax', TRUE, TRUE), + ('QC', 'Quebec', 'America/Montreal', TRUE, TRUE), + ('SK', 'Saskatchewan', 'America/Regina', TRUE, TRUE), + ('YT', 'Yukon', 'America/Whitehorse', TRUE, TRUE) +ON CONFLICT (code) DO UPDATE SET + name = EXCLUDED.name, + timezone = COALESCE(states.timezone, EXCLUDED.timezone), + updated_at = NOW(); + + +-- ============================================================================ +-- SECTION 4: VIEWS FOR DISCOVERY MONITORING +-- ============================================================================ + +-- View: Discovery status summary +CREATE OR REPLACE VIEW v_discovery_status AS +SELECT + platform, + country_code, + state_code, + status, + COUNT(*) AS location_count, + COUNT(*) FILTER (WHERE dispensary_id IS NOT NULL) AS linked_count, + MIN(first_seen_at) AS earliest_discovery, + MAX(last_seen_at) AS latest_activity +FROM dutchie_discovery_locations +GROUP BY platform, country_code, state_code, status +ORDER BY country_code, state_code, status; + +-- View: Unverified discoveries awaiting action +CREATE OR REPLACE VIEW v_discovery_pending AS +SELECT + dl.id, + dl.platform, + dl.name, + dl.city, + dl.state_code, + dl.country_code, + dl.platform_menu_url, + dl.first_seen_at, + dl.last_seen_at, + dl.offers_delivery, + dl.offers_pickup, + dl.is_recreational, + dl.is_medical, + dc.city_name AS discovery_city_name +FROM dutchie_discovery_locations dl +LEFT JOIN dutchie_discovery_cities dc ON dc.id = dl.discovery_city_id +WHERE dl.status = 'discovered' + AND dl.active = TRUE +ORDER BY dl.state_code, dl.city, dl.name; + +-- View: City crawl status +CREATE OR REPLACE VIEW v_discovery_cities_status AS +SELECT + dc.id, + dc.platform, + dc.city_name, + dc.state_code, + dc.country_code, + dc.crawl_enabled, + dc.last_crawled_at, + dc.location_count, + COUNT(dl.id) AS actual_locations, + COUNT(dl.id) FILTER (WHERE dl.status = 'discovered') AS pending_count, + COUNT(dl.id) FILTER (WHERE dl.status = 'verified') AS verified_count, + COUNT(dl.id) FILTER (WHERE dl.status = 'rejected') AS rejected_count +FROM dutchie_discovery_cities dc +LEFT JOIN dutchie_discovery_locations dl ON dl.discovery_city_id = dc.id +GROUP BY dc.id, dc.platform, dc.city_name, dc.state_code, dc.country_code, + dc.crawl_enabled, dc.last_crawled_at, dc.location_count +ORDER BY dc.country_code, dc.state_code, dc.city_name; + + +-- ============================================================================ +-- DONE +-- ============================================================================ + +SELECT 'Migration 053 completed successfully. Discovery schema created.' AS status; diff --git a/backend/migrations/054_worker_metadata.sql b/backend/migrations/054_worker_metadata.sql new file mode 100644 index 00000000..336fa15f --- /dev/null +++ b/backend/migrations/054_worker_metadata.sql @@ -0,0 +1,49 @@ +-- Migration 054: Worker Metadata for Named Workforce +-- Adds worker_name and worker_role to job tables for displaying friendly worker identities + +-- Add worker metadata columns to job_schedules +ALTER TABLE job_schedules + ADD COLUMN IF NOT EXISTS worker_name VARCHAR(50), + ADD COLUMN IF NOT EXISTS worker_role VARCHAR(100); + +COMMENT ON COLUMN job_schedules.worker_name IS 'Friendly name for the worker (e.g., Alice, Henry, Bella, Oscar)'; +COMMENT ON COLUMN job_schedules.worker_role IS 'Description of worker role (e.g., Store Discovery Worker, GraphQL Product Sync)'; + +-- Add worker metadata columns to job_run_logs +ALTER TABLE job_run_logs + ADD COLUMN IF NOT EXISTS worker_name VARCHAR(50), + ADD COLUMN IF NOT EXISTS run_role VARCHAR(100); + +COMMENT ON COLUMN job_run_logs.worker_name IS 'Name of the worker that executed this run (copied from schedule)'; +COMMENT ON COLUMN job_run_logs.run_role IS 'Role description for this specific run'; + +-- Add worker_name to dispensary_crawl_jobs (for tracking which named worker enqueued it) +ALTER TABLE dispensary_crawl_jobs + ADD COLUMN IF NOT EXISTS enqueued_by_worker VARCHAR(50); + +COMMENT ON COLUMN dispensary_crawl_jobs.enqueued_by_worker IS 'Name of the worker that enqueued this job'; + +-- Update existing schedules with worker names +UPDATE job_schedules SET + worker_name = 'Bella', + worker_role = 'GraphQL Product Sync' +WHERE job_name = 'dutchie_az_product_crawl' AND worker_name IS NULL; + +UPDATE job_schedules SET + worker_name = 'Henry', + worker_role = 'Entry Point Finder' +WHERE job_name = 'dutchie_az_menu_detection' AND worker_name IS NULL; + +UPDATE job_schedules SET + worker_name = 'Alice', + worker_role = 'Store Discovery' +WHERE job_name = 'dutchie_store_discovery' AND worker_name IS NULL; + +UPDATE job_schedules SET + worker_name = 'Oscar', + worker_role = 'Analytics Refresh' +WHERE job_name = 'analytics_refresh' AND worker_name IS NULL; + +-- Create index for worker name lookups +CREATE INDEX IF NOT EXISTS idx_job_run_logs_worker_name ON job_run_logs(worker_name); +CREATE INDEX IF NOT EXISTS idx_dispensary_crawl_jobs_enqueued_by ON dispensary_crawl_jobs(enqueued_by_worker); diff --git a/backend/migrations/055_workforce_enhancements.sql b/backend/migrations/055_workforce_enhancements.sql new file mode 100644 index 00000000..65b10a38 --- /dev/null +++ b/backend/migrations/055_workforce_enhancements.sql @@ -0,0 +1,123 @@ +-- Migration 055: Workforce System Enhancements +-- Adds visibility tracking, slug change tracking, and scope support for workers + +-- ============================================================ +-- 1. VISIBILITY TRACKING FOR BELLA (Product Sync) +-- ============================================================ + +-- Add visibility tracking to dutchie_products +ALTER TABLE dutchie_products + ADD COLUMN IF NOT EXISTS visibility_lost BOOLEAN DEFAULT FALSE, + ADD COLUMN IF NOT EXISTS visibility_lost_at TIMESTAMPTZ, + ADD COLUMN IF NOT EXISTS visibility_restored_at TIMESTAMPTZ; + +COMMENT ON COLUMN dutchie_products.visibility_lost IS 'True if product disappeared from GraphQL results'; +COMMENT ON COLUMN dutchie_products.visibility_lost_at IS 'When product was last marked as visibility lost'; +COMMENT ON COLUMN dutchie_products.visibility_restored_at IS 'When product reappeared after being lost'; + +-- Index for visibility queries +CREATE INDEX IF NOT EXISTS idx_dutchie_products_visibility_lost + ON dutchie_products(dispensary_id, visibility_lost) + WHERE visibility_lost = TRUE; + +-- ============================================================ +-- 2. SLUG CHANGE TRACKING FOR ALICE (Store Discovery) +-- ============================================================ + +-- Add slug change and retirement tracking to discovery locations +ALTER TABLE dutchie_discovery_locations + ADD COLUMN IF NOT EXISTS slug_changed_at TIMESTAMPTZ, + ADD COLUMN IF NOT EXISTS previous_slug VARCHAR(255), + ADD COLUMN IF NOT EXISTS retired_at TIMESTAMPTZ, + ADD COLUMN IF NOT EXISTS retirement_reason VARCHAR(100); + +COMMENT ON COLUMN dutchie_discovery_locations.slug_changed_at IS 'When the platform slug was last changed'; +COMMENT ON COLUMN dutchie_discovery_locations.previous_slug IS 'Previous slug before the last change'; +COMMENT ON COLUMN dutchie_discovery_locations.retired_at IS 'When store was marked as retired/removed'; +COMMENT ON COLUMN dutchie_discovery_locations.retirement_reason IS 'Reason for retirement (removed_from_source, closed, etc.)'; + +-- Index for finding retired stores +CREATE INDEX IF NOT EXISTS idx_dutchie_discovery_locations_retired + ON dutchie_discovery_locations(retired_at) + WHERE retired_at IS NOT NULL; + +-- ============================================================ +-- 3. ID RESOLUTION TRACKING FOR HENRY (Entry Point Finder) +-- ============================================================ + +-- Add resolution tracking to dispensaries +ALTER TABLE dispensaries + ADD COLUMN IF NOT EXISTS last_id_resolution_at TIMESTAMPTZ, + ADD COLUMN IF NOT EXISTS id_resolution_attempts INT DEFAULT 0, + ADD COLUMN IF NOT EXISTS id_resolution_error TEXT; + +COMMENT ON COLUMN dispensaries.last_id_resolution_at IS 'When platform_dispensary_id was last resolved/attempted'; +COMMENT ON COLUMN dispensaries.id_resolution_attempts IS 'Number of resolution attempts'; +COMMENT ON COLUMN dispensaries.id_resolution_error IS 'Last error message from resolution attempt'; + +-- Index for finding stores needing resolution +CREATE INDEX IF NOT EXISTS idx_dispensaries_needs_resolution + ON dispensaries(state, menu_type) + WHERE platform_dispensary_id IS NULL AND menu_type = 'dutchie'; + +-- ============================================================ +-- 4. ENHANCED CITIES TABLE FOR ALICE +-- ============================================================ + +-- Add tracking columns to cities table +ALTER TABLE dutchie_discovery_cities + ADD COLUMN IF NOT EXISTS state_name VARCHAR(100), + ADD COLUMN IF NOT EXISTS discovered_at TIMESTAMPTZ DEFAULT NOW(), + ADD COLUMN IF NOT EXISTS last_verified_at TIMESTAMPTZ, + ADD COLUMN IF NOT EXISTS store_count_reported INT, + ADD COLUMN IF NOT EXISTS store_count_actual INT; + +COMMENT ON COLUMN dutchie_discovery_cities.state_name IS 'Full state name from source'; +COMMENT ON COLUMN dutchie_discovery_cities.discovered_at IS 'When city was first discovered'; +COMMENT ON COLUMN dutchie_discovery_cities.last_verified_at IS 'When city was last verified to exist'; +COMMENT ON COLUMN dutchie_discovery_cities.store_count_reported IS 'Store count reported by source'; +COMMENT ON COLUMN dutchie_discovery_cities.store_count_actual IS 'Actual store count from discovery'; + +-- ============================================================ +-- 5. UPDATE WORKER ROLES (Standardize naming) +-- ============================================================ + +-- Update existing workers to use standardized role names +UPDATE job_schedules SET worker_role = 'store_discovery' + WHERE worker_name = 'Alice' AND worker_role = 'Store Discovery'; + +UPDATE job_schedules SET worker_role = 'entry_point_finder' + WHERE worker_name = 'Henry' AND worker_role = 'Entry Point Finder'; + +UPDATE job_schedules SET worker_role = 'product_sync' + WHERE worker_name = 'Bella' AND worker_role = 'GraphQL Product Sync'; + +UPDATE job_schedules SET worker_role = 'analytics_refresh' + WHERE worker_name = 'Oscar' AND worker_role = 'Analytics Refresh'; + +-- ============================================================ +-- 6. VISIBILITY EVENTS IN SNAPSHOTS (JSONB approach) +-- ============================================================ + +-- Add visibility_events array to product snapshots metadata +-- This will store: [{event_type, timestamp, worker_name}] +-- No schema change needed - we use existing metadata JSONB column + +-- ============================================================ +-- 7. INDEXES FOR WORKER QUERIES +-- ============================================================ + +-- Index for finding recently added stores (for Henry) +CREATE INDEX IF NOT EXISTS idx_dutchie_discovery_locations_created + ON dutchie_discovery_locations(created_at DESC) + WHERE active = TRUE; + +-- Index for scope-based queries (by state) +CREATE INDEX IF NOT EXISTS idx_dispensaries_state_menu + ON dispensaries(state, menu_type) + WHERE menu_type IS NOT NULL; + +-- Record migration +INSERT INTO schema_migrations (version, name, applied_at) +VALUES (55, '055_workforce_enhancements', NOW()) +ON CONFLICT (version) DO NOTHING; diff --git a/backend/migrations/056_fix_worker_and_run_logs.sql b/backend/migrations/056_fix_worker_and_run_logs.sql new file mode 100644 index 00000000..2b09de2d --- /dev/null +++ b/backend/migrations/056_fix_worker_and_run_logs.sql @@ -0,0 +1,110 @@ +-- Migration 056: Fix Worker Metadata and Job Run Logs +-- +-- This migration safely ensures all expected schema exists for: +-- 1. job_schedules - worker_name, worker_role columns +-- 2. job_run_logs - entire table creation if missing +-- +-- Uses IF NOT EXISTS / ADD COLUMN IF NOT EXISTS for idempotency. +-- Safe to run on databases that already have some or all of these changes. + +-- ============================================================ +-- 1. ADD MISSING COLUMNS TO job_schedules +-- ============================================================ + +ALTER TABLE job_schedules + ADD COLUMN IF NOT EXISTS worker_name VARCHAR(50), + ADD COLUMN IF NOT EXISTS worker_role VARCHAR(100); + +COMMENT ON COLUMN job_schedules.worker_name IS 'Friendly name for the worker (e.g., Alice, Henry, Bella, Oscar)'; +COMMENT ON COLUMN job_schedules.worker_role IS 'Description of worker role (e.g., store_discovery, product_sync)'; + +-- ============================================================ +-- 2. CREATE job_run_logs TABLE IF NOT EXISTS +-- ============================================================ + +CREATE TABLE IF NOT EXISTS job_run_logs ( + id SERIAL PRIMARY KEY, + schedule_id INTEGER NOT NULL REFERENCES job_schedules(id) ON DELETE CASCADE, + job_name VARCHAR(100) NOT NULL, + status VARCHAR(20) NOT NULL, -- 'pending', 'running', 'success', 'error', 'partial' + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + duration_ms INTEGER, + error_message TEXT, + + -- Results summary + items_processed INTEGER DEFAULT 0, + items_succeeded INTEGER DEFAULT 0, + items_failed INTEGER DEFAULT 0, + + -- Worker metadata (from scheduler.ts createRunLog function) + worker_name VARCHAR(50), + run_role VARCHAR(100), + + -- Additional run details + metadata JSONB, + + created_at TIMESTAMPTZ DEFAULT NOW() +); + +-- Create indexes if they don't exist +CREATE INDEX IF NOT EXISTS idx_job_run_logs_schedule ON job_run_logs(schedule_id); +CREATE INDEX IF NOT EXISTS idx_job_run_logs_job_name ON job_run_logs(job_name); +CREATE INDEX IF NOT EXISTS idx_job_run_logs_status ON job_run_logs(status); +CREATE INDEX IF NOT EXISTS idx_job_run_logs_created ON job_run_logs(created_at); +CREATE INDEX IF NOT EXISTS idx_job_run_logs_worker_name ON job_run_logs(worker_name); + +-- ============================================================ +-- 3. ADD enqueued_by_worker TO dispensary_crawl_jobs IF EXISTS +-- ============================================================ + +DO $$ +BEGIN + -- Only add column if dispensary_crawl_jobs table exists + IF EXISTS (SELECT FROM information_schema.tables WHERE table_name = 'dispensary_crawl_jobs') THEN + ALTER TABLE dispensary_crawl_jobs + ADD COLUMN IF NOT EXISTS enqueued_by_worker VARCHAR(50); + + COMMENT ON COLUMN dispensary_crawl_jobs.enqueued_by_worker IS 'Name of the worker that enqueued this job'; + + CREATE INDEX IF NOT EXISTS idx_dispensary_crawl_jobs_enqueued_by + ON dispensary_crawl_jobs(enqueued_by_worker); + END IF; +END $$; + +-- ============================================================ +-- 4. SEED DEFAULT WORKER NAMES FOR EXISTING SCHEDULES +-- ============================================================ + +UPDATE job_schedules SET + worker_name = 'Bella', + worker_role = 'product_sync' +WHERE job_name = 'dutchie_az_product_crawl' AND worker_name IS NULL; + +UPDATE job_schedules SET + worker_name = 'Henry', + worker_role = 'entry_point_finder' +WHERE job_name = 'dutchie_az_menu_detection' AND worker_name IS NULL; + +UPDATE job_schedules SET + worker_name = 'Alice', + worker_role = 'store_discovery' +WHERE job_name = 'dutchie_store_discovery' AND worker_name IS NULL; + +UPDATE job_schedules SET + worker_name = 'Oscar', + worker_role = 'analytics_refresh' +WHERE job_name = 'analytics_refresh' AND worker_name IS NULL; + +-- ============================================================ +-- 5. RECORD MIGRATION (if schema_migrations table exists) +-- ============================================================ + +DO $$ +BEGIN + IF EXISTS (SELECT FROM information_schema.tables WHERE table_name = 'schema_migrations') THEN + INSERT INTO schema_migrations (version, name, applied_at) + VALUES (56, '056_fix_worker_and_run_logs', NOW()) + ON CONFLICT (version) DO NOTHING; + END IF; +END $$; diff --git a/backend/package.json b/backend/package.json index d2eeba55..23e85ad8 100755 --- a/backend/package.json +++ b/backend/package.json @@ -10,11 +10,18 @@ "migrate": "tsx src/db/migrate.ts", "seed": "tsx src/db/seed.ts", "migrate:az": "tsx src/dutchie-az/db/migrate.ts", - "health:az": "tsx -e \"import { healthCheck } from './src/dutchie-az/db/connection'; (async()=>{ const ok=await healthCheck(); console.log(ok?'AZ DB healthy':'AZ DB NOT reachable'); process.exit(ok?0:1); })();\"" + "health:az": "tsx -e \"import { healthCheck } from './src/dutchie-az/db/connection'; (async()=>{ const ok=await healthCheck(); console.log(ok?'AZ DB healthy':'AZ DB NOT reachable'); process.exit(ok?0:1); })();\"", + "system:smoke-test": "tsx src/scripts/system-smoke-test.ts", + "discovery:dt:cities:auto": "tsx src/dutchie-az/discovery/discovery-dt-cities-auto.ts", + "discovery:dt:cities:manual": "tsx src/dutchie-az/discovery/discovery-dt-cities-manual-seed.ts", + "discovery:dt:locations": "tsx src/dutchie-az/discovery/discovery-dt-locations-from-cities.ts", + "backfill:legacy:canonical": "tsx src/scripts/backfill-legacy-to-canonical.ts", + "seed:dt:cities:bulk": "tsx src/scripts/seed-dt-cities-bulk.ts" }, "dependencies": { "axios": "^1.6.2", "bcrypt": "^5.1.1", + "cheerio": "^1.1.2", "cors": "^2.8.5", "dotenv": "^16.3.1", "express": "^4.18.2", diff --git a/backend/setup-local.sh b/backend/setup-local.sh new file mode 100755 index 00000000..7fc5ae38 --- /dev/null +++ b/backend/setup-local.sh @@ -0,0 +1,224 @@ +#!/bin/bash +# CannaiQ Local Development Setup (Idempotent) +# +# This script starts the complete local development environment: +# - PostgreSQL (cannaiq-postgres) on port 54320 +# - Backend API on port 3010 +# - CannaiQ Admin UI on port 8080 +# - FindADispo Consumer UI on port 3001 +# - Findagram Consumer UI on port 3002 +# +# Usage: ./setup-local.sh +# +# URLs: +# Admin: http://localhost:8080/admin +# FindADispo: http://localhost:3001 +# Findagram: http://localhost:3002 +# Backend: http://localhost:3010 +# +# Idempotent: Safe to run multiple times. Already-running services are left alone. + +set -e + +# Colors for output +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +NC='\033[0m' # No Color + +echo -e "${BLUE}================================${NC}" +echo -e "${BLUE} CannaiQ Local Dev Setup${NC}" +echo -e "${BLUE}================================${NC}" +echo "" + +# Check for required tools +command -v docker >/dev/null 2>&1 || { echo -e "${RED}Error: docker is required but not installed.${NC}" >&2; exit 1; } +command -v npm >/dev/null 2>&1 || { echo -e "${RED}Error: npm is required but not installed.${NC}" >&2; exit 1; } + +# Get the script directory +SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" +ROOT_DIR="$SCRIPT_DIR/.." +cd "$SCRIPT_DIR" + +# Step 1: PostgreSQL +PG_RUNNING=$(docker ps --filter "name=cannaiq-postgres" --filter "status=running" -q) +if [ -n "$PG_RUNNING" ]; then + echo -e "${GREEN}[1/6] PostgreSQL already running (cannaiq-postgres)${NC}" +else + echo -e "${YELLOW}[1/6] Starting PostgreSQL (cannaiq-postgres)...${NC}" + docker compose -f docker-compose.local.yml up -d cannaiq-postgres + + # Wait for PostgreSQL to be ready + echo -e "${YELLOW} Waiting for PostgreSQL to be ready...${NC}" + until docker exec cannaiq-postgres pg_isready -U cannaiq >/dev/null 2>&1; do + sleep 1 + done + echo -e "${GREEN} PostgreSQL ready on port 54320${NC}" +fi + +# Step 2: Create storage directories (always safe to run) +mkdir -p storage/images/products +mkdir -p storage/images/brands +mkdir -p public/images + +# Step 3: Backend +if lsof -i:3010 >/dev/null 2>&1; then + echo -e "${GREEN}[2/6] Backend already running on port 3010${NC}" +else + echo -e "${YELLOW}[2/6] Starting Backend API...${NC}" + + # Install dependencies if needed + if [ ! -d "node_modules" ]; then + echo -e "${YELLOW} Installing backend dependencies...${NC}" + npm install + fi + + # Set environment for local mode + export STORAGE_DRIVER=local + export STORAGE_BASE_PATH=./storage + export PORT=3010 + + # Start backend in background + npm run dev > /tmp/cannaiq-backend.log 2>&1 & + BACKEND_PID=$! + echo $BACKEND_PID > /tmp/cannaiq-backend.pid + echo -e "${GREEN} Backend starting (PID: $BACKEND_PID)${NC}" + + # Wait briefly for backend to start + sleep 3 +fi + +# Step 4: CannaiQ Admin UI +if lsof -i:8080 >/dev/null 2>&1; then + echo -e "${GREEN}[3/6] CannaiQ Admin already running on port 8080${NC}" +else + echo -e "${YELLOW}[3/6] Starting CannaiQ Admin UI...${NC}" + + cd "$ROOT_DIR/cannaiq" + + # Install dependencies if needed + if [ ! -d "node_modules" ]; then + echo -e "${YELLOW} Installing cannaiq dependencies...${NC}" + npm install + fi + + # Start frontend in background + npm run dev:admin > /tmp/cannaiq-frontend.log 2>&1 & + FRONTEND_PID=$! + echo $FRONTEND_PID > /tmp/cannaiq-frontend.pid + echo -e "${GREEN} CannaiQ Admin starting (PID: $FRONTEND_PID)${NC}" + + cd "$SCRIPT_DIR" +fi + +# Step 5: FindADispo Consumer UI +if lsof -i:3001 >/dev/null 2>&1; then + echo -e "${GREEN}[4/6] FindADispo already running on port 3001${NC}" +else + echo -e "${YELLOW}[4/6] Starting FindADispo Consumer UI...${NC}" + + cd "$ROOT_DIR/findadispo/frontend" + + # Install dependencies if needed + if [ ! -d "node_modules" ]; then + echo -e "${YELLOW} Installing findadispo dependencies...${NC}" + npm install + fi + + # Start in background on port 3001 + PORT=3001 npm run dev > /tmp/findadispo-frontend.log 2>&1 & + FINDADISPO_PID=$! + echo $FINDADISPO_PID > /tmp/findadispo-frontend.pid + echo -e "${GREEN} FindADispo starting (PID: $FINDADISPO_PID)${NC}" + + cd "$SCRIPT_DIR" +fi + +# Step 6: Findagram Consumer UI +if lsof -i:3002 >/dev/null 2>&1; then + echo -e "${GREEN}[5/6] Findagram already running on port 3002${NC}" +else + echo -e "${YELLOW}[5/6] Starting Findagram Consumer UI...${NC}" + + cd "$ROOT_DIR/findagram/frontend" + + # Install dependencies if needed + if [ ! -d "node_modules" ]; then + echo -e "${YELLOW} Installing findagram dependencies...${NC}" + npm install + fi + + # Start in background on port 3002 + PORT=3002 npm run dev > /tmp/findagram-frontend.log 2>&1 & + FINDAGRAM_PID=$! + echo $FINDAGRAM_PID > /tmp/findagram-frontend.pid + echo -e "${GREEN} Findagram starting (PID: $FINDAGRAM_PID)${NC}" + + cd "$SCRIPT_DIR" +fi + +# Step 7: Health checks for newly started services +echo "" +echo -e "${YELLOW}[6/6] Checking service health...${NC}" + +# Check backend if it was just started +if ! lsof -i:3010 >/dev/null 2>&1; then + for i in {1..15}; do + if curl -s http://localhost:3010/health > /dev/null 2>&1; then + break + fi + sleep 1 + done +fi + +if curl -s http://localhost:3010/health > /dev/null 2>&1; then + echo -e "${GREEN} Backend API: OK (port 3010)${NC}" +else + echo -e "${YELLOW} Backend API: Starting (check: tail -f /tmp/cannaiq-backend.log)${NC}" +fi + +# Check CannaiQ Admin +if curl -s http://localhost:8080 > /dev/null 2>&1; then + echo -e "${GREEN} CannaiQ Admin: OK (port 8080)${NC}" +else + echo -e "${YELLOW} CannaiQ Admin: Starting (check: tail -f /tmp/cannaiq-frontend.log)${NC}" +fi + +# Check FindADispo +sleep 2 +if curl -s http://localhost:3001 > /dev/null 2>&1; then + echo -e "${GREEN} FindADispo: OK (port 3001)${NC}" +else + echo -e "${YELLOW} FindADispo: Starting (check: tail -f /tmp/findadispo-frontend.log)${NC}" +fi + +# Check Findagram +if curl -s http://localhost:3002 > /dev/null 2>&1; then + echo -e "${GREEN} Findagram: OK (port 3002)${NC}" +else + echo -e "${YELLOW} Findagram: Starting (check: tail -f /tmp/findagram-frontend.log)${NC}" +fi + +# Print final status +echo "" +echo -e "${BLUE}================================${NC}" +echo -e "${GREEN} Local Environment Ready${NC}" +echo -e "${BLUE}================================${NC}" +echo "" +echo -e " ${BLUE}Services:${NC}" +echo -e " Postgres: localhost:54320" +echo -e " Backend API: http://localhost:3010" +echo "" +echo -e " ${BLUE}Frontends:${NC}" +echo -e " CannaiQ Admin: http://localhost:8080/admin" +echo -e " FindADispo: http://localhost:3001" +echo -e " Findagram: http://localhost:3002" +echo "" +echo -e "${YELLOW}To stop services:${NC} ./stop-local.sh" +echo -e "${YELLOW}View logs:${NC}" +echo " Backend: tail -f /tmp/cannaiq-backend.log" +echo " CannaiQ: tail -f /tmp/cannaiq-frontend.log" +echo " FindADispo: tail -f /tmp/findadispo-frontend.log" +echo " Findagram: tail -f /tmp/findagram-frontend.log" +echo "" diff --git a/backend/src/auth/middleware.ts b/backend/src/auth/middleware.ts index 8148a4c7..a0bc9a64 100755 --- a/backend/src/auth/middleware.ts +++ b/backend/src/auth/middleware.ts @@ -1,7 +1,7 @@ import { Request, Response, NextFunction } from 'express'; import jwt from 'jsonwebtoken'; import bcrypt from 'bcrypt'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; const JWT_SECRET = process.env.JWT_SECRET || 'change_this_in_production'; diff --git a/backend/src/canonical-hydration/RUNBOOK.md b/backend/src/canonical-hydration/RUNBOOK.md new file mode 100644 index 00000000..4f7d7edd --- /dev/null +++ b/backend/src/canonical-hydration/RUNBOOK.md @@ -0,0 +1,204 @@ +# Canonical Hydration Pipeline - Runbook + +## Overview + +The Canonical Hydration Pipeline transforms data from the `dutchie_*` source tables into the provider-agnostic canonical tables (`store_products`, `store_product_snapshots`, `crawl_runs`). This enables: + +- Unified analytics across multiple data providers +- Historical price/inventory tracking +- Provider-agnostic API endpoints + +## Architecture + +``` +Source Tables (read-only): + dutchie_products → StoreProductNormalizer → store_products + dutchie_product_snapshots → SnapshotWriter → store_product_snapshots + dispensary_crawl_jobs → CrawlRunRecorder → crawl_runs + +Orchestration: + CanonicalHydrationService coordinates all transformations +``` + +## Table Mappings + +### dutchie_products → store_products + +| Source Column | Target Column | Notes | +|---------------|---------------|-------| +| dispensary_id | dispensary_id | Direct mapping | +| external_product_id | provider_product_id | Canonical key | +| platform | provider | 'dutchie' | +| name | name_raw | Raw product name | +| brand_name | brand_name_raw | Raw brand name | +| type/subcategory | category_raw | Category info | +| price_rec (JSONB) | price_rec (DECIMAL) | Extracted from JSONB | +| price_med (JSONB) | price_med (DECIMAL) | Extracted from JSONB | +| thc | thc_percent | Parsed percentage | +| cbd | cbd_percent | Parsed percentage | +| stock_status | is_in_stock | Boolean conversion | +| total_quantity_available | stock_quantity | Direct mapping | +| primary_image_url | image_url | Direct mapping | +| created_at | first_seen_at | First seen timestamp | +| updated_at | last_seen_at | Last seen timestamp | + +### Canonical Keys + +- **store_products**: `(dispensary_id, provider, provider_product_id)` +- **store_product_snapshots**: `(store_product_id, crawl_run_id)` +- **crawl_runs**: `(source_job_type, source_job_id)` + +## CLI Commands + +### Check Hydration Status + +```bash +# Overall status +DATABASE_URL="..." npx tsx src/canonical-hydration/cli/backfill.ts --status + +# Single dispensary status +DATABASE_URL="..." npx tsx src/canonical-hydration/cli/backfill.ts --status --dispensary-id 112 +``` + +### Products-Only Hydration + +Use when source data has products but no historical snapshots/job records. + +```bash +# Dry run (see what would be done) +DATABASE_URL="..." npx tsx src/canonical-hydration/cli/products-only.ts --dry-run + +# Hydrate single dispensary +DATABASE_URL="..." npx tsx src/canonical-hydration/cli/products-only.ts --dispensary-id 112 + +# Hydrate all dispensaries +DATABASE_URL="..." npx tsx src/canonical-hydration/cli/products-only.ts +``` + +### Backfill Hydration + +Use when source data has historical job records in `dispensary_crawl_jobs`. + +```bash +# Dry run +DATABASE_URL="..." npx tsx src/canonical-hydration/cli/backfill.ts --dry-run + +# Backfill with date range +DATABASE_URL="..." npx tsx src/canonical-hydration/cli/backfill.ts --start-date 2024-01-01 --end-date 2024-12-31 + +# Backfill single dispensary +DATABASE_URL="..." npx tsx src/canonical-hydration/cli/backfill.ts --dispensary-id 112 +``` + +### Incremental Hydration + +Use for ongoing hydration of new data. + +```bash +# Single run +DATABASE_URL="..." npx tsx src/canonical-hydration/cli/incremental.ts + +# Continuous loop (runs every 60 seconds) +DATABASE_URL="..." npx tsx src/canonical-hydration/cli/incremental.ts --loop + +# Continuous loop with custom interval +DATABASE_URL="..." npx tsx src/canonical-hydration/cli/incremental.ts --loop --interval 300 +``` + +## Migration + +Apply the schema migration before first use: + +```bash +# Apply migration 050 +DATABASE_URL="..." psql -f src/migrations/050_canonical_hydration_schema.sql +``` + +This migration adds: +- `source_job_type` and `source_job_id` columns to `crawl_runs` +- Unique index on `crawl_runs (source_job_type, source_job_id)` +- Unique index on `store_product_snapshots (store_product_id, crawl_run_id)` +- Performance indexes for hydration queries + +## Idempotency + +All hydration operations are idempotent: + +- **crawl_runs**: ON CONFLICT updates existing records +- **store_products**: ON CONFLICT updates mutable fields +- **store_product_snapshots**: ON CONFLICT DO NOTHING + +Re-running hydration is safe and will not create duplicates. + +## Monitoring + +### Check Canonical Data + +```sql +-- Count canonical records +SELECT + (SELECT COUNT(*) FROM crawl_runs WHERE provider = 'dutchie') as crawl_runs, + (SELECT COUNT(*) FROM store_products WHERE provider = 'dutchie') as products, + (SELECT COUNT(*) FROM store_product_snapshots) as snapshots; + +-- Products by dispensary +SELECT dispensary_id, COUNT(*) as products +FROM store_products +WHERE provider = 'dutchie' +GROUP BY dispensary_id +ORDER BY products DESC; + +-- Recent crawl runs +SELECT id, dispensary_id, started_at, products_found, snapshots_written +FROM crawl_runs +ORDER BY started_at DESC +LIMIT 10; +``` + +### Verify Hydration Completeness + +```sql +-- Compare source vs canonical product counts +SELECT + dp.dispensary_id, + COUNT(DISTINCT dp.id) as source_products, + COUNT(DISTINCT sp.id) as canonical_products +FROM dutchie_products dp +LEFT JOIN store_products sp + ON sp.dispensary_id = dp.dispensary_id + AND sp.provider = 'dutchie' + AND sp.provider_product_id = dp.external_product_id +GROUP BY dp.dispensary_id +ORDER BY dp.dispensary_id; +``` + +## Troubleshooting + +### "invalid input syntax for type integer" + +This usually means a type mismatch between source and target columns. The most common case is `brand_id` - the source has UUID strings but the target expects integers. The normalizer sets `brand_id = null` to handle this. + +### "could not determine data type of parameter $1" + +This indicates a batch insert issue with parameter indexing. Ensure each batch has its own parameter indexing starting from $1. + +### Empty Snapshots + +If `snapshotsWritten` is 0 but products were upserted: +1. Check if snapshots already exist for the crawl run (ON CONFLICT DO NOTHING) +2. Verify store_products exist with the correct dispensary_id and provider + +## Performance + +Typical performance metrics: +- ~1000 products/second for upsert +- ~2000 snapshots/second for insert +- 39 dispensaries with 37K products: ~17 seconds + +For large backfills, use `--batch-size` to control memory usage. + +## Known Limitations + +1. **brand_id not mapped**: Source brand_id is UUID, target expects integer. Currently set to null. +2. **No historical snapshots**: If source has no `dutchie_product_snapshots`, use products-only mode which creates initial snapshots from current product state. +3. **Source jobs empty**: If `dispensary_crawl_jobs` is empty, use products-only mode. diff --git a/backend/src/canonical-hydration/cli/backfill.ts b/backend/src/canonical-hydration/cli/backfill.ts new file mode 100644 index 00000000..5cb8b4de --- /dev/null +++ b/backend/src/canonical-hydration/cli/backfill.ts @@ -0,0 +1,170 @@ +#!/usr/bin/env npx tsx +/** + * Backfill CLI - Historical data hydration + * + * Usage: + * npx tsx src/canonical-hydration/cli/backfill.ts [options] + * + * Options: + * --dispensary-id Hydrate only a specific dispensary + * --start-date Start date for backfill (ISO format) + * --end-date End date for backfill (ISO format) + * --batch-size Number of jobs to process per batch (default: 50) + * --dry-run Show what would be done without making changes + * --status Show hydration status and exit + * + * Examples: + * npx tsx src/canonical-hydration/cli/backfill.ts --status + * npx tsx src/canonical-hydration/cli/backfill.ts --dispensary-id 112 + * npx tsx src/canonical-hydration/cli/backfill.ts --start-date 2024-01-01 --end-date 2024-12-31 + * npx tsx src/canonical-hydration/cli/backfill.ts --dry-run + */ + +import { Pool } from 'pg'; +import { CanonicalHydrationService } from '../hydration-service'; +import { HydrationOptions } from '../types'; + +async function main() { + const args = process.argv.slice(2); + + // Parse command line arguments + const options: HydrationOptions = { + mode: 'backfill', + }; + let showStatus = false; + + for (let i = 0; i < args.length; i++) { + const arg = args[i]; + switch (arg) { + case '--dispensary-id': + options.dispensaryId = parseInt(args[++i]); + break; + case '--start-date': + options.startDate = new Date(args[++i]); + break; + case '--end-date': + options.endDate = new Date(args[++i]); + break; + case '--batch-size': + options.batchSize = parseInt(args[++i]); + break; + case '--dry-run': + options.dryRun = true; + break; + case '--status': + showStatus = true; + break; + case '--help': + console.log(` +Backfill CLI - Historical data hydration + +Usage: + npx tsx src/canonical-hydration/cli/backfill.ts [options] + +Options: + --dispensary-id Hydrate only a specific dispensary + --start-date Start date for backfill (ISO format) + --end-date End date for backfill (ISO format) + --batch-size Number of jobs to process per batch (default: 50) + --dry-run Show what would be done without making changes + --status Show hydration status and exit + +Examples: + npx tsx src/canonical-hydration/cli/backfill.ts --status + npx tsx src/canonical-hydration/cli/backfill.ts --dispensary-id 112 + npx tsx src/canonical-hydration/cli/backfill.ts --start-date 2024-01-01 --end-date 2024-12-31 + npx tsx src/canonical-hydration/cli/backfill.ts --dry-run + `); + process.exit(0); + } + } + + // Connect to database + const pool = new Pool({ + connectionString: process.env.DATABASE_URL, + }); + + const service = new CanonicalHydrationService({ + pool, + logger: (msg) => console.log(`[${new Date().toISOString()}] ${msg}`), + }); + + try { + if (showStatus) { + // Show status and exit + if (options.dispensaryId) { + const status = await service.getHydrationStatus(options.dispensaryId); + console.log(`\nHydration Status for Dispensary ${options.dispensaryId}:`); + console.log('═'.repeat(50)); + console.log(` Source Jobs (completed): ${status.sourceJobs}`); + console.log(` Hydrated Jobs: ${status.hydratedJobs}`); + console.log(` Unhydrated Jobs: ${status.unhydratedJobs}`); + console.log(''); + console.log(` Source Products: ${status.sourceProducts}`); + console.log(` Store Products: ${status.storeProducts}`); + console.log(''); + console.log(` Source Snapshots: ${status.sourceSnapshots}`); + console.log(` Store Snapshots: ${status.storeSnapshots}`); + } else { + const status = await service.getOverallStatus(); + console.log('\nOverall Hydration Status:'); + console.log('═'.repeat(50)); + console.log(` Dispensaries with Data: ${status.dispensariesWithData}`); + console.log(''); + console.log(` Source Jobs (completed): ${status.totalSourceJobs}`); + console.log(` Hydrated Jobs: ${status.totalHydratedJobs}`); + console.log(` Unhydrated Jobs: ${status.totalSourceJobs - status.totalHydratedJobs}`); + console.log(''); + console.log(` Source Products: ${status.totalSourceProducts}`); + console.log(` Store Products: ${status.totalStoreProducts}`); + console.log(''); + console.log(` Source Snapshots: ${status.totalSourceSnapshots}`); + console.log(` Store Snapshots: ${status.totalStoreSnapshots}`); + } + process.exit(0); + } + + // Run backfill + console.log('\n' + '═'.repeat(60)); + console.log(' CANONICAL HYDRATION - BACKFILL MODE'); + console.log('═'.repeat(60)); + console.log(` Dispensary ID: ${options.dispensaryId || 'ALL'}`); + console.log(` Start Date: ${options.startDate?.toISOString() || 'N/A'}`); + console.log(` End Date: ${options.endDate?.toISOString() || 'N/A'}`); + console.log(` Batch Size: ${options.batchSize || 50}`); + console.log(` Dry Run: ${options.dryRun ? 'YES' : 'NO'}`); + console.log('═'.repeat(60) + '\n'); + + const result = await service.hydrate(options); + + console.log('\n' + '═'.repeat(60)); + console.log(' HYDRATION COMPLETE'); + console.log('═'.repeat(60)); + console.log(` Crawl Runs Created: ${result.crawlRunsCreated}`); + console.log(` Crawl Runs Skipped: ${result.crawlRunsSkipped}`); + console.log(` Products Upserted: ${result.productsUpserted}`); + console.log(` Snapshots Written: ${result.snapshotsWritten}`); + console.log(` Duration: ${result.durationMs}ms`); + console.log(` Errors: ${result.errors.length}`); + + if (result.errors.length > 0) { + console.log('\nErrors:'); + for (const error of result.errors.slice(0, 10)) { + console.log(` - ${error}`); + } + if (result.errors.length > 10) { + console.log(` ... and ${result.errors.length - 10} more`); + } + } + console.log('═'.repeat(60) + '\n'); + + process.exit(result.errors.length > 0 ? 1 : 0); + } catch (error: any) { + console.error('Fatal error:', error.message); + process.exit(1); + } finally { + await pool.end(); + } +} + +main(); diff --git a/backend/src/canonical-hydration/cli/incremental.ts b/backend/src/canonical-hydration/cli/incremental.ts new file mode 100644 index 00000000..11eb1149 --- /dev/null +++ b/backend/src/canonical-hydration/cli/incremental.ts @@ -0,0 +1,142 @@ +#!/usr/bin/env npx tsx +/** + * Incremental CLI - Ongoing data hydration + * + * Usage: + * npx tsx src/canonical-hydration/cli/incremental.ts [options] + * + * Options: + * --dispensary-id Hydrate only a specific dispensary + * --batch-size Number of jobs to process per batch (default: 100) + * --loop Run continuously in a loop + * --interval Interval between loops (default: 60) + * --dry-run Show what would be done without making changes + * + * Examples: + * npx tsx src/canonical-hydration/cli/incremental.ts + * npx tsx src/canonical-hydration/cli/incremental.ts --dispensary-id 112 + * npx tsx src/canonical-hydration/cli/incremental.ts --loop --interval 300 + * npx tsx src/canonical-hydration/cli/incremental.ts --dry-run + */ + +import { Pool } from 'pg'; +import { CanonicalHydrationService } from '../hydration-service'; +import { HydrationOptions } from '../types'; + +async function main() { + const args = process.argv.slice(2); + + // Parse command line arguments + const options: HydrationOptions = { + mode: 'incremental', + }; + let loop = false; + let intervalSeconds = 60; + + for (let i = 0; i < args.length; i++) { + const arg = args[i]; + switch (arg) { + case '--dispensary-id': + options.dispensaryId = parseInt(args[++i]); + break; + case '--batch-size': + options.batchSize = parseInt(args[++i]); + break; + case '--loop': + loop = true; + break; + case '--interval': + intervalSeconds = parseInt(args[++i]); + break; + case '--dry-run': + options.dryRun = true; + break; + case '--help': + console.log(` +Incremental CLI - Ongoing data hydration + +Usage: + npx tsx src/canonical-hydration/cli/incremental.ts [options] + +Options: + --dispensary-id Hydrate only a specific dispensary + --batch-size Number of jobs to process per batch (default: 100) + --loop Run continuously in a loop + --interval Interval between loops (default: 60) + --dry-run Show what would be done without making changes + +Examples: + npx tsx src/canonical-hydration/cli/incremental.ts + npx tsx src/canonical-hydration/cli/incremental.ts --dispensary-id 112 + npx tsx src/canonical-hydration/cli/incremental.ts --loop --interval 300 + npx tsx src/canonical-hydration/cli/incremental.ts --dry-run + `); + process.exit(0); + } + } + + // Connect to database + const pool = new Pool({ + connectionString: process.env.DATABASE_URL, + }); + + const service = new CanonicalHydrationService({ + pool, + logger: (msg) => console.log(`[${new Date().toISOString()}] ${msg}`), + }); + + const log = (msg: string) => console.log(`[${new Date().toISOString()}] ${msg}`); + + // Graceful shutdown + let running = true; + process.on('SIGINT', () => { + log('Received SIGINT, shutting down...'); + running = false; + }); + process.on('SIGTERM', () => { + log('Received SIGTERM, shutting down...'); + running = false; + }); + + try { + console.log('\n' + '═'.repeat(60)); + console.log(' CANONICAL HYDRATION - INCREMENTAL MODE'); + console.log('═'.repeat(60)); + console.log(` Dispensary ID: ${options.dispensaryId || 'ALL'}`); + console.log(` Batch Size: ${options.batchSize || 100}`); + console.log(` Loop Mode: ${loop ? 'YES' : 'NO'}`); + if (loop) { + console.log(` Interval: ${intervalSeconds}s`); + } + console.log(` Dry Run: ${options.dryRun ? 'YES' : 'NO'}`); + console.log('═'.repeat(60) + '\n'); + + do { + const result = await service.hydrate(options); + + log(`Hydration complete: ${result.crawlRunsCreated} runs, ${result.productsUpserted} products, ${result.snapshotsWritten} snapshots (${result.durationMs}ms)`); + + if (result.errors.length > 0) { + log(`Errors: ${result.errors.length}`); + for (const error of result.errors.slice(0, 5)) { + log(` - ${error}`); + } + } + + if (loop && running) { + log(`Sleeping for ${intervalSeconds}s...`); + await new Promise(resolve => setTimeout(resolve, intervalSeconds * 1000)); + } + } while (loop && running); + + log('Incremental hydration completed'); + process.exit(0); + } catch (error: any) { + console.error('Fatal error:', error.message); + process.exit(1); + } finally { + await pool.end(); + } +} + +main(); diff --git a/backend/src/canonical-hydration/cli/products-only.ts b/backend/src/canonical-hydration/cli/products-only.ts new file mode 100644 index 00000000..13c42f63 --- /dev/null +++ b/backend/src/canonical-hydration/cli/products-only.ts @@ -0,0 +1,113 @@ +#!/usr/bin/env npx tsx +/** + * Products-Only Hydration CLI + * + * Used when there are no historical job records - creates synthetic crawl runs + * from current product data. + * + * Usage: + * npx tsx src/canonical-hydration/cli/products-only.ts [options] + * + * Options: + * --dispensary-id Hydrate only a specific dispensary + * --dry-run Show what would be done without making changes + * + * Examples: + * npx tsx src/canonical-hydration/cli/products-only.ts + * npx tsx src/canonical-hydration/cli/products-only.ts --dispensary-id 112 + * npx tsx src/canonical-hydration/cli/products-only.ts --dry-run + */ + +import { Pool } from 'pg'; +import { CanonicalHydrationService } from '../hydration-service'; + +async function main() { + const args = process.argv.slice(2); + + // Parse command line arguments + let dispensaryId: number | undefined; + let dryRun = false; + + for (let i = 0; i < args.length; i++) { + const arg = args[i]; + switch (arg) { + case '--dispensary-id': + dispensaryId = parseInt(args[++i]); + break; + case '--dry-run': + dryRun = true; + break; + case '--help': + console.log(` +Products-Only Hydration CLI + +Used when there are no historical job records - creates synthetic crawl runs +from current product data. + +Usage: + npx tsx src/canonical-hydration/cli/products-only.ts [options] + +Options: + --dispensary-id Hydrate only a specific dispensary + --dry-run Show what would be done without making changes + +Examples: + npx tsx src/canonical-hydration/cli/products-only.ts + npx tsx src/canonical-hydration/cli/products-only.ts --dispensary-id 112 + npx tsx src/canonical-hydration/cli/products-only.ts --dry-run + `); + process.exit(0); + } + } + + // Connect to database + const pool = new Pool({ + connectionString: process.env.DATABASE_URL, + }); + + const service = new CanonicalHydrationService({ + pool, + logger: (msg) => console.log(`[${new Date().toISOString()}] ${msg}`), + }); + + try { + console.log('\n' + '═'.repeat(60)); + console.log(' CANONICAL HYDRATION - PRODUCTS-ONLY MODE'); + console.log('═'.repeat(60)); + console.log(` Dispensary ID: ${dispensaryId || 'ALL'}`); + console.log(` Dry Run: ${dryRun ? 'YES' : 'NO'}`); + console.log('═'.repeat(60) + '\n'); + + const result = await service.hydrateProductsOnly({ dispensaryId, dryRun }); + + console.log('\n' + '═'.repeat(60)); + console.log(' HYDRATION COMPLETE'); + console.log('═'.repeat(60)); + console.log(` Crawl Runs Created: ${result.crawlRunsCreated}`); + console.log(` Crawl Runs Skipped: ${result.crawlRunsSkipped}`); + console.log(` Products Upserted: ${result.productsUpserted}`); + console.log(` Snapshots Written: ${result.snapshotsWritten}`); + console.log(` Duration: ${result.durationMs}ms`); + console.log(` Errors: ${result.errors.length}`); + + if (result.errors.length > 0) { + console.log('\nErrors:'); + for (const error of result.errors.slice(0, 10)) { + console.log(` - ${error}`); + } + if (result.errors.length > 10) { + console.log(` ... and ${result.errors.length - 10} more`); + } + } + console.log('═'.repeat(60) + '\n'); + + process.exit(result.errors.length > 0 ? 1 : 0); + } catch (error: any) { + console.error('Fatal error:', error.message); + process.exit(1); + } finally { + await pool.end(); + } +} + +main(); diff --git a/backend/src/canonical-hydration/crawl-run-recorder.ts b/backend/src/canonical-hydration/crawl-run-recorder.ts new file mode 100644 index 00000000..395e74b1 --- /dev/null +++ b/backend/src/canonical-hydration/crawl-run-recorder.ts @@ -0,0 +1,226 @@ +/** + * CrawlRunRecorder + * Records crawl runs from source job tables (dispensary_crawl_jobs) to canonical crawl_runs table + */ + +import { Pool, PoolClient } from 'pg'; +import { SourceJob, CrawlRun, ServiceContext, SourceJobType } from './types'; + +export class CrawlRunRecorder { + private pool: Pool; + private log: (message: string) => void; + + constructor(ctx: ServiceContext) { + this.pool = ctx.pool; + this.log = ctx.logger || console.log; + } + + /** + * Record a single crawl run from a source job + * Uses ON CONFLICT to ensure idempotency + */ + async recordCrawlRun( + sourceJob: SourceJob, + sourceJobType: SourceJobType = 'dispensary_crawl_jobs' + ): Promise { + // Skip jobs that aren't completed successfully + if (sourceJob.status !== 'completed') { + return null; + } + + const crawlRun: Partial = { + dispensary_id: sourceJob.dispensary_id, + provider: 'dutchie', // Source is always dutchie for now + started_at: sourceJob.started_at || new Date(), + finished_at: sourceJob.completed_at, + duration_ms: sourceJob.duration_ms, + status: this.mapStatus(sourceJob.status), + error_message: sourceJob.error_message, + products_found: sourceJob.products_found, + products_new: sourceJob.products_new, + products_updated: sourceJob.products_updated, + snapshots_written: null, // Will be updated after snapshot insertion + worker_id: null, + trigger_type: sourceJob.job_type === 'dutchie_product_crawl' ? 'scheduled' : 'manual', + metadata: { sourceJobType, originalJobType: sourceJob.job_type }, + source_job_type: sourceJobType, + source_job_id: sourceJob.id, + }; + + const result = await this.pool.query( + `INSERT INTO crawl_runs ( + dispensary_id, provider, started_at, finished_at, duration_ms, + status, error_message, products_found, products_new, products_updated, + snapshots_written, worker_id, trigger_type, metadata, + source_job_type, source_job_id + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16) + ON CONFLICT (source_job_type, source_job_id) WHERE source_job_id IS NOT NULL + DO UPDATE SET + finished_at = EXCLUDED.finished_at, + duration_ms = EXCLUDED.duration_ms, + status = EXCLUDED.status, + error_message = EXCLUDED.error_message, + products_found = EXCLUDED.products_found, + products_new = EXCLUDED.products_new, + products_updated = EXCLUDED.products_updated + RETURNING id`, + [ + crawlRun.dispensary_id, + crawlRun.provider, + crawlRun.started_at, + crawlRun.finished_at, + crawlRun.duration_ms, + crawlRun.status, + crawlRun.error_message, + crawlRun.products_found, + crawlRun.products_new, + crawlRun.products_updated, + crawlRun.snapshots_written, + crawlRun.worker_id, + crawlRun.trigger_type, + JSON.stringify(crawlRun.metadata), + crawlRun.source_job_type, + crawlRun.source_job_id, + ] + ); + + return result.rows[0]?.id || null; + } + + /** + * Record multiple crawl runs in a batch + */ + async recordCrawlRunsBatch( + sourceJobs: SourceJob[], + sourceJobType: SourceJobType = 'dispensary_crawl_jobs' + ): Promise<{ created: number; skipped: number; crawlRunIds: Map }> { + let created = 0; + let skipped = 0; + const crawlRunIds = new Map(); // sourceJobId -> crawlRunId + + for (const job of sourceJobs) { + const crawlRunId = await this.recordCrawlRun(job, sourceJobType); + if (crawlRunId) { + created++; + crawlRunIds.set(job.id, crawlRunId); + } else { + skipped++; + } + } + + return { created, skipped, crawlRunIds }; + } + + /** + * Update snapshots_written count for a crawl run + */ + async updateSnapshotsWritten(crawlRunId: number, snapshotsWritten: number): Promise { + await this.pool.query( + 'UPDATE crawl_runs SET snapshots_written = $1 WHERE id = $2', + [snapshotsWritten, crawlRunId] + ); + } + + /** + * Get crawl run ID by source job + */ + async getCrawlRunIdBySourceJob( + sourceJobType: SourceJobType, + sourceJobId: number + ): Promise { + const result = await this.pool.query( + 'SELECT id FROM crawl_runs WHERE source_job_type = $1 AND source_job_id = $2', + [sourceJobType, sourceJobId] + ); + return result.rows[0]?.id || null; + } + + /** + * Get unhydrated source jobs (jobs not yet recorded in crawl_runs) + */ + async getUnhydratedJobs( + dispensaryId?: number, + startDate?: Date, + limit: number = 100 + ): Promise { + let query = ` + SELECT j.* + FROM dispensary_crawl_jobs j + LEFT JOIN crawl_runs cr ON cr.source_job_type = 'dispensary_crawl_jobs' AND cr.source_job_id = j.id + WHERE cr.id IS NULL + AND j.status = 'completed' + AND j.job_type = 'dutchie_product_crawl' + `; + const params: any[] = []; + let paramIndex = 1; + + if (dispensaryId) { + query += ` AND j.dispensary_id = $${paramIndex++}`; + params.push(dispensaryId); + } + + if (startDate) { + query += ` AND j.completed_at >= $${paramIndex++}`; + params.push(startDate); + } + + query += ` ORDER BY j.completed_at ASC LIMIT $${paramIndex}`; + params.push(limit); + + const result = await this.pool.query(query, params); + return result.rows; + } + + /** + * Get all source jobs for backfill (within date range) + */ + async getSourceJobsForBackfill( + startDate?: Date, + endDate?: Date, + dispensaryId?: number, + limit: number = 1000 + ): Promise { + let query = ` + SELECT * + FROM dispensary_crawl_jobs + WHERE status = 'completed' + AND job_type = 'dutchie_product_crawl' + `; + const params: any[] = []; + let paramIndex = 1; + + if (startDate) { + query += ` AND completed_at >= $${paramIndex++}`; + params.push(startDate); + } + + if (endDate) { + query += ` AND completed_at <= $${paramIndex++}`; + params.push(endDate); + } + + if (dispensaryId) { + query += ` AND dispensary_id = $${paramIndex++}`; + params.push(dispensaryId); + } + + query += ` ORDER BY completed_at ASC LIMIT $${paramIndex}`; + params.push(limit); + + const result = await this.pool.query(query, params); + return result.rows; + } + + private mapStatus(sourceStatus: string): string { + switch (sourceStatus) { + case 'completed': + return 'success'; + case 'failed': + return 'failed'; + case 'running': + return 'running'; + default: + return sourceStatus; + } + } +} diff --git a/backend/src/canonical-hydration/hydration-service.ts b/backend/src/canonical-hydration/hydration-service.ts new file mode 100644 index 00000000..9921818f --- /dev/null +++ b/backend/src/canonical-hydration/hydration-service.ts @@ -0,0 +1,560 @@ +/** + * CanonicalHydrationService + * Orchestrates the full hydration pipeline from dutchie_* to canonical tables + */ + +import { Pool } from 'pg'; +import { CrawlRunRecorder } from './crawl-run-recorder'; +import { StoreProductNormalizer } from './store-product-normalizer'; +import { SnapshotWriter } from './snapshot-writer'; +import { HydrationOptions, HydrationResult, ServiceContext, SourceJob } from './types'; + +export class CanonicalHydrationService { + private pool: Pool; + private log: (message: string) => void; + private crawlRunRecorder: CrawlRunRecorder; + private productNormalizer: StoreProductNormalizer; + private snapshotWriter: SnapshotWriter; + + constructor(ctx: ServiceContext) { + this.pool = ctx.pool; + this.log = ctx.logger || console.log; + this.crawlRunRecorder = new CrawlRunRecorder(ctx); + this.productNormalizer = new StoreProductNormalizer(ctx); + this.snapshotWriter = new SnapshotWriter(ctx); + } + + /** + * Run the full hydration pipeline + * Supports both backfill (historical) and incremental (ongoing) modes + */ + async hydrate(options: HydrationOptions): Promise { + const startTime = Date.now(); + const result: HydrationResult = { + crawlRunsCreated: 0, + crawlRunsSkipped: 0, + productsUpserted: 0, + snapshotsWritten: 0, + errors: [], + durationMs: 0, + }; + + this.log(`Starting hydration in ${options.mode} mode`); + + try { + if (options.mode === 'backfill') { + await this.runBackfill(options, result); + } else { + await this.runIncremental(options, result); + } + } catch (err: any) { + result.errors.push(`Fatal error: ${err.message}`); + this.log(`Hydration failed: ${err.message}`); + } + + result.durationMs = Date.now() - startTime; + this.log(`Hydration completed in ${result.durationMs}ms: ${JSON.stringify({ + crawlRunsCreated: result.crawlRunsCreated, + crawlRunsSkipped: result.crawlRunsSkipped, + productsUpserted: result.productsUpserted, + snapshotsWritten: result.snapshotsWritten, + errors: result.errors.length, + })}`); + + return result; + } + + /** + * Backfill mode: Process historical data from source tables + */ + private async runBackfill(options: HydrationOptions, result: HydrationResult): Promise { + const batchSize = options.batchSize || 50; + + // Get source jobs to process + const sourceJobs = await this.crawlRunRecorder.getSourceJobsForBackfill( + options.startDate, + options.endDate, + options.dispensaryId, + 1000 // Max jobs to process + ); + + this.log(`Found ${sourceJobs.length} source jobs to backfill`); + + // Group jobs by dispensary for efficient processing + const jobsByDispensary = this.groupJobsByDispensary(sourceJobs); + + for (const [dispensaryId, jobs] of jobsByDispensary) { + this.log(`Processing dispensary ${dispensaryId} (${jobs.length} jobs)`); + + try { + // Step 1: Upsert products for this dispensary + if (!options.dryRun) { + const productResult = await this.productNormalizer.upsertProductsForDispensary(dispensaryId); + result.productsUpserted += productResult.upserted; + if (productResult.errors.length > 0) { + result.errors.push(...productResult.errors.map(e => `Dispensary ${dispensaryId}: ${e}`)); + } + } + + // Get store_product_id map for snapshot writing + const storeProductIdMap = await this.productNormalizer.getStoreProductIdMap(dispensaryId); + + // Step 2: Record crawl runs and write snapshots for each job + for (const job of jobs) { + try { + await this.processJob(job, storeProductIdMap, result, options.dryRun); + } catch (err: any) { + result.errors.push(`Job ${job.id}: ${err.message}`); + } + } + } catch (err: any) { + result.errors.push(`Dispensary ${dispensaryId}: ${err.message}`); + } + } + } + + /** + * Incremental mode: Process only unhydrated jobs + */ + private async runIncremental(options: HydrationOptions, result: HydrationResult): Promise { + const limit = options.batchSize || 100; + + // Get unhydrated jobs + const unhydratedJobs = await this.crawlRunRecorder.getUnhydratedJobs( + options.dispensaryId, + options.startDate, + limit + ); + + this.log(`Found ${unhydratedJobs.length} unhydrated jobs`); + + // Group by dispensary + const jobsByDispensary = this.groupJobsByDispensary(unhydratedJobs); + + for (const [dispensaryId, jobs] of jobsByDispensary) { + this.log(`Processing dispensary ${dispensaryId} (${jobs.length} jobs)`); + + try { + // Step 1: Upsert products + if (!options.dryRun) { + const productResult = await this.productNormalizer.upsertProductsForDispensary(dispensaryId); + result.productsUpserted += productResult.upserted; + if (productResult.errors.length > 0) { + result.errors.push(...productResult.errors.map(e => `Dispensary ${dispensaryId}: ${e}`)); + } + } + + // Get store_product_id map + const storeProductIdMap = await this.productNormalizer.getStoreProductIdMap(dispensaryId); + + // Step 2: Process each job + for (const job of jobs) { + try { + await this.processJob(job, storeProductIdMap, result, options.dryRun); + } catch (err: any) { + result.errors.push(`Job ${job.id}: ${err.message}`); + } + } + } catch (err: any) { + result.errors.push(`Dispensary ${dispensaryId}: ${err.message}`); + } + } + } + + /** + * Process a single job: record crawl run and write snapshots + */ + private async processJob( + job: SourceJob, + storeProductIdMap: Map, + result: HydrationResult, + dryRun?: boolean + ): Promise { + // Step 1: Record the crawl run + let crawlRunId: number | null = null; + + if (!dryRun) { + crawlRunId = await this.crawlRunRecorder.recordCrawlRun(job); + if (crawlRunId) { + result.crawlRunsCreated++; + } else { + result.crawlRunsSkipped++; + return; // Skip snapshot writing if crawl run wasn't created + } + } else { + // In dry run, check if it would be created + const existingId = await this.crawlRunRecorder.getCrawlRunIdBySourceJob( + 'dispensary_crawl_jobs', + job.id + ); + if (existingId) { + result.crawlRunsSkipped++; + return; + } + result.crawlRunsCreated++; + return; // Skip snapshot writing in dry run + } + + // Step 2: Write snapshots for this crawl run + if (crawlRunId && job.completed_at) { + const snapshotResult = await this.snapshotWriter.writeSnapshotsForCrawlRun( + crawlRunId, + job.dispensary_id, + storeProductIdMap, + job.completed_at + ); + + result.snapshotsWritten += snapshotResult.written; + if (snapshotResult.errors.length > 0) { + result.errors.push(...snapshotResult.errors); + } + + // Update crawl_run with snapshots_written count + await this.crawlRunRecorder.updateSnapshotsWritten(crawlRunId, snapshotResult.written); + } + } + + /** + * Hydrate a single dispensary (convenience method) + */ + async hydrateDispensary( + dispensaryId: number, + mode: 'backfill' | 'incremental' = 'incremental' + ): Promise { + return this.hydrate({ + mode, + dispensaryId, + }); + } + + /** + * Get hydration status for a dispensary + */ + async getHydrationStatus(dispensaryId: number): Promise<{ + sourceJobs: number; + hydratedJobs: number; + unhydratedJobs: number; + sourceProducts: number; + storeProducts: number; + sourceSnapshots: number; + storeSnapshots: number; + }> { + const [sourceJobs, hydratedJobs, sourceProducts, storeProducts, sourceSnapshots, storeSnapshots] = + await Promise.all([ + this.pool.query( + `SELECT COUNT(*) FROM dispensary_crawl_jobs + WHERE dispensary_id = $1 AND status = 'completed' AND job_type = 'dutchie_product_crawl'`, + [dispensaryId] + ), + this.pool.query( + `SELECT COUNT(*) FROM crawl_runs + WHERE dispensary_id = $1 AND source_job_type = 'dispensary_crawl_jobs'`, + [dispensaryId] + ), + this.pool.query( + `SELECT COUNT(*) FROM dutchie_products WHERE dispensary_id = $1`, + [dispensaryId] + ), + this.pool.query( + `SELECT COUNT(*) FROM store_products WHERE dispensary_id = $1 AND provider = 'dutchie'`, + [dispensaryId] + ), + this.pool.query( + `SELECT COUNT(*) FROM dutchie_product_snapshots WHERE dispensary_id = $1`, + [dispensaryId] + ), + this.pool.query( + `SELECT COUNT(*) FROM store_product_snapshots WHERE dispensary_id = $1`, + [dispensaryId] + ), + ]); + + const sourceJobCount = parseInt(sourceJobs.rows[0].count); + const hydratedJobCount = parseInt(hydratedJobs.rows[0].count); + + return { + sourceJobs: sourceJobCount, + hydratedJobs: hydratedJobCount, + unhydratedJobs: sourceJobCount - hydratedJobCount, + sourceProducts: parseInt(sourceProducts.rows[0].count), + storeProducts: parseInt(storeProducts.rows[0].count), + sourceSnapshots: parseInt(sourceSnapshots.rows[0].count), + storeSnapshots: parseInt(storeSnapshots.rows[0].count), + }; + } + + /** + * Get overall hydration status + */ + async getOverallStatus(): Promise<{ + totalSourceJobs: number; + totalHydratedJobs: number; + totalSourceProducts: number; + totalStoreProducts: number; + totalSourceSnapshots: number; + totalStoreSnapshots: number; + dispensariesWithData: number; + }> { + const [sourceJobs, hydratedJobs, sourceProducts, storeProducts, sourceSnapshots, storeSnapshots, dispensaries] = + await Promise.all([ + this.pool.query( + `SELECT COUNT(*) FROM dispensary_crawl_jobs + WHERE status = 'completed' AND job_type = 'dutchie_product_crawl'` + ), + this.pool.query( + `SELECT COUNT(*) FROM crawl_runs WHERE source_job_type = 'dispensary_crawl_jobs'` + ), + this.pool.query(`SELECT COUNT(*) FROM dutchie_products`), + this.pool.query(`SELECT COUNT(*) FROM store_products WHERE provider = 'dutchie'`), + this.pool.query(`SELECT COUNT(*) FROM dutchie_product_snapshots`), + this.pool.query(`SELECT COUNT(*) FROM store_product_snapshots`), + this.pool.query( + `SELECT COUNT(DISTINCT dispensary_id) FROM dutchie_products` + ), + ]); + + return { + totalSourceJobs: parseInt(sourceJobs.rows[0].count), + totalHydratedJobs: parseInt(hydratedJobs.rows[0].count), + totalSourceProducts: parseInt(sourceProducts.rows[0].count), + totalStoreProducts: parseInt(storeProducts.rows[0].count), + totalSourceSnapshots: parseInt(sourceSnapshots.rows[0].count), + totalStoreSnapshots: parseInt(storeSnapshots.rows[0].count), + dispensariesWithData: parseInt(dispensaries.rows[0].count), + }; + } + + /** + * Group jobs by dispensary ID + */ + private groupJobsByDispensary(jobs: SourceJob[]): Map { + const map = new Map(); + for (const job of jobs) { + const list = map.get(job.dispensary_id) || []; + list.push(job); + map.set(job.dispensary_id, list); + } + return map; + } + + /** + * Products-only hydration mode + * Used when there are no historical job records - creates synthetic crawl runs + * from current product data + */ + async hydrateProductsOnly(options: { + dispensaryId?: number; + dryRun?: boolean; + } = {}): Promise { + const startTime = Date.now(); + const result: HydrationResult = { + crawlRunsCreated: 0, + crawlRunsSkipped: 0, + productsUpserted: 0, + snapshotsWritten: 0, + errors: [], + durationMs: 0, + }; + + this.log('Starting products-only hydration mode'); + + try { + // Get all dispensaries with products + let dispensaryIds: number[]; + if (options.dispensaryId) { + dispensaryIds = [options.dispensaryId]; + } else { + const dispResult = await this.pool.query( + 'SELECT DISTINCT dispensary_id FROM dutchie_products ORDER BY dispensary_id' + ); + dispensaryIds = dispResult.rows.map(r => r.dispensary_id); + } + + this.log(`Processing ${dispensaryIds.length} dispensaries`); + + for (const dispensaryId of dispensaryIds) { + try { + await this.hydrateDispensaryProductsOnly(dispensaryId, result, options.dryRun); + } catch (err: any) { + result.errors.push(`Dispensary ${dispensaryId}: ${err.message}`); + } + } + } catch (err: any) { + result.errors.push(`Fatal error: ${err.message}`); + } + + result.durationMs = Date.now() - startTime; + this.log(`Products-only hydration completed in ${result.durationMs}ms: ${JSON.stringify({ + crawlRunsCreated: result.crawlRunsCreated, + productsUpserted: result.productsUpserted, + snapshotsWritten: result.snapshotsWritten, + errors: result.errors.length, + })}`); + + return result; + } + + /** + * Hydrate a single dispensary in products-only mode + */ + private async hydrateDispensaryProductsOnly( + dispensaryId: number, + result: HydrationResult, + dryRun?: boolean + ): Promise { + // Get product count and timestamps for this dispensary + const statsResult = await this.pool.query( + `SELECT COUNT(*) as cnt, MIN(created_at) as min_date, MAX(updated_at) as max_date + FROM dutchie_products WHERE dispensary_id = $1`, + [dispensaryId] + ); + const stats = statsResult.rows[0]; + const productCount = parseInt(stats.cnt); + + if (productCount === 0) { + this.log(`Dispensary ${dispensaryId}: No products, skipping`); + return; + } + + this.log(`Dispensary ${dispensaryId}: ${productCount} products`); + + // Step 1: Create synthetic crawl run + let crawlRunId: number | null = null; + const now = new Date(); + + if (!dryRun) { + // Check if we already have a synthetic run for this dispensary + const existingRun = await this.pool.query( + `SELECT id FROM crawl_runs + WHERE dispensary_id = $1 + AND source_job_type = 'products_only_hydration' + LIMIT 1`, + [dispensaryId] + ); + + if (existingRun.rows.length > 0) { + crawlRunId = existingRun.rows[0].id; + this.log(`Dispensary ${dispensaryId}: Using existing synthetic crawl run ${crawlRunId}`); + result.crawlRunsSkipped++; + } else { + // Create new synthetic crawl run + const insertResult = await this.pool.query( + `INSERT INTO crawl_runs ( + dispensary_id, provider, started_at, finished_at, duration_ms, + status, products_found, trigger_type, metadata, + source_job_type, source_job_id + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11) + RETURNING id`, + [ + dispensaryId, + 'dutchie', + stats.min_date || now, + stats.max_date || now, + 0, + 'success', + productCount, + 'hydration', + JSON.stringify({ mode: 'products_only', hydratedAt: now.toISOString() }), + 'products_only_hydration', + dispensaryId, // Use dispensary_id as synthetic job_id + ] + ); + crawlRunId = insertResult.rows[0].id; + result.crawlRunsCreated++; + this.log(`Dispensary ${dispensaryId}: Created synthetic crawl run ${crawlRunId}`); + } + + // Step 2: Upsert products + const productResult = await this.productNormalizer.upsertProductsForDispensary(dispensaryId); + result.productsUpserted += productResult.upserted; + if (productResult.errors.length > 0) { + result.errors.push(...productResult.errors.map(e => `Dispensary ${dispensaryId}: ${e}`)); + } + + // Step 3: Create initial snapshots from current product state + const snapshotsWritten = await this.createInitialSnapshots(dispensaryId, crawlRunId); + result.snapshotsWritten += snapshotsWritten; + + // Update crawl run with snapshot count + await this.pool.query( + 'UPDATE crawl_runs SET snapshots_written = $1 WHERE id = $2', + [snapshotsWritten, crawlRunId] + ); + } else { + // Dry run - just count what would be done + result.crawlRunsCreated++; + result.productsUpserted += productCount; + result.snapshotsWritten += productCount; + } + } + + /** + * Create initial snapshots from current product state + */ + private async createInitialSnapshots( + dispensaryId: number, + crawlRunId: number + ): Promise { + // Get all store products for this dispensary + const products = await this.pool.query( + `SELECT sp.id, sp.price_rec, sp.price_med, sp.is_on_special, sp.is_in_stock, + sp.stock_quantity, sp.thc_percent, sp.cbd_percent + FROM store_products sp + WHERE sp.dispensary_id = $1 AND sp.provider = 'dutchie'`, + [dispensaryId] + ); + + if (products.rows.length === 0) return 0; + + const now = new Date(); + const batchSize = 100; + let totalInserted = 0; + + // Process in batches + for (let i = 0; i < products.rows.length; i += batchSize) { + const batch = products.rows.slice(i, i + batchSize); + const values: any[] = []; + const placeholders: string[] = []; + let paramIndex = 1; + + for (const product of batch) { + values.push( + dispensaryId, + product.id, + crawlRunId, + now, + product.price_rec, + product.price_med, + product.is_on_special || false, + product.is_in_stock || false, + product.stock_quantity, + product.thc_percent, + product.cbd_percent, + JSON.stringify({ source: 'initial_hydration' }) + ); + + const rowPlaceholders = []; + for (let j = 0; j < 12; j++) { + rowPlaceholders.push(`$${paramIndex++}`); + } + placeholders.push(`(${rowPlaceholders.join(', ')}, NOW())`); + } + + const query = ` + INSERT INTO store_product_snapshots ( + dispensary_id, store_product_id, crawl_run_id, captured_at, + price_rec, price_med, is_on_special, is_in_stock, stock_quantity, + thc_percent, cbd_percent, raw_data, created_at + ) VALUES ${placeholders.join(', ')} + ON CONFLICT (store_product_id, crawl_run_id) + WHERE store_product_id IS NOT NULL AND crawl_run_id IS NOT NULL + DO NOTHING + `; + + const result = await this.pool.query(query, values); + totalInserted += result.rowCount || 0; + } + + return totalInserted; + } +} diff --git a/backend/src/canonical-hydration/index.ts b/backend/src/canonical-hydration/index.ts new file mode 100644 index 00000000..6a8ed123 --- /dev/null +++ b/backend/src/canonical-hydration/index.ts @@ -0,0 +1,13 @@ +/** + * Canonical Hydration Module + * Phase 2: Hydration Pipeline from dutchie_* to store_products/store_product_snapshots/crawl_runs + */ + +// Types +export * from './types'; + +// Services +export { CrawlRunRecorder } from './crawl-run-recorder'; +export { StoreProductNormalizer } from './store-product-normalizer'; +export { SnapshotWriter } from './snapshot-writer'; +export { CanonicalHydrationService } from './hydration-service'; diff --git a/backend/src/canonical-hydration/snapshot-writer.ts b/backend/src/canonical-hydration/snapshot-writer.ts new file mode 100644 index 00000000..b5b6f3bb --- /dev/null +++ b/backend/src/canonical-hydration/snapshot-writer.ts @@ -0,0 +1,303 @@ +/** + * SnapshotWriter + * Inserts store_product_snapshots from dutchie_product_snapshots source table + */ + +import { Pool } from 'pg'; +import { SourceSnapshot, StoreProductSnapshot, ServiceContext } from './types'; + +export class SnapshotWriter { + private pool: Pool; + private log: (message: string) => void; + private batchSize: number; + + constructor(ctx: ServiceContext, batchSize: number = 100) { + this.pool = ctx.pool; + this.log = ctx.logger || console.log; + this.batchSize = batchSize; + } + + /** + * Write snapshots for a crawl run + * Reads from dutchie_product_snapshots and inserts to store_product_snapshots + */ + async writeSnapshotsForCrawlRun( + crawlRunId: number, + dispensaryId: number, + storeProductIdMap: Map, + crawledAt: Date + ): Promise<{ written: number; skipped: number; errors: string[] }> { + const errors: string[] = []; + let written = 0; + let skipped = 0; + + // Get source snapshots for this dispensary at this crawl time + const sourceSnapshots = await this.getSourceSnapshots(dispensaryId, crawledAt); + this.log(`Found ${sourceSnapshots.length} source snapshots for dispensary ${dispensaryId} at ${crawledAt.toISOString()}`); + + // Process in batches + for (let i = 0; i < sourceSnapshots.length; i += this.batchSize) { + const batch = sourceSnapshots.slice(i, i + this.batchSize); + try { + const { batchWritten, batchSkipped } = await this.writeBatch( + batch, + crawlRunId, + storeProductIdMap + ); + written += batchWritten; + skipped += batchSkipped; + } catch (err: any) { + errors.push(`Batch ${i / this.batchSize}: ${err.message}`); + } + } + + return { written, skipped, errors }; + } + + /** + * Write a single snapshot + */ + async writeSnapshot( + source: SourceSnapshot, + crawlRunId: number, + storeProductId: number + ): Promise { + const normalized = this.normalizeSnapshot(source, crawlRunId, storeProductId); + + const result = await this.pool.query( + `INSERT INTO store_product_snapshots ( + dispensary_id, store_product_id, crawl_run_id, captured_at, + price_rec, price_med, is_on_special, is_in_stock, stock_quantity, + thc_percent, cbd_percent, raw_data, created_at + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, NOW()) + ON CONFLICT (store_product_id, crawl_run_id) + WHERE store_product_id IS NOT NULL AND crawl_run_id IS NOT NULL + DO UPDATE SET + price_rec = EXCLUDED.price_rec, + price_med = EXCLUDED.price_med, + is_on_special = EXCLUDED.is_on_special, + is_in_stock = EXCLUDED.is_in_stock, + stock_quantity = EXCLUDED.stock_quantity, + thc_percent = EXCLUDED.thc_percent, + cbd_percent = EXCLUDED.cbd_percent, + raw_data = EXCLUDED.raw_data + RETURNING id`, + [ + normalized.dispensary_id, + normalized.store_product_id, + normalized.crawl_run_id, + normalized.captured_at, + normalized.price_rec, + normalized.price_med, + normalized.is_on_special, + normalized.is_in_stock, + normalized.stock_quantity, + normalized.thc_percent, + normalized.cbd_percent, + JSON.stringify(normalized.raw_data), + ] + ); + + return result.rows[0]?.id || null; + } + + /** + * Write a batch of snapshots + */ + async writeBatch( + sourceSnapshots: SourceSnapshot[], + crawlRunId: number, + storeProductIdMap: Map + ): Promise<{ batchWritten: number; batchSkipped: number }> { + if (sourceSnapshots.length === 0) return { batchWritten: 0, batchSkipped: 0 }; + + const values: any[] = []; + const placeholders: string[] = []; + let paramIndex = 1; + let skipped = 0; + + for (const source of sourceSnapshots) { + // Look up store_product_id + const storeProductId = storeProductIdMap.get(source.external_product_id); + if (!storeProductId) { + skipped++; + continue; + } + + const normalized = this.normalizeSnapshot(source, crawlRunId, storeProductId); + + values.push( + normalized.dispensary_id, + normalized.store_product_id, + normalized.crawl_run_id, + normalized.captured_at, + normalized.price_rec, + normalized.price_med, + normalized.is_on_special, + normalized.is_in_stock, + normalized.stock_quantity, + normalized.thc_percent, + normalized.cbd_percent, + JSON.stringify(normalized.raw_data) + ); + + const rowPlaceholders = []; + for (let j = 0; j < 12; j++) { + rowPlaceholders.push(`$${paramIndex++}`); + } + placeholders.push(`(${rowPlaceholders.join(', ')}, NOW())`); + } + + if (placeholders.length === 0) { + return { batchWritten: 0, batchSkipped: skipped }; + } + + const query = ` + INSERT INTO store_product_snapshots ( + dispensary_id, store_product_id, crawl_run_id, captured_at, + price_rec, price_med, is_on_special, is_in_stock, stock_quantity, + thc_percent, cbd_percent, raw_data, created_at + ) VALUES ${placeholders.join(', ')} + ON CONFLICT (store_product_id, crawl_run_id) + WHERE store_product_id IS NOT NULL AND crawl_run_id IS NOT NULL + DO UPDATE SET + price_rec = EXCLUDED.price_rec, + price_med = EXCLUDED.price_med, + is_on_special = EXCLUDED.is_on_special, + is_in_stock = EXCLUDED.is_in_stock, + stock_quantity = EXCLUDED.stock_quantity, + thc_percent = EXCLUDED.thc_percent, + cbd_percent = EXCLUDED.cbd_percent, + raw_data = EXCLUDED.raw_data + `; + + const result = await this.pool.query(query, values); + return { batchWritten: result.rowCount || 0, batchSkipped: skipped }; + } + + /** + * Get source snapshots from dutchie_product_snapshots for a specific crawl time + * Groups snapshots by crawled_at time (within a 5-minute window) + */ + async getSourceSnapshots( + dispensaryId: number, + crawledAt: Date + ): Promise { + // Find snapshots within 5 minutes of the target time + const windowMinutes = 5; + const result = await this.pool.query( + `SELECT * FROM dutchie_product_snapshots + WHERE dispensary_id = $1 + AND crawled_at >= $2 - INTERVAL '${windowMinutes} minutes' + AND crawled_at <= $2 + INTERVAL '${windowMinutes} minutes' + ORDER BY crawled_at ASC`, + [dispensaryId, crawledAt] + ); + return result.rows; + } + + /** + * Get distinct crawl times from dutchie_product_snapshots for a dispensary + * Used for backfill to identify each crawl run + */ + async getDistinctCrawlTimes( + dispensaryId: number, + startDate?: Date, + endDate?: Date + ): Promise { + let query = ` + SELECT DISTINCT date_trunc('minute', crawled_at) as crawl_time + FROM dutchie_product_snapshots + WHERE dispensary_id = $1 + `; + const params: any[] = [dispensaryId]; + let paramIndex = 2; + + if (startDate) { + query += ` AND crawled_at >= $${paramIndex++}`; + params.push(startDate); + } + + if (endDate) { + query += ` AND crawled_at <= $${paramIndex++}`; + params.push(endDate); + } + + query += ' ORDER BY crawl_time ASC'; + + const result = await this.pool.query(query, params); + return result.rows.map(row => new Date(row.crawl_time)); + } + + /** + * Check if snapshots already exist for a crawl run + */ + async snapshotsExistForCrawlRun(crawlRunId: number): Promise { + const result = await this.pool.query( + 'SELECT 1 FROM store_product_snapshots WHERE crawl_run_id = $1 LIMIT 1', + [crawlRunId] + ); + return result.rows.length > 0; + } + + /** + * Normalize a source snapshot to store_product_snapshot format + */ + private normalizeSnapshot( + source: SourceSnapshot, + crawlRunId: number, + storeProductId: number + ): StoreProductSnapshot { + // Convert cents to dollars + const priceRec = source.rec_min_price_cents !== null + ? source.rec_min_price_cents / 100 + : null; + const priceMed = source.med_min_price_cents !== null + ? source.med_min_price_cents / 100 + : null; + + // Determine stock status + const isInStock = this.isSnapshotInStock(source.stock_status, source.total_quantity_available); + + return { + dispensary_id: source.dispensary_id, + store_product_id: storeProductId, + crawl_run_id: crawlRunId, + captured_at: source.crawled_at, + price_rec: priceRec, + price_med: priceMed, + is_on_special: false, // Source doesn't have special flag + is_in_stock: isInStock, + stock_quantity: source.total_quantity_available, + thc_percent: null, // Not in snapshot, would need to join with product + cbd_percent: null, // Not in snapshot, would need to join with product + raw_data: { + source_id: source.id, + status: source.status, + rec_min_price_cents: source.rec_min_price_cents, + rec_max_price_cents: source.rec_max_price_cents, + med_min_price_cents: source.med_min_price_cents, + med_max_price_cents: source.med_max_price_cents, + }, + }; + } + + /** + * Determine if snapshot is in stock + */ + private isSnapshotInStock(stockStatus: string | null, quantity: number | null): boolean { + if (quantity !== null && quantity > 0) return true; + + if (stockStatus) { + const status = stockStatus.toLowerCase(); + if (status === 'in_stock' || status === 'instock' || status === 'available') { + return true; + } + if (status === 'out_of_stock' || status === 'outofstock' || status === 'unavailable') { + return false; + } + } + + return false; + } +} diff --git a/backend/src/canonical-hydration/store-product-normalizer.ts b/backend/src/canonical-hydration/store-product-normalizer.ts new file mode 100644 index 00000000..983a8ece --- /dev/null +++ b/backend/src/canonical-hydration/store-product-normalizer.ts @@ -0,0 +1,322 @@ +/** + * StoreProductNormalizer + * Upserts store_products from dutchie_products source table + */ + +import { Pool } from 'pg'; +import { SourceProduct, StoreProduct, ServiceContext } from './types'; + +export class StoreProductNormalizer { + private pool: Pool; + private log: (message: string) => void; + private batchSize: number; + + constructor(ctx: ServiceContext, batchSize: number = 100) { + this.pool = ctx.pool; + this.log = ctx.logger || console.log; + this.batchSize = batchSize; + } + + /** + * Upsert products for a specific dispensary + * Reads from dutchie_products and upserts to store_products + */ + async upsertProductsForDispensary(dispensaryId: number): Promise<{ upserted: number; errors: string[] }> { + const errors: string[] = []; + let upserted = 0; + + // Get all products for this dispensary from source + const sourceProducts = await this.getSourceProducts(dispensaryId); + this.log(`Found ${sourceProducts.length} source products for dispensary ${dispensaryId}`); + + // Process in batches to avoid memory issues + for (let i = 0; i < sourceProducts.length; i += this.batchSize) { + const batch = sourceProducts.slice(i, i + this.batchSize); + try { + const batchUpserted = await this.upsertBatch(batch); + upserted += batchUpserted; + } catch (err: any) { + errors.push(`Batch ${i / this.batchSize}: ${err.message}`); + } + } + + return { upserted, errors }; + } + + /** + * Upsert a single product + */ + async upsertProduct(source: SourceProduct): Promise { + const normalized = this.normalizeProduct(source); + + const result = await this.pool.query( + `INSERT INTO store_products ( + dispensary_id, brand_id, provider, provider_product_id, + name_raw, brand_name_raw, category_raw, + price_rec, price_med, is_on_special, is_in_stock, stock_quantity, + thc_percent, cbd_percent, image_url, + first_seen_at, last_seen_at, created_at, updated_at + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, NOW(), NOW()) + ON CONFLICT (dispensary_id, provider, provider_product_id) + DO UPDATE SET + name_raw = EXCLUDED.name_raw, + brand_name_raw = EXCLUDED.brand_name_raw, + category_raw = EXCLUDED.category_raw, + price_rec = EXCLUDED.price_rec, + price_med = EXCLUDED.price_med, + is_on_special = EXCLUDED.is_on_special, + is_in_stock = EXCLUDED.is_in_stock, + stock_quantity = EXCLUDED.stock_quantity, + thc_percent = EXCLUDED.thc_percent, + cbd_percent = EXCLUDED.cbd_percent, + image_url = COALESCE(EXCLUDED.image_url, store_products.image_url), + last_seen_at = EXCLUDED.last_seen_at, + updated_at = NOW() + RETURNING id`, + [ + normalized.dispensary_id, + normalized.brand_id, + normalized.provider, + normalized.provider_product_id, + normalized.name_raw, + normalized.brand_name_raw, + normalized.category_raw, + normalized.price_rec, + normalized.price_med, + normalized.is_on_special, + normalized.is_in_stock, + normalized.stock_quantity, + normalized.thc_percent, + normalized.cbd_percent, + normalized.image_url, + normalized.first_seen_at, + normalized.last_seen_at, + ] + ); + + return result.rows[0]?.id || null; + } + + /** + * Upsert a batch of products + */ + async upsertBatch(sourceProducts: SourceProduct[]): Promise { + if (sourceProducts.length === 0) return 0; + + // Build multi-row INSERT with ON CONFLICT + const values: any[] = []; + const placeholders: string[] = []; + let paramIndex = 1; + + for (const source of sourceProducts) { + const normalized = this.normalizeProduct(source); + values.push( + normalized.dispensary_id, + normalized.brand_id, + normalized.provider, + normalized.provider_product_id, + normalized.name_raw, + normalized.brand_name_raw, + normalized.category_raw, + normalized.price_rec, + normalized.price_med, + normalized.is_on_special, + normalized.is_in_stock, + normalized.stock_quantity, + normalized.thc_percent, + normalized.cbd_percent, + normalized.image_url, + normalized.first_seen_at, + normalized.last_seen_at + ); + + const rowPlaceholders = []; + for (let j = 0; j < 17; j++) { + rowPlaceholders.push(`$${paramIndex++}`); + } + placeholders.push(`(${rowPlaceholders.join(', ')}, NOW(), NOW())`); + } + + const query = ` + INSERT INTO store_products ( + dispensary_id, brand_id, provider, provider_product_id, + name_raw, brand_name_raw, category_raw, + price_rec, price_med, is_on_special, is_in_stock, stock_quantity, + thc_percent, cbd_percent, image_url, + first_seen_at, last_seen_at, created_at, updated_at + ) VALUES ${placeholders.join(', ')} + ON CONFLICT (dispensary_id, provider, provider_product_id) + DO UPDATE SET + name_raw = EXCLUDED.name_raw, + brand_name_raw = EXCLUDED.brand_name_raw, + category_raw = EXCLUDED.category_raw, + price_rec = EXCLUDED.price_rec, + price_med = EXCLUDED.price_med, + is_on_special = EXCLUDED.is_on_special, + is_in_stock = EXCLUDED.is_in_stock, + stock_quantity = EXCLUDED.stock_quantity, + thc_percent = EXCLUDED.thc_percent, + cbd_percent = EXCLUDED.cbd_percent, + image_url = COALESCE(EXCLUDED.image_url, store_products.image_url), + last_seen_at = EXCLUDED.last_seen_at, + updated_at = NOW() + `; + + const result = await this.pool.query(query, values); + return result.rowCount || 0; + } + + /** + * Get store_product ID by canonical key + */ + async getStoreProductId( + dispensaryId: number, + provider: string, + providerProductId: string + ): Promise { + const result = await this.pool.query( + 'SELECT id FROM store_products WHERE dispensary_id = $1 AND provider = $2 AND provider_product_id = $3', + [dispensaryId, provider, providerProductId] + ); + return result.rows[0]?.id || null; + } + + /** + * Get all store_product IDs for a dispensary (for snapshot writing) + */ + async getStoreProductIdMap(dispensaryId: number): Promise> { + const result = await this.pool.query( + 'SELECT id, provider_product_id FROM store_products WHERE dispensary_id = $1', + [dispensaryId] + ); + + const map = new Map(); + for (const row of result.rows) { + map.set(row.provider_product_id, row.id); + } + return map; + } + + /** + * Get source products from dutchie_products + */ + private async getSourceProducts(dispensaryId: number): Promise { + const result = await this.pool.query( + `SELECT * FROM dutchie_products WHERE dispensary_id = $1`, + [dispensaryId] + ); + return result.rows; + } + + /** + * Normalize a source product to store_product format + */ + private normalizeProduct(source: SourceProduct): StoreProduct { + // Extract price from JSONB if present + const priceRec = this.extractPrice(source.price_rec); + const priceMed = this.extractPrice(source.price_med); + + // Parse THC/CBD percentages + const thcPercent = this.parsePercentage(source.thc); + const cbdPercent = this.parsePercentage(source.cbd); + + // Determine stock status + const isInStock = this.isProductInStock(source.stock_status, source.total_quantity_available); + + return { + dispensary_id: source.dispensary_id, + brand_id: null, // Source has UUID strings, target expects integer - set to null for now + provider: source.platform || 'dutchie', + provider_product_id: source.external_product_id, + name_raw: source.name, + brand_name_raw: source.brand_name, + category_raw: source.type || source.subcategory, + price_rec: priceRec, + price_med: priceMed, + is_on_special: false, // Dutchie doesn't have a direct special flag, would need to check specials table + is_in_stock: isInStock, + stock_quantity: source.total_quantity_available, + thc_percent: thcPercent, + cbd_percent: cbdPercent, + image_url: source.primary_image_url, + first_seen_at: source.created_at, + last_seen_at: source.updated_at, + }; + } + + /** + * Extract price from JSONB price field + * Handles formats like: {min: 10, max: 20}, {value: 15}, or just a number + */ + private extractPrice(priceData: any): number | null { + if (priceData === null || priceData === undefined) return null; + + // If it's already a number + if (typeof priceData === 'number') return priceData; + + // If it's a string that looks like a number + if (typeof priceData === 'string') { + const parsed = parseFloat(priceData); + return isNaN(parsed) ? null : parsed; + } + + // If it's an object with price data + if (typeof priceData === 'object') { + // Try common price formats + if (priceData.min !== undefined && priceData.min !== null) { + return typeof priceData.min === 'number' ? priceData.min : parseFloat(priceData.min); + } + if (priceData.value !== undefined && priceData.value !== null) { + return typeof priceData.value === 'number' ? priceData.value : parseFloat(priceData.value); + } + if (priceData.price !== undefined && priceData.price !== null) { + return typeof priceData.price === 'number' ? priceData.price : parseFloat(priceData.price); + } + // Check for array of variants + if (Array.isArray(priceData) && priceData.length > 0) { + const firstVariant = priceData[0]; + if (firstVariant.price !== undefined) { + return typeof firstVariant.price === 'number' ? firstVariant.price : parseFloat(firstVariant.price); + } + } + } + + return null; + } + + /** + * Parse percentage string to number + * Handles formats like: "25.5%", "25.5", "25.5 %", etc. + */ + private parsePercentage(value: string | null | undefined): number | null { + if (value === null || value === undefined) return null; + + // Remove percentage sign and whitespace + const cleaned = value.toString().replace(/%/g, '').trim(); + + const parsed = parseFloat(cleaned); + return isNaN(parsed) ? null : parsed; + } + + /** + * Determine if product is in stock based on status and quantity + */ + private isProductInStock(stockStatus: string | null, quantity: number | null): boolean { + // Check quantity first + if (quantity !== null && quantity > 0) return true; + + // Check status string + if (stockStatus) { + const status = stockStatus.toLowerCase(); + if (status === 'in_stock' || status === 'instock' || status === 'available') { + return true; + } + if (status === 'out_of_stock' || status === 'outofstock' || status === 'unavailable') { + return false; + } + } + + // Default to false if unknown + return false; + } +} diff --git a/backend/src/canonical-hydration/types.ts b/backend/src/canonical-hydration/types.ts new file mode 100644 index 00000000..b0fbb180 --- /dev/null +++ b/backend/src/canonical-hydration/types.ts @@ -0,0 +1,150 @@ +/** + * Canonical Hydration Types + * Phase 2: Hydration Pipeline from dutchie_* to store_products/store_product_snapshots/crawl_runs + */ + +import { Pool } from 'pg'; + +// Source job types for hydration +export type SourceJobType = 'dispensary_crawl_jobs' | 'crawl_jobs' | 'job_run_logs'; + +// Source job record (from dispensary_crawl_jobs) +export interface SourceJob { + id: number; + dispensary_id: number; + job_type: string; + status: string; + started_at: Date | null; + completed_at: Date | null; + duration_ms: number | null; + products_found: number | null; + products_new: number | null; + products_updated: number | null; + error_message: string | null; +} + +// Source product record (from dutchie_products) +export interface SourceProduct { + id: number; + dispensary_id: number; + platform: string; + external_product_id: string; + name: string; + brand_name: string | null; + brand_id: number | null; + type: string | null; + subcategory: string | null; + strain_type: string | null; + thc: string | null; + cbd: string | null; + price_rec: any; // JSONB + price_med: any; // JSONB + stock_status: string | null; + total_quantity_available: number | null; + primary_image_url: string | null; + created_at: Date; + updated_at: Date; +} + +// Source snapshot record (from dutchie_product_snapshots) +export interface SourceSnapshot { + id: number; + dutchie_product_id: number; + dispensary_id: number; + external_product_id: string; + status: string | null; + rec_min_price_cents: number | null; + rec_max_price_cents: number | null; + med_min_price_cents: number | null; + med_max_price_cents: number | null; + stock_status: string | null; + total_quantity_available: number | null; + crawled_at: Date; + created_at: Date; +} + +// Crawl run record for canonical table +export interface CrawlRun { + id?: number; + dispensary_id: number; + provider: string; + started_at: Date; + finished_at: Date | null; + duration_ms: number | null; + status: string; + error_message: string | null; + products_found: number | null; + products_new: number | null; + products_updated: number | null; + snapshots_written: number | null; + worker_id: string | null; + trigger_type: string | null; + metadata: any; + source_job_type: SourceJobType; + source_job_id: number; +} + +// Store product record for canonical table +export interface StoreProduct { + id?: number; + dispensary_id: number; + brand_id: number | null; + provider: string; + provider_product_id: string; + name_raw: string; + brand_name_raw: string | null; + category_raw: string | null; + price_rec: number | null; + price_med: number | null; + is_on_special: boolean; + is_in_stock: boolean; + stock_quantity: number | null; + thc_percent: number | null; + cbd_percent: number | null; + image_url: string | null; + first_seen_at: Date; + last_seen_at: Date; +} + +// Store product snapshot record for canonical table +export interface StoreProductSnapshot { + id?: number; + dispensary_id: number; + store_product_id: number; + crawl_run_id: number; + captured_at: Date; + price_rec: number | null; + price_med: number | null; + is_on_special: boolean; + is_in_stock: boolean; + stock_quantity: number | null; + thc_percent: number | null; + cbd_percent: number | null; + raw_data: any; +} + +// Hydration options +export interface HydrationOptions { + mode: 'backfill' | 'incremental'; + dispensaryId?: number; + startDate?: Date; + endDate?: Date; + batchSize?: number; + dryRun?: boolean; +} + +// Hydration result +export interface HydrationResult { + crawlRunsCreated: number; + crawlRunsSkipped: number; + productsUpserted: number; + snapshotsWritten: number; + errors: string[]; + durationMs: number; +} + +// Service context +export interface ServiceContext { + pool: Pool; + logger?: (message: string) => void; +} diff --git a/backend/src/crawlers/base/base-dutchie.ts b/backend/src/crawlers/base/base-dutchie.ts new file mode 100644 index 00000000..f612fae7 --- /dev/null +++ b/backend/src/crawlers/base/base-dutchie.ts @@ -0,0 +1,657 @@ +/** + * Base Dutchie Crawler Template + * + * This is the base template for all Dutchie store crawlers. + * Per-store crawlers extend this by overriding specific methods. + * + * Exports: + * - crawlProducts(dispensary, options) - Main crawl entry point + * - detectStructure(page) - Detect page structure for sandbox mode + * - extractProducts(document) - Extract product data + * - extractImages(document) - Extract product images + * - extractStock(document) - Extract stock status + * - extractPagination(document) - Extract pagination info + */ + +import { + crawlDispensaryProducts as baseCrawlDispensaryProducts, + CrawlResult, +} from '../../dutchie-az/services/product-crawler'; +import { Dispensary, CrawlerProfileOptions } from '../../dutchie-az/types'; + +// Re-export CrawlResult for convenience +export { CrawlResult }; + +// ============================================================ +// TYPES +// ============================================================ + +/** + * Options passed to the per-store crawler + */ +export interface StoreCrawlOptions { + pricingType?: 'rec' | 'med'; + useBothModes?: boolean; + downloadImages?: boolean; + trackStock?: boolean; + timeoutMs?: number; + config?: Record; +} + +/** + * Progress callback for reporting crawl progress + */ +export interface CrawlProgressCallback { + phase: 'fetching' | 'processing' | 'saving' | 'images' | 'complete'; + current: number; + total: number; + message?: string; +} + +/** + * Structure detection result for sandbox mode + */ +export interface StructureDetectionResult { + success: boolean; + menuType: 'dutchie' | 'treez' | 'jane' | 'unknown'; + iframeUrl?: string; + graphqlEndpoint?: string; + dispensaryId?: string; + selectors: { + productContainer?: string; + productName?: string; + productPrice?: string; + productImage?: string; + productCategory?: string; + pagination?: string; + loadMore?: string; + }; + pagination: { + type: 'scroll' | 'click' | 'graphql' | 'none'; + hasMore?: boolean; + pageSize?: number; + }; + errors: string[]; + metadata: Record; +} + +/** + * Product extraction result + */ +export interface ExtractedProduct { + externalId: string; + name: string; + brand?: string; + category?: string; + subcategory?: string; + price?: number; + priceRec?: number; + priceMed?: number; + weight?: string; + thcContent?: string; + cbdContent?: string; + description?: string; + imageUrl?: string; + stockStatus?: 'in_stock' | 'out_of_stock' | 'low_stock' | 'unknown'; + quantity?: number; + raw?: Record; +} + +/** + * Image extraction result + */ +export interface ExtractedImage { + productId: string; + imageUrl: string; + isPrimary: boolean; + position: number; +} + +/** + * Stock extraction result + */ +export interface ExtractedStock { + productId: string; + status: 'in_stock' | 'out_of_stock' | 'low_stock' | 'unknown'; + quantity?: number; + lastChecked: Date; +} + +/** + * Pagination extraction result + */ +export interface ExtractedPagination { + hasNextPage: boolean; + currentPage?: number; + totalPages?: number; + totalProducts?: number; + nextCursor?: string; + loadMoreSelector?: string; +} + +/** + * Hook points that per-store crawlers can override + */ +export interface DutchieCrawlerHooks { + /** + * Called before fetching products + * Can be used to set up custom headers, cookies, etc. + */ + beforeFetch?: (dispensary: Dispensary) => Promise; + + /** + * Called after fetching products, before processing + * Can be used to filter or transform raw products + */ + afterFetch?: (products: any[], dispensary: Dispensary) => Promise; + + /** + * Called after all processing is complete + * Can be used for cleanup or post-processing + */ + afterComplete?: (result: CrawlResult, dispensary: Dispensary) => Promise; + + /** + * Custom selector resolver for iframe detection + */ + resolveIframe?: (page: any) => Promise; + + /** + * Custom product container selector + */ + getProductContainerSelector?: () => string; + + /** + * Custom product extraction from container element + */ + extractProductFromElement?: (element: any) => Promise; +} + +/** + * Selectors configuration for per-store overrides + */ +export interface DutchieSelectors { + iframe?: string; + productContainer?: string; + productName?: string; + productPrice?: string; + productPriceRec?: string; + productPriceMed?: string; + productImage?: string; + productCategory?: string; + productBrand?: string; + productWeight?: string; + productThc?: string; + productCbd?: string; + productDescription?: string; + productStock?: string; + loadMore?: string; + pagination?: string; +} + +// ============================================================ +// DEFAULT SELECTORS +// ============================================================ + +export const DEFAULT_DUTCHIE_SELECTORS: DutchieSelectors = { + iframe: 'iframe[src*="dutchie.com"]', + productContainer: '[data-testid="product-card"], .product-card, [class*="ProductCard"]', + productName: '[data-testid="product-title"], .product-title, [class*="ProductTitle"]', + productPrice: '[data-testid="product-price"], .product-price, [class*="ProductPrice"]', + productImage: 'img[src*="dutchie"], img[src*="product"], .product-image img', + productCategory: '[data-testid="category-name"], .category-name', + productBrand: '[data-testid="brand-name"], .brand-name, [class*="BrandName"]', + loadMore: 'button[data-testid="load-more"], .load-more-button', + pagination: '.pagination, [class*="Pagination"]', +}; + +// ============================================================ +// BASE CRAWLER CLASS +// ============================================================ + +/** + * BaseDutchieCrawler - Base class for all Dutchie store crawlers + * + * Per-store crawlers extend this class and override methods as needed. + * The default implementation delegates to the existing shared Dutchie logic. + */ +export class BaseDutchieCrawler { + protected dispensary: Dispensary; + protected options: StoreCrawlOptions; + protected hooks: DutchieCrawlerHooks; + protected selectors: DutchieSelectors; + + constructor( + dispensary: Dispensary, + options: StoreCrawlOptions = {}, + hooks: DutchieCrawlerHooks = {}, + selectors: DutchieSelectors = {} + ) { + this.dispensary = dispensary; + this.options = { + pricingType: 'rec', + useBothModes: true, + downloadImages: true, + trackStock: true, + timeoutMs: 30000, + ...options, + }; + this.hooks = hooks; + this.selectors = { ...DEFAULT_DUTCHIE_SELECTORS, ...selectors }; + } + + /** + * Main entry point - crawl products for this dispensary + * Override this in per-store crawlers to customize behavior + */ + async crawlProducts(): Promise { + // Call beforeFetch hook if defined + if (this.hooks.beforeFetch) { + await this.hooks.beforeFetch(this.dispensary); + } + + // Use the existing shared Dutchie crawl logic + const result = await baseCrawlDispensaryProducts( + this.dispensary, + this.options.pricingType || 'rec', + { + useBothModes: this.options.useBothModes, + downloadImages: this.options.downloadImages, + } + ); + + // Call afterComplete hook if defined + if (this.hooks.afterComplete) { + await this.hooks.afterComplete(result, this.dispensary); + } + + return result; + } + + /** + * Detect page structure for sandbox discovery mode + * Override in per-store crawlers if needed + * + * @param page - Puppeteer page object or HTML string + * @returns Structure detection result + */ + async detectStructure(page: any): Promise { + const result: StructureDetectionResult = { + success: false, + menuType: 'unknown', + selectors: {}, + pagination: { type: 'none' }, + errors: [], + metadata: {}, + }; + + try { + // Default implementation: check for Dutchie iframe + if (typeof page === 'string') { + // HTML string mode + if (page.includes('dutchie.com')) { + result.menuType = 'dutchie'; + result.success = true; + } + } else if (page && typeof page.evaluate === 'function') { + // Puppeteer page mode + const detection = await page.evaluate((selectorConfig: DutchieSelectors) => { + const iframe = document.querySelector(selectorConfig.iframe || '') as HTMLIFrameElement; + const iframeUrl = iframe?.src || null; + + // Check for product containers + const containers = document.querySelectorAll(selectorConfig.productContainer || ''); + + return { + hasIframe: !!iframe, + iframeUrl, + productCount: containers.length, + isDutchie: !!iframeUrl?.includes('dutchie.com'), + }; + }, this.selectors); + + if (detection.isDutchie) { + result.menuType = 'dutchie'; + result.iframeUrl = detection.iframeUrl; + result.success = true; + } + + result.metadata = detection; + } + + // Set default selectors for Dutchie + if (result.menuType === 'dutchie') { + result.selectors = { + productContainer: this.selectors.productContainer, + productName: this.selectors.productName, + productPrice: this.selectors.productPrice, + productImage: this.selectors.productImage, + productCategory: this.selectors.productCategory, + }; + result.pagination = { type: 'graphql' }; + } + } catch (error: any) { + result.errors.push(`Detection error: ${error.message}`); + } + + return result; + } + + /** + * Extract products from page/document + * Override in per-store crawlers for custom extraction + * + * @param document - DOM document, Puppeteer page, or raw products array + * @returns Array of extracted products + */ + async extractProducts(document: any): Promise { + // Default implementation: assume document is already an array of products + // from the GraphQL response + if (Array.isArray(document)) { + return document.map((product) => this.mapRawProduct(product)); + } + + // If document is a Puppeteer page, extract from DOM + if (document && typeof document.evaluate === 'function') { + return this.extractProductsFromPage(document); + } + + return []; + } + + /** + * Extract products from Puppeteer page + * Override for custom DOM extraction + */ + protected async extractProductsFromPage(page: any): Promise { + const products = await page.evaluate((selectors: DutchieSelectors) => { + const containers = document.querySelectorAll(selectors.productContainer || ''); + return Array.from(containers).map((container) => { + const nameEl = container.querySelector(selectors.productName || ''); + const priceEl = container.querySelector(selectors.productPrice || ''); + const imageEl = container.querySelector(selectors.productImage || '') as HTMLImageElement; + const brandEl = container.querySelector(selectors.productBrand || ''); + + return { + name: nameEl?.textContent?.trim() || '', + price: priceEl?.textContent?.trim() || '', + imageUrl: imageEl?.src || '', + brand: brandEl?.textContent?.trim() || '', + }; + }); + }, this.selectors); + + return products.map((p: any, i: number) => ({ + externalId: `dom-product-${i}`, + name: p.name, + brand: p.brand, + price: this.parsePrice(p.price), + imageUrl: p.imageUrl, + stockStatus: 'unknown' as const, + })); + } + + /** + * Map raw product from GraphQL to ExtractedProduct + * Override for custom mapping + */ + protected mapRawProduct(raw: any): ExtractedProduct { + return { + externalId: raw.id || raw._id || raw.externalId, + name: raw.name || raw.Name, + brand: raw.brand?.name || raw.brandName || raw.brand, + category: raw.type || raw.category || raw.Category, + subcategory: raw.subcategory || raw.Subcategory, + price: raw.recPrice || raw.price || raw.Price, + priceRec: raw.recPrice || raw.Prices?.rec, + priceMed: raw.medPrice || raw.Prices?.med, + weight: raw.weight || raw.Weight, + thcContent: raw.potencyThc?.formatted || raw.THCContent?.formatted, + cbdContent: raw.potencyCbd?.formatted || raw.CBDContent?.formatted, + description: raw.description || raw.Description, + imageUrl: raw.image || raw.Image, + stockStatus: this.mapStockStatus(raw), + quantity: raw.quantity || raw.Quantity, + raw, + }; + } + + /** + * Map raw stock status to standardized value + */ + protected mapStockStatus(raw: any): 'in_stock' | 'out_of_stock' | 'low_stock' | 'unknown' { + const status = raw.Status || raw.status || raw.stockStatus; + if (status === 'Active' || status === 'active' || status === 'in_stock') { + return 'in_stock'; + } + if (status === 'Inactive' || status === 'inactive' || status === 'out_of_stock') { + return 'out_of_stock'; + } + if (status === 'low_stock') { + return 'low_stock'; + } + return 'unknown'; + } + + /** + * Parse price string to number + */ + protected parsePrice(priceStr: string): number | undefined { + if (!priceStr) return undefined; + const cleaned = priceStr.replace(/[^0-9.]/g, ''); + const num = parseFloat(cleaned); + return isNaN(num) ? undefined : num; + } + + /** + * Extract images from document + * Override for custom image extraction + * + * @param document - DOM document, Puppeteer page, or products array + * @returns Array of extracted images + */ + async extractImages(document: any): Promise { + if (Array.isArray(document)) { + return document + .filter((p) => p.image || p.Image || p.imageUrl) + .map((p, i) => ({ + productId: p.id || p._id || `product-${i}`, + imageUrl: p.image || p.Image || p.imageUrl, + isPrimary: true, + position: 0, + })); + } + + // Puppeteer page extraction + if (document && typeof document.evaluate === 'function') { + return this.extractImagesFromPage(document); + } + + return []; + } + + /** + * Extract images from Puppeteer page + */ + protected async extractImagesFromPage(page: any): Promise { + const images = await page.evaluate((selector: string) => { + const imgs = document.querySelectorAll(selector); + return Array.from(imgs).map((img, i) => ({ + src: (img as HTMLImageElement).src, + position: i, + })); + }, this.selectors.productImage || 'img'); + + return images.map((img: any, i: number) => ({ + productId: `dom-product-${i}`, + imageUrl: img.src, + isPrimary: i === 0, + position: img.position, + })); + } + + /** + * Extract stock information from document + * Override for custom stock extraction + * + * @param document - DOM document, Puppeteer page, or products array + * @returns Array of extracted stock statuses + */ + async extractStock(document: any): Promise { + if (Array.isArray(document)) { + return document.map((p) => ({ + productId: p.id || p._id || p.externalId, + status: this.mapStockStatus(p), + quantity: p.quantity || p.Quantity, + lastChecked: new Date(), + })); + } + + return []; + } + + /** + * Extract pagination information from document + * Override for custom pagination handling + * + * @param document - DOM document, Puppeteer page, or GraphQL response + * @returns Pagination info + */ + async extractPagination(document: any): Promise { + // Default: check for page info in GraphQL response + if (document && document.pageInfo) { + return { + hasNextPage: document.pageInfo.hasNextPage || false, + currentPage: document.pageInfo.currentPage, + totalPages: document.pageInfo.totalPages, + totalProducts: document.pageInfo.totalCount || document.totalCount, + nextCursor: document.pageInfo.endCursor, + }; + } + + // Default: no pagination + return { + hasNextPage: false, + }; + } + + /** + * Get the cName (Dutchie slug) for this dispensary + * Override to customize cName extraction + */ + getCName(): string { + if (this.dispensary.menuUrl) { + try { + const url = new URL(this.dispensary.menuUrl); + const segments = url.pathname.split('/').filter(Boolean); + if (segments.length >= 2) { + return segments[segments.length - 1]; + } + } catch { + // Fall through to default + } + } + return this.dispensary.slug || ''; + } + + /** + * Get custom headers for API requests + * Override for store-specific headers + */ + getCustomHeaders(): Record { + const cName = this.getCName(); + return { + 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36', + Origin: 'https://dutchie.com', + Referer: `https://dutchie.com/embedded-menu/${cName}`, + }; + } +} + +// ============================================================ +// FACTORY FUNCTION +// ============================================================ + +/** + * Create a base Dutchie crawler instance + * This is the default export used when no per-store override exists + */ +export function createCrawler( + dispensary: Dispensary, + options: StoreCrawlOptions = {}, + hooks: DutchieCrawlerHooks = {}, + selectors: DutchieSelectors = {} +): BaseDutchieCrawler { + return new BaseDutchieCrawler(dispensary, options, hooks, selectors); +} + +// ============================================================ +// STANDALONE FUNCTIONS (required exports for orchestrator) +// ============================================================ + +/** + * Crawl products using the base Dutchie logic + * Per-store files can call this or override it completely + */ +export async function crawlProducts( + dispensary: Dispensary, + options: StoreCrawlOptions = {} +): Promise { + const crawler = createCrawler(dispensary, options); + return crawler.crawlProducts(); +} + +/** + * Detect structure using the base Dutchie logic + */ +export async function detectStructure( + page: any, + dispensary?: Dispensary +): Promise { + const crawler = createCrawler(dispensary || ({} as Dispensary)); + return crawler.detectStructure(page); +} + +/** + * Extract products using the base Dutchie logic + */ +export async function extractProducts( + document: any, + dispensary?: Dispensary +): Promise { + const crawler = createCrawler(dispensary || ({} as Dispensary)); + return crawler.extractProducts(document); +} + +/** + * Extract images using the base Dutchie logic + */ +export async function extractImages( + document: any, + dispensary?: Dispensary +): Promise { + const crawler = createCrawler(dispensary || ({} as Dispensary)); + return crawler.extractImages(document); +} + +/** + * Extract stock using the base Dutchie logic + */ +export async function extractStock( + document: any, + dispensary?: Dispensary +): Promise { + const crawler = createCrawler(dispensary || ({} as Dispensary)); + return crawler.extractStock(document); +} + +/** + * Extract pagination using the base Dutchie logic + */ +export async function extractPagination( + document: any, + dispensary?: Dispensary +): Promise { + const crawler = createCrawler(dispensary || ({} as Dispensary)); + return crawler.extractPagination(document); +} diff --git a/backend/src/crawlers/base/base-jane.ts b/backend/src/crawlers/base/base-jane.ts new file mode 100644 index 00000000..ffc46d8b --- /dev/null +++ b/backend/src/crawlers/base/base-jane.ts @@ -0,0 +1,330 @@ +/** + * Base Jane Crawler Template (PLACEHOLDER) + * + * This is the base template for all Jane (iheartjane) store crawlers. + * Per-store crawlers extend this by overriding specific methods. + * + * TODO: Implement Jane-specific crawling logic (Algolia-based) + */ + +import { Dispensary } from '../../dutchie-az/types'; +import { + StoreCrawlOptions, + CrawlResult, + StructureDetectionResult, + ExtractedProduct, + ExtractedImage, + ExtractedStock, + ExtractedPagination, +} from './base-dutchie'; + +// Re-export types +export { + StoreCrawlOptions, + CrawlResult, + StructureDetectionResult, + ExtractedProduct, + ExtractedImage, + ExtractedStock, + ExtractedPagination, +}; + +// ============================================================ +// JANE-SPECIFIC TYPES +// ============================================================ + +export interface JaneConfig { + algoliaAppId?: string; + algoliaApiKey?: string; + algoliaIndex?: string; + storeId?: string; +} + +export interface JaneSelectors { + productContainer?: string; + productName?: string; + productPrice?: string; + productImage?: string; + productCategory?: string; + productBrand?: string; + pagination?: string; + loadMore?: string; +} + +export const DEFAULT_JANE_SELECTORS: JaneSelectors = { + productContainer: '[data-testid="product-card"], .product-card', + productName: '[data-testid="product-name"], .product-name', + productPrice: '[data-testid="product-price"], .product-price', + productImage: '.product-image img, [data-testid="product-image"] img', + productCategory: '.product-category', + productBrand: '.product-brand, [data-testid="brand-name"]', + loadMore: '[data-testid="load-more"], .load-more-btn', +}; + +// ============================================================ +// BASE JANE CRAWLER CLASS +// ============================================================ + +export class BaseJaneCrawler { + protected dispensary: Dispensary; + protected options: StoreCrawlOptions; + protected selectors: JaneSelectors; + protected janeConfig: JaneConfig; + + constructor( + dispensary: Dispensary, + options: StoreCrawlOptions = {}, + selectors: JaneSelectors = {}, + janeConfig: JaneConfig = {} + ) { + this.dispensary = dispensary; + this.options = { + pricingType: 'rec', + useBothModes: false, + downloadImages: true, + trackStock: true, + timeoutMs: 30000, + ...options, + }; + this.selectors = { ...DEFAULT_JANE_SELECTORS, ...selectors }; + this.janeConfig = janeConfig; + } + + /** + * Main entry point - crawl products for this dispensary + * TODO: Implement Jane/Algolia-specific crawling + */ + async crawlProducts(): Promise { + const startTime = Date.now(); + console.warn(`[BaseJaneCrawler] Jane crawling not yet implemented for ${this.dispensary.name}`); + return { + success: false, + dispensaryId: this.dispensary.id || 0, + productsFound: 0, + productsFetched: 0, + productsUpserted: 0, + snapshotsCreated: 0, + imagesDownloaded: 0, + errorMessage: 'Jane crawler not yet implemented', + durationMs: Date.now() - startTime, + }; + } + + /** + * Detect page structure for sandbox discovery mode + * Jane uses Algolia, so we look for Algolia config + */ + async detectStructure(page: any): Promise { + const result: StructureDetectionResult = { + success: false, + menuType: 'unknown', + selectors: {}, + pagination: { type: 'none' }, + errors: [], + metadata: {}, + }; + + try { + if (page && typeof page.evaluate === 'function') { + // Look for Jane/Algolia indicators + const detection = await page.evaluate(() => { + // Check for iheartjane in page + const hasJane = document.documentElement.innerHTML.includes('iheartjane') || + document.documentElement.innerHTML.includes('jane-menu'); + + // Look for Algolia config + const scripts = Array.from(document.querySelectorAll('script')); + let algoliaConfig: any = null; + + for (const script of scripts) { + const content = script.textContent || ''; + if (content.includes('algolia') || content.includes('ALGOLIA')) { + // Try to extract config + const appIdMatch = content.match(/applicationId['":\s]+['"]([^'"]+)['"]/); + const apiKeyMatch = content.match(/apiKey['":\s]+['"]([^'"]+)['"]/); + if (appIdMatch && apiKeyMatch) { + algoliaConfig = { + appId: appIdMatch[1], + apiKey: apiKeyMatch[1], + }; + } + } + } + + return { + hasJane, + algoliaConfig, + }; + }); + + if (detection.hasJane) { + result.menuType = 'jane'; + result.success = true; + result.metadata = detection; + + if (detection.algoliaConfig) { + result.metadata.algoliaAppId = detection.algoliaConfig.appId; + result.metadata.algoliaApiKey = detection.algoliaConfig.apiKey; + } + } + } + } catch (error: any) { + result.errors.push(`Detection error: ${error.message}`); + } + + return result; + } + + /** + * Extract products from Algolia response or page + */ + async extractProducts(document: any): Promise { + // If document is Algolia hits array + if (Array.isArray(document)) { + return document.map((hit) => this.mapAlgoliaHit(hit)); + } + + console.warn('[BaseJaneCrawler] extractProducts not yet fully implemented'); + return []; + } + + /** + * Map Algolia hit to ExtractedProduct + */ + protected mapAlgoliaHit(hit: any): ExtractedProduct { + return { + externalId: hit.objectID || hit.id || hit.product_id, + name: hit.name || hit.product_name, + brand: hit.brand || hit.brand_name, + category: hit.category || hit.kind, + subcategory: hit.subcategory, + price: hit.price || hit.bucket_price, + priceRec: hit.prices?.rec || hit.price_rec, + priceMed: hit.prices?.med || hit.price_med, + weight: hit.weight || hit.amount, + thcContent: hit.percent_thc ? `${hit.percent_thc}%` : undefined, + cbdContent: hit.percent_cbd ? `${hit.percent_cbd}%` : undefined, + description: hit.description, + imageUrl: hit.image_url || hit.product_image_url, + stockStatus: hit.available ? 'in_stock' : 'out_of_stock', + quantity: hit.quantity_available, + raw: hit, + }; + } + + /** + * Extract images from document + */ + async extractImages(document: any): Promise { + if (Array.isArray(document)) { + return document + .filter((hit) => hit.image_url || hit.product_image_url) + .map((hit, i) => ({ + productId: hit.objectID || hit.id || `jane-product-${i}`, + imageUrl: hit.image_url || hit.product_image_url, + isPrimary: true, + position: 0, + })); + } + + return []; + } + + /** + * Extract stock information from document + */ + async extractStock(document: any): Promise { + if (Array.isArray(document)) { + return document.map((hit) => ({ + productId: hit.objectID || hit.id, + status: hit.available ? 'in_stock' as const : 'out_of_stock' as const, + quantity: hit.quantity_available, + lastChecked: new Date(), + })); + } + + return []; + } + + /** + * Extract pagination information + * Algolia uses cursor-based pagination + */ + async extractPagination(document: any): Promise { + if (document && typeof document === 'object' && !Array.isArray(document)) { + return { + hasNextPage: document.page < document.nbPages - 1, + currentPage: document.page, + totalPages: document.nbPages, + totalProducts: document.nbHits, + }; + } + + return { hasNextPage: false }; + } +} + +// ============================================================ +// FACTORY FUNCTION +// ============================================================ + +export function createCrawler( + dispensary: Dispensary, + options: StoreCrawlOptions = {}, + selectors: JaneSelectors = {}, + janeConfig: JaneConfig = {} +): BaseJaneCrawler { + return new BaseJaneCrawler(dispensary, options, selectors, janeConfig); +} + +// ============================================================ +// STANDALONE FUNCTIONS +// ============================================================ + +export async function crawlProducts( + dispensary: Dispensary, + options: StoreCrawlOptions = {} +): Promise { + const crawler = createCrawler(dispensary, options); + return crawler.crawlProducts(); +} + +export async function detectStructure( + page: any, + dispensary?: Dispensary +): Promise { + const crawler = createCrawler(dispensary || ({} as Dispensary)); + return crawler.detectStructure(page); +} + +export async function extractProducts( + document: any, + dispensary?: Dispensary +): Promise { + const crawler = createCrawler(dispensary || ({} as Dispensary)); + return crawler.extractProducts(document); +} + +export async function extractImages( + document: any, + dispensary?: Dispensary +): Promise { + const crawler = createCrawler(dispensary || ({} as Dispensary)); + return crawler.extractImages(document); +} + +export async function extractStock( + document: any, + dispensary?: Dispensary +): Promise { + const crawler = createCrawler(dispensary || ({} as Dispensary)); + return crawler.extractStock(document); +} + +export async function extractPagination( + document: any, + dispensary?: Dispensary +): Promise { + const crawler = createCrawler(dispensary || ({} as Dispensary)); + return crawler.extractPagination(document); +} diff --git a/backend/src/crawlers/base/base-treez.ts b/backend/src/crawlers/base/base-treez.ts new file mode 100644 index 00000000..b930f903 --- /dev/null +++ b/backend/src/crawlers/base/base-treez.ts @@ -0,0 +1,212 @@ +/** + * Base Treez Crawler Template (PLACEHOLDER) + * + * This is the base template for all Treez store crawlers. + * Per-store crawlers extend this by overriding specific methods. + * + * TODO: Implement Treez-specific crawling logic + */ + +import { Dispensary } from '../../dutchie-az/types'; +import { + StoreCrawlOptions, + CrawlResult, + StructureDetectionResult, + ExtractedProduct, + ExtractedImage, + ExtractedStock, + ExtractedPagination, +} from './base-dutchie'; + +// Re-export types +export { + StoreCrawlOptions, + CrawlResult, + StructureDetectionResult, + ExtractedProduct, + ExtractedImage, + ExtractedStock, + ExtractedPagination, +}; + +// ============================================================ +// TREEZ-SPECIFIC TYPES +// ============================================================ + +export interface TreezSelectors { + productContainer?: string; + productName?: string; + productPrice?: string; + productImage?: string; + productCategory?: string; + productBrand?: string; + addToCart?: string; + pagination?: string; +} + +export const DEFAULT_TREEZ_SELECTORS: TreezSelectors = { + productContainer: '.product-tile, [class*="ProductCard"]', + productName: '.product-name, [class*="ProductName"]', + productPrice: '.product-price, [class*="ProductPrice"]', + productImage: '.product-image img', + productCategory: '.product-category', + productBrand: '.product-brand', + addToCart: '.add-to-cart-btn', + pagination: '.pagination', +}; + +// ============================================================ +// BASE TREEZ CRAWLER CLASS +// ============================================================ + +export class BaseTreezCrawler { + protected dispensary: Dispensary; + protected options: StoreCrawlOptions; + protected selectors: TreezSelectors; + + constructor( + dispensary: Dispensary, + options: StoreCrawlOptions = {}, + selectors: TreezSelectors = {} + ) { + this.dispensary = dispensary; + this.options = { + pricingType: 'rec', + useBothModes: false, + downloadImages: true, + trackStock: true, + timeoutMs: 30000, + ...options, + }; + this.selectors = { ...DEFAULT_TREEZ_SELECTORS, ...selectors }; + } + + /** + * Main entry point - crawl products for this dispensary + * TODO: Implement Treez-specific crawling + */ + async crawlProducts(): Promise { + const startTime = Date.now(); + console.warn(`[BaseTreezCrawler] Treez crawling not yet implemented for ${this.dispensary.name}`); + return { + success: false, + dispensaryId: this.dispensary.id || 0, + productsFound: 0, + productsFetched: 0, + productsUpserted: 0, + snapshotsCreated: 0, + imagesDownloaded: 0, + errorMessage: 'Treez crawler not yet implemented', + durationMs: Date.now() - startTime, + }; + } + + /** + * Detect page structure for sandbox discovery mode + */ + async detectStructure(page: any): Promise { + return { + success: false, + menuType: 'unknown', + selectors: {}, + pagination: { type: 'none' }, + errors: ['Treez structure detection not yet implemented'], + metadata: {}, + }; + } + + /** + * Extract products from page/document + */ + async extractProducts(document: any): Promise { + console.warn('[BaseTreezCrawler] extractProducts not yet implemented'); + return []; + } + + /** + * Extract images from document + */ + async extractImages(document: any): Promise { + console.warn('[BaseTreezCrawler] extractImages not yet implemented'); + return []; + } + + /** + * Extract stock information from document + */ + async extractStock(document: any): Promise { + console.warn('[BaseTreezCrawler] extractStock not yet implemented'); + return []; + } + + /** + * Extract pagination information from document + */ + async extractPagination(document: any): Promise { + return { hasNextPage: false }; + } +} + +// ============================================================ +// FACTORY FUNCTION +// ============================================================ + +export function createCrawler( + dispensary: Dispensary, + options: StoreCrawlOptions = {}, + selectors: TreezSelectors = {} +): BaseTreezCrawler { + return new BaseTreezCrawler(dispensary, options, selectors); +} + +// ============================================================ +// STANDALONE FUNCTIONS +// ============================================================ + +export async function crawlProducts( + dispensary: Dispensary, + options: StoreCrawlOptions = {} +): Promise { + const crawler = createCrawler(dispensary, options); + return crawler.crawlProducts(); +} + +export async function detectStructure( + page: any, + dispensary?: Dispensary +): Promise { + const crawler = createCrawler(dispensary || ({} as Dispensary)); + return crawler.detectStructure(page); +} + +export async function extractProducts( + document: any, + dispensary?: Dispensary +): Promise { + const crawler = createCrawler(dispensary || ({} as Dispensary)); + return crawler.extractProducts(document); +} + +export async function extractImages( + document: any, + dispensary?: Dispensary +): Promise { + const crawler = createCrawler(dispensary || ({} as Dispensary)); + return crawler.extractImages(document); +} + +export async function extractStock( + document: any, + dispensary?: Dispensary +): Promise { + const crawler = createCrawler(dispensary || ({} as Dispensary)); + return crawler.extractStock(document); +} + +export async function extractPagination( + document: any, + dispensary?: Dispensary +): Promise { + const crawler = createCrawler(dispensary || ({} as Dispensary)); + return crawler.extractPagination(document); +} diff --git a/backend/src/crawlers/base/index.ts b/backend/src/crawlers/base/index.ts new file mode 100644 index 00000000..19142cfe --- /dev/null +++ b/backend/src/crawlers/base/index.ts @@ -0,0 +1,27 @@ +/** + * Base Crawler Templates Index + * + * Exports all base crawler templates for easy importing. + */ + +// Dutchie base (primary implementation) +export * from './base-dutchie'; + +// Treez base (placeholder) +export * as Treez from './base-treez'; + +// Jane base (placeholder) +export * as Jane from './base-jane'; + +// Re-export common types from dutchie for convenience +export type { + StoreCrawlOptions, + CrawlResult, + StructureDetectionResult, + ExtractedProduct, + ExtractedImage, + ExtractedStock, + ExtractedPagination, + DutchieCrawlerHooks, + DutchieSelectors, +} from './base-dutchie'; diff --git a/backend/src/crawlers/dutchie/base-dutchie.ts b/backend/src/crawlers/dutchie/base-dutchie.ts new file mode 100644 index 00000000..01dd3323 --- /dev/null +++ b/backend/src/crawlers/dutchie/base-dutchie.ts @@ -0,0 +1,9 @@ +/** + * Base Dutchie Crawler Template (Re-export for backward compatibility) + * + * DEPRECATED: Import from '../base/base-dutchie' instead. + * This file re-exports everything from the new location for existing code. + */ + +// Re-export everything from the new base location +export * from '../base/base-dutchie'; diff --git a/backend/src/crawlers/dutchie/stores/trulieve-scottsdale.ts b/backend/src/crawlers/dutchie/stores/trulieve-scottsdale.ts new file mode 100644 index 00000000..142cc242 --- /dev/null +++ b/backend/src/crawlers/dutchie/stores/trulieve-scottsdale.ts @@ -0,0 +1,118 @@ +/** + * Trulieve Scottsdale - Per-Store Dutchie Crawler + * + * Store ID: 101 + * Profile Key: trulieve-scottsdale + * Platform Dispensary ID: 5eaf489fa8a61801212577cc + * + * Phase 1: Identity implementation - no overrides, just uses base Dutchie logic. + * Future: Add store-specific selectors, timing, or custom logic as needed. + */ + +import { + BaseDutchieCrawler, + StoreCrawlOptions, + CrawlResult, + DutchieSelectors, + crawlProducts as baseCrawlProducts, +} from '../../base/base-dutchie'; +import { Dispensary } from '../../../dutchie-az/types'; + +// Re-export CrawlResult for the orchestrator +export { CrawlResult }; + +// ============================================================ +// STORE CONFIGURATION +// ============================================================ + +/** + * Store-specific configuration + * These can be used to customize crawler behavior for this store + */ +export const STORE_CONFIG = { + storeId: 101, + profileKey: 'trulieve-scottsdale', + name: 'Trulieve of Scottsdale Dispensary', + platformDispensaryId: '5eaf489fa8a61801212577cc', + + // Store-specific overrides (none for Phase 1) + customOptions: { + // Example future overrides: + // pricingType: 'rec', + // useBothModes: true, + // customHeaders: {}, + // maxRetries: 3, + }, +}; + +// ============================================================ +// STORE CRAWLER CLASS +// ============================================================ + +/** + * TrulieveScottsdaleCrawler - Per-store crawler for Trulieve Scottsdale + * + * Phase 1: Identity implementation - extends BaseDutchieCrawler with no overrides. + * Future phases can override methods like: + * - getCName() for custom slug handling + * - crawlProducts() for completely custom logic + * - Add hooks for pre/post processing + */ +export class TrulieveScottsdaleCrawler extends BaseDutchieCrawler { + constructor(dispensary: Dispensary, options: StoreCrawlOptions = {}) { + // Merge store-specific options with provided options + const mergedOptions: StoreCrawlOptions = { + ...STORE_CONFIG.customOptions, + ...options, + }; + + super(dispensary, mergedOptions); + } + + // Phase 1: No overrides - use base implementation + // Future phases can add overrides here: + // + // async crawlProducts(): Promise { + // // Custom pre-processing + // // ... + // const result = await super.crawlProducts(); + // // Custom post-processing + // // ... + // return result; + // } +} + +// ============================================================ +// EXPORTED CRAWL FUNCTION +// ============================================================ + +/** + * Main entry point for the orchestrator + * + * The orchestrator calls: mod.crawlProducts(dispensary, options) + * This function creates a TrulieveScottsdaleCrawler and runs it. + */ +export async function crawlProducts( + dispensary: Dispensary, + options: StoreCrawlOptions = {} +): Promise { + console.log(`[TrulieveScottsdale] Using per-store crawler for ${dispensary.name}`); + + const crawler = new TrulieveScottsdaleCrawler(dispensary, options); + return crawler.crawlProducts(); +} + +// ============================================================ +// FACTORY FUNCTION (alternative API) +// ============================================================ + +/** + * Create a crawler instance without running it + * Useful for testing or when you need to configure before running + */ +export function createCrawler( + dispensary: Dispensary, + options: StoreCrawlOptions = {} +): TrulieveScottsdaleCrawler { + return new TrulieveScottsdaleCrawler(dispensary, options); +} diff --git a/backend/src/db/add-jobs-table.ts b/backend/src/db/add-jobs-table.ts index f308172d..12b67bed 100755 --- a/backend/src/db/add-jobs-table.ts +++ b/backend/src/db/add-jobs-table.ts @@ -1,4 +1,4 @@ -import { pool } from './migrate'; +import { pool } from './pool'; async function addJobsTable() { const client = await pool.connect(); diff --git a/backend/src/db/migrate.ts b/backend/src/db/migrate.ts index 0f8eaf7e..0fc6fa8a 100755 --- a/backend/src/db/migrate.ts +++ b/backend/src/db/migrate.ts @@ -1,23 +1,63 @@ +/** + * Database Migration Script (CLI-ONLY) + * + * This file is for running migrations via CLI only: + * npx tsx src/db/migrate.ts + * + * DO NOT import this file from runtime code. + * Runtime code should import from src/db/pool.ts instead. + */ + import { Pool } from 'pg'; +import dotenv from 'dotenv'; -// Consolidated DB connection: -// - Prefer CRAWLSY_DATABASE_URL (e.g., crawlsy_local, crawlsy_prod) -// - Then DATABASE_URL (default) -const DATABASE_URL = - process.env.CRAWLSY_DATABASE_URL || - process.env.DATABASE_URL || - 'postgresql://dutchie:dutchie_local_pass@localhost:54320/crawlsy_local'; +// Load .env BEFORE any env var access +dotenv.config(); -const pool = new Pool({ - connectionString: DATABASE_URL, -}); +/** + * Get the database connection string from environment variables. + * Strict validation - will throw if required vars are missing. + */ +function getConnectionString(): string { + // Priority 1: Full connection URL + if (process.env.CANNAIQ_DB_URL) { + return process.env.CANNAIQ_DB_URL; + } + + // Priority 2: Build from individual env vars (all required) + const required = ['CANNAIQ_DB_HOST', 'CANNAIQ_DB_PORT', 'CANNAIQ_DB_NAME', 'CANNAIQ_DB_USER', 'CANNAIQ_DB_PASS']; + const missing = required.filter((key) => !process.env[key]); + + if (missing.length > 0) { + throw new Error( + `[Migrate] Missing required environment variables: ${missing.join(', ')}\n` + + `Either set CANNAIQ_DB_URL or all of: CANNAIQ_DB_HOST, CANNAIQ_DB_PORT, CANNAIQ_DB_NAME, CANNAIQ_DB_USER, CANNAIQ_DB_PASS` + ); + } + + const host = process.env.CANNAIQ_DB_HOST!; + const port = process.env.CANNAIQ_DB_PORT!; + const name = process.env.CANNAIQ_DB_NAME!; + const user = process.env.CANNAIQ_DB_USER!; + const pass = process.env.CANNAIQ_DB_PASS!; + + return `postgresql://${user}:${pass}@${host}:${port}/${name}`; +} + +/** + * Run all database migrations + */ +async function runMigrations() { + // Create pool only when migrations are actually run + const pool = new Pool({ + connectionString: getConnectionString(), + }); -export async function runMigrations() { const client = await pool.connect(); - + try { await client.query('BEGIN'); - + // Users table await client.query(` CREATE TABLE IF NOT EXISTS users ( @@ -29,7 +69,7 @@ export async function runMigrations() { updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); `); - + // Stores table await client.query(` CREATE TABLE IF NOT EXISTS stores ( @@ -44,7 +84,7 @@ export async function runMigrations() { updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); `); - + // Categories table (shop, brands, specials) await client.query(` CREATE TABLE IF NOT EXISTS categories ( @@ -59,7 +99,7 @@ export async function runMigrations() { UNIQUE(store_id, slug) ); `); - + // Products table await client.query(` CREATE TABLE IF NOT EXISTS products ( @@ -90,7 +130,7 @@ export async function runMigrations() { UNIQUE(store_id, dutchie_product_id) ); `); - + // Campaigns table await client.query(` CREATE TABLE IF NOT EXISTS campaigns ( @@ -106,7 +146,7 @@ export async function runMigrations() { updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); `); - + // Add variant column to products table (for different sizes/options of same product) await client.query(` ALTER TABLE products ADD COLUMN IF NOT EXISTS variant VARCHAR(255); @@ -226,7 +266,7 @@ export async function runMigrations() { UNIQUE(campaign_id, product_id) ); `); - + // Click tracking await client.query(` CREATE TABLE IF NOT EXISTS clicks ( @@ -239,14 +279,14 @@ export async function runMigrations() { clicked_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); `); - + // Create index on clicked_at for analytics queries await client.query(` CREATE INDEX IF NOT EXISTS idx_clicks_clicked_at ON clicks(clicked_at); CREATE INDEX IF NOT EXISTS idx_clicks_product_id ON clicks(product_id); CREATE INDEX IF NOT EXISTS idx_clicks_campaign_id ON clicks(campaign_id); `); - + // Proxies table await client.query(` CREATE TABLE IF NOT EXISTS proxies ( @@ -310,7 +350,7 @@ export async function runMigrations() { CREATE INDEX IF NOT EXISTS idx_proxy_test_jobs_status ON proxy_test_jobs(status); CREATE INDEX IF NOT EXISTS idx_proxy_test_jobs_created_at ON proxy_test_jobs(created_at DESC); `); - + // Settings table await client.query(` CREATE TABLE IF NOT EXISTS settings ( @@ -320,7 +360,7 @@ export async function runMigrations() { updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); `); - + // Insert default settings await client.query(` INSERT INTO settings (key, value, description) VALUES @@ -331,7 +371,7 @@ export async function runMigrations() { ('proxy_test_url', 'https://httpbin.org/ip', 'URL to test proxies against') ON CONFLICT (key) DO NOTHING; `); - + await client.query('COMMIT'); console.log('✅ Migrations completed successfully'); } catch (error) { @@ -340,12 +380,12 @@ export async function runMigrations() { throw error; } finally { client.release(); + await pool.end(); } } -export { pool }; - -// Run migrations if this file is executed directly +// Only run when executed directly (CLI mode) +// DO NOT export pool - runtime code must use src/db/pool.ts if (require.main === module) { runMigrations() .then(() => process.exit(0)) diff --git a/backend/src/db/run-notifications-migration.ts b/backend/src/db/run-notifications-migration.ts index 30482826..745da935 100644 --- a/backend/src/db/run-notifications-migration.ts +++ b/backend/src/db/run-notifications-migration.ts @@ -1,4 +1,4 @@ -import { pool } from './migrate'; +import { pool } from './pool'; import * as fs from 'fs'; import * as path from 'path'; diff --git a/backend/src/db/seed.ts b/backend/src/db/seed.ts index 8235f049..25f790d5 100755 --- a/backend/src/db/seed.ts +++ b/backend/src/db/seed.ts @@ -1,4 +1,4 @@ -import { pool } from './migrate'; +import { pool } from './pool'; import bcrypt from 'bcrypt'; export async function seedDatabase() { diff --git a/backend/src/db/update-categories-hierarchy.ts b/backend/src/db/update-categories-hierarchy.ts index 9e0d833e..5145c177 100644 --- a/backend/src/db/update-categories-hierarchy.ts +++ b/backend/src/db/update-categories-hierarchy.ts @@ -1,4 +1,4 @@ -import { pool } from './migrate'; +import { pool } from './pool'; async function updateCategoriesHierarchy() { const client = await pool.connect(); diff --git a/backend/src/discovery/city-discovery.ts b/backend/src/discovery/city-discovery.ts new file mode 100644 index 00000000..9a63d069 --- /dev/null +++ b/backend/src/discovery/city-discovery.ts @@ -0,0 +1,474 @@ +/** + * Dutchie City Discovery Service + * + * Discovers cities from the Dutchie cities page. + * Each city can contain multiple dispensary locations. + * + * Source: https://dutchie.com/cities + * + * This module ONLY handles city discovery and upserts to dutchie_discovery_cities. + * It does NOT create any dispensary records. + */ + +import { Pool } from 'pg'; +import axios from 'axios'; +import * as cheerio from 'cheerio'; +import { + DiscoveryCity, + DiscoveryCityRow, + DutchieCityResponse, + CityDiscoveryResult, + mapCityRowToCity, +} from './types'; + +const CITIES_PAGE_URL = 'https://dutchie.com/cities'; +const PLATFORM = 'dutchie'; + +// ============================================================ +// CITY PAGE SCRAPING +// ============================================================ + +/** + * Fetch and parse the Dutchie cities page. + * Returns a list of cities with their slugs and states. + */ +export async function fetchCitiesFromPage(): Promise { + console.log(`[CityDiscovery] Fetching cities from ${CITIES_PAGE_URL}...`); + + const response = await axios.get(CITIES_PAGE_URL, { + headers: { + 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36', + 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', + 'Accept-Language': 'en-US,en;q=0.9', + }, + timeout: 30000, + }); + + const $ = cheerio.load(response.data); + const cities: DutchieCityResponse[] = []; + + // Look for city links in various possible structures + // Structure 1: Links in /dispensaries/{state}/{city} format + $('a[href*="/dispensaries/"]').each((_, element) => { + const href = $(element).attr('href') || ''; + const text = $(element).text().trim(); + + // Match /dispensaries/{state}/{city} pattern + const match = href.match(/\/dispensaries\/([a-z]{2,3})\/([a-z0-9-]+)/i); + if (match) { + const [, stateCode, citySlug] = match; + cities.push({ + slug: citySlug, + name: text || citySlug.replace(/-/g, ' '), + stateCode: stateCode.toUpperCase(), + countryCode: stateCode.length === 2 ? 'US' : 'CA', // 2-letter = US state, 3+ = Canadian province + }); + } + }); + + // Structure 2: Links in /city/{slug} format + $('a[href*="/city/"]').each((_, element) => { + const href = $(element).attr('href') || ''; + const text = $(element).text().trim(); + + const match = href.match(/\/city\/([a-z0-9-]+)/i); + if (match) { + const [, citySlug] = match; + cities.push({ + slug: citySlug, + name: text || citySlug.replace(/-/g, ' '), + }); + } + }); + + // Dedupe by slug + const uniqueCities = new Map(); + for (const city of cities) { + const key = `${city.countryCode || 'unknown'}-${city.stateCode || 'unknown'}-${city.slug}`; + if (!uniqueCities.has(key)) { + uniqueCities.set(key, city); + } + } + + const result = Array.from(uniqueCities.values()); + console.log(`[CityDiscovery] Found ${result.length} unique cities`); + + return result; +} + +/** + * Alternative: Fetch cities from Dutchie's internal API/GraphQL + * This is a fallback if the HTML scraping doesn't work. + */ +export async function fetchCitiesFromApi(): Promise { + console.log('[CityDiscovery] Attempting to fetch cities from API...'); + + // Try to find the cities endpoint - this is exploratory + // Dutchie may expose cities via their public API + + // Common patterns to try: + const possibleEndpoints = [ + 'https://dutchie.com/api/cities', + 'https://dutchie.com/api-3/cities', + 'https://api.dutchie.com/v1/cities', + ]; + + for (const endpoint of possibleEndpoints) { + try { + const response = await axios.get(endpoint, { + headers: { + 'Accept': 'application/json', + 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36', + }, + timeout: 10000, + validateStatus: () => true, + }); + + if (response.status === 200 && Array.isArray(response.data)) { + console.log(`[CityDiscovery] Found cities at ${endpoint}`); + return response.data.map((city: any) => ({ + slug: city.slug || city.city_slug, + name: city.name || city.city_name, + stateCode: city.stateCode || city.state_code || city.state, + countryCode: city.countryCode || city.country_code || city.country || 'US', + })); + } + } catch { + // Continue to next endpoint + } + } + + console.log('[CityDiscovery] No API endpoint found, falling back to page scraping'); + return []; +} + +// ============================================================ +// DATABASE OPERATIONS +// ============================================================ + +/** + * Upsert a city into dutchie_discovery_cities. + * Returns the city ID. + */ +export async function upsertCity( + pool: Pool, + city: DutchieCityResponse +): Promise<{ id: number; isNew: boolean }> { + const result = await pool.query( + `INSERT INTO dutchie_discovery_cities ( + platform, + city_name, + city_slug, + state_code, + country_code, + updated_at + ) VALUES ($1, $2, $3, $4, $5, NOW()) + ON CONFLICT (platform, country_code, state_code, city_slug) + DO UPDATE SET + city_name = EXCLUDED.city_name, + updated_at = NOW() + RETURNING id, (xmax = 0) as is_new`, + [ + PLATFORM, + city.name, + city.slug, + city.stateCode || null, + city.countryCode || 'US', + ] + ); + + return { + id: result.rows[0].id, + isNew: result.rows[0].is_new, + }; +} + +/** + * Mark a city as crawled and update location count. + */ +export async function markCityCrawled( + pool: Pool, + cityId: number, + locationCount: number +): Promise { + await pool.query( + `UPDATE dutchie_discovery_cities + SET last_crawled_at = NOW(), + location_count = $2, + updated_at = NOW() + WHERE id = $1`, + [cityId, locationCount] + ); +} + +/** + * Get all cities that need to be crawled. + */ +export async function getCitiesToCrawl( + pool: Pool, + options: { + stateCode?: string; + countryCode?: string; + limit?: number; + onlyStale?: boolean; + staleDays?: number; + } = {} +): Promise { + const { + stateCode, + countryCode, + limit = 100, + onlyStale = false, + staleDays = 7, + } = options; + + let query = ` + SELECT * + FROM dutchie_discovery_cities + WHERE crawl_enabled = TRUE + `; + const params: any[] = []; + let paramIdx = 1; + + if (stateCode) { + query += ` AND state_code = $${paramIdx}`; + params.push(stateCode); + paramIdx++; + } + + if (countryCode) { + query += ` AND country_code = $${paramIdx}`; + params.push(countryCode); + paramIdx++; + } + + if (onlyStale) { + query += ` AND (last_crawled_at IS NULL OR last_crawled_at < NOW() - INTERVAL '${staleDays} days')`; + } + + query += ` ORDER BY last_crawled_at ASC NULLS FIRST LIMIT $${paramIdx}`; + params.push(limit); + + const result = await pool.query(query, params); + return result.rows.map(mapCityRowToCity); +} + +/** + * Get a city by ID. + */ +export async function getCityById( + pool: Pool, + id: number +): Promise { + const result = await pool.query( + `SELECT * FROM dutchie_discovery_cities WHERE id = $1`, + [id] + ); + + if (result.rows.length === 0) { + return null; + } + + return mapCityRowToCity(result.rows[0]); +} + +/** + * Get a city by slug. + */ +export async function getCityBySlug( + pool: Pool, + slug: string, + stateCode?: string, + countryCode: string = 'US' +): Promise { + let query = ` + SELECT * FROM dutchie_discovery_cities + WHERE platform = $1 AND city_slug = $2 AND country_code = $3 + `; + const params: any[] = [PLATFORM, slug, countryCode]; + + if (stateCode) { + query += ` AND state_code = $4`; + params.push(stateCode); + } + + const result = await pool.query(query, params); + + if (result.rows.length === 0) { + return null; + } + + return mapCityRowToCity(result.rows[0]); +} + +// ============================================================ +// MAIN DISCOVERY FUNCTION +// ============================================================ + +/** + * Run the full city discovery process. + * Fetches cities from Dutchie and upserts them into the database. + */ +export async function discoverCities( + pool: Pool, + options: { + dryRun?: boolean; + verbose?: boolean; + } = {} +): Promise { + const startTime = Date.now(); + const { dryRun = false, verbose = false } = options; + const errors: string[] = []; + + console.log('[CityDiscovery] Starting city discovery...'); + console.log(`[CityDiscovery] Mode: ${dryRun ? 'DRY RUN' : 'LIVE'}`); + + // Try API first, fall back to page scraping + let cities = await fetchCitiesFromApi(); + if (cities.length === 0) { + cities = await fetchCitiesFromPage(); + } + + if (cities.length === 0) { + console.log('[CityDiscovery] No cities found'); + return { + citiesFound: 0, + citiesUpserted: 0, + citiesSkipped: 0, + errors: ['No cities found from page or API'], + durationMs: Date.now() - startTime, + }; + } + + let upserted = 0; + let skipped = 0; + + for (const city of cities) { + try { + if (dryRun) { + if (verbose) { + console.log(`[CityDiscovery][DryRun] Would upsert: ${city.name} (${city.stateCode}, ${city.countryCode})`); + } + upserted++; + continue; + } + + const result = await upsertCity(pool, city); + upserted++; + + if (verbose) { + const action = result.isNew ? 'Created' : 'Updated'; + console.log(`[CityDiscovery] ${action}: ${city.name} (${city.stateCode}, ${city.countryCode}) -> ID ${result.id}`); + } + } catch (error: any) { + errors.push(`City ${city.slug}: ${error.message}`); + skipped++; + } + } + + const durationMs = Date.now() - startTime; + + console.log(`[CityDiscovery] Complete: ${upserted} upserted, ${skipped} skipped, ${errors.length} errors in ${durationMs}ms`); + + return { + citiesFound: cities.length, + citiesUpserted: upserted, + citiesSkipped: skipped, + errors, + durationMs, + }; +} + +// ============================================================ +// MANUAL CITY SEEDING +// ============================================================ + +/** + * Seed known cities manually. + * Use this when the cities page doesn't expose all cities. + */ +export async function seedKnownCities( + pool: Pool, + cities: Array<{ + name: string; + slug: string; + stateCode: string; + countryCode?: string; + }> +): Promise<{ created: number; updated: number }> { + let created = 0; + let updated = 0; + + for (const city of cities) { + const result = await upsertCity(pool, { + name: city.name, + slug: city.slug, + stateCode: city.stateCode, + countryCode: city.countryCode || 'US', + }); + + if (result.isNew) { + created++; + } else { + updated++; + } + } + + return { created, updated }; +} + +/** + * Pre-defined Arizona cities for seeding. + */ +export const ARIZONA_CITIES = [ + { name: 'Phoenix', slug: 'phoenix', stateCode: 'AZ' }, + { name: 'Tucson', slug: 'tucson', stateCode: 'AZ' }, + { name: 'Mesa', slug: 'mesa', stateCode: 'AZ' }, + { name: 'Chandler', slug: 'chandler', stateCode: 'AZ' }, + { name: 'Scottsdale', slug: 'scottsdale', stateCode: 'AZ' }, + { name: 'Glendale', slug: 'glendale', stateCode: 'AZ' }, + { name: 'Gilbert', slug: 'gilbert', stateCode: 'AZ' }, + { name: 'Tempe', slug: 'tempe', stateCode: 'AZ' }, + { name: 'Peoria', slug: 'peoria', stateCode: 'AZ' }, + { name: 'Surprise', slug: 'surprise', stateCode: 'AZ' }, + { name: 'Yuma', slug: 'yuma', stateCode: 'AZ' }, + { name: 'Avondale', slug: 'avondale', stateCode: 'AZ' }, + { name: 'Flagstaff', slug: 'flagstaff', stateCode: 'AZ' }, + { name: 'Goodyear', slug: 'goodyear', stateCode: 'AZ' }, + { name: 'Lake Havasu City', slug: 'lake-havasu-city', stateCode: 'AZ' }, + { name: 'Buckeye', slug: 'buckeye', stateCode: 'AZ' }, + { name: 'Casa Grande', slug: 'casa-grande', stateCode: 'AZ' }, + { name: 'Sierra Vista', slug: 'sierra-vista', stateCode: 'AZ' }, + { name: 'Maricopa', slug: 'maricopa', stateCode: 'AZ' }, + { name: 'Oro Valley', slug: 'oro-valley', stateCode: 'AZ' }, + { name: 'Prescott', slug: 'prescott', stateCode: 'AZ' }, + { name: 'Bullhead City', slug: 'bullhead-city', stateCode: 'AZ' }, + { name: 'Prescott Valley', slug: 'prescott-valley', stateCode: 'AZ' }, + { name: 'Apache Junction', slug: 'apache-junction', stateCode: 'AZ' }, + { name: 'Marana', slug: 'marana', stateCode: 'AZ' }, + { name: 'El Mirage', slug: 'el-mirage', stateCode: 'AZ' }, + { name: 'Kingman', slug: 'kingman', stateCode: 'AZ' }, + { name: 'Queen Creek', slug: 'queen-creek', stateCode: 'AZ' }, + { name: 'San Luis', slug: 'san-luis', stateCode: 'AZ' }, + { name: 'Sahuarita', slug: 'sahuarita', stateCode: 'AZ' }, + { name: 'Fountain Hills', slug: 'fountain-hills', stateCode: 'AZ' }, + { name: 'Nogales', slug: 'nogales', stateCode: 'AZ' }, + { name: 'Douglas', slug: 'douglas', stateCode: 'AZ' }, + { name: 'Eloy', slug: 'eloy', stateCode: 'AZ' }, + { name: 'Somerton', slug: 'somerton', stateCode: 'AZ' }, + { name: 'Paradise Valley', slug: 'paradise-valley', stateCode: 'AZ' }, + { name: 'Coolidge', slug: 'coolidge', stateCode: 'AZ' }, + { name: 'Cottonwood', slug: 'cottonwood', stateCode: 'AZ' }, + { name: 'Camp Verde', slug: 'camp-verde', stateCode: 'AZ' }, + { name: 'Show Low', slug: 'show-low', stateCode: 'AZ' }, + { name: 'Payson', slug: 'payson', stateCode: 'AZ' }, + { name: 'Sedona', slug: 'sedona', stateCode: 'AZ' }, + { name: 'Winslow', slug: 'winslow', stateCode: 'AZ' }, + { name: 'Globe', slug: 'globe', stateCode: 'AZ' }, + { name: 'Safford', slug: 'safford', stateCode: 'AZ' }, + { name: 'Bisbee', slug: 'bisbee', stateCode: 'AZ' }, + { name: 'Wickenburg', slug: 'wickenburg', stateCode: 'AZ' }, + { name: 'Page', slug: 'page', stateCode: 'AZ' }, + { name: 'Holbrook', slug: 'holbrook', stateCode: 'AZ' }, + { name: 'Willcox', slug: 'willcox', stateCode: 'AZ' }, +]; diff --git a/backend/src/discovery/discovery-crawler.ts b/backend/src/discovery/discovery-crawler.ts new file mode 100644 index 00000000..4f6f9576 --- /dev/null +++ b/backend/src/discovery/discovery-crawler.ts @@ -0,0 +1,327 @@ +/** + * Dutchie Discovery Crawler + * + * Main orchestrator for the Dutchie store discovery pipeline. + * + * Flow: + * 1. Discover cities from Dutchie (or use seeded cities) + * 2. For each city, discover store locations + * 3. Upsert all data to discovery tables + * 4. Admin verifies locations manually + * 5. Verified locations are promoted to canonical dispensaries + * + * This module does NOT create canonical dispensaries automatically. + */ + +import { Pool } from 'pg'; +import { + FullDiscoveryResult, + LocationDiscoveryResult, + DiscoveryCity, +} from './types'; +import { + discoverCities, + getCitiesToCrawl, + getCityBySlug, + seedKnownCities, + ARIZONA_CITIES, +} from './city-discovery'; +import { + discoverLocationsForCity, +} from './location-discovery'; + +// ============================================================ +// FULL DISCOVERY +// ============================================================ + +export interface DiscoveryCrawlerOptions { + dryRun?: boolean; + verbose?: boolean; + stateCode?: string; + countryCode?: string; + cityLimit?: number; + skipCityDiscovery?: boolean; + onlyStale?: boolean; + staleDays?: number; +} + +/** + * Run the full discovery pipeline: + * 1. Discover/refresh cities + * 2. For each city, discover locations + */ +export async function runFullDiscovery( + pool: Pool, + options: DiscoveryCrawlerOptions = {} +): Promise { + const startTime = Date.now(); + const { + dryRun = false, + verbose = false, + stateCode, + countryCode = 'US', + cityLimit = 50, + skipCityDiscovery = false, + onlyStale = true, + staleDays = 7, + } = options; + + console.log('='.repeat(60)); + console.log('DUTCHIE DISCOVERY CRAWLER'); + console.log('='.repeat(60)); + console.log(`Mode: ${dryRun ? 'DRY RUN' : 'LIVE'}`); + if (stateCode) console.log(`State: ${stateCode}`); + console.log(`Country: ${countryCode}`); + console.log(`City limit: ${cityLimit}`); + console.log(''); + + // Step 1: Discover/refresh cities + let cityResult = { + citiesFound: 0, + citiesUpserted: 0, + citiesSkipped: 0, + errors: [] as string[], + durationMs: 0, + }; + + if (!skipCityDiscovery) { + console.log('[Discovery] Step 1: Discovering cities...'); + cityResult = await discoverCities(pool, { dryRun, verbose }); + } else { + console.log('[Discovery] Step 1: Skipping city discovery (using existing cities)'); + } + + // Step 2: Get cities to crawl + console.log('[Discovery] Step 2: Getting cities to crawl...'); + const cities = await getCitiesToCrawl(pool, { + stateCode, + countryCode, + limit: cityLimit, + onlyStale, + staleDays, + }); + + console.log(`[Discovery] Found ${cities.length} cities to crawl`); + + // Step 3: Discover locations for each city + console.log('[Discovery] Step 3: Discovering locations...'); + const locationResults: LocationDiscoveryResult[] = []; + let totalLocationsFound = 0; + let totalLocationsUpserted = 0; + + for (let i = 0; i < cities.length; i++) { + const city = cities[i]; + console.log(`\n[Discovery] City ${i + 1}/${cities.length}: ${city.cityName}, ${city.stateCode}`); + + try { + const result = await discoverLocationsForCity(pool, city, { dryRun, verbose }); + locationResults.push(result); + totalLocationsFound += result.locationsFound; + totalLocationsUpserted += result.locationsUpserted; + + // Rate limiting between cities + if (i < cities.length - 1) { + await new Promise((r) => setTimeout(r, 2000)); + } + } catch (error: any) { + console.error(`[Discovery] Error crawling ${city.cityName}: ${error.message}`); + locationResults.push({ + cityId: city.id, + citySlug: city.citySlug, + locationsFound: 0, + locationsUpserted: 0, + locationsNew: 0, + locationsUpdated: 0, + errors: [error.message], + durationMs: 0, + }); + } + } + + const durationMs = Date.now() - startTime; + + // Summary + console.log('\n' + '='.repeat(60)); + console.log('DISCOVERY COMPLETE'); + console.log('='.repeat(60)); + console.log(`Duration: ${(durationMs / 1000).toFixed(1)}s`); + console.log(''); + console.log('Cities:'); + console.log(` Discovered: ${cityResult.citiesFound}`); + console.log(` Upserted: ${cityResult.citiesUpserted}`); + console.log(` Crawled: ${cities.length}`); + console.log(''); + console.log('Locations:'); + console.log(` Found: ${totalLocationsFound}`); + console.log(` Upserted: ${totalLocationsUpserted}`); + console.log(''); + + const totalErrors = cityResult.errors.length + + locationResults.reduce((sum, r) => sum + r.errors.length, 0); + if (totalErrors > 0) { + console.log(`Errors: ${totalErrors}`); + } + + return { + cities: cityResult, + locations: locationResults, + totalLocationsFound, + totalLocationsUpserted, + durationMs, + }; +} + +// ============================================================ +// SINGLE CITY DISCOVERY +// ============================================================ + +/** + * Discover locations for a single city by slug. + */ +export async function discoverCity( + pool: Pool, + citySlug: string, + options: { + stateCode?: string; + countryCode?: string; + dryRun?: boolean; + verbose?: boolean; + } = {} +): Promise { + const { stateCode, countryCode = 'US', dryRun = false, verbose = false } = options; + + // Find the city + let city = await getCityBySlug(pool, citySlug, stateCode, countryCode); + + if (!city) { + // Try to create it if we have enough info + if (stateCode) { + console.log(`[Discovery] City ${citySlug} not found, creating...`); + await seedKnownCities(pool, [{ + name: citySlug.replace(/-/g, ' ').replace(/\b\w/g, c => c.toUpperCase()), + slug: citySlug, + stateCode, + countryCode, + }]); + city = await getCityBySlug(pool, citySlug, stateCode, countryCode); + } + + if (!city) { + console.log(`[Discovery] City ${citySlug} not found and could not be created`); + return null; + } + } + + return await discoverLocationsForCity(pool, city, { dryRun, verbose }); +} + +// ============================================================ +// STATE-WIDE DISCOVERY +// ============================================================ + +/** + * Seed and discover all cities for a state. + */ +export async function discoverState( + pool: Pool, + stateCode: string, + options: { + dryRun?: boolean; + verbose?: boolean; + cityLimit?: number; + } = {} +): Promise { + const { dryRun = false, verbose = false, cityLimit = 100 } = options; + + console.log(`[Discovery] Discovering state: ${stateCode}`); + + // Seed known cities for this state + if (stateCode === 'AZ') { + console.log('[Discovery] Seeding Arizona cities...'); + const seeded = await seedKnownCities(pool, ARIZONA_CITIES); + console.log(`[Discovery] Seeded ${seeded.created} new cities, ${seeded.updated} updated`); + } + + // Run full discovery for this state + return await runFullDiscovery(pool, { + dryRun, + verbose, + stateCode, + countryCode: 'US', + cityLimit, + skipCityDiscovery: true, // Use seeded cities + onlyStale: false, // Crawl all + }); +} + +// ============================================================ +// STATISTICS +// ============================================================ + +export interface DiscoveryStats { + cities: { + total: number; + crawledLast24h: number; + neverCrawled: number; + }; + locations: { + total: number; + discovered: number; + verified: number; + rejected: number; + merged: number; + byState: Array<{ stateCode: string; count: number }>; + }; +} + +/** + * Get discovery statistics. + */ +export async function getDiscoveryStats(pool: Pool): Promise { + const [citiesTotal, citiesRecent, citiesNever] = await Promise.all([ + pool.query('SELECT COUNT(*) as cnt FROM dutchie_discovery_cities'), + pool.query(`SELECT COUNT(*) as cnt FROM dutchie_discovery_cities WHERE last_crawled_at > NOW() - INTERVAL '24 hours'`), + pool.query('SELECT COUNT(*) as cnt FROM dutchie_discovery_cities WHERE last_crawled_at IS NULL'), + ]); + + const [locsTotal, locsByStatus, locsByState] = await Promise.all([ + pool.query('SELECT COUNT(*) as cnt FROM dutchie_discovery_locations WHERE active = TRUE'), + pool.query(` + SELECT status, COUNT(*) as cnt + FROM dutchie_discovery_locations + WHERE active = TRUE + GROUP BY status + `), + pool.query(` + SELECT state_code, COUNT(*) as cnt + FROM dutchie_discovery_locations + WHERE active = TRUE AND state_code IS NOT NULL + GROUP BY state_code + ORDER BY cnt DESC + `), + ]); + + const statusCounts = locsByStatus.rows.reduce((acc, row) => { + acc[row.status] = parseInt(row.cnt, 10); + return acc; + }, {} as Record); + + return { + cities: { + total: parseInt(citiesTotal.rows[0].cnt, 10), + crawledLast24h: parseInt(citiesRecent.rows[0].cnt, 10), + neverCrawled: parseInt(citiesNever.rows[0].cnt, 10), + }, + locations: { + total: parseInt(locsTotal.rows[0].cnt, 10), + discovered: statusCounts.discovered || 0, + verified: statusCounts.verified || 0, + rejected: statusCounts.rejected || 0, + merged: statusCounts.merged || 0, + byState: locsByState.rows.map(row => ({ + stateCode: row.state_code, + count: parseInt(row.cnt, 10), + })), + }, + }; +} diff --git a/backend/src/discovery/index.ts b/backend/src/discovery/index.ts new file mode 100644 index 00000000..6ab74bba --- /dev/null +++ b/backend/src/discovery/index.ts @@ -0,0 +1,37 @@ +/** + * Dutchie Discovery Module + * + * Exports all discovery-related functionality for use in the main application. + */ + +// Types +export * from './types'; + +// City Discovery +export { + discoverCities, + getCitiesToCrawl, + getCityBySlug, + seedKnownCities, + ARIZONA_CITIES, +} from './city-discovery'; + +// Location Discovery +export { + discoverLocationsForCity, + fetchLocationsForCity, + upsertLocation, +} from './location-discovery'; + +// Discovery Crawler (Orchestrator) +export { + runFullDiscovery, + discoverCity, + discoverState, + getDiscoveryStats, + DiscoveryCrawlerOptions, + DiscoveryStats, +} from './discovery-crawler'; + +// Routes +export { createDiscoveryRoutes } from './routes'; diff --git a/backend/src/discovery/location-discovery.ts b/backend/src/discovery/location-discovery.ts new file mode 100644 index 00000000..1e927b4a --- /dev/null +++ b/backend/src/discovery/location-discovery.ts @@ -0,0 +1,686 @@ +/** + * Dutchie Location Discovery Service + * + * Discovers store locations from Dutchie city pages. + * Each city can contain multiple dispensary locations. + * + * This module: + * 1. Fetches location listings for a given city + * 2. Upserts locations into dutchie_discovery_locations + * 3. Does NOT create any canonical dispensary records + * + * Locations remain in "discovered" status until manually verified. + */ + +import { Pool } from 'pg'; +import axios from 'axios'; +import puppeteer from 'puppeteer-extra'; +import type { Browser, Page, Protocol } from 'puppeteer'; +import StealthPlugin from 'puppeteer-extra-plugin-stealth'; +import { + DiscoveryLocation, + DiscoveryLocationRow, + DutchieLocationResponse, + LocationDiscoveryResult, + DiscoveryStatus, + mapLocationRowToLocation, +} from './types'; +import { DiscoveryCity } from './types'; + +puppeteer.use(StealthPlugin()); + +const PLATFORM = 'dutchie'; + +// ============================================================ +// GRAPHQL / API FETCHING +// ============================================================ + +interface SessionCredentials { + cookies: string; + userAgent: string; + browser: Browser; + page: Page; +} + +/** + * Create a browser session for fetching location data. + */ +async function createSession(citySlug: string): Promise { + const browser = await puppeteer.launch({ + headless: 'new', + args: [ + '--no-sandbox', + '--disable-setuid-sandbox', + '--disable-dev-shm-usage', + '--disable-blink-features=AutomationControlled', + ], + }); + + const page = await browser.newPage(); + const userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'; + + await page.setUserAgent(userAgent); + await page.setViewport({ width: 1920, height: 1080 }); + await page.evaluateOnNewDocument(() => { + Object.defineProperty(navigator, 'webdriver', { get: () => false }); + (window as any).chrome = { runtime: {} }; + }); + + // Navigate to a dispensaries page to get cookies + const url = `https://dutchie.com/dispensaries/az/${citySlug}`; + console.log(`[LocationDiscovery] Loading ${url} to establish session...`); + + try { + await page.goto(url, { + waitUntil: 'networkidle2', + timeout: 60000, + }); + await new Promise((r) => setTimeout(r, 2000)); + } catch (error: any) { + console.warn(`[LocationDiscovery] Navigation warning: ${error.message}`); + } + + const cookies = await page.cookies(); + const cookieString = cookies.map((c: Protocol.Network.Cookie) => `${c.name}=${c.value}`).join('; '); + + return { cookies: cookieString, userAgent, browser, page }; +} + +async function closeSession(session: SessionCredentials): Promise { + await session.browser.close(); +} + +/** + * Fetch locations for a city using Dutchie's internal search API. + */ +export async function fetchLocationsForCity( + city: DiscoveryCity, + options: { + session?: SessionCredentials; + verbose?: boolean; + } = {} +): Promise { + const { verbose = false } = options; + let session = options.session; + let shouldCloseSession = false; + + if (!session) { + session = await createSession(city.citySlug); + shouldCloseSession = true; + } + + try { + console.log(`[LocationDiscovery] Fetching locations for ${city.cityName}, ${city.stateCode}...`); + + // Try multiple approaches to get location data + + // Approach 1: Extract from page __NEXT_DATA__ or similar + const locations = await extractLocationsFromPage(session.page, verbose); + if (locations.length > 0) { + console.log(`[LocationDiscovery] Found ${locations.length} locations from page data`); + return locations; + } + + // Approach 2: Try the geo-based GraphQL query + const geoLocations = await fetchLocationsViaGraphQL(session, city, verbose); + if (geoLocations.length > 0) { + console.log(`[LocationDiscovery] Found ${geoLocations.length} locations from GraphQL`); + return geoLocations; + } + + // Approach 3: Scrape visible location cards + const scrapedLocations = await scrapeLocationCards(session.page, verbose); + if (scrapedLocations.length > 0) { + console.log(`[LocationDiscovery] Found ${scrapedLocations.length} locations from scraping`); + return scrapedLocations; + } + + console.log(`[LocationDiscovery] No locations found for ${city.cityName}`); + return []; + } finally { + if (shouldCloseSession) { + await closeSession(session); + } + } +} + +/** + * Extract locations from page's embedded data (__NEXT_DATA__, window.*, etc.) + */ +async function extractLocationsFromPage( + page: Page, + verbose: boolean +): Promise { + try { + const data = await page.evaluate(() => { + // Try __NEXT_DATA__ + const nextDataEl = document.querySelector('#__NEXT_DATA__'); + if (nextDataEl?.textContent) { + try { + const nextData = JSON.parse(nextDataEl.textContent); + // Look for dispensaries in various paths + const dispensaries = + nextData?.props?.pageProps?.dispensaries || + nextData?.props?.pageProps?.initialDispensaries || + nextData?.props?.pageProps?.data?.dispensaries || + []; + if (Array.isArray(dispensaries) && dispensaries.length > 0) { + return { source: '__NEXT_DATA__', dispensaries }; + } + } catch { + // Ignore parse errors + } + } + + // Try window variables + const win = window as any; + if (win.__APOLLO_STATE__) { + // Extract from Apollo cache + const entries = Object.entries(win.__APOLLO_STATE__).filter( + ([key]) => key.startsWith('Dispensary:') + ); + if (entries.length > 0) { + return { source: 'APOLLO_STATE', dispensaries: entries.map(([, v]) => v) }; + } + } + + return { source: 'none', dispensaries: [] }; + }); + + if (verbose) { + console.log(`[LocationDiscovery] Page data source: ${data.source}, count: ${data.dispensaries.length}`); + } + + return data.dispensaries.map((d: any) => normalizeLocationResponse(d)); + } catch (error: any) { + if (verbose) { + console.log(`[LocationDiscovery] Could not extract from page data: ${error.message}`); + } + return []; + } +} + +/** + * Fetch locations via GraphQL geo-based query. + */ +async function fetchLocationsViaGraphQL( + session: SessionCredentials, + city: DiscoveryCity, + verbose: boolean +): Promise { + // Use a known center point for the city or default to a central US location + const CITY_COORDS: Record = { + 'phoenix': { lat: 33.4484, lng: -112.074 }, + 'tucson': { lat: 32.2226, lng: -110.9747 }, + 'scottsdale': { lat: 33.4942, lng: -111.9261 }, + 'mesa': { lat: 33.4152, lng: -111.8315 }, + 'tempe': { lat: 33.4255, lng: -111.94 }, + 'flagstaff': { lat: 35.1983, lng: -111.6513 }, + // Add more as needed + }; + + const coords = CITY_COORDS[city.citySlug] || { lat: 33.4484, lng: -112.074 }; + + const variables = { + dispensariesFilter: { + latitude: coords.lat, + longitude: coords.lng, + distance: 50, // miles + state: city.stateCode, + city: city.cityName, + }, + }; + + const hash = '0a5bfa6ca1d64ae47bcccb7c8077c87147cbc4e6982c17ceec97a2a4948b311b'; + + try { + const response = await axios.post( + 'https://dutchie.com/api-3/graphql', + { + operationName: 'ConsumerDispensaries', + variables, + extensions: { + persistedQuery: { version: 1, sha256Hash: hash }, + }, + }, + { + headers: { + 'content-type': 'application/json', + 'origin': 'https://dutchie.com', + 'referer': `https://dutchie.com/dispensaries/${city.stateCode?.toLowerCase()}/${city.citySlug}`, + 'user-agent': session.userAgent, + 'cookie': session.cookies, + }, + timeout: 30000, + validateStatus: () => true, + } + ); + + if (response.status !== 200) { + if (verbose) { + console.log(`[LocationDiscovery] GraphQL returned ${response.status}`); + } + return []; + } + + const dispensaries = response.data?.data?.consumerDispensaries || []; + return dispensaries.map((d: any) => normalizeLocationResponse(d)); + } catch (error: any) { + if (verbose) { + console.log(`[LocationDiscovery] GraphQL error: ${error.message}`); + } + return []; + } +} + +/** + * Scrape location cards from the visible page. + */ +async function scrapeLocationCards( + page: Page, + verbose: boolean +): Promise { + try { + const locations = await page.evaluate(() => { + const cards: any[] = []; + + // Look for common dispensary card patterns + const selectors = [ + '[data-testid="dispensary-card"]', + '.dispensary-card', + 'a[href*="/dispensary/"]', + '[class*="DispensaryCard"]', + ]; + + for (const selector of selectors) { + const elements = document.querySelectorAll(selector); + if (elements.length > 0) { + elements.forEach((el) => { + const link = el.querySelector('a')?.href || (el as HTMLAnchorElement).href || ''; + const name = el.querySelector('h2, h3, [class*="name"]')?.textContent?.trim() || ''; + const address = el.querySelector('[class*="address"], address')?.textContent?.trim() || ''; + + // Extract slug from URL + const slugMatch = link.match(/\/dispensary\/([^/?]+)/); + const slug = slugMatch ? slugMatch[1] : ''; + + if (slug && name) { + cards.push({ + slug, + name, + address, + menuUrl: link, + }); + } + }); + break; // Stop after first successful selector + } + } + + return cards; + }); + + return locations.map((d: any) => ({ + id: '', + name: d.name, + slug: d.slug, + address: d.address, + menuUrl: d.menuUrl, + })); + } catch (error: any) { + if (verbose) { + console.log(`[LocationDiscovery] Scraping error: ${error.message}`); + } + return []; + } +} + +/** + * Normalize a raw location response to a consistent format. + */ +function normalizeLocationResponse(raw: any): DutchieLocationResponse { + const slug = raw.slug || raw.cName || raw.urlSlug || ''; + const id = raw.id || raw._id || raw.dispensaryId || ''; + + return { + id, + name: raw.name || raw.dispensaryName || '', + slug, + address: raw.address || raw.fullAddress || '', + address1: raw.address1 || raw.addressLine1 || raw.streetAddress || '', + address2: raw.address2 || raw.addressLine2 || '', + city: raw.city || '', + state: raw.state || raw.stateCode || '', + zip: raw.zip || raw.zipCode || raw.postalCode || '', + country: raw.country || raw.countryCode || 'US', + latitude: raw.latitude || raw.lat || raw.location?.latitude, + longitude: raw.longitude || raw.lng || raw.location?.longitude, + timezone: raw.timezone || raw.tz || '', + menuUrl: raw.menuUrl || (slug ? `https://dutchie.com/dispensary/${slug}` : ''), + retailType: raw.retailType || raw.type || '', + offerPickup: raw.offerPickup ?? raw.storeSettings?.offerPickup ?? true, + offerDelivery: raw.offerDelivery ?? raw.storeSettings?.offerDelivery ?? false, + isRecreational: raw.isRecreational ?? raw.retailType?.includes('Recreational') ?? true, + isMedical: raw.isMedical ?? raw.retailType?.includes('Medical') ?? true, + // Preserve raw data + ...raw, + }; +} + +// ============================================================ +// DATABASE OPERATIONS +// ============================================================ + +/** + * Upsert a location into dutchie_discovery_locations. + */ +export async function upsertLocation( + pool: Pool, + location: DutchieLocationResponse, + cityId: number | null +): Promise<{ id: number; isNew: boolean }> { + const platformLocationId = location.id || location.slug; + const menuUrl = location.menuUrl || `https://dutchie.com/dispensary/${location.slug}`; + + const result = await pool.query( + `INSERT INTO dutchie_discovery_locations ( + platform, + platform_location_id, + platform_slug, + platform_menu_url, + name, + raw_address, + address_line1, + address_line2, + city, + state_code, + postal_code, + country_code, + latitude, + longitude, + timezone, + discovery_city_id, + metadata, + offers_delivery, + offers_pickup, + is_recreational, + is_medical, + last_seen_at, + updated_at + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, NOW(), NOW()) + ON CONFLICT (platform, platform_location_id) + DO UPDATE SET + name = EXCLUDED.name, + platform_menu_url = EXCLUDED.platform_menu_url, + raw_address = COALESCE(EXCLUDED.raw_address, dutchie_discovery_locations.raw_address), + address_line1 = COALESCE(EXCLUDED.address_line1, dutchie_discovery_locations.address_line1), + city = COALESCE(EXCLUDED.city, dutchie_discovery_locations.city), + state_code = COALESCE(EXCLUDED.state_code, dutchie_discovery_locations.state_code), + postal_code = COALESCE(EXCLUDED.postal_code, dutchie_discovery_locations.postal_code), + latitude = COALESCE(EXCLUDED.latitude, dutchie_discovery_locations.latitude), + longitude = COALESCE(EXCLUDED.longitude, dutchie_discovery_locations.longitude), + timezone = COALESCE(EXCLUDED.timezone, dutchie_discovery_locations.timezone), + metadata = EXCLUDED.metadata, + offers_delivery = COALESCE(EXCLUDED.offers_delivery, dutchie_discovery_locations.offers_delivery), + offers_pickup = COALESCE(EXCLUDED.offers_pickup, dutchie_discovery_locations.offers_pickup), + is_recreational = COALESCE(EXCLUDED.is_recreational, dutchie_discovery_locations.is_recreational), + is_medical = COALESCE(EXCLUDED.is_medical, dutchie_discovery_locations.is_medical), + last_seen_at = NOW(), + updated_at = NOW() + RETURNING id, (xmax = 0) as is_new`, + [ + PLATFORM, + platformLocationId, + location.slug, + menuUrl, + location.name, + location.address || null, + location.address1 || null, + location.address2 || null, + location.city || null, + location.state || null, + location.zip || null, + location.country || 'US', + location.latitude || null, + location.longitude || null, + location.timezone || null, + cityId, + JSON.stringify(location), + location.offerDelivery ?? null, + location.offerPickup ?? null, + location.isRecreational ?? null, + location.isMedical ?? null, + ] + ); + + return { + id: result.rows[0].id, + isNew: result.rows[0].is_new, + }; +} + +/** + * Get locations by status. + */ +export async function getLocationsByStatus( + pool: Pool, + status: DiscoveryStatus, + options: { + stateCode?: string; + countryCode?: string; + limit?: number; + offset?: number; + } = {} +): Promise { + const { stateCode, countryCode, limit = 100, offset = 0 } = options; + + let query = ` + SELECT * FROM dutchie_discovery_locations + WHERE status = $1 AND active = TRUE + `; + const params: any[] = [status]; + let paramIdx = 2; + + if (stateCode) { + query += ` AND state_code = $${paramIdx}`; + params.push(stateCode); + paramIdx++; + } + + if (countryCode) { + query += ` AND country_code = $${paramIdx}`; + params.push(countryCode); + paramIdx++; + } + + query += ` ORDER BY first_seen_at DESC LIMIT $${paramIdx} OFFSET $${paramIdx + 1}`; + params.push(limit, offset); + + const result = await pool.query(query, params); + return result.rows.map(mapLocationRowToLocation); +} + +/** + * Get a location by ID. + */ +export async function getLocationById( + pool: Pool, + id: number +): Promise { + const result = await pool.query( + `SELECT * FROM dutchie_discovery_locations WHERE id = $1`, + [id] + ); + + if (result.rows.length === 0) { + return null; + } + + return mapLocationRowToLocation(result.rows[0]); +} + +/** + * Update location status. + */ +export async function updateLocationStatus( + pool: Pool, + locationId: number, + status: DiscoveryStatus, + options: { + dispensaryId?: number; + verifiedBy?: string; + notes?: string; + } = {} +): Promise { + const { dispensaryId, verifiedBy, notes } = options; + + await pool.query( + `UPDATE dutchie_discovery_locations + SET status = $2, + dispensary_id = COALESCE($3, dispensary_id), + verified_at = CASE WHEN $2 IN ('verified', 'merged') THEN NOW() ELSE verified_at END, + verified_by = COALESCE($4, verified_by), + notes = COALESCE($5, notes), + updated_at = NOW() + WHERE id = $1`, + [locationId, status, dispensaryId || null, verifiedBy || null, notes || null] + ); +} + +/** + * Search locations by name or address. + */ +export async function searchLocations( + pool: Pool, + query: string, + options: { + status?: DiscoveryStatus; + stateCode?: string; + limit?: number; + } = {} +): Promise { + const { status, stateCode, limit = 50 } = options; + const searchPattern = `%${query}%`; + + let sql = ` + SELECT * FROM dutchie_discovery_locations + WHERE active = TRUE + AND (name ILIKE $1 OR city ILIKE $1 OR raw_address ILIKE $1 OR platform_slug ILIKE $1) + `; + const params: any[] = [searchPattern]; + let paramIdx = 2; + + if (status) { + sql += ` AND status = $${paramIdx}`; + params.push(status); + paramIdx++; + } + + if (stateCode) { + sql += ` AND state_code = $${paramIdx}`; + params.push(stateCode); + paramIdx++; + } + + sql += ` ORDER BY name LIMIT $${paramIdx}`; + params.push(limit); + + const result = await pool.query(sql, params); + return result.rows.map(mapLocationRowToLocation); +} + +// ============================================================ +// MAIN DISCOVERY FUNCTION +// ============================================================ + +/** + * Discover locations for a specific city. + */ +export async function discoverLocationsForCity( + pool: Pool, + city: DiscoveryCity, + options: { + dryRun?: boolean; + verbose?: boolean; + } = {} +): Promise { + const startTime = Date.now(); + const { dryRun = false, verbose = false } = options; + const errors: string[] = []; + + console.log(`[LocationDiscovery] Discovering locations for ${city.cityName}, ${city.stateCode}...`); + console.log(`[LocationDiscovery] Mode: ${dryRun ? 'DRY RUN' : 'LIVE'}`); + + const locations = await fetchLocationsForCity(city, { verbose }); + + if (locations.length === 0) { + console.log(`[LocationDiscovery] No locations found for ${city.cityName}`); + return { + cityId: city.id, + citySlug: city.citySlug, + locationsFound: 0, + locationsUpserted: 0, + locationsNew: 0, + locationsUpdated: 0, + errors: [], + durationMs: Date.now() - startTime, + }; + } + + let newCount = 0; + let updatedCount = 0; + + for (const location of locations) { + try { + if (dryRun) { + if (verbose) { + console.log(`[LocationDiscovery][DryRun] Would upsert: ${location.name} (${location.slug})`); + } + newCount++; + continue; + } + + const result = await upsertLocation(pool, location, city.id); + + if (result.isNew) { + newCount++; + } else { + updatedCount++; + } + + if (verbose) { + const action = result.isNew ? 'Created' : 'Updated'; + console.log(`[LocationDiscovery] ${action}: ${location.name} -> ID ${result.id}`); + } + } catch (error: any) { + errors.push(`Location ${location.slug}: ${error.message}`); + } + } + + // Update city crawl status + if (!dryRun) { + await pool.query( + `UPDATE dutchie_discovery_cities + SET last_crawled_at = NOW(), + location_count = $2, + updated_at = NOW() + WHERE id = $1`, + [city.id, locations.length] + ); + } + + const durationMs = Date.now() - startTime; + + console.log(`[LocationDiscovery] Complete for ${city.cityName}: ${newCount} new, ${updatedCount} updated, ${errors.length} errors in ${durationMs}ms`); + + return { + cityId: city.id, + citySlug: city.citySlug, + locationsFound: locations.length, + locationsUpserted: newCount + updatedCount, + locationsNew: newCount, + locationsUpdated: updatedCount, + errors, + durationMs, + }; +} diff --git a/backend/src/discovery/routes.ts b/backend/src/discovery/routes.ts new file mode 100644 index 00000000..837f7ee0 --- /dev/null +++ b/backend/src/discovery/routes.ts @@ -0,0 +1,840 @@ +/** + * Dutchie Discovery API Routes + * + * Express routes for the Dutchie store discovery pipeline. + * Provides endpoints for discovering, listing, and verifying locations. + */ + +import { Router, Request, Response } from 'express'; +import { Pool } from 'pg'; +import { + runFullDiscovery, + discoverCity, + discoverState, + getDiscoveryStats, +} from './discovery-crawler'; +import { + discoverCities, + getCitiesToCrawl, + getCityBySlug, + seedKnownCities, + ARIZONA_CITIES, +} from './city-discovery'; +import { + DiscoveryLocation, + DiscoveryCity, + DiscoveryStatus, + mapLocationRowToLocation, + mapCityRowToCity, +} from './types'; + +export function createDiscoveryRoutes(pool: Pool): Router { + const router = Router(); + + // ============================================================ + // DISCOVERY LOCATIONS + // ============================================================ + + /** + * GET /api/discovery/locations + * List discovered locations with filtering + */ + router.get('/locations', async (req: Request, res: Response) => { + try { + const { + status, + stateCode, + countryCode, + city, + platform = 'dutchie', + search, + hasDispensary, + limit = '50', + offset = '0', + } = req.query; + + let whereClause = 'WHERE platform = $1 AND active = TRUE'; + const params: any[] = [platform]; + let paramIndex = 2; + + if (status) { + whereClause += ` AND status = $${paramIndex}`; + params.push(status); + paramIndex++; + } + + if (stateCode) { + whereClause += ` AND state_code = $${paramIndex}`; + params.push(stateCode); + paramIndex++; + } + + if (countryCode) { + whereClause += ` AND country_code = $${paramIndex}`; + params.push(countryCode); + paramIndex++; + } + + if (city) { + whereClause += ` AND city ILIKE $${paramIndex}`; + params.push(`%${city}%`); + paramIndex++; + } + + if (search) { + whereClause += ` AND (name ILIKE $${paramIndex} OR platform_slug ILIKE $${paramIndex})`; + params.push(`%${search}%`); + paramIndex++; + } + + if (hasDispensary === 'true') { + whereClause += ' AND dispensary_id IS NOT NULL'; + } else if (hasDispensary === 'false') { + whereClause += ' AND dispensary_id IS NULL'; + } + + params.push(parseInt(limit as string, 10), parseInt(offset as string, 10)); + + const { rows } = await pool.query( + ` + SELECT + dl.*, + d.name as dispensary_name, + dc.city_name as discovery_city_name + FROM dutchie_discovery_locations dl + LEFT JOIN dispensaries d ON dl.dispensary_id = d.id + LEFT JOIN dutchie_discovery_cities dc ON dl.discovery_city_id = dc.id + ${whereClause} + ORDER BY dl.first_seen_at DESC + LIMIT $${paramIndex} OFFSET $${paramIndex + 1} + `, + params + ); + + const { rows: countRows } = await pool.query( + `SELECT COUNT(*) as total FROM dutchie_discovery_locations dl ${whereClause}`, + params.slice(0, -2) + ); + + const locations = rows.map((row: any) => ({ + ...mapLocationRowToLocation(row), + dispensaryName: row.dispensary_name, + discoveryCityName: row.discovery_city_name, + })); + + res.json({ + locations, + total: parseInt(countRows[0]?.total || '0', 10), + limit: parseInt(limit as string, 10), + offset: parseInt(offset as string, 10), + }); + } catch (error: any) { + res.status(500).json({ error: error.message }); + } + }); + + /** + * GET /api/discovery/locations/:id + * Get a single discovery location + */ + router.get('/locations/:id', async (req: Request, res: Response) => { + try { + const { id } = req.params; + + const { rows } = await pool.query( + ` + SELECT + dl.*, + d.name as dispensary_name, + d.menu_url as dispensary_menu_url, + dc.city_name as discovery_city_name + FROM dutchie_discovery_locations dl + LEFT JOIN dispensaries d ON dl.dispensary_id = d.id + LEFT JOIN dutchie_discovery_cities dc ON dl.discovery_city_id = dc.id + WHERE dl.id = $1 + `, + [parseInt(id, 10)] + ); + + if (rows.length === 0) { + return res.status(404).json({ error: 'Location not found' }); + } + + res.json({ + ...mapLocationRowToLocation(rows[0]), + dispensaryName: rows[0].dispensary_name, + dispensaryMenuUrl: rows[0].dispensary_menu_url, + discoveryCityName: rows[0].discovery_city_name, + }); + } catch (error: any) { + res.status(500).json({ error: error.message }); + } + }); + + /** + * GET /api/discovery/locations/pending + * Get locations awaiting verification + */ + router.get('/locations/pending', async (req: Request, res: Response) => { + try { + const { stateCode, countryCode, limit = '100' } = req.query; + + let whereClause = `WHERE status = 'discovered' AND active = TRUE`; + const params: any[] = []; + let paramIndex = 1; + + if (stateCode) { + whereClause += ` AND state_code = $${paramIndex}`; + params.push(stateCode); + paramIndex++; + } + + if (countryCode) { + whereClause += ` AND country_code = $${paramIndex}`; + params.push(countryCode); + paramIndex++; + } + + params.push(parseInt(limit as string, 10)); + + const { rows } = await pool.query( + ` + SELECT * FROM dutchie_discovery_locations + ${whereClause} + ORDER BY state_code, city, name + LIMIT $${paramIndex} + `, + params + ); + + res.json({ + locations: rows.map(mapLocationRowToLocation), + total: rows.length, + }); + } catch (error: any) { + res.status(500).json({ error: error.message }); + } + }); + + // ============================================================ + // DISCOVERY CITIES + // ============================================================ + + /** + * GET /api/discovery/cities + * List discovery cities + */ + router.get('/cities', async (req: Request, res: Response) => { + try { + const { + stateCode, + countryCode, + crawlEnabled, + platform = 'dutchie', + limit = '100', + offset = '0', + } = req.query; + + let whereClause = 'WHERE platform = $1'; + const params: any[] = [platform]; + let paramIndex = 2; + + if (stateCode) { + whereClause += ` AND state_code = $${paramIndex}`; + params.push(stateCode); + paramIndex++; + } + + if (countryCode) { + whereClause += ` AND country_code = $${paramIndex}`; + params.push(countryCode); + paramIndex++; + } + + if (crawlEnabled === 'true') { + whereClause += ' AND crawl_enabled = TRUE'; + } else if (crawlEnabled === 'false') { + whereClause += ' AND crawl_enabled = FALSE'; + } + + params.push(parseInt(limit as string, 10), parseInt(offset as string, 10)); + + const { rows } = await pool.query( + ` + SELECT + dc.*, + (SELECT COUNT(*) FROM dutchie_discovery_locations dl WHERE dl.discovery_city_id = dc.id) as actual_location_count + FROM dutchie_discovery_cities dc + ${whereClause} + ORDER BY dc.country_code, dc.state_code, dc.city_name + LIMIT $${paramIndex} OFFSET $${paramIndex + 1} + `, + params + ); + + const { rows: countRows } = await pool.query( + `SELECT COUNT(*) as total FROM dutchie_discovery_cities dc ${whereClause}`, + params.slice(0, -2) + ); + + const cities = rows.map((row: any) => ({ + ...mapCityRowToCity(row), + actualLocationCount: parseInt(row.actual_location_count || '0', 10), + })); + + res.json({ + cities, + total: parseInt(countRows[0]?.total || '0', 10), + limit: parseInt(limit as string, 10), + offset: parseInt(offset as string, 10), + }); + } catch (error: any) { + res.status(500).json({ error: error.message }); + } + }); + + // ============================================================ + // STATISTICS + // ============================================================ + + /** + * GET /api/discovery/stats + * Get discovery statistics + */ + router.get('/stats', async (_req: Request, res: Response) => { + try { + const stats = await getDiscoveryStats(pool); + res.json(stats); + } catch (error: any) { + res.status(500).json({ error: error.message }); + } + }); + + // ============================================================ + // VERIFICATION ACTIONS + // ============================================================ + + /** + * POST /api/discovery/locations/:id/verify + * Verify a discovered location and create a new canonical dispensary + */ + router.post('/locations/:id/verify', async (req: Request, res: Response) => { + try { + const { id } = req.params; + const { verifiedBy = 'admin' } = req.body; + + // Get the discovery location + const { rows: locRows } = await pool.query( + `SELECT * FROM dutchie_discovery_locations WHERE id = $1`, + [parseInt(id, 10)] + ); + + if (locRows.length === 0) { + return res.status(404).json({ error: 'Location not found' }); + } + + const location = locRows[0]; + + if (location.status !== 'discovered') { + return res.status(400).json({ + error: `Location already has status: ${location.status}`, + }); + } + + // Create the canonical dispensary + const { rows: dispRows } = await pool.query( + ` + INSERT INTO dispensaries ( + name, + slug, + address, + city, + state, + zip, + latitude, + longitude, + timezone, + menu_type, + menu_url, + platform_dispensary_id, + active, + created_at, + updated_at + ) VALUES ( + $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, TRUE, NOW(), NOW() + ) + RETURNING id + `, + [ + location.name, + location.platform_slug, + location.address_line1, + location.city, + location.state_code, + location.postal_code, + location.latitude, + location.longitude, + location.timezone, + location.platform, + location.platform_menu_url, + location.platform_location_id, + ] + ); + + const dispensaryId = dispRows[0].id; + + // Update the discovery location + await pool.query( + ` + UPDATE dutchie_discovery_locations + SET status = 'verified', + dispensary_id = $1, + verified_at = NOW(), + verified_by = $2, + updated_at = NOW() + WHERE id = $3 + `, + [dispensaryId, verifiedBy, id] + ); + + res.json({ + success: true, + action: 'created', + discoveryId: parseInt(id, 10), + dispensaryId, + message: `Created new dispensary (ID: ${dispensaryId})`, + }); + } catch (error: any) { + res.status(500).json({ error: error.message }); + } + }); + + /** + * POST /api/discovery/locations/:id/link + * Link a discovered location to an existing dispensary + */ + router.post('/locations/:id/link', async (req: Request, res: Response) => { + try { + const { id } = req.params; + const { dispensaryId, verifiedBy = 'admin' } = req.body; + + if (!dispensaryId) { + return res.status(400).json({ error: 'dispensaryId is required' }); + } + + // Verify dispensary exists + const { rows: dispRows } = await pool.query( + `SELECT id, name FROM dispensaries WHERE id = $1`, + [dispensaryId] + ); + + if (dispRows.length === 0) { + return res.status(404).json({ error: 'Dispensary not found' }); + } + + // Get the discovery location + const { rows: locRows } = await pool.query( + `SELECT * FROM dutchie_discovery_locations WHERE id = $1`, + [parseInt(id, 10)] + ); + + if (locRows.length === 0) { + return res.status(404).json({ error: 'Location not found' }); + } + + const location = locRows[0]; + + if (location.status !== 'discovered') { + return res.status(400).json({ + error: `Location already has status: ${location.status}`, + }); + } + + // Update dispensary with platform info if missing + await pool.query( + ` + UPDATE dispensaries + SET platform_dispensary_id = COALESCE(platform_dispensary_id, $1), + menu_url = COALESCE(menu_url, $2), + menu_type = COALESCE(menu_type, $3), + updated_at = NOW() + WHERE id = $4 + `, + [ + location.platform_location_id, + location.platform_menu_url, + location.platform, + dispensaryId, + ] + ); + + // Update the discovery location + await pool.query( + ` + UPDATE dutchie_discovery_locations + SET status = 'merged', + dispensary_id = $1, + verified_at = NOW(), + verified_by = $2, + updated_at = NOW() + WHERE id = $3 + `, + [dispensaryId, verifiedBy, id] + ); + + res.json({ + success: true, + action: 'linked', + discoveryId: parseInt(id, 10), + dispensaryId, + dispensaryName: dispRows[0].name, + message: `Linked to existing dispensary: ${dispRows[0].name}`, + }); + } catch (error: any) { + res.status(500).json({ error: error.message }); + } + }); + + /** + * POST /api/discovery/locations/:id/reject + * Reject a discovered location + */ + router.post('/locations/:id/reject', async (req: Request, res: Response) => { + try { + const { id } = req.params; + const { reason, verifiedBy = 'admin' } = req.body; + + const { rows } = await pool.query( + `SELECT status FROM dutchie_discovery_locations WHERE id = $1`, + [parseInt(id, 10)] + ); + + if (rows.length === 0) { + return res.status(404).json({ error: 'Location not found' }); + } + + if (rows[0].status !== 'discovered') { + return res.status(400).json({ + error: `Location already has status: ${rows[0].status}`, + }); + } + + await pool.query( + ` + UPDATE dutchie_discovery_locations + SET status = 'rejected', + verified_at = NOW(), + verified_by = $1, + notes = $2, + updated_at = NOW() + WHERE id = $3 + `, + [verifiedBy, reason || 'Rejected by admin', id] + ); + + res.json({ + success: true, + action: 'rejected', + discoveryId: parseInt(id, 10), + message: 'Location rejected', + }); + } catch (error: any) { + res.status(500).json({ error: error.message }); + } + }); + + /** + * POST /api/discovery/locations/:id/unreject + * Restore a rejected location back to discovered status + */ + router.post('/locations/:id/unreject', async (req: Request, res: Response) => { + try { + const { id } = req.params; + + const { rows } = await pool.query( + `SELECT status FROM dutchie_discovery_locations WHERE id = $1`, + [parseInt(id, 10)] + ); + + if (rows.length === 0) { + return res.status(404).json({ error: 'Location not found' }); + } + + if (rows[0].status !== 'rejected') { + return res.status(400).json({ + error: `Location is not rejected. Current status: ${rows[0].status}`, + }); + } + + await pool.query( + ` + UPDATE dutchie_discovery_locations + SET status = 'discovered', + verified_at = NULL, + verified_by = NULL, + updated_at = NOW() + WHERE id = $1 + `, + [id] + ); + + res.json({ + success: true, + action: 'unrejected', + discoveryId: parseInt(id, 10), + message: 'Location restored to discovered status', + }); + } catch (error: any) { + res.status(500).json({ error: error.message }); + } + }); + + // ============================================================ + // DISCOVERY ADMIN ACTIONS + // ============================================================ + + /** + * POST /api/discovery/admin/discover-state + * Run discovery for an entire state + */ + router.post('/admin/discover-state', async (req: Request, res: Response) => { + try { + const { stateCode, dryRun = false, cityLimit = 100 } = req.body; + + if (!stateCode) { + return res.status(400).json({ error: 'stateCode is required' }); + } + + console.log(`[Discovery API] Starting state discovery for ${stateCode}`); + const result = await discoverState(pool, stateCode, { + dryRun, + cityLimit, + verbose: true, + }); + + res.json({ + success: true, + stateCode, + result, + }); + } catch (error: any) { + res.status(500).json({ error: error.message }); + } + }); + + /** + * POST /api/discovery/admin/discover-city + * Run discovery for a single city + */ + router.post('/admin/discover-city', async (req: Request, res: Response) => { + try { + const { citySlug, stateCode, countryCode = 'US', dryRun = false } = req.body; + + if (!citySlug) { + return res.status(400).json({ error: 'citySlug is required' }); + } + + console.log(`[Discovery API] Starting city discovery for ${citySlug}`); + const result = await discoverCity(pool, citySlug, { + stateCode, + countryCode, + dryRun, + verbose: true, + }); + + if (!result) { + return res.status(404).json({ error: `City not found: ${citySlug}` }); + } + + res.json({ + success: true, + citySlug, + result, + }); + } catch (error: any) { + res.status(500).json({ error: error.message }); + } + }); + + /** + * POST /api/discovery/admin/run-full + * Run full discovery pipeline + */ + router.post('/admin/run-full', async (req: Request, res: Response) => { + try { + const { + stateCode, + countryCode = 'US', + cityLimit = 50, + skipCityDiscovery = false, + onlyStale = true, + staleDays = 7, + dryRun = false, + } = req.body; + + console.log(`[Discovery API] Starting full discovery`); + const result = await runFullDiscovery(pool, { + stateCode, + countryCode, + cityLimit, + skipCityDiscovery, + onlyStale, + staleDays, + dryRun, + verbose: true, + }); + + res.json({ + success: true, + result, + }); + } catch (error: any) { + res.status(500).json({ error: error.message }); + } + }); + + /** + * POST /api/discovery/admin/seed-cities + * Seed known cities for a state + */ + router.post('/admin/seed-cities', async (req: Request, res: Response) => { + try { + const { stateCode } = req.body; + + if (!stateCode) { + return res.status(400).json({ error: 'stateCode is required' }); + } + + let cities: any[] = []; + if (stateCode === 'AZ') { + cities = ARIZONA_CITIES; + } else { + return res.status(400).json({ + error: `No predefined cities for state: ${stateCode}. Add cities to city-discovery.ts`, + }); + } + + const result = await seedKnownCities(pool, cities); + + res.json({ + success: true, + stateCode, + ...result, + }); + } catch (error: any) { + res.status(500).json({ error: error.message }); + } + }); + + /** + * GET /api/discovery/admin/match-candidates/:id + * Find potential dispensary matches for a discovery location + */ + router.get('/admin/match-candidates/:id', async (req: Request, res: Response) => { + try { + const { id } = req.params; + + // Get the discovery location + const { rows: locRows } = await pool.query( + `SELECT * FROM dutchie_discovery_locations WHERE id = $1`, + [parseInt(id, 10)] + ); + + if (locRows.length === 0) { + return res.status(404).json({ error: 'Location not found' }); + } + + const location = locRows[0]; + + // Find potential matches by name similarity and location + const { rows: candidates } = await pool.query( + ` + SELECT + d.id, + d.name, + d.city, + d.state, + d.address, + d.menu_type, + d.platform_dispensary_id, + d.menu_url, + d.latitude, + d.longitude, + CASE + WHEN d.name ILIKE $1 THEN 'exact_name' + WHEN d.name ILIKE $2 THEN 'partial_name' + WHEN d.city ILIKE $3 AND d.state = $4 THEN 'same_city' + ELSE 'location_match' + END as match_type, + -- Distance in miles if coordinates available + CASE + WHEN d.latitude IS NOT NULL AND d.longitude IS NOT NULL + AND $5::float IS NOT NULL AND $6::float IS NOT NULL + THEN (3959 * acos( + cos(radians($5::float)) * cos(radians(d.latitude)) * + cos(radians(d.longitude) - radians($6::float)) + + sin(radians($5::float)) * sin(radians(d.latitude)) + )) + ELSE NULL + END as distance_miles + FROM dispensaries d + WHERE d.state = $4 + AND ( + d.name ILIKE $1 + OR d.name ILIKE $2 + OR d.city ILIKE $3 + OR ( + d.latitude IS NOT NULL + AND d.longitude IS NOT NULL + AND $5::float IS NOT NULL + AND $6::float IS NOT NULL + AND (3959 * acos( + cos(radians($5::float)) * cos(radians(d.latitude)) * + cos(radians(d.longitude) - radians($6::float)) + + sin(radians($5::float)) * sin(radians(d.latitude)) + )) < 5 + ) + ) + ORDER BY + CASE + WHEN d.name ILIKE $1 THEN 1 + WHEN d.name ILIKE $2 THEN 2 + ELSE 3 + END, + distance_miles NULLS LAST + LIMIT 10 + `, + [ + location.name, + `%${location.name.split(' ')[0]}%`, + location.city, + location.state_code, + location.latitude, + location.longitude, + ] + ); + + res.json({ + location: mapLocationRowToLocation(location), + candidates: candidates.map((c: any) => ({ + id: c.id, + name: c.name, + city: c.city, + state: c.state, + address: c.address, + menuType: c.menu_type, + platformDispensaryId: c.platform_dispensary_id, + menuUrl: c.menu_url, + matchType: c.match_type, + distanceMiles: c.distance_miles ? Math.round(c.distance_miles * 10) / 10 : null, + })), + }); + } catch (error: any) { + res.status(500).json({ error: error.message }); + } + }); + + return router; +} + +export default createDiscoveryRoutes; diff --git a/backend/src/discovery/types.ts b/backend/src/discovery/types.ts new file mode 100644 index 00000000..f8cd626d --- /dev/null +++ b/backend/src/discovery/types.ts @@ -0,0 +1,269 @@ +/** + * Dutchie Discovery Types + * + * Type definitions for the Dutchie store discovery pipeline. + */ + +// ============================================================ +// DISCOVERY CITY +// ============================================================ + +export interface DiscoveryCity { + id: number; + platform: string; + cityName: string; + citySlug: string; + stateCode: string | null; + countryCode: string; + lastCrawledAt: Date | null; + crawlEnabled: boolean; + locationCount: number | null; + notes: string | null; + metadata: Record | null; + createdAt: Date; + updatedAt: Date; +} + +export interface DiscoveryCityRow { + id: number; + platform: string; + city_name: string; + city_slug: string; + state_code: string | null; + country_code: string; + last_crawled_at: Date | null; + crawl_enabled: boolean; + location_count: number | null; + notes: string | null; + metadata: Record | null; + created_at: Date; + updated_at: Date; +} + +// ============================================================ +// DISCOVERY LOCATION +// ============================================================ + +export type DiscoveryStatus = 'discovered' | 'verified' | 'rejected' | 'merged'; + +export interface DiscoveryLocation { + id: number; + platform: string; + platformLocationId: string; + platformSlug: string; + platformMenuUrl: string; + name: string; + rawAddress: string | null; + addressLine1: string | null; + addressLine2: string | null; + city: string | null; + stateCode: string | null; + postalCode: string | null; + countryCode: string | null; + latitude: number | null; + longitude: number | null; + timezone: string | null; + status: DiscoveryStatus; + dispensaryId: number | null; + discoveryCityId: number | null; + metadata: Record | null; + notes: string | null; + offersDelivery: boolean | null; + offersPickup: boolean | null; + isRecreational: boolean | null; + isMedical: boolean | null; + firstSeenAt: Date; + lastSeenAt: Date; + lastCheckedAt: Date | null; + verifiedAt: Date | null; + verifiedBy: string | null; + active: boolean; + createdAt: Date; + updatedAt: Date; +} + +export interface DiscoveryLocationRow { + id: number; + platform: string; + platform_location_id: string; + platform_slug: string; + platform_menu_url: string; + name: string; + raw_address: string | null; + address_line1: string | null; + address_line2: string | null; + city: string | null; + state_code: string | null; + postal_code: string | null; + country_code: string | null; + latitude: number | null; + longitude: number | null; + timezone: string | null; + status: DiscoveryStatus; + dispensary_id: number | null; + discovery_city_id: number | null; + metadata: Record | null; + notes: string | null; + offers_delivery: boolean | null; + offers_pickup: boolean | null; + is_recreational: boolean | null; + is_medical: boolean | null; + first_seen_at: Date; + last_seen_at: Date; + last_checked_at: Date | null; + verified_at: Date | null; + verified_by: string | null; + active: boolean; + created_at: Date; + updated_at: Date; +} + +// ============================================================ +// RAW API RESPONSES +// ============================================================ + +export interface DutchieCityResponse { + slug: string; + name: string; + state?: string; + stateCode?: string; + country?: string; + countryCode?: string; +} + +export interface DutchieLocationResponse { + id: string; + name: string; + slug: string; + address?: string; + address1?: string; + address2?: string; + city?: string; + state?: string; + zip?: string; + zipCode?: string; + country?: string; + latitude?: number; + longitude?: number; + timezone?: string; + menuUrl?: string; + retailType?: string; + offerPickup?: boolean; + offerDelivery?: boolean; + isRecreational?: boolean; + isMedical?: boolean; + // Raw response preserved + [key: string]: any; +} + +// ============================================================ +// DISCOVERY RESULTS +// ============================================================ + +export interface CityDiscoveryResult { + citiesFound: number; + citiesUpserted: number; + citiesSkipped: number; + errors: string[]; + durationMs: number; +} + +export interface LocationDiscoveryResult { + cityId: number; + citySlug: string; + locationsFound: number; + locationsUpserted: number; + locationsNew: number; + locationsUpdated: number; + errors: string[]; + durationMs: number; +} + +export interface FullDiscoveryResult { + cities: CityDiscoveryResult; + locations: LocationDiscoveryResult[]; + totalLocationsFound: number; + totalLocationsUpserted: number; + durationMs: number; +} + +// ============================================================ +// VERIFICATION +// ============================================================ + +export interface VerificationResult { + success: boolean; + discoveryId: number; + dispensaryId: number | null; + action: 'created' | 'linked' | 'rejected'; + error?: string; +} + +export interface PromotionResult { + success: boolean; + discoveryId: number; + dispensaryId: number; + crawlProfileId?: number; + scheduleId?: number; + error?: string; +} + +// ============================================================ +// MAPPER FUNCTIONS +// ============================================================ + +export function mapCityRowToCity(row: DiscoveryCityRow): DiscoveryCity { + return { + id: row.id, + platform: row.platform, + cityName: row.city_name, + citySlug: row.city_slug, + stateCode: row.state_code, + countryCode: row.country_code, + lastCrawledAt: row.last_crawled_at, + crawlEnabled: row.crawl_enabled, + locationCount: row.location_count, + notes: row.notes, + metadata: row.metadata, + createdAt: row.created_at, + updatedAt: row.updated_at, + }; +} + +export function mapLocationRowToLocation(row: DiscoveryLocationRow): DiscoveryLocation { + return { + id: row.id, + platform: row.platform, + platformLocationId: row.platform_location_id, + platformSlug: row.platform_slug, + platformMenuUrl: row.platform_menu_url, + name: row.name, + rawAddress: row.raw_address, + addressLine1: row.address_line1, + addressLine2: row.address_line2, + city: row.city, + stateCode: row.state_code, + postalCode: row.postal_code, + countryCode: row.country_code, + latitude: row.latitude, + longitude: row.longitude, + timezone: row.timezone, + status: row.status, + dispensaryId: row.dispensary_id, + discoveryCityId: row.discovery_city_id, + metadata: row.metadata, + notes: row.notes, + offersDelivery: row.offers_delivery, + offersPickup: row.offers_pickup, + isRecreational: row.is_recreational, + isMedical: row.is_medical, + firstSeenAt: row.first_seen_at, + lastSeenAt: row.last_seen_at, + lastCheckedAt: row.last_checked_at, + verifiedAt: row.verified_at, + verifiedBy: row.verified_by, + active: row.active, + createdAt: row.created_at, + updatedAt: row.updated_at, + }; +} diff --git a/backend/src/dutchie-az/db/connection.ts b/backend/src/dutchie-az/db/connection.ts index 29d31412..5f470889 100644 --- a/backend/src/dutchie-az/db/connection.ts +++ b/backend/src/dutchie-az/db/connection.ts @@ -1,50 +1,99 @@ /** - * Dutchie AZ Database Connection + * CannaiQ Database Connection * - * Isolated database connection for Dutchie Arizona data. - * Uses a separate database/schema to prevent cross-contamination with main app data. + * All database access for the CannaiQ platform goes through this module. + * + * SINGLE DATABASE ARCHITECTURE: + * - All services (auth, orchestrator, crawlers, admin) use this ONE database + * - States are modeled via states table + state_id on dispensaries (not separate DBs) + * + * CONFIGURATION (in priority order): + * 1. CANNAIQ_DB_URL - Full connection string (preferred) + * 2. Individual vars: CANNAIQ_DB_HOST, CANNAIQ_DB_PORT, CANNAIQ_DB_NAME, CANNAIQ_DB_USER, CANNAIQ_DB_PASS + * 3. DATABASE_URL - Legacy fallback for K8s compatibility + * + * IMPORTANT: + * - Do NOT create separate pools elsewhere + * - All services should import from this module */ import { Pool, PoolClient } from 'pg'; -// Consolidated DB naming: -// - Prefer CRAWLSY_DATABASE_URL (e.g., crawlsy_local, crawlsy_prod) -// - Then DUTCHIE_AZ_DATABASE_URL (legacy) -// - Finally DATABASE_URL (legacy main DB) -const DUTCHIE_AZ_DATABASE_URL = - process.env.CRAWLSY_DATABASE_URL || - process.env.DUTCHIE_AZ_DATABASE_URL || - process.env.DATABASE_URL || - 'postgresql://dutchie:dutchie_local_pass@localhost:54320/crawlsy_local'; +/** + * Get the database connection string from environment variables. + * Supports multiple configuration methods with fallback for legacy compatibility. + */ +function getConnectionString(): string { + // Priority 1: Full CANNAIQ connection URL + if (process.env.CANNAIQ_DB_URL) { + return process.env.CANNAIQ_DB_URL; + } + + // Priority 2: Build from individual CANNAIQ env vars + const host = process.env.CANNAIQ_DB_HOST; + const port = process.env.CANNAIQ_DB_PORT; + const name = process.env.CANNAIQ_DB_NAME; + const user = process.env.CANNAIQ_DB_USER; + const pass = process.env.CANNAIQ_DB_PASS; + + if (host && port && name && user && pass) { + return `postgresql://${user}:${pass}@${host}:${port}/${name}`; + } + + // Priority 3: Fallback to DATABASE_URL for legacy/K8s compatibility + if (process.env.DATABASE_URL) { + return process.env.DATABASE_URL; + } + + // Report what's missing + const required = ['CANNAIQ_DB_HOST', 'CANNAIQ_DB_PORT', 'CANNAIQ_DB_NAME', 'CANNAIQ_DB_USER', 'CANNAIQ_DB_PASS']; + const missing = required.filter((key) => !process.env[key]); + + throw new Error( + `[CannaiQ DB] Missing database configuration.\n` + + `Set CANNAIQ_DB_URL, DATABASE_URL, or all of: ${missing.join(', ')}` + ); +} let pool: Pool | null = null; /** - * Get the Dutchie AZ database pool (singleton) + * Get the CannaiQ database pool (singleton) + * + * This is the canonical pool for all CannaiQ services. + * Do NOT create separate pools elsewhere. */ -export function getDutchieAZPool(): Pool { +export function getPool(): Pool { if (!pool) { pool = new Pool({ - connectionString: DUTCHIE_AZ_DATABASE_URL, + connectionString: getConnectionString(), max: 10, idleTimeoutMillis: 30000, connectionTimeoutMillis: 5000, }); pool.on('error', (err) => { - console.error('[DutchieAZ DB] Unexpected error on idle client:', err); + console.error('[CannaiQ DB] Unexpected error on idle client:', err); }); - console.log('[DutchieAZ DB] Pool initialized'); + console.log('[CannaiQ DB] Pool initialized'); } return pool; } /** - * Execute a query on the Dutchie AZ database + * @deprecated Use getPool() instead + */ +export function getDutchieAZPool(): Pool { + console.warn('[CannaiQ DB] getDutchieAZPool() is deprecated. Use getPool() instead.'); + return getPool(); +} + +/** + * Execute a query on the CannaiQ database */ export async function query(text: string, params?: any[]): Promise<{ rows: T[]; rowCount: number }> { - const p = getDutchieAZPool(); + const p = getPool(); const result = await p.query(text, params); return { rows: result.rows as T[], rowCount: result.rowCount || 0 }; } @@ -53,7 +102,7 @@ export async function query(text: string, params?: any[]): Promise<{ ro * Get a client from the pool for transaction use */ export async function getClient(): Promise { - const p = getDutchieAZPool(); + const p = getPool(); return p.connect(); } @@ -64,7 +113,7 @@ export async function closePool(): Promise { if (pool) { await pool.end(); pool = null; - console.log('[DutchieAZ DB] Pool closed'); + console.log('[CannaiQ DB] Pool closed'); } } @@ -76,7 +125,7 @@ export async function healthCheck(): Promise { const result = await query('SELECT 1 as ok'); return result.rows.length > 0 && result.rows[0].ok === 1; } catch (error) { - console.error('[DutchieAZ DB] Health check failed:', error); + console.error('[CannaiQ DB] Health check failed:', error); return false; } } diff --git a/backend/src/dutchie-az/db/dispensary-columns.ts b/backend/src/dutchie-az/db/dispensary-columns.ts new file mode 100644 index 00000000..e6caada1 --- /dev/null +++ b/backend/src/dutchie-az/db/dispensary-columns.ts @@ -0,0 +1,137 @@ +/** + * Dispensary Column Definitions + * + * Centralized column list for dispensaries table queries. + * Handles optional columns that may not exist in all environments. + * + * USAGE: + * import { DISPENSARY_COLUMNS, DISPENSARY_COLUMNS_WITH_FAILED } from '../db/dispensary-columns'; + * const result = await query(`SELECT ${DISPENSARY_COLUMNS} FROM dispensaries WHERE ...`); + */ + +/** + * Core dispensary columns that always exist. + * These are guaranteed to be present in all environments. + */ +const CORE_COLUMNS = ` + id, name, slug, city, state, zip, address, latitude, longitude, + menu_type, menu_url, platform_dispensary_id, website, + created_at, updated_at +`; + +/** + * Optional columns with NULL fallback. + * + * provider_detection_data: Added in migration 044 + * active_crawler_profile_id: Added in migration 041 + * + * Using COALESCE ensures the query works whether or not the column exists: + * - If column exists: returns the actual value + * - If column doesn't exist: query fails (but migration should be run) + * + * For pre-migration compatibility, we select NULL::jsonb which always works. + * After migration 044 is applied, this can be changed to the real column. + */ + +// TEMPORARY: Use NULL fallback until migration 044 is applied +// After running 044, change this to: provider_detection_data +const PROVIDER_DETECTION_COLUMN = `NULL::jsonb AS provider_detection_data`; + +// After migration 044 is applied, uncomment this line and remove the above: +// const PROVIDER_DETECTION_COLUMN = `provider_detection_data`; + +/** + * Standard dispensary columns for most queries. + * Includes provider_detection_data with NULL fallback for pre-migration compatibility. + */ +export const DISPENSARY_COLUMNS = `${CORE_COLUMNS.trim()}, + ${PROVIDER_DETECTION_COLUMN}`; + +/** + * Dispensary columns including active_crawler_profile_id. + * Used by routes that need profile information. + */ +export const DISPENSARY_COLUMNS_WITH_PROFILE = `${CORE_COLUMNS.trim()}, + ${PROVIDER_DETECTION_COLUMN}, + active_crawler_profile_id`; + +/** + * Dispensary columns including failed_at. + * Used by worker for compatibility checks. + */ +export const DISPENSARY_COLUMNS_WITH_FAILED = `${CORE_COLUMNS.trim()}, + ${PROVIDER_DETECTION_COLUMN}, + failed_at`; + +/** + * NOTE: After migration 044 is applied, update PROVIDER_DETECTION_COLUMN above + * to use the real column instead of NULL fallback. + * + * To verify migration status: + * SELECT column_name FROM information_schema.columns + * WHERE table_name = 'dispensaries' AND column_name = 'provider_detection_data'; + */ + +// Cache for column existence check +let _providerDetectionColumnExists: boolean | null = null; + +/** + * Check if provider_detection_data column exists in dispensaries table. + * Result is cached after first check. + */ +export async function hasProviderDetectionColumn(pool: { query: (sql: string) => Promise<{ rows: any[] }> }): Promise { + if (_providerDetectionColumnExists !== null) { + return _providerDetectionColumnExists; + } + + try { + const result = await pool.query(` + SELECT 1 FROM information_schema.columns + WHERE table_name = 'dispensaries' AND column_name = 'provider_detection_data' + `); + _providerDetectionColumnExists = result.rows.length > 0; + } catch { + _providerDetectionColumnExists = false; + } + + return _providerDetectionColumnExists; +} + +/** + * Safely update provider_detection_data column. + * If column doesn't exist, logs a warning but doesn't crash. + * + * @param pool - Database pool with query method + * @param dispensaryId - ID of dispensary to update + * @param data - JSONB data to merge into provider_detection_data + * @returns true if update succeeded, false if column doesn't exist + */ +export async function safeUpdateProviderDetectionData( + pool: { query: (sql: string, params?: any[]) => Promise }, + dispensaryId: number, + data: Record +): Promise { + const hasColumn = await hasProviderDetectionColumn(pool); + + if (!hasColumn) { + console.warn(`[DispensaryColumns] provider_detection_data column not found. Run migration 044 to add it.`); + return false; + } + + try { + await pool.query( + `UPDATE dispensaries + SET provider_detection_data = COALESCE(provider_detection_data, '{}'::jsonb) || $1::jsonb, + updated_at = NOW() + WHERE id = $2`, + [JSON.stringify(data), dispensaryId] + ); + return true; + } catch (error: any) { + if (error.message?.includes('provider_detection_data')) { + console.warn(`[DispensaryColumns] Failed to update provider_detection_data: ${error.message}`); + return false; + } + throw error; + } +} diff --git a/backend/src/dutchie-az/discovery/DtCityDiscoveryService.ts b/backend/src/dutchie-az/discovery/DtCityDiscoveryService.ts new file mode 100644 index 00000000..99d38d7a --- /dev/null +++ b/backend/src/dutchie-az/discovery/DtCityDiscoveryService.ts @@ -0,0 +1,403 @@ +/** + * DtCityDiscoveryService + * + * Core service for Dutchie city discovery. + * Contains shared logic used by multiple entrypoints. + * + * Responsibilities: + * - Browser/API-based city fetching + * - Manual city seeding + * - City upsert operations + */ + +import { Pool } from 'pg'; +import axios from 'axios'; +import puppeteer from 'puppeteer-extra'; +import StealthPlugin from 'puppeteer-extra-plugin-stealth'; + +puppeteer.use(StealthPlugin()); + +// ============================================================ +// TYPES +// ============================================================ + +export interface DutchieCity { + name: string; + slug: string; + stateCode: string | null; + countryCode: string; + url?: string; +} + +export interface CityDiscoveryResult { + citiesFound: number; + citiesInserted: number; + citiesUpdated: number; + errors: string[]; + durationMs: number; +} + +export interface ManualSeedResult { + city: DutchieCity; + id: number; + wasInserted: boolean; +} + +// ============================================================ +// US STATE CODE MAPPING +// ============================================================ + +export const US_STATE_MAP: Record = { + 'alabama': 'AL', 'alaska': 'AK', 'arizona': 'AZ', 'arkansas': 'AR', + 'california': 'CA', 'colorado': 'CO', 'connecticut': 'CT', 'delaware': 'DE', + 'florida': 'FL', 'georgia': 'GA', 'hawaii': 'HI', 'idaho': 'ID', + 'illinois': 'IL', 'indiana': 'IN', 'iowa': 'IA', 'kansas': 'KS', + 'kentucky': 'KY', 'louisiana': 'LA', 'maine': 'ME', 'maryland': 'MD', + 'massachusetts': 'MA', 'michigan': 'MI', 'minnesota': 'MN', 'mississippi': 'MS', + 'missouri': 'MO', 'montana': 'MT', 'nebraska': 'NE', 'nevada': 'NV', + 'new-hampshire': 'NH', 'new-jersey': 'NJ', 'new-mexico': 'NM', 'new-york': 'NY', + 'north-carolina': 'NC', 'north-dakota': 'ND', 'ohio': 'OH', 'oklahoma': 'OK', + 'oregon': 'OR', 'pennsylvania': 'PA', 'rhode-island': 'RI', 'south-carolina': 'SC', + 'south-dakota': 'SD', 'tennessee': 'TN', 'texas': 'TX', 'utah': 'UT', + 'vermont': 'VT', 'virginia': 'VA', 'washington': 'WA', 'west-virginia': 'WV', + 'wisconsin': 'WI', 'wyoming': 'WY', 'district-of-columbia': 'DC', +}; + +// Canadian province mapping +export const CA_PROVINCE_MAP: Record = { + 'alberta': 'AB', 'british-columbia': 'BC', 'manitoba': 'MB', + 'new-brunswick': 'NB', 'newfoundland-and-labrador': 'NL', + 'northwest-territories': 'NT', 'nova-scotia': 'NS', 'nunavut': 'NU', + 'ontario': 'ON', 'prince-edward-island': 'PE', 'quebec': 'QC', + 'saskatchewan': 'SK', 'yukon': 'YT', +}; + +// ============================================================ +// CITY FETCHING (AUTO DISCOVERY) +// ============================================================ + +/** + * Fetch cities from Dutchie's /cities page using Puppeteer. + */ +export async function fetchCitiesFromBrowser(): Promise { + console.log('[DtCityDiscoveryService] Launching browser to fetch cities...'); + + const browser = await puppeteer.launch({ + headless: 'new', + args: ['--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage'], + }); + + try { + const page = await browser.newPage(); + await page.setUserAgent( + 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36' + ); + + console.log('[DtCityDiscoveryService] Navigating to https://dutchie.com/cities...'); + await page.goto('https://dutchie.com/cities', { + waitUntil: 'networkidle2', + timeout: 60000, + }); + + await new Promise((r) => setTimeout(r, 3000)); + + const cities = await page.evaluate(() => { + const cityLinks: Array<{ + name: string; + slug: string; + url: string; + stateSlug: string | null; + }> = []; + + const links = document.querySelectorAll('a[href*="/city/"]'); + links.forEach((link) => { + const href = (link as HTMLAnchorElement).href; + const text = (link as HTMLElement).innerText?.trim(); + + const match = href.match(/\/city\/([^/]+)\/([^/?]+)/); + if (match && text) { + cityLinks.push({ + name: text, + slug: match[2], + url: href, + stateSlug: match[1], + }); + } + }); + + return cityLinks; + }); + + console.log(`[DtCityDiscoveryService] Extracted ${cities.length} city links from page`); + + return cities.map((city) => { + let countryCode = 'US'; + let stateCode: string | null = null; + + if (city.stateSlug) { + if (US_STATE_MAP[city.stateSlug]) { + stateCode = US_STATE_MAP[city.stateSlug]; + countryCode = 'US'; + } else if (CA_PROVINCE_MAP[city.stateSlug]) { + stateCode = CA_PROVINCE_MAP[city.stateSlug]; + countryCode = 'CA'; + } else if (city.stateSlug.length === 2) { + stateCode = city.stateSlug.toUpperCase(); + if (Object.values(CA_PROVINCE_MAP).includes(stateCode)) { + countryCode = 'CA'; + } + } + } + + return { + name: city.name, + slug: city.slug, + stateCode, + countryCode, + url: city.url, + }; + }); + } finally { + await browser.close(); + } +} + +/** + * Fetch cities via API endpoints (fallback). + */ +export async function fetchCitiesFromAPI(): Promise { + console.log('[DtCityDiscoveryService] Attempting API-based city discovery...'); + + const apiEndpoints = [ + 'https://dutchie.com/api/cities', + 'https://api.dutchie.com/v1/cities', + ]; + + for (const endpoint of apiEndpoints) { + try { + const response = await axios.get(endpoint, { + headers: { + 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0', + Accept: 'application/json', + }, + timeout: 15000, + }); + + if (response.data && Array.isArray(response.data)) { + console.log(`[DtCityDiscoveryService] API returned ${response.data.length} cities`); + return response.data.map((c: any) => ({ + name: c.name || c.city, + slug: c.slug || c.citySlug, + stateCode: c.stateCode || c.state, + countryCode: c.countryCode || c.country || 'US', + })); + } + } catch (error: any) { + console.log(`[DtCityDiscoveryService] API ${endpoint} failed: ${error.message}`); + } + } + + return []; +} + +// ============================================================ +// DATABASE OPERATIONS +// ============================================================ + +/** + * Upsert a city into dutchie_discovery_cities + */ +export async function upsertCity( + pool: Pool, + city: DutchieCity +): Promise<{ id: number; inserted: boolean; updated: boolean }> { + const result = await pool.query( + ` + INSERT INTO dutchie_discovery_cities ( + platform, + city_name, + city_slug, + state_code, + country_code, + crawl_enabled, + created_at, + updated_at + ) VALUES ( + 'dutchie', + $1, + $2, + $3, + $4, + TRUE, + NOW(), + NOW() + ) + ON CONFLICT (platform, country_code, state_code, city_slug) + DO UPDATE SET + city_name = EXCLUDED.city_name, + crawl_enabled = TRUE, + updated_at = NOW() + RETURNING id, (xmax = 0) AS inserted + `, + [city.name, city.slug, city.stateCode, city.countryCode] + ); + + const inserted = result.rows[0]?.inserted === true; + return { + id: result.rows[0]?.id, + inserted, + updated: !inserted, + }; +} + +// ============================================================ +// MAIN SERVICE CLASS +// ============================================================ + +export class DtCityDiscoveryService { + constructor(private pool: Pool) {} + + /** + * Run auto-discovery (browser + API fallback) + */ + async runAutoDiscovery(): Promise { + const startTime = Date.now(); + const errors: string[] = []; + let citiesFound = 0; + let citiesInserted = 0; + let citiesUpdated = 0; + + console.log('[DtCityDiscoveryService] Starting auto city discovery...'); + + try { + let cities = await fetchCitiesFromBrowser(); + + if (cities.length === 0) { + console.log('[DtCityDiscoveryService] Browser returned 0 cities, trying API...'); + cities = await fetchCitiesFromAPI(); + } + + citiesFound = cities.length; + console.log(`[DtCityDiscoveryService] Found ${citiesFound} cities`); + + for (const city of cities) { + try { + const result = await upsertCity(this.pool, city); + if (result.inserted) citiesInserted++; + else if (result.updated) citiesUpdated++; + } catch (error: any) { + const msg = `Failed to upsert city ${city.slug}: ${error.message}`; + console.error(`[DtCityDiscoveryService] ${msg}`); + errors.push(msg); + } + } + } catch (error: any) { + const msg = `Auto discovery failed: ${error.message}`; + console.error(`[DtCityDiscoveryService] ${msg}`); + errors.push(msg); + } + + const durationMs = Date.now() - startTime; + + return { + citiesFound, + citiesInserted, + citiesUpdated, + errors, + durationMs, + }; + } + + /** + * Seed a single city manually + */ + async seedCity(city: DutchieCity): Promise { + console.log(`[DtCityDiscoveryService] Seeding city: ${city.name} (${city.slug}), ${city.stateCode}, ${city.countryCode}`); + + const result = await upsertCity(this.pool, city); + + return { + city, + id: result.id, + wasInserted: result.inserted, + }; + } + + /** + * Seed multiple cities from a list + */ + async seedCities(cities: DutchieCity[]): Promise<{ + results: ManualSeedResult[]; + errors: string[]; + }> { + const results: ManualSeedResult[] = []; + const errors: string[] = []; + + for (const city of cities) { + try { + const result = await this.seedCity(city); + results.push(result); + } catch (error: any) { + errors.push(`${city.slug}: ${error.message}`); + } + } + + return { results, errors }; + } + + /** + * Get statistics about discovered cities + */ + async getStats(): Promise<{ + total: number; + byCountry: Array<{ countryCode: string; count: number }>; + byState: Array<{ stateCode: string; countryCode: string; count: number }>; + crawlEnabled: number; + neverCrawled: number; + }> { + const [totalRes, byCountryRes, byStateRes, enabledRes, neverRes] = await Promise.all([ + this.pool.query('SELECT COUNT(*) as cnt FROM dutchie_discovery_cities WHERE platform = \'dutchie\''), + this.pool.query(` + SELECT country_code, COUNT(*) as cnt + FROM dutchie_discovery_cities + WHERE platform = 'dutchie' + GROUP BY country_code + ORDER BY cnt DESC + `), + this.pool.query(` + SELECT state_code, country_code, COUNT(*) as cnt + FROM dutchie_discovery_cities + WHERE platform = 'dutchie' AND state_code IS NOT NULL + GROUP BY state_code, country_code + ORDER BY cnt DESC + `), + this.pool.query(` + SELECT COUNT(*) as cnt + FROM dutchie_discovery_cities + WHERE platform = 'dutchie' AND crawl_enabled = TRUE + `), + this.pool.query(` + SELECT COUNT(*) as cnt + FROM dutchie_discovery_cities + WHERE platform = 'dutchie' AND last_crawled_at IS NULL + `), + ]); + + return { + total: parseInt(totalRes.rows[0]?.cnt || '0', 10), + byCountry: byCountryRes.rows.map((r) => ({ + countryCode: r.country_code, + count: parseInt(r.cnt, 10), + })), + byState: byStateRes.rows.map((r) => ({ + stateCode: r.state_code, + countryCode: r.country_code, + count: parseInt(r.cnt, 10), + })), + crawlEnabled: parseInt(enabledRes.rows[0]?.cnt || '0', 10), + neverCrawled: parseInt(neverRes.rows[0]?.cnt || '0', 10), + }; + } +} + +export default DtCityDiscoveryService; diff --git a/backend/src/dutchie-az/discovery/DtLocationDiscoveryService.ts b/backend/src/dutchie-az/discovery/DtLocationDiscoveryService.ts new file mode 100644 index 00000000..d6661531 --- /dev/null +++ b/backend/src/dutchie-az/discovery/DtLocationDiscoveryService.ts @@ -0,0 +1,1249 @@ +/** + * DtLocationDiscoveryService + * + * Core service for Dutchie location discovery. + * Contains shared logic used by multiple entrypoints. + * + * Responsibilities: + * - Fetch locations from city pages + * - Extract geo coordinates when available + * - Upsert locations to dutchie_discovery_locations + * - DO NOT overwrite protected statuses or existing lat/lng + */ + +import { Pool } from 'pg'; +import puppeteer from 'puppeteer-extra'; +import StealthPlugin from 'puppeteer-extra-plugin-stealth'; + +puppeteer.use(StealthPlugin()); + +// ============================================================ +// TYPES +// ============================================================ + +export interface DiscoveryCity { + id: number; + platform: string; + cityName: string; + citySlug: string; + stateCode: string | null; + countryCode: string; + crawlEnabled: boolean; +} + +export interface DutchieLocation { + platformLocationId: string; + platformSlug: string; + platformMenuUrl: string; + name: string; + rawAddress: string | null; + addressLine1: string | null; + addressLine2: string | null; + city: string | null; + stateCode: string | null; + postalCode: string | null; + countryCode: string | null; + latitude: number | null; + longitude: number | null; + timezone: string | null; + offersDelivery: boolean | null; + offersPickup: boolean | null; + isRecreational: boolean | null; + isMedical: boolean | null; + metadata: Record; +} + +export interface LocationDiscoveryResult { + cityId: number; + citySlug: string; + locationsFound: number; + locationsInserted: number; + locationsUpdated: number; + locationsSkipped: number; + reportedStoreCount: number | null; + errors: string[]; + durationMs: number; +} + +interface FetchResult { + locations: DutchieLocation[]; + reportedStoreCount: number | null; +} + +export interface BatchDiscoveryResult { + totalCities: number; + totalLocationsFound: number; + totalInserted: number; + totalUpdated: number; + totalSkipped: number; + errors: string[]; + durationMs: number; +} + +// ============================================================ +// COORDINATE EXTRACTION HELPERS +// ============================================================ + +/** + * Extract latitude from various payload formats + */ +function extractLatitude(data: any): number | null { + // Direct lat/latitude fields + if (typeof data.lat === 'number') return data.lat; + if (typeof data.latitude === 'number') return data.latitude; + + // Nested in location object + if (data.location) { + if (typeof data.location.lat === 'number') return data.location.lat; + if (typeof data.location.latitude === 'number') return data.location.latitude; + } + + // Nested in coordinates object + if (data.coordinates) { + if (typeof data.coordinates.lat === 'number') return data.coordinates.lat; + if (typeof data.coordinates.latitude === 'number') return data.coordinates.latitude; + // GeoJSON format [lng, lat] + if (Array.isArray(data.coordinates) && data.coordinates.length >= 2) { + return data.coordinates[1]; + } + } + + // Geometry object (GeoJSON) + if (data.geometry?.coordinates && Array.isArray(data.geometry.coordinates)) { + return data.geometry.coordinates[1]; + } + + // Nested in address + if (data.address) { + if (typeof data.address.lat === 'number') return data.address.lat; + if (typeof data.address.latitude === 'number') return data.address.latitude; + } + + // geo object + if (data.geo) { + if (typeof data.geo.lat === 'number') return data.geo.lat; + if (typeof data.geo.latitude === 'number') return data.geo.latitude; + } + + return null; +} + +/** + * Extract longitude from various payload formats + */ +function extractLongitude(data: any): number | null { + // Direct lng/longitude fields + if (typeof data.lng === 'number') return data.lng; + if (typeof data.lon === 'number') return data.lon; + if (typeof data.longitude === 'number') return data.longitude; + + // Nested in location object + if (data.location) { + if (typeof data.location.lng === 'number') return data.location.lng; + if (typeof data.location.lon === 'number') return data.location.lon; + if (typeof data.location.longitude === 'number') return data.location.longitude; + } + + // Nested in coordinates object + if (data.coordinates) { + if (typeof data.coordinates.lng === 'number') return data.coordinates.lng; + if (typeof data.coordinates.lon === 'number') return data.coordinates.lon; + if (typeof data.coordinates.longitude === 'number') return data.coordinates.longitude; + // GeoJSON format [lng, lat] + if (Array.isArray(data.coordinates) && data.coordinates.length >= 2) { + return data.coordinates[0]; + } + } + + // Geometry object (GeoJSON) + if (data.geometry?.coordinates && Array.isArray(data.geometry.coordinates)) { + return data.geometry.coordinates[0]; + } + + // Nested in address + if (data.address) { + if (typeof data.address.lng === 'number') return data.address.lng; + if (typeof data.address.lon === 'number') return data.address.lon; + if (typeof data.address.longitude === 'number') return data.address.longitude; + } + + // geo object + if (data.geo) { + if (typeof data.geo.lng === 'number') return data.geo.lng; + if (typeof data.geo.lon === 'number') return data.geo.lon; + if (typeof data.geo.longitude === 'number') return data.geo.longitude; + } + + return null; +} + +// ============================================================ +// LOCATION FETCHING +// ============================================================ + +/** + * Parse dispensary data from Dutchie's API/JSON response with coordinate extraction + */ +function parseDispensaryData(d: any, city: DiscoveryCity): DutchieLocation { + const id = d.id || d._id || d.dispensaryId || ''; + const slug = d.slug || d.cName || d.name?.toLowerCase().replace(/\s+/g, '-') || ''; + + // Build menu URL + let menuUrl = `https://dutchie.com/dispensary/${slug}`; + if (d.menuUrl) { + menuUrl = d.menuUrl; + } else if (d.embeddedMenuUrl) { + menuUrl = d.embeddedMenuUrl; + } + + // Parse address + const address = d.address || d.location?.address || {}; + const rawAddress = [ + address.line1 || address.street1 || d.address1, + address.line2 || address.street2 || d.address2, + [ + address.city || d.city, + address.state || address.stateCode || d.state, + address.zip || address.zipCode || address.postalCode || d.zip, + ] + .filter(Boolean) + .join(' '), + ] + .filter(Boolean) + .join(', '); + + // Extract coordinates from various possible locations in the payload + const latitude = extractLatitude(d); + const longitude = extractLongitude(d); + + if (latitude !== null && longitude !== null) { + console.log(`[DtLocationDiscoveryService] Extracted coordinates for ${slug}: ${latitude}, ${longitude}`); + } + + return { + platformLocationId: id, + platformSlug: slug, + platformMenuUrl: menuUrl, + name: d.name || d.dispensaryName || '', + rawAddress: rawAddress || null, + addressLine1: address.line1 || address.street1 || d.address1 || null, + addressLine2: address.line2 || address.street2 || d.address2 || null, + city: address.city || d.city || city.cityName, + stateCode: address.state || address.stateCode || d.state || city.stateCode, + postalCode: address.zip || address.zipCode || address.postalCode || d.zip || null, + countryCode: address.country || address.countryCode || d.country || city.countryCode, + latitude, + longitude, + timezone: d.timezone || d.timeZone || null, + offersDelivery: d.offerDelivery ?? d.offersDelivery ?? d.delivery ?? null, + offersPickup: d.offerPickup ?? d.offersPickup ?? d.pickup ?? null, + isRecreational: d.isRecreational ?? d.recreational ?? (d.retailType === 'recreational' || d.retailType === 'both'), + isMedical: d.isMedical ?? d.medical ?? (d.retailType === 'medical' || d.retailType === 'both'), + metadata: { + source: 'next_data', + retailType: d.retailType, + brand: d.brand, + logo: d.logo || d.logoUrl, + raw: d, + }, + }; +} + +/** + * Fetch locations for a city using Puppeteer + * Returns both locations and Dutchie's reported store count from page header + */ +async function fetchLocationsForCity(city: DiscoveryCity): Promise { + console.log(`[DtLocationDiscoveryService] Fetching locations for ${city.cityName}, ${city.stateCode}...`); + + const browser = await puppeteer.launch({ + headless: 'new', + args: ['--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage'], + }); + + try { + const page = await browser.newPage(); + await page.setUserAgent( + 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36' + ); + + // Use the /us/dispensaries/{city_slug} pattern (NOT /city/{state}/{slug}) + const cityUrl = `https://dutchie.com/us/dispensaries/${city.citySlug}`; + console.log(`[DtLocationDiscoveryService] Navigating to ${cityUrl}...`); + + await page.goto(cityUrl, { + waitUntil: 'networkidle2', + timeout: 60000, + }); + + await new Promise((r) => setTimeout(r, 3000)); + + // Extract reported store count from page header (e.g., "18 dispensaries") + const reportedStoreCount = await page.evaluate(() => { + // Look for patterns like "18 dispensaries", "18 stores", "18 results" + const headerSelectors = [ + 'h1', 'h2', '[data-testid="city-header"]', '[data-testid="results-count"]', + '.results-header', '.city-header', '.page-header' + ]; + + for (const selector of headerSelectors) { + const elements = Array.from(document.querySelectorAll(selector)); + for (const el of elements) { + const text = el.textContent || ''; + // Match patterns like "18 dispensaries", "18 stores", "18 results", or just "18" followed by word + const match = text.match(/(\d+)\s*(?:dispensar(?:y|ies)|stores?|results?|locations?)/i); + if (match) { + return parseInt(match[1], 10); + } + } + } + + // Also check for count in any element containing "dispensaries" or "stores" + const allText = document.body.innerText; + const globalMatch = allText.match(/(\d+)\s+dispensar(?:y|ies)/i); + if (globalMatch) { + return parseInt(globalMatch[1], 10); + } + + return null; + }); + + if (reportedStoreCount !== null) { + console.log(`[DtLocationDiscoveryService] Dutchie reports ${reportedStoreCount} stores for ${city.citySlug}`); + } + + // Try to extract __NEXT_DATA__ + const nextData = await page.evaluate(() => { + const script = document.querySelector('script#__NEXT_DATA__'); + if (script) { + try { + return JSON.parse(script.textContent || '{}'); + } catch { + return null; + } + } + return null; + }); + + let locations: DutchieLocation[] = []; + + if (nextData?.props?.pageProps?.dispensaries) { + const dispensaries = nextData.props.pageProps.dispensaries; + console.log(`[DtLocationDiscoveryService] Found ${dispensaries.length} dispensaries in __NEXT_DATA__`); + locations = dispensaries.map((d: any) => parseDispensaryData(d, city)); + } else { + // Fall back to DOM scraping + console.log('[DtLocationDiscoveryService] No __NEXT_DATA__, trying DOM scraping...'); + + const scrapedData = await page.evaluate(() => { + const stores: Array<{ + name: string; + href: string; + address: string | null; + }> = []; + + const cards = document.querySelectorAll('[data-testid="dispensary-card"], .dispensary-card, a[href*="/dispensary/"]'); + cards.forEach((card) => { + const link = card.querySelector('a[href*="/dispensary/"]') || (card as HTMLAnchorElement); + const href = (link as HTMLAnchorElement).href || ''; + const name = + card.querySelector('[data-testid="dispensary-name"]')?.textContent || + card.querySelector('h2, h3, .name')?.textContent || + link.textContent || + ''; + const address = card.querySelector('[data-testid="dispensary-address"], .address')?.textContent || null; + + if (href && name) { + stores.push({ + name: name.trim(), + href, + address: address?.trim() || null, + }); + } + }); + + return stores; + }); + + console.log(`[DtLocationDiscoveryService] DOM scraping found ${scrapedData.length} raw store cards`); + + locations = scrapedData.map((s) => { + const match = s.href.match(/\/dispensary\/([^/?]+)/); + const slug = match ? match[1] : s.name.toLowerCase().replace(/\s+/g, '-'); + + return { + platformLocationId: slug, + platformSlug: slug, + platformMenuUrl: `https://dutchie.com/dispensary/${slug}`, + name: s.name, + rawAddress: s.address, + addressLine1: null, + addressLine2: null, + city: city.cityName, + stateCode: city.stateCode, + postalCode: null, + countryCode: city.countryCode, + latitude: null, // Not available from DOM scraping + longitude: null, + timezone: null, + offersDelivery: null, + offersPickup: null, + isRecreational: null, + isMedical: null, + metadata: { source: 'dom_scrape', originalUrl: s.href }, + }; + }); + } + + // ========================================================================= + // FILTERING AND DEDUPLICATION + // ========================================================================= + + const beforeFilterCount = locations.length; + + // 1. Filter out ghost entries and marketing links + locations = locations.filter((loc) => { + // Filter out slug matching city slug (e.g., /dispensary/ak-anchorage) + if (loc.platformSlug === city.citySlug) { + console.log(`[DtLocationDiscoveryService] Filtering ghost entry: /dispensary/${loc.platformSlug} (matches city slug)`); + return false; + } + + // Filter out marketing/referral links (e.g., try.dutchie.com/dispensary/referral/) + if (!loc.platformMenuUrl.startsWith('https://dutchie.com/dispensary/')) { + console.log(`[DtLocationDiscoveryService] Filtering non-store URL: ${loc.platformMenuUrl}`); + return false; + } + + // Filter out generic marketing slugs + const marketingSlugs = ['referral', 'refer-a-dispensary', 'sign-up', 'signup']; + if (marketingSlugs.includes(loc.platformSlug.toLowerCase())) { + console.log(`[DtLocationDiscoveryService] Filtering marketing slug: ${loc.platformSlug}`); + return false; + } + + return true; + }); + + // 2. Deduplicate by platformMenuUrl (unique store URL) + const seenUrls = new Set(); + locations = locations.filter((loc) => { + if (seenUrls.has(loc.platformMenuUrl)) { + return false; + } + seenUrls.add(loc.platformMenuUrl); + return true; + }); + + const afterFilterCount = locations.length; + if (beforeFilterCount !== afterFilterCount) { + console.log(`[DtLocationDiscoveryService] Filtered: ${beforeFilterCount} -> ${afterFilterCount} (removed ${beforeFilterCount - afterFilterCount} ghost/duplicate entries)`); + } + + // Log comparison for QA + console.log(`[DtLocationDiscoveryService] [${city.citySlug}] reported_store_count=${reportedStoreCount ?? 'N/A'}, scraped_store_count=${afterFilterCount}`); + if (reportedStoreCount !== null && reportedStoreCount !== afterFilterCount) { + console.log(`[DtLocationDiscoveryService] [${city.citySlug}] MISMATCH: Dutchie reports ${reportedStoreCount}, we scraped ${afterFilterCount}`); + } + + return { locations, reportedStoreCount }; + } finally { + await browser.close(); + } +} + +// ============================================================ +// DATABASE OPERATIONS +// ============================================================ + +/** + * Upsert a location into dutchie_discovery_locations + * - Does NOT overwrite status if already verified/merged/rejected + * - Does NOT overwrite dispensary_id if already set + * - Does NOT overwrite existing lat/lng (only fills nulls) + */ +async function upsertLocation( + pool: Pool, + location: DutchieLocation, + cityId: number +): Promise<{ inserted: boolean; updated: boolean; skipped: boolean }> { + // First check if this location exists and has a protected status + const existing = await pool.query( + ` + SELECT id, status, dispensary_id, latitude, longitude + FROM dutchie_discovery_locations + WHERE platform = 'dutchie' AND platform_location_id = $1 + `, + [location.platformLocationId] + ); + + if (existing.rows.length > 0) { + const row = existing.rows[0]; + const protectedStatuses = ['verified', 'merged', 'rejected']; + + if (protectedStatuses.includes(row.status)) { + // Only update last_seen_at for protected statuses + // But still update coordinates if they were null and we now have them + await pool.query( + ` + UPDATE dutchie_discovery_locations + SET + last_seen_at = NOW(), + updated_at = NOW(), + latitude = CASE WHEN latitude IS NULL THEN $2 ELSE latitude END, + longitude = CASE WHEN longitude IS NULL THEN $3 ELSE longitude END + WHERE id = $1 + `, + [row.id, location.latitude, location.longitude] + ); + return { inserted: false, updated: false, skipped: true }; + } + + // Update existing discovered location + // Preserve existing lat/lng if already set (only fill nulls) + await pool.query( + ` + UPDATE dutchie_discovery_locations + SET + platform_slug = $2, + platform_menu_url = $3, + name = $4, + raw_address = COALESCE($5, raw_address), + address_line1 = COALESCE($6, address_line1), + address_line2 = COALESCE($7, address_line2), + city = COALESCE($8, city), + state_code = COALESCE($9, state_code), + postal_code = COALESCE($10, postal_code), + country_code = COALESCE($11, country_code), + latitude = CASE WHEN latitude IS NULL THEN $12 ELSE latitude END, + longitude = CASE WHEN longitude IS NULL THEN $13 ELSE longitude END, + timezone = COALESCE($14, timezone), + offers_delivery = COALESCE($15, offers_delivery), + offers_pickup = COALESCE($16, offers_pickup), + is_recreational = COALESCE($17, is_recreational), + is_medical = COALESCE($18, is_medical), + metadata = COALESCE($19, metadata), + discovery_city_id = $20, + last_seen_at = NOW(), + updated_at = NOW() + WHERE id = $1 + `, + [ + row.id, + location.platformSlug, + location.platformMenuUrl, + location.name, + location.rawAddress, + location.addressLine1, + location.addressLine2, + location.city, + location.stateCode, + location.postalCode, + location.countryCode, + location.latitude, + location.longitude, + location.timezone, + location.offersDelivery, + location.offersPickup, + location.isRecreational, + location.isMedical, + JSON.stringify(location.metadata), + cityId, + ] + ); + return { inserted: false, updated: true, skipped: false }; + } + + // Insert new location + await pool.query( + ` + INSERT INTO dutchie_discovery_locations ( + platform, + platform_location_id, + platform_slug, + platform_menu_url, + name, + raw_address, + address_line1, + address_line2, + city, + state_code, + postal_code, + country_code, + latitude, + longitude, + timezone, + status, + offers_delivery, + offers_pickup, + is_recreational, + is_medical, + metadata, + discovery_city_id, + first_seen_at, + last_seen_at, + active, + created_at, + updated_at + ) VALUES ( + 'dutchie', + $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, + 'discovered', + $15, $16, $17, $18, $19, $20, + NOW(), NOW(), TRUE, NOW(), NOW() + ) + `, + [ + location.platformLocationId, + location.platformSlug, + location.platformMenuUrl, + location.name, + location.rawAddress, + location.addressLine1, + location.addressLine2, + location.city, + location.stateCode, + location.postalCode, + location.countryCode, + location.latitude, + location.longitude, + location.timezone, + location.offersDelivery, + location.offersPickup, + location.isRecreational, + location.isMedical, + JSON.stringify(location.metadata), + cityId, + ] + ); + + return { inserted: true, updated: false, skipped: false }; +} + +// ============================================================ +// MAIN SERVICE CLASS +// ============================================================ + +export class DtLocationDiscoveryService { + constructor(private pool: Pool) {} + + /** + * Get a city by slug + */ + async getCityBySlug(citySlug: string): Promise { + const { rows } = await this.pool.query( + ` + SELECT id, platform, city_name, city_slug, state_code, country_code, crawl_enabled + FROM dutchie_discovery_cities + WHERE platform = 'dutchie' AND city_slug = $1 + LIMIT 1 + `, + [citySlug] + ); + + if (rows.length === 0) return null; + + const r = rows[0]; + return { + id: r.id, + platform: r.platform, + cityName: r.city_name, + citySlug: r.city_slug, + stateCode: r.state_code, + countryCode: r.country_code, + crawlEnabled: r.crawl_enabled, + }; + } + + /** + * Get all crawl-enabled cities + */ + async getEnabledCities(limit?: number): Promise { + const { rows } = await this.pool.query( + ` + SELECT id, platform, city_name, city_slug, state_code, country_code, crawl_enabled + FROM dutchie_discovery_cities + WHERE platform = 'dutchie' AND crawl_enabled = TRUE + ORDER BY last_crawled_at ASC NULLS FIRST, city_name ASC + ${limit ? `LIMIT ${limit}` : ''} + ` + ); + + return rows.map((r) => ({ + id: r.id, + platform: r.platform, + cityName: r.city_name, + citySlug: r.city_slug, + stateCode: r.state_code, + countryCode: r.country_code, + crawlEnabled: r.crawl_enabled, + })); + } + + /** + * Discover locations for a single city + */ + async discoverForCity(city: DiscoveryCity): Promise { + const startTime = Date.now(); + const errors: string[] = []; + let locationsFound = 0; + let locationsInserted = 0; + let locationsUpdated = 0; + let locationsSkipped = 0; + let reportedStoreCount: number | null = null; + + console.log(`[DtLocationDiscoveryService] Discovering locations for ${city.cityName}, ${city.stateCode}...`); + + try { + const fetchResult = await fetchLocationsForCity(city); + const locations = fetchResult.locations; + reportedStoreCount = fetchResult.reportedStoreCount; + + locationsFound = locations.length; + console.log(`[DtLocationDiscoveryService] Found ${locationsFound} locations`); + + // Count how many have coordinates + const withCoords = locations.filter(l => l.latitude !== null && l.longitude !== null).length; + if (withCoords > 0) { + console.log(`[DtLocationDiscoveryService] ${withCoords}/${locationsFound} locations have coordinates`); + } + + for (const location of locations) { + try { + const result = await upsertLocation(this.pool, location, city.id); + if (result.inserted) locationsInserted++; + else if (result.updated) locationsUpdated++; + else if (result.skipped) locationsSkipped++; + } catch (error: any) { + const msg = `Failed to upsert location ${location.platformSlug}: ${error.message}`; + console.error(`[DtLocationDiscoveryService] ${msg}`); + errors.push(msg); + } + } + + // Update city's last_crawled_at, location_count, and reported_store_count in metadata + await this.pool.query( + ` + UPDATE dutchie_discovery_cities + SET last_crawled_at = NOW(), + location_count = $1, + metadata = COALESCE(metadata, '{}')::jsonb || jsonb_build_object( + 'reported_store_count', $3::int, + 'scraped_store_count', $1::int, + 'last_discovery_at', NOW()::text + ), + updated_at = NOW() + WHERE id = $2 + `, + [locationsFound, city.id, reportedStoreCount] + ); + } catch (error: any) { + const msg = `Location discovery failed for ${city.citySlug}: ${error.message}`; + console.error(`[DtLocationDiscoveryService] ${msg}`); + errors.push(msg); + } + + const durationMs = Date.now() - startTime; + + console.log(`[DtLocationDiscoveryService] City ${city.citySlug} complete:`); + console.log(` Reported count: ${reportedStoreCount ?? 'N/A'}`); + console.log(` Locations found: ${locationsFound}`); + console.log(` Inserted: ${locationsInserted}`); + console.log(` Updated: ${locationsUpdated}`); + console.log(` Skipped (protected): ${locationsSkipped}`); + console.log(` Errors: ${errors.length}`); + console.log(` Duration: ${(durationMs / 1000).toFixed(1)}s`); + + return { + cityId: city.id, + citySlug: city.citySlug, + locationsFound, + locationsInserted, + locationsUpdated, + locationsSkipped, + reportedStoreCount, + errors, + durationMs, + }; + } + + /** + * Discover locations for all enabled cities + */ + async discoverAllEnabled(options: { + limit?: number; + delayMs?: number; + } = {}): Promise { + const { limit, delayMs = 2000 } = options; + const startTime = Date.now(); + let totalLocationsFound = 0; + let totalInserted = 0; + let totalUpdated = 0; + let totalSkipped = 0; + const allErrors: string[] = []; + + const cities = await this.getEnabledCities(limit); + console.log(`[DtLocationDiscoveryService] Discovering locations for ${cities.length} cities...`); + + for (let i = 0; i < cities.length; i++) { + const city = cities[i]; + console.log(`\n[DtLocationDiscoveryService] City ${i + 1}/${cities.length}: ${city.cityName}, ${city.stateCode}`); + + try { + const result = await this.discoverForCity(city); + totalLocationsFound += result.locationsFound; + totalInserted += result.locationsInserted; + totalUpdated += result.locationsUpdated; + totalSkipped += result.locationsSkipped; + allErrors.push(...result.errors); + } catch (error: any) { + allErrors.push(`City ${city.citySlug} failed: ${error.message}`); + } + + if (i < cities.length - 1 && delayMs > 0) { + await new Promise((r) => setTimeout(r, delayMs)); + } + } + + const durationMs = Date.now() - startTime; + + return { + totalCities: cities.length, + totalLocationsFound, + totalInserted, + totalUpdated, + totalSkipped, + errors: allErrors, + durationMs, + }; + } + + /** + * Get location statistics + */ + async getStats(): Promise<{ + total: number; + withCoordinates: number; + byStatus: Array<{ status: string; count: number }>; + byState: Array<{ stateCode: string; count: number }>; + }> { + const [totalRes, coordsRes, byStatusRes, byStateRes] = await Promise.all([ + this.pool.query(` + SELECT COUNT(*) as cnt FROM dutchie_discovery_locations + WHERE platform = 'dutchie' AND active = TRUE + `), + this.pool.query(` + SELECT COUNT(*) as cnt FROM dutchie_discovery_locations + WHERE platform = 'dutchie' AND active = TRUE + AND latitude IS NOT NULL AND longitude IS NOT NULL + `), + this.pool.query(` + SELECT status, COUNT(*) as cnt + FROM dutchie_discovery_locations + WHERE platform = 'dutchie' AND active = TRUE + GROUP BY status + ORDER BY cnt DESC + `), + this.pool.query(` + SELECT state_code, COUNT(*) as cnt + FROM dutchie_discovery_locations + WHERE platform = 'dutchie' AND active = TRUE AND state_code IS NOT NULL + GROUP BY state_code + ORDER BY cnt DESC + LIMIT 20 + `), + ]); + + return { + total: parseInt(totalRes.rows[0]?.cnt || '0', 10), + withCoordinates: parseInt(coordsRes.rows[0]?.cnt || '0', 10), + byStatus: byStatusRes.rows.map((r) => ({ + status: r.status, + count: parseInt(r.cnt, 10), + })), + byState: byStateRes.rows.map((r) => ({ + stateCode: r.state_code, + count: parseInt(r.cnt, 10), + })), + }; + } + + // ============================================================ + // ALICE - FULL DISCOVERY FROM /CITIES PAGE + // ============================================================ + + /** + * Fetch all states and cities from https://dutchie.com/cities + * Returns the complete hierarchy of states -> cities + */ + async fetchCitiesFromMasterPage(): Promise<{ + states: Array<{ + stateCode: string; + stateName: string; + cities: Array<{ cityName: string; citySlug: string; storeCount?: number }>; + }>; + errors: string[]; + }> { + console.log('[Alice] Fetching master cities page from https://dutchie.com/cities...'); + + const browser = await puppeteer.launch({ + headless: 'new', + args: ['--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage'], + }); + + try { + const page = await browser.newPage(); + await page.setUserAgent( + 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36' + ); + + await page.goto('https://dutchie.com/cities', { + waitUntil: 'networkidle2', + timeout: 60000, + }); + + await new Promise((r) => setTimeout(r, 3000)); + + // Try to extract from __NEXT_DATA__ + const citiesData = await page.evaluate(() => { + const script = document.querySelector('script#__NEXT_DATA__'); + if (script) { + try { + const data = JSON.parse(script.textContent || '{}'); + return data?.props?.pageProps || null; + } catch { + return null; + } + } + return null; + }); + + const states: Array<{ + stateCode: string; + stateName: string; + cities: Array<{ cityName: string; citySlug: string; storeCount?: number }>; + }> = []; + const errors: string[] = []; + + if (citiesData?.states || citiesData?.regions) { + // Parse from structured data + const statesList = citiesData.states || citiesData.regions || []; + for (const state of statesList) { + const stateCode = state.code || state.stateCode || state.abbreviation || ''; + const stateName = state.name || state.stateName || ''; + const cities = (state.cities || []).map((c: any) => ({ + cityName: c.name || c.cityName || '', + citySlug: c.slug || c.citySlug || c.name?.toLowerCase().replace(/\s+/g, '-') || '', + storeCount: c.dispensaryCount || c.storeCount || undefined, + })); + if (stateCode && cities.length > 0) { + states.push({ stateCode, stateName, cities }); + } + } + } else { + // Fallback: DOM scraping + console.log('[Alice] No __NEXT_DATA__, attempting DOM scrape...'); + const scrapedStates = await page.evaluate(() => { + const result: Array<{ + stateCode: string; + stateName: string; + cities: Array<{ cityName: string; citySlug: string }>; + }> = []; + + // Look for state sections + const stateHeaders = document.querySelectorAll('h2, h3, [data-testid*="state"]'); + stateHeaders.forEach((header) => { + const stateName = header.textContent?.trim() || ''; + // Try to extract state code from data attributes or guess from name + const stateCode = (header as HTMLElement).dataset?.stateCode || + stateName.substring(0, 2).toUpperCase(); + + // Find city links following this header + const container = header.closest('section') || header.parentElement; + const cityLinks = container?.querySelectorAll('a[href*="/dispensaries/"]') || []; + const cities: Array<{ cityName: string; citySlug: string }> = []; + + cityLinks.forEach((link) => { + const href = (link as HTMLAnchorElement).href || ''; + const match = href.match(/\/dispensaries\/([^/?]+)/); + if (match) { + cities.push({ + cityName: link.textContent?.trim() || '', + citySlug: match[1], + }); + } + }); + + if (stateName && cities.length > 0) { + result.push({ stateCode, stateName, cities }); + } + }); + + return result; + }); + + states.push(...scrapedStates); + + if (states.length === 0) { + errors.push('Could not parse cities from master page'); + } + } + + console.log(`[Alice] Found ${states.length} states with cities from master page`); + return { states, errors }; + + } finally { + await browser.close(); + } + } + + /** + * Upsert cities from master page discovery + */ + async upsertCitiesFromMaster(states: Array<{ + stateCode: string; + stateName: string; + cities: Array<{ cityName: string; citySlug: string; storeCount?: number }>; + }>): Promise<{ inserted: number; updated: number }> { + let inserted = 0; + let updated = 0; + + for (const state of states) { + for (const city of state.cities) { + const existing = await this.pool.query( + `SELECT id FROM dutchie_discovery_cities + WHERE platform = 'dutchie' AND city_slug = $1`, + [city.citySlug] + ); + + if (existing.rows.length === 0) { + // Insert new city + await this.pool.query( + `INSERT INTO dutchie_discovery_cities ( + platform, city_name, city_slug, state_code, state_name, + country_code, crawl_enabled, discovered_at, last_verified_at, + store_count_reported, created_at, updated_at + ) VALUES ($1, $2, $3, $4, $5, $6, $7, NOW(), NOW(), $8, NOW(), NOW())`, + [ + 'dutchie', + city.cityName, + city.citySlug, + state.stateCode, + state.stateName, + 'US', + true, + city.storeCount || null, + ] + ); + inserted++; + } else { + // Update existing city + await this.pool.query( + `UPDATE dutchie_discovery_cities SET + city_name = COALESCE($2, city_name), + state_code = COALESCE($3, state_code), + state_name = COALESCE($4, state_name), + last_verified_at = NOW(), + store_count_reported = COALESCE($5, store_count_reported), + updated_at = NOW() + WHERE id = $1`, + [existing.rows[0].id, city.cityName, state.stateCode, state.stateName, city.storeCount] + ); + updated++; + } + } + } + + return { inserted, updated }; + } + + /** + * Detect stores that have been removed from source + * Mark them as retired instead of deleting + */ + async detectAndMarkRemovedStores( + currentLocationIds: Set + ): Promise<{ retiredCount: number; retiredIds: string[] }> { + // Get all active locations we know about + const { rows: existingLocations } = await this.pool.query<{ + id: number; + platform_location_id: string; + name: string; + }>(` + SELECT id, platform_location_id, name + FROM dutchie_discovery_locations + WHERE platform = 'dutchie' + AND active = TRUE + AND retired_at IS NULL + `); + + const retiredIds: string[] = []; + + for (const loc of existingLocations) { + if (!currentLocationIds.has(loc.platform_location_id)) { + // This store no longer appears in source - mark as retired + await this.pool.query( + `UPDATE dutchie_discovery_locations SET + active = FALSE, + retired_at = NOW(), + retirement_reason = 'removed_from_source', + updated_at = NOW() + WHERE id = $1`, + [loc.id] + ); + retiredIds.push(loc.platform_location_id); + console.log(`[Alice] Marked store as retired: ${loc.name} (${loc.platform_location_id})`); + } + } + + return { retiredCount: retiredIds.length, retiredIds }; + } + + /** + * Detect and track slug changes + */ + async detectSlugChanges( + locationId: string, + newSlug: string + ): Promise<{ changed: boolean; previousSlug?: string }> { + const { rows } = await this.pool.query<{ platform_slug: string }>( + `SELECT platform_slug FROM dutchie_discovery_locations + WHERE platform = 'dutchie' AND platform_location_id = $1`, + [locationId] + ); + + if (rows.length === 0) return { changed: false }; + + const currentSlug = rows[0].platform_slug; + if (currentSlug && currentSlug !== newSlug) { + // Slug changed - update with tracking + await this.pool.query( + `UPDATE dutchie_discovery_locations SET + platform_slug = $1, + previous_slug = $2, + slug_changed_at = NOW(), + updated_at = NOW() + WHERE platform = 'dutchie' AND platform_location_id = $3`, + [newSlug, currentSlug, locationId] + ); + console.log(`[Alice] Slug change detected: ${currentSlug} -> ${newSlug}`); + return { changed: true, previousSlug: currentSlug }; + } + + return { changed: false }; + } + + /** + * Full discovery run with change detection (Alice's main job) + * Fetches from /cities, discovers all stores, detects changes + */ + async runFullDiscoveryWithChangeDetection(options: { + scope?: { states?: string[]; storeIds?: number[] }; + delayMs?: number; + } = {}): Promise<{ + statesDiscovered: number; + citiesDiscovered: number; + newStoreCount: number; + removedStoreCount: number; + updatedStoreCount: number; + slugChangedCount: number; + totalLocationsFound: number; + errors: string[]; + durationMs: number; + }> { + const startTime = Date.now(); + const { scope, delayMs = 2000 } = options; + const errors: string[] = []; + let slugChangedCount = 0; + + console.log('[Alice] Starting full discovery with change detection...'); + if (scope?.states) { + console.log(`[Alice] Scope limited to states: ${scope.states.join(', ')}`); + } + + // Step 1: Fetch master cities page + const { states: masterStates, errors: fetchErrors } = await this.fetchCitiesFromMasterPage(); + errors.push(...fetchErrors); + + // Filter by scope if provided + const statesToProcess = scope?.states + ? masterStates.filter(s => scope.states!.includes(s.stateCode)) + : masterStates; + + // Step 2: Upsert cities + const citiesResult = await this.upsertCitiesFromMaster(statesToProcess); + console.log(`[Alice] Cities: ${citiesResult.inserted} new, ${citiesResult.updated} updated`); + + // Step 3: Discover locations for each city + const allLocationIds = new Set(); + let totalLocationsFound = 0; + let totalInserted = 0; + let totalUpdated = 0; + + const cities = await this.getEnabledCities(); + const citiesToProcess = scope?.states + ? cities.filter(c => c.stateCode && scope.states!.includes(c.stateCode)) + : cities; + + for (let i = 0; i < citiesToProcess.length; i++) { + const city = citiesToProcess[i]; + console.log(`[Alice] City ${i + 1}/${citiesToProcess.length}: ${city.cityName}, ${city.stateCode}`); + + try { + const result = await this.discoverForCity(city); + totalLocationsFound += result.locationsFound; + totalInserted += result.locationsInserted; + totalUpdated += result.locationsUpdated; + errors.push(...result.errors); + + // Track all discovered location IDs for removal detection + // (This requires modifying discoverForCity to return IDs, or query them after) + + } catch (error: any) { + errors.push(`City ${city.citySlug}: ${error.message}`); + } + + if (i < citiesToProcess.length - 1 && delayMs > 0) { + await new Promise((r) => setTimeout(r, delayMs)); + } + } + + // Step 4: Get all current active location IDs for removal detection + const { rows: currentLocations } = await this.pool.query<{ platform_location_id: string }>( + `SELECT platform_location_id FROM dutchie_discovery_locations + WHERE platform = 'dutchie' AND active = TRUE AND last_seen_at > NOW() - INTERVAL '1 day'` + ); + currentLocations.forEach(loc => allLocationIds.add(loc.platform_location_id)); + + // Step 5: Detect removed stores (only if we had a successful discovery) + let removedResult = { retiredCount: 0, retiredIds: [] as string[] }; + if (totalLocationsFound > 0 && !scope) { + // Only detect removals on full (unscoped) runs + removedResult = await this.detectAndMarkRemovedStores(allLocationIds); + } + + const durationMs = Date.now() - startTime; + + console.log('[Alice] Full discovery complete:'); + console.log(` States: ${statesToProcess.length}`); + console.log(` Cities: ${citiesToProcess.length}`); + console.log(` Locations found: ${totalLocationsFound}`); + console.log(` New: ${totalInserted}, Updated: ${totalUpdated}`); + console.log(` Removed: ${removedResult.retiredCount}`); + console.log(` Duration: ${(durationMs / 1000).toFixed(1)}s`); + + return { + statesDiscovered: statesToProcess.length, + citiesDiscovered: citiesToProcess.length, + newStoreCount: totalInserted, + removedStoreCount: removedResult.retiredCount, + updatedStoreCount: totalUpdated, + slugChangedCount, + totalLocationsFound, + errors, + durationMs, + }; + } +} + +export default DtLocationDiscoveryService; diff --git a/backend/src/dutchie-az/discovery/DutchieCityDiscovery.ts b/backend/src/dutchie-az/discovery/DutchieCityDiscovery.ts new file mode 100644 index 00000000..cbde7dd7 --- /dev/null +++ b/backend/src/dutchie-az/discovery/DutchieCityDiscovery.ts @@ -0,0 +1,390 @@ +/** + * DutchieCityDiscovery + * + * Discovers cities from Dutchie's /cities page and upserts to dutchie_discovery_cities. + * + * Responsibilities: + * - Fetch all cities available on Dutchie + * - For each city derive: city_name, city_slug, state_code, country_code + * - Upsert into dutchie_discovery_cities + */ + +import { Pool } from 'pg'; +import axios from 'axios'; +import puppeteer from 'puppeteer-extra'; +import StealthPlugin from 'puppeteer-extra-plugin-stealth'; +import type { Browser, Page } from 'puppeteer'; + +puppeteer.use(StealthPlugin()); + +// ============================================================ +// TYPES +// ============================================================ + +export interface DutchieCity { + name: string; + slug: string; + stateCode: string | null; + countryCode: string; + url?: string; +} + +export interface CityDiscoveryResult { + citiesFound: number; + citiesInserted: number; + citiesUpdated: number; + errors: string[]; + durationMs: number; +} + +// ============================================================ +// US STATE CODE MAPPING +// ============================================================ + +const US_STATE_MAP: Record = { + 'alabama': 'AL', 'alaska': 'AK', 'arizona': 'AZ', 'arkansas': 'AR', + 'california': 'CA', 'colorado': 'CO', 'connecticut': 'CT', 'delaware': 'DE', + 'florida': 'FL', 'georgia': 'GA', 'hawaii': 'HI', 'idaho': 'ID', + 'illinois': 'IL', 'indiana': 'IN', 'iowa': 'IA', 'kansas': 'KS', + 'kentucky': 'KY', 'louisiana': 'LA', 'maine': 'ME', 'maryland': 'MD', + 'massachusetts': 'MA', 'michigan': 'MI', 'minnesota': 'MN', 'mississippi': 'MS', + 'missouri': 'MO', 'montana': 'MT', 'nebraska': 'NE', 'nevada': 'NV', + 'new-hampshire': 'NH', 'new-jersey': 'NJ', 'new-mexico': 'NM', 'new-york': 'NY', + 'north-carolina': 'NC', 'north-dakota': 'ND', 'ohio': 'OH', 'oklahoma': 'OK', + 'oregon': 'OR', 'pennsylvania': 'PA', 'rhode-island': 'RI', 'south-carolina': 'SC', + 'south-dakota': 'SD', 'tennessee': 'TN', 'texas': 'TX', 'utah': 'UT', + 'vermont': 'VT', 'virginia': 'VA', 'washington': 'WA', 'west-virginia': 'WV', + 'wisconsin': 'WI', 'wyoming': 'WY', 'district-of-columbia': 'DC', +}; + +// Canadian province mapping +const CA_PROVINCE_MAP: Record = { + 'alberta': 'AB', 'british-columbia': 'BC', 'manitoba': 'MB', + 'new-brunswick': 'NB', 'newfoundland-and-labrador': 'NL', + 'northwest-territories': 'NT', 'nova-scotia': 'NS', 'nunavut': 'NU', + 'ontario': 'ON', 'prince-edward-island': 'PE', 'quebec': 'QC', + 'saskatchewan': 'SK', 'yukon': 'YT', +}; + +// ============================================================ +// CITY FETCHING +// ============================================================ + +/** + * Fetch cities from Dutchie's /cities page using Puppeteer to extract data. + */ +async function fetchCitiesFromDutchie(): Promise { + console.log('[DutchieCityDiscovery] Launching browser to fetch cities...'); + + const browser = await puppeteer.launch({ + headless: 'new', + args: ['--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage'], + }); + + try { + const page = await browser.newPage(); + await page.setUserAgent( + 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36' + ); + + // Navigate to cities page + console.log('[DutchieCityDiscovery] Navigating to https://dutchie.com/cities...'); + await page.goto('https://dutchie.com/cities', { + waitUntil: 'networkidle2', + timeout: 60000, + }); + + // Wait for content to load + await new Promise((r) => setTimeout(r, 3000)); + + // Extract city links from the page + const cities = await page.evaluate(() => { + const cityLinks: Array<{ + name: string; + slug: string; + url: string; + stateSlug: string | null; + }> = []; + + // Find all city links - they typically follow pattern /city/{state}/{city} + const links = document.querySelectorAll('a[href*="/city/"]'); + links.forEach((link) => { + const href = (link as HTMLAnchorElement).href; + const text = (link as HTMLElement).innerText?.trim(); + + // Parse URL: https://dutchie.com/city/{state}/{city} + const match = href.match(/\/city\/([^/]+)\/([^/?]+)/); + if (match && text) { + cityLinks.push({ + name: text, + slug: match[2], + url: href, + stateSlug: match[1], + }); + } + }); + + return cityLinks; + }); + + console.log(`[DutchieCityDiscovery] Extracted ${cities.length} city links from page`); + + // Convert to DutchieCity format + const result: DutchieCity[] = []; + + for (const city of cities) { + // Determine country and state code + let countryCode = 'US'; + let stateCode: string | null = null; + + if (city.stateSlug) { + // Check if it's a US state + if (US_STATE_MAP[city.stateSlug]) { + stateCode = US_STATE_MAP[city.stateSlug]; + countryCode = 'US'; + } + // Check if it's a Canadian province + else if (CA_PROVINCE_MAP[city.stateSlug]) { + stateCode = CA_PROVINCE_MAP[city.stateSlug]; + countryCode = 'CA'; + } + // Check if it's already a 2-letter code + else if (city.stateSlug.length === 2) { + stateCode = city.stateSlug.toUpperCase(); + // Determine country based on state code + if (Object.values(CA_PROVINCE_MAP).includes(stateCode)) { + countryCode = 'CA'; + } + } + } + + result.push({ + name: city.name, + slug: city.slug, + stateCode, + countryCode, + url: city.url, + }); + } + + return result; + } finally { + await browser.close(); + } +} + +/** + * Alternative: Fetch cities by making API/GraphQL requests. + * Falls back to this if scraping fails. + */ +async function fetchCitiesFromAPI(): Promise { + console.log('[DutchieCityDiscovery] Attempting API-based city discovery...'); + + // Dutchie may have an API endpoint for cities + // Try common patterns + const apiEndpoints = [ + 'https://dutchie.com/api/cities', + 'https://api.dutchie.com/v1/cities', + ]; + + for (const endpoint of apiEndpoints) { + try { + const response = await axios.get(endpoint, { + headers: { + 'User-Agent': + 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0', + Accept: 'application/json', + }, + timeout: 15000, + }); + + if (response.data && Array.isArray(response.data)) { + console.log(`[DutchieCityDiscovery] API returned ${response.data.length} cities`); + return response.data.map((c: any) => ({ + name: c.name || c.city, + slug: c.slug || c.citySlug, + stateCode: c.stateCode || c.state, + countryCode: c.countryCode || c.country || 'US', + })); + } + } catch (error: any) { + console.log(`[DutchieCityDiscovery] API ${endpoint} failed: ${error.message}`); + } + } + + return []; +} + +// ============================================================ +// DATABASE OPERATIONS +// ============================================================ + +/** + * Upsert a city into dutchie_discovery_cities + */ +async function upsertCity( + pool: Pool, + city: DutchieCity +): Promise<{ inserted: boolean; updated: boolean }> { + const result = await pool.query( + ` + INSERT INTO dutchie_discovery_cities ( + platform, + city_name, + city_slug, + state_code, + country_code, + last_crawled_at, + updated_at + ) VALUES ( + 'dutchie', + $1, + $2, + $3, + $4, + NOW(), + NOW() + ) + ON CONFLICT (platform, country_code, state_code, city_slug) + DO UPDATE SET + city_name = EXCLUDED.city_name, + last_crawled_at = NOW(), + updated_at = NOW() + RETURNING (xmax = 0) AS inserted + `, + [city.name, city.slug, city.stateCode, city.countryCode] + ); + + const inserted = result.rows[0]?.inserted === true; + return { inserted, updated: !inserted }; +} + +// ============================================================ +// MAIN DISCOVERY FUNCTION +// ============================================================ + +export class DutchieCityDiscovery { + private pool: Pool; + + constructor(pool: Pool) { + this.pool = pool; + } + + /** + * Run the city discovery process + */ + async run(): Promise { + const startTime = Date.now(); + const errors: string[] = []; + let citiesFound = 0; + let citiesInserted = 0; + let citiesUpdated = 0; + + console.log('[DutchieCityDiscovery] Starting city discovery...'); + + try { + // Try scraping first, fall back to API + let cities = await fetchCitiesFromDutchie(); + + if (cities.length === 0) { + console.log('[DutchieCityDiscovery] Scraping returned 0 cities, trying API...'); + cities = await fetchCitiesFromAPI(); + } + + citiesFound = cities.length; + console.log(`[DutchieCityDiscovery] Found ${citiesFound} cities`); + + // Upsert each city + for (const city of cities) { + try { + const result = await upsertCity(this.pool, city); + if (result.inserted) { + citiesInserted++; + } else if (result.updated) { + citiesUpdated++; + } + } catch (error: any) { + const msg = `Failed to upsert city ${city.slug}: ${error.message}`; + console.error(`[DutchieCityDiscovery] ${msg}`); + errors.push(msg); + } + } + } catch (error: any) { + const msg = `City discovery failed: ${error.message}`; + console.error(`[DutchieCityDiscovery] ${msg}`); + errors.push(msg); + } + + const durationMs = Date.now() - startTime; + + console.log('[DutchieCityDiscovery] Discovery complete:'); + console.log(` Cities found: ${citiesFound}`); + console.log(` Inserted: ${citiesInserted}`); + console.log(` Updated: ${citiesUpdated}`); + console.log(` Errors: ${errors.length}`); + console.log(` Duration: ${(durationMs / 1000).toFixed(1)}s`); + + return { + citiesFound, + citiesInserted, + citiesUpdated, + errors, + durationMs, + }; + } + + /** + * Get statistics about discovered cities + */ + async getStats(): Promise<{ + total: number; + byCountry: Array<{ countryCode: string; count: number }>; + byState: Array<{ stateCode: string; countryCode: string; count: number }>; + crawlEnabled: number; + neverCrawled: number; + }> { + const [totalRes, byCountryRes, byStateRes, enabledRes, neverRes] = await Promise.all([ + this.pool.query('SELECT COUNT(*) as cnt FROM dutchie_discovery_cities'), + this.pool.query(` + SELECT country_code, COUNT(*) as cnt + FROM dutchie_discovery_cities + GROUP BY country_code + ORDER BY cnt DESC + `), + this.pool.query(` + SELECT state_code, country_code, COUNT(*) as cnt + FROM dutchie_discovery_cities + WHERE state_code IS NOT NULL + GROUP BY state_code, country_code + ORDER BY cnt DESC + `), + this.pool.query(` + SELECT COUNT(*) as cnt + FROM dutchie_discovery_cities + WHERE crawl_enabled = TRUE + `), + this.pool.query(` + SELECT COUNT(*) as cnt + FROM dutchie_discovery_cities + WHERE last_crawled_at IS NULL + `), + ]); + + return { + total: parseInt(totalRes.rows[0]?.cnt || '0', 10), + byCountry: byCountryRes.rows.map((r) => ({ + countryCode: r.country_code, + count: parseInt(r.cnt, 10), + })), + byState: byStateRes.rows.map((r) => ({ + stateCode: r.state_code, + countryCode: r.country_code, + count: parseInt(r.cnt, 10), + })), + crawlEnabled: parseInt(enabledRes.rows[0]?.cnt || '0', 10), + neverCrawled: parseInt(neverRes.rows[0]?.cnt || '0', 10), + }; + } +} + +export default DutchieCityDiscovery; diff --git a/backend/src/dutchie-az/discovery/DutchieLocationDiscovery.ts b/backend/src/dutchie-az/discovery/DutchieLocationDiscovery.ts new file mode 100644 index 00000000..2bb27e17 --- /dev/null +++ b/backend/src/dutchie-az/discovery/DutchieLocationDiscovery.ts @@ -0,0 +1,639 @@ +/** + * DutchieLocationDiscovery + * + * Discovers store locations for each city from Dutchie and upserts to dutchie_discovery_locations. + * + * Responsibilities: + * - Given a dutchie_discovery_cities row, call Dutchie's location/search endpoint + * - For each store: extract platform_location_id, platform_slug, platform_menu_url, name, address, coords + * - Upsert into dutchie_discovery_locations + * - DO NOT overwrite status if already verified/merged/rejected + * - DO NOT overwrite dispensary_id if already set + */ + +import { Pool } from 'pg'; +import axios from 'axios'; +import puppeteer from 'puppeteer-extra'; +import StealthPlugin from 'puppeteer-extra-plugin-stealth'; + +puppeteer.use(StealthPlugin()); + +// ============================================================ +// TYPES +// ============================================================ + +export interface DiscoveryCity { + id: number; + platform: string; + cityName: string; + citySlug: string; + stateCode: string | null; + countryCode: string; + crawlEnabled: boolean; +} + +export interface DutchieLocation { + platformLocationId: string; + platformSlug: string; + platformMenuUrl: string; + name: string; + rawAddress: string | null; + addressLine1: string | null; + addressLine2: string | null; + city: string | null; + stateCode: string | null; + postalCode: string | null; + countryCode: string | null; + latitude: number | null; + longitude: number | null; + timezone: string | null; + offersDelivery: boolean | null; + offersPickup: boolean | null; + isRecreational: boolean | null; + isMedical: boolean | null; + metadata: Record; +} + +export interface LocationDiscoveryResult { + cityId: number; + citySlug: string; + locationsFound: number; + locationsInserted: number; + locationsUpdated: number; + locationsSkipped: number; + errors: string[]; + durationMs: number; +} + +// ============================================================ +// LOCATION FETCHING +// ============================================================ + +/** + * Fetch locations for a city using Puppeteer to scrape the city page + */ +async function fetchLocationsForCity(city: DiscoveryCity): Promise { + console.log(`[DutchieLocationDiscovery] Fetching locations for ${city.cityName}, ${city.stateCode}...`); + + const browser = await puppeteer.launch({ + headless: 'new', + args: ['--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage'], + }); + + try { + const page = await browser.newPage(); + await page.setUserAgent( + 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36' + ); + + // Navigate to city page - use /us/dispensaries/{city_slug} pattern + const cityUrl = `https://dutchie.com/us/dispensaries/${city.citySlug}`; + console.log(`[DutchieLocationDiscovery] Navigating to ${cityUrl}...`); + + await page.goto(cityUrl, { + waitUntil: 'networkidle2', + timeout: 60000, + }); + + // Wait for content + await new Promise((r) => setTimeout(r, 3000)); + + // Try to extract __NEXT_DATA__ which often contains store data + const nextData = await page.evaluate(() => { + const script = document.querySelector('script#__NEXT_DATA__'); + if (script) { + try { + return JSON.parse(script.textContent || '{}'); + } catch { + return null; + } + } + return null; + }); + + let locations: DutchieLocation[] = []; + + if (nextData?.props?.pageProps?.dispensaries) { + // Extract from Next.js data + const dispensaries = nextData.props.pageProps.dispensaries; + console.log(`[DutchieLocationDiscovery] Found ${dispensaries.length} dispensaries in __NEXT_DATA__`); + + locations = dispensaries.map((d: any) => parseDispensaryData(d, city)); + } else { + // Fall back to DOM scraping + console.log('[DutchieLocationDiscovery] No __NEXT_DATA__, trying DOM scraping...'); + + const scrapedData = await page.evaluate(() => { + const stores: Array<{ + name: string; + href: string; + address: string | null; + }> = []; + + // Look for dispensary cards/links + const cards = document.querySelectorAll('[data-testid="dispensary-card"], .dispensary-card, a[href*="/dispensary/"]'); + cards.forEach((card) => { + const link = card.querySelector('a[href*="/dispensary/"]') || (card as HTMLAnchorElement); + const href = (link as HTMLAnchorElement).href || ''; + const name = + card.querySelector('[data-testid="dispensary-name"]')?.textContent || + card.querySelector('h2, h3, .name')?.textContent || + link.textContent || + ''; + const address = card.querySelector('[data-testid="dispensary-address"], .address')?.textContent || null; + + if (href && name) { + stores.push({ + name: name.trim(), + href, + address: address?.trim() || null, + }); + } + }); + + return stores; + }); + + console.log(`[DutchieLocationDiscovery] DOM scraping found ${scrapedData.length} stores`); + + locations = scrapedData.map((s) => { + // Parse slug from URL + const match = s.href.match(/\/dispensary\/([^/?]+)/); + const slug = match ? match[1] : s.name.toLowerCase().replace(/\s+/g, '-'); + + return { + platformLocationId: slug, // Will be resolved later + platformSlug: slug, + platformMenuUrl: `https://dutchie.com/dispensary/${slug}`, + name: s.name, + rawAddress: s.address, + addressLine1: null, + addressLine2: null, + city: city.cityName, + stateCode: city.stateCode, + postalCode: null, + countryCode: city.countryCode, + latitude: null, + longitude: null, + timezone: null, + offersDelivery: null, + offersPickup: null, + isRecreational: null, + isMedical: null, + metadata: { source: 'dom_scrape', originalUrl: s.href }, + }; + }); + } + + return locations; + } finally { + await browser.close(); + } +} + +/** + * Parse dispensary data from Dutchie's API/JSON response + */ +function parseDispensaryData(d: any, city: DiscoveryCity): DutchieLocation { + const id = d.id || d._id || d.dispensaryId || ''; + const slug = d.slug || d.cName || d.name?.toLowerCase().replace(/\s+/g, '-') || ''; + + // Build menu URL + let menuUrl = `https://dutchie.com/dispensary/${slug}`; + if (d.menuUrl) { + menuUrl = d.menuUrl; + } else if (d.embeddedMenuUrl) { + menuUrl = d.embeddedMenuUrl; + } + + // Parse address + const address = d.address || d.location?.address || {}; + const rawAddress = [ + address.line1 || address.street1 || d.address1, + address.line2 || address.street2 || d.address2, + [ + address.city || d.city, + address.state || address.stateCode || d.state, + address.zip || address.zipCode || address.postalCode || d.zip, + ] + .filter(Boolean) + .join(' '), + ] + .filter(Boolean) + .join(', '); + + return { + platformLocationId: id, + platformSlug: slug, + platformMenuUrl: menuUrl, + name: d.name || d.dispensaryName || '', + rawAddress: rawAddress || null, + addressLine1: address.line1 || address.street1 || d.address1 || null, + addressLine2: address.line2 || address.street2 || d.address2 || null, + city: address.city || d.city || city.cityName, + stateCode: address.state || address.stateCode || d.state || city.stateCode, + postalCode: address.zip || address.zipCode || address.postalCode || d.zip || null, + countryCode: address.country || address.countryCode || d.country || city.countryCode, + latitude: d.latitude ?? d.location?.latitude ?? d.location?.lat ?? null, + longitude: d.longitude ?? d.location?.longitude ?? d.location?.lng ?? null, + timezone: d.timezone || d.timeZone || null, + offersDelivery: d.offerDelivery ?? d.offersDelivery ?? d.delivery ?? null, + offersPickup: d.offerPickup ?? d.offersPickup ?? d.pickup ?? null, + isRecreational: d.isRecreational ?? d.recreational ?? (d.retailType === 'recreational' || d.retailType === 'both'), + isMedical: d.isMedical ?? d.medical ?? (d.retailType === 'medical' || d.retailType === 'both'), + metadata: { + source: 'next_data', + retailType: d.retailType, + brand: d.brand, + logo: d.logo || d.logoUrl, + raw: d, + }, + }; +} + +/** + * Alternative: Use GraphQL to discover locations + */ +async function fetchLocationsViaGraphQL(city: DiscoveryCity): Promise { + console.log(`[DutchieLocationDiscovery] Trying GraphQL for ${city.cityName}...`); + + // Try geo-based search + // This would require knowing the city's coordinates + // For now, return empty and rely on page scraping + return []; +} + +// ============================================================ +// DATABASE OPERATIONS +// ============================================================ + +/** + * Upsert a location into dutchie_discovery_locations + * Does NOT overwrite status if already verified/merged/rejected + * Does NOT overwrite dispensary_id if already set + */ +async function upsertLocation( + pool: Pool, + location: DutchieLocation, + cityId: number +): Promise<{ inserted: boolean; updated: boolean; skipped: boolean }> { + // First check if this location exists and has a protected status + const existing = await pool.query( + ` + SELECT id, status, dispensary_id + FROM dutchie_discovery_locations + WHERE platform = 'dutchie' AND platform_location_id = $1 + `, + [location.platformLocationId] + ); + + if (existing.rows.length > 0) { + const row = existing.rows[0]; + const protectedStatuses = ['verified', 'merged', 'rejected']; + + if (protectedStatuses.includes(row.status)) { + // Only update last_seen_at for protected statuses + await pool.query( + ` + UPDATE dutchie_discovery_locations + SET last_seen_at = NOW(), updated_at = NOW() + WHERE id = $1 + `, + [row.id] + ); + return { inserted: false, updated: false, skipped: true }; + } + + // Update existing discovered location (but preserve dispensary_id if set) + await pool.query( + ` + UPDATE dutchie_discovery_locations + SET + platform_slug = $2, + platform_menu_url = $3, + name = $4, + raw_address = COALESCE($5, raw_address), + address_line1 = COALESCE($6, address_line1), + address_line2 = COALESCE($7, address_line2), + city = COALESCE($8, city), + state_code = COALESCE($9, state_code), + postal_code = COALESCE($10, postal_code), + country_code = COALESCE($11, country_code), + latitude = COALESCE($12, latitude), + longitude = COALESCE($13, longitude), + timezone = COALESCE($14, timezone), + offers_delivery = COALESCE($15, offers_delivery), + offers_pickup = COALESCE($16, offers_pickup), + is_recreational = COALESCE($17, is_recreational), + is_medical = COALESCE($18, is_medical), + metadata = COALESCE($19, metadata), + discovery_city_id = $20, + last_seen_at = NOW(), + updated_at = NOW() + WHERE id = $1 + `, + [ + row.id, + location.platformSlug, + location.platformMenuUrl, + location.name, + location.rawAddress, + location.addressLine1, + location.addressLine2, + location.city, + location.stateCode, + location.postalCode, + location.countryCode, + location.latitude, + location.longitude, + location.timezone, + location.offersDelivery, + location.offersPickup, + location.isRecreational, + location.isMedical, + JSON.stringify(location.metadata), + cityId, + ] + ); + return { inserted: false, updated: true, skipped: false }; + } + + // Insert new location + await pool.query( + ` + INSERT INTO dutchie_discovery_locations ( + platform, + platform_location_id, + platform_slug, + platform_menu_url, + name, + raw_address, + address_line1, + address_line2, + city, + state_code, + postal_code, + country_code, + latitude, + longitude, + timezone, + status, + offers_delivery, + offers_pickup, + is_recreational, + is_medical, + metadata, + discovery_city_id, + first_seen_at, + last_seen_at, + active, + created_at, + updated_at + ) VALUES ( + 'dutchie', + $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, + 'discovered', + $15, $16, $17, $18, $19, $20, + NOW(), NOW(), TRUE, NOW(), NOW() + ) + `, + [ + location.platformLocationId, + location.platformSlug, + location.platformMenuUrl, + location.name, + location.rawAddress, + location.addressLine1, + location.addressLine2, + location.city, + location.stateCode, + location.postalCode, + location.countryCode, + location.latitude, + location.longitude, + location.timezone, + location.offersDelivery, + location.offersPickup, + location.isRecreational, + location.isMedical, + JSON.stringify(location.metadata), + cityId, + ] + ); + + return { inserted: true, updated: false, skipped: false }; +} + +// ============================================================ +// MAIN DISCOVERY CLASS +// ============================================================ + +export class DutchieLocationDiscovery { + private pool: Pool; + + constructor(pool: Pool) { + this.pool = pool; + } + + /** + * Get a city by slug + */ + async getCityBySlug(citySlug: string): Promise { + const { rows } = await this.pool.query( + ` + SELECT id, platform, city_name, city_slug, state_code, country_code, crawl_enabled + FROM dutchie_discovery_cities + WHERE platform = 'dutchie' AND city_slug = $1 + LIMIT 1 + `, + [citySlug] + ); + + if (rows.length === 0) return null; + + const r = rows[0]; + return { + id: r.id, + platform: r.platform, + cityName: r.city_name, + citySlug: r.city_slug, + stateCode: r.state_code, + countryCode: r.country_code, + crawlEnabled: r.crawl_enabled, + }; + } + + /** + * Get all crawl-enabled cities + */ + async getEnabledCities(limit?: number): Promise { + const { rows } = await this.pool.query( + ` + SELECT id, platform, city_name, city_slug, state_code, country_code, crawl_enabled + FROM dutchie_discovery_cities + WHERE platform = 'dutchie' AND crawl_enabled = TRUE + ORDER BY last_crawled_at ASC NULLS FIRST, city_name ASC + ${limit ? `LIMIT ${limit}` : ''} + ` + ); + + return rows.map((r) => ({ + id: r.id, + platform: r.platform, + cityName: r.city_name, + citySlug: r.city_slug, + stateCode: r.state_code, + countryCode: r.country_code, + crawlEnabled: r.crawl_enabled, + })); + } + + /** + * Discover locations for a single city + */ + async discoverForCity(city: DiscoveryCity): Promise { + const startTime = Date.now(); + const errors: string[] = []; + let locationsFound = 0; + let locationsInserted = 0; + let locationsUpdated = 0; + let locationsSkipped = 0; + + console.log(`[DutchieLocationDiscovery] Discovering locations for ${city.cityName}, ${city.stateCode}...`); + + try { + // Fetch locations + let locations = await fetchLocationsForCity(city); + + // If scraping fails, try GraphQL + if (locations.length === 0) { + locations = await fetchLocationsViaGraphQL(city); + } + + locationsFound = locations.length; + console.log(`[DutchieLocationDiscovery] Found ${locationsFound} locations`); + + // Upsert each location + for (const location of locations) { + try { + const result = await upsertLocation(this.pool, location, city.id); + if (result.inserted) locationsInserted++; + else if (result.updated) locationsUpdated++; + else if (result.skipped) locationsSkipped++; + } catch (error: any) { + const msg = `Failed to upsert location ${location.platformSlug}: ${error.message}`; + console.error(`[DutchieLocationDiscovery] ${msg}`); + errors.push(msg); + } + } + + // Update city's last_crawled_at and location_count + await this.pool.query( + ` + UPDATE dutchie_discovery_cities + SET last_crawled_at = NOW(), + location_count = $1, + updated_at = NOW() + WHERE id = $2 + `, + [locationsFound, city.id] + ); + } catch (error: any) { + const msg = `Location discovery failed for ${city.citySlug}: ${error.message}`; + console.error(`[DutchieLocationDiscovery] ${msg}`); + errors.push(msg); + } + + const durationMs = Date.now() - startTime; + + console.log(`[DutchieLocationDiscovery] City ${city.citySlug} complete:`); + console.log(` Locations found: ${locationsFound}`); + console.log(` Inserted: ${locationsInserted}`); + console.log(` Updated: ${locationsUpdated}`); + console.log(` Skipped (protected): ${locationsSkipped}`); + console.log(` Errors: ${errors.length}`); + console.log(` Duration: ${(durationMs / 1000).toFixed(1)}s`); + + return { + cityId: city.id, + citySlug: city.citySlug, + locationsFound, + locationsInserted, + locationsUpdated, + locationsSkipped, + errors, + durationMs, + }; + } + + /** + * Discover locations for all enabled cities + */ + async discoverAllEnabled(options: { + limit?: number; + delayMs?: number; + } = {}): Promise<{ + totalCities: number; + totalLocationsFound: number; + totalInserted: number; + totalUpdated: number; + totalSkipped: number; + errors: string[]; + durationMs: number; + }> { + const { limit, delayMs = 2000 } = options; + const startTime = Date.now(); + let totalLocationsFound = 0; + let totalInserted = 0; + let totalUpdated = 0; + let totalSkipped = 0; + const allErrors: string[] = []; + + const cities = await this.getEnabledCities(limit); + console.log(`[DutchieLocationDiscovery] Discovering locations for ${cities.length} cities...`); + + for (let i = 0; i < cities.length; i++) { + const city = cities[i]; + console.log(`\n[DutchieLocationDiscovery] City ${i + 1}/${cities.length}: ${city.cityName}, ${city.stateCode}`); + + try { + const result = await this.discoverForCity(city); + totalLocationsFound += result.locationsFound; + totalInserted += result.locationsInserted; + totalUpdated += result.locationsUpdated; + totalSkipped += result.locationsSkipped; + allErrors.push(...result.errors); + } catch (error: any) { + allErrors.push(`City ${city.citySlug} failed: ${error.message}`); + } + + // Delay between cities + if (i < cities.length - 1 && delayMs > 0) { + await new Promise((r) => setTimeout(r, delayMs)); + } + } + + const durationMs = Date.now() - startTime; + + console.log('\n[DutchieLocationDiscovery] All cities complete:'); + console.log(` Total cities: ${cities.length}`); + console.log(` Total locations found: ${totalLocationsFound}`); + console.log(` Total inserted: ${totalInserted}`); + console.log(` Total updated: ${totalUpdated}`); + console.log(` Total skipped: ${totalSkipped}`); + console.log(` Total errors: ${allErrors.length}`); + console.log(` Duration: ${(durationMs / 1000).toFixed(1)}s`); + + return { + totalCities: cities.length, + totalLocationsFound, + totalInserted, + totalUpdated, + totalSkipped, + errors: allErrors, + durationMs, + }; + } +} + +export default DutchieLocationDiscovery; diff --git a/backend/src/dutchie-az/discovery/discovery-dt-cities-auto.ts b/backend/src/dutchie-az/discovery/discovery-dt-cities-auto.ts new file mode 100644 index 00000000..7f0b9e48 --- /dev/null +++ b/backend/src/dutchie-az/discovery/discovery-dt-cities-auto.ts @@ -0,0 +1,73 @@ +#!/usr/bin/env npx tsx +/** + * Discovery Entrypoint: Dutchie Cities (Auto) + * + * Attempts browser/API-based /cities discovery. + * Even if currently blocked (403), this runner preserves the auto-discovery path. + * + * Usage: + * npm run discovery:dt:cities:auto + * DATABASE_URL="..." npx tsx src/dutchie-az/discovery/discovery-dt-cities-auto.ts + */ + +import { Pool } from 'pg'; +import { DtCityDiscoveryService } from './DtCityDiscoveryService'; + +const DB_URL = process.env.DATABASE_URL || process.env.CANNAIQ_DB_URL || + 'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus'; + +async function main() { + console.log('╔══════════════════════════════════════════════════╗'); + console.log('║ Dutchie City Discovery (AUTO) ║'); + console.log('║ Browser + API fallback ║'); + console.log('╚══════════════════════════════════════════════════╝'); + console.log(`\nDatabase: ${DB_URL.replace(/:[^:@]+@/, ':****@')}`); + + const pool = new Pool({ connectionString: DB_URL }); + + try { + const { rows } = await pool.query('SELECT NOW() as time'); + console.log(`Connected at: ${rows[0].time}\n`); + + const service = new DtCityDiscoveryService(pool); + const result = await service.runAutoDiscovery(); + + console.log('\n' + '═'.repeat(50)); + console.log('SUMMARY'); + console.log('═'.repeat(50)); + console.log(`Cities found: ${result.citiesFound}`); + console.log(`Cities inserted: ${result.citiesInserted}`); + console.log(`Cities updated: ${result.citiesUpdated}`); + console.log(`Errors: ${result.errors.length}`); + console.log(`Duration: ${(result.durationMs / 1000).toFixed(1)}s`); + + if (result.errors.length > 0) { + console.log('\nErrors:'); + result.errors.forEach((e, i) => console.log(` ${i + 1}. ${e}`)); + } + + const stats = await service.getStats(); + console.log('\nCurrent Database Stats:'); + console.log(` Total cities: ${stats.total}`); + console.log(` Crawl enabled: ${stats.crawlEnabled}`); + console.log(` Never crawled: ${stats.neverCrawled}`); + + if (result.citiesFound === 0) { + console.log('\n⚠️ No cities found via auto-discovery.'); + console.log(' This may be due to Dutchie blocking scraping/API access.'); + console.log(' Use manual seeding instead:'); + console.log(' npm run discovery:dt:cities:manual -- --city-slug=ny-hudson --city-name=Hudson --state-code=NY'); + process.exit(1); + } + + console.log('\n✅ Auto city discovery completed'); + process.exit(0); + } catch (error: any) { + console.error('\n❌ Auto city discovery failed:', error.message); + process.exit(1); + } finally { + await pool.end(); + } +} + +main(); diff --git a/backend/src/dutchie-az/discovery/discovery-dt-cities-manual-seed.ts b/backend/src/dutchie-az/discovery/discovery-dt-cities-manual-seed.ts new file mode 100644 index 00000000..b9c422f6 --- /dev/null +++ b/backend/src/dutchie-az/discovery/discovery-dt-cities-manual-seed.ts @@ -0,0 +1,137 @@ +#!/usr/bin/env npx tsx +/** + * Discovery Entrypoint: Dutchie Cities (Manual Seed) + * + * Manually seeds cities into dutchie_discovery_cities via CLI args. + * Use this when auto-discovery is blocked (403). + * + * Usage: + * npm run discovery:dt:cities:manual -- --city-slug=ny-hudson --city-name=Hudson --state-code=NY + * npm run discovery:dt:cities:manual -- --city-slug=ma-boston --city-name=Boston --state-code=MA --country-code=US + * + * Options: + * --city-slug Required. URL slug (e.g., "ny-hudson") + * --city-name Required. Display name (e.g., "Hudson") + * --state-code Required. State/province code (e.g., "NY", "CA", "ON") + * --country-code Optional. Country code (default: "US") + * + * After seeding, run location discovery: + * npm run discovery:dt:locations + */ + +import { Pool } from 'pg'; +import { DtCityDiscoveryService, DutchieCity } from './DtCityDiscoveryService'; + +const DB_URL = process.env.DATABASE_URL || process.env.CANNAIQ_DB_URL || + 'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus'; + +interface Args { + citySlug?: string; + cityName?: string; + stateCode?: string; + countryCode: string; +} + +function parseArgs(): Args { + const args: Args = { countryCode: 'US' }; + + for (const arg of process.argv.slice(2)) { + const citySlugMatch = arg.match(/--city-slug=(.+)/); + if (citySlugMatch) args.citySlug = citySlugMatch[1]; + + const cityNameMatch = arg.match(/--city-name=(.+)/); + if (cityNameMatch) args.cityName = cityNameMatch[1]; + + const stateCodeMatch = arg.match(/--state-code=(.+)/); + if (stateCodeMatch) args.stateCode = stateCodeMatch[1].toUpperCase(); + + const countryCodeMatch = arg.match(/--country-code=(.+)/); + if (countryCodeMatch) args.countryCode = countryCodeMatch[1].toUpperCase(); + } + + return args; +} + +function printUsage() { + console.log(` +Usage: + npm run discovery:dt:cities:manual -- --city-slug= --city-name= --state-code= + +Required arguments: + --city-slug URL slug for the city (e.g., "ny-hudson", "ma-boston") + --city-name Display name (e.g., "Hudson", "Boston") + --state-code State/province code (e.g., "NY", "CA", "ON") + +Optional arguments: + --country-code Country code (default: "US") + +Examples: + npm run discovery:dt:cities:manual -- --city-slug=ny-hudson --city-name=Hudson --state-code=NY + npm run discovery:dt:cities:manual -- --city-slug=ca-los-angeles --city-name="Los Angeles" --state-code=CA + npm run discovery:dt:cities:manual -- --city-slug=on-toronto --city-name=Toronto --state-code=ON --country-code=CA + +After seeding, run location discovery: + npm run discovery:dt:locations +`); +} + +async function main() { + const args = parseArgs(); + + console.log('╔══════════════════════════════════════════════════╗'); + console.log('║ Dutchie City Discovery (MANUAL SEED) ║'); + console.log('╚══════════════════════════════════════════════════╝'); + + if (!args.citySlug || !args.cityName || !args.stateCode) { + console.error('\n❌ Error: Missing required arguments\n'); + printUsage(); + process.exit(1); + } + + console.log(`\nCity Slug: ${args.citySlug}`); + console.log(`City Name: ${args.cityName}`); + console.log(`State Code: ${args.stateCode}`); + console.log(`Country Code: ${args.countryCode}`); + console.log(`Database: ${DB_URL.replace(/:[^:@]+@/, ':****@')}`); + + const pool = new Pool({ connectionString: DB_URL }); + + try { + const { rows } = await pool.query('SELECT NOW() as time'); + console.log(`\nConnected at: ${rows[0].time}`); + + const service = new DtCityDiscoveryService(pool); + + const city: DutchieCity = { + slug: args.citySlug, + name: args.cityName, + stateCode: args.stateCode, + countryCode: args.countryCode, + }; + + const result = await service.seedCity(city); + + const action = result.wasInserted ? 'INSERTED' : 'UPDATED'; + console.log(`\n✅ City ${action}:`); + console.log(` ID: ${result.id}`); + console.log(` City Slug: ${result.city.slug}`); + console.log(` City Name: ${result.city.name}`); + console.log(` State Code: ${result.city.stateCode}`); + console.log(` Country Code: ${result.city.countryCode}`); + + const stats = await service.getStats(); + console.log(`\nTotal Dutchie cities: ${stats.total} (${stats.crawlEnabled} enabled)`); + + console.log('\n📍 Next step: Run location discovery'); + console.log(' npm run discovery:dt:locations'); + + process.exit(0); + } catch (error: any) { + console.error('\n❌ Failed to seed city:', error.message); + process.exit(1); + } finally { + await pool.end(); + } +} + +main(); diff --git a/backend/src/dutchie-az/discovery/discovery-dt-cities.ts b/backend/src/dutchie-az/discovery/discovery-dt-cities.ts new file mode 100644 index 00000000..3c875274 --- /dev/null +++ b/backend/src/dutchie-az/discovery/discovery-dt-cities.ts @@ -0,0 +1,73 @@ +#!/usr/bin/env npx tsx +/** + * Discovery Runner: Dutchie Cities + * + * Discovers cities from Dutchie's /cities page and upserts to dutchie_discovery_cities. + * + * Usage: + * npm run discovery:platforms:dt:cities + * DATABASE_URL="..." npx tsx src/dutchie-az/discovery/discovery-dt-cities.ts + */ + +import { Pool } from 'pg'; +import { DutchieCityDiscovery } from './DutchieCityDiscovery'; + +const DB_URL = process.env.DATABASE_URL || process.env.CANNAIQ_DB_URL || + 'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus'; + +async function main() { + console.log('╔══════════════════════════════════════════════════╗'); + console.log('║ Dutchie City Discovery Runner ║'); + console.log('╚══════════════════════════════════════════════════╝'); + console.log(`\nDatabase: ${DB_URL.replace(/:[^:@]+@/, ':****@')}`); + + const pool = new Pool({ connectionString: DB_URL }); + + try { + // Test DB connection + const { rows } = await pool.query('SELECT NOW() as time'); + console.log(`Connected at: ${rows[0].time}\n`); + + // Run city discovery + const discovery = new DutchieCityDiscovery(pool); + const result = await discovery.run(); + + // Print summary + console.log('\n' + '═'.repeat(50)); + console.log('SUMMARY'); + console.log('═'.repeat(50)); + console.log(`Cities found: ${result.citiesFound}`); + console.log(`Cities inserted: ${result.citiesInserted}`); + console.log(`Cities updated: ${result.citiesUpdated}`); + console.log(`Errors: ${result.errors.length}`); + console.log(`Duration: ${(result.durationMs / 1000).toFixed(1)}s`); + + if (result.errors.length > 0) { + console.log('\nErrors:'); + result.errors.forEach((e, i) => console.log(` ${i + 1}. ${e}`)); + } + + // Get final stats + const stats = await discovery.getStats(); + console.log('\nCurrent Database Stats:'); + console.log(` Total cities: ${stats.total}`); + console.log(` Crawl enabled: ${stats.crawlEnabled}`); + console.log(` Never crawled: ${stats.neverCrawled}`); + console.log(` By country: ${stats.byCountry.map(c => `${c.countryCode}=${c.count}`).join(', ')}`); + + if (result.errors.length > 0) { + console.log('\n⚠️ Completed with errors'); + process.exit(1); + } + + console.log('\n✅ City discovery completed successfully'); + process.exit(0); + } catch (error: any) { + console.error('\n❌ City discovery failed:', error.message); + process.exit(1); + } finally { + await pool.end(); + } +} + +main(); diff --git a/backend/src/dutchie-az/discovery/discovery-dt-locations-from-cities.ts b/backend/src/dutchie-az/discovery/discovery-dt-locations-from-cities.ts new file mode 100644 index 00000000..61d122d7 --- /dev/null +++ b/backend/src/dutchie-az/discovery/discovery-dt-locations-from-cities.ts @@ -0,0 +1,113 @@ +#!/usr/bin/env npx tsx +/** + * Discovery Entrypoint: Dutchie Locations (From Cities) + * + * Reads from dutchie_discovery_cities (crawl_enabled = true) + * and discovers store locations for each city. + * + * Geo coordinates are captured when available from Dutchie's payloads. + * + * Usage: + * npm run discovery:dt:locations + * npm run discovery:dt:locations -- --limit=10 + * npm run discovery:dt:locations -- --delay=3000 + * DATABASE_URL="..." npx tsx src/dutchie-az/discovery/discovery-dt-locations-from-cities.ts + * + * Options: + * --limit=N Only process N cities (default: all) + * --delay=N Delay between cities in ms (default: 2000) + */ + +import { Pool } from 'pg'; +import { DtLocationDiscoveryService } from './DtLocationDiscoveryService'; + +const DB_URL = process.env.DATABASE_URL || process.env.CANNAIQ_DB_URL || + 'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus'; + +function parseArgs(): { limit?: number; delay?: number } { + const args: { limit?: number; delay?: number } = {}; + + for (const arg of process.argv.slice(2)) { + const limitMatch = arg.match(/--limit=(\d+)/); + if (limitMatch) args.limit = parseInt(limitMatch[1], 10); + + const delayMatch = arg.match(/--delay=(\d+)/); + if (delayMatch) args.delay = parseInt(delayMatch[1], 10); + } + + return args; +} + +async function main() { + const args = parseArgs(); + + console.log('╔══════════════════════════════════════════════════╗'); + console.log('║ Dutchie Location Discovery (From Cities) ║'); + console.log('║ Reads crawl_enabled cities, discovers stores ║'); + console.log('╚══════════════════════════════════════════════════╝'); + console.log(`\nDatabase: ${DB_URL.replace(/:[^:@]+@/, ':****@')}`); + if (args.limit) console.log(`City limit: ${args.limit}`); + if (args.delay) console.log(`Delay: ${args.delay}ms`); + + const pool = new Pool({ connectionString: DB_URL }); + + try { + const { rows } = await pool.query('SELECT NOW() as time'); + console.log(`Connected at: ${rows[0].time}\n`); + + const service = new DtLocationDiscoveryService(pool); + const result = await service.discoverAllEnabled({ + limit: args.limit, + delayMs: args.delay ?? 2000, + }); + + console.log('\n' + '═'.repeat(50)); + console.log('SUMMARY'); + console.log('═'.repeat(50)); + console.log(`Cities processed: ${result.totalCities}`); + console.log(`Locations found: ${result.totalLocationsFound}`); + console.log(`Locations inserted: ${result.totalInserted}`); + console.log(`Locations updated: ${result.totalUpdated}`); + console.log(`Locations skipped: ${result.totalSkipped} (protected status)`); + console.log(`Errors: ${result.errors.length}`); + console.log(`Duration: ${(result.durationMs / 1000).toFixed(1)}s`); + + if (result.errors.length > 0) { + console.log('\nErrors (first 10):'); + result.errors.slice(0, 10).forEach((e, i) => console.log(` ${i + 1}. ${e}`)); + if (result.errors.length > 10) { + console.log(` ... and ${result.errors.length - 10} more`); + } + } + + // Get location stats including coordinates + const stats = await service.getStats(); + console.log('\nCurrent Database Stats:'); + console.log(` Total locations: ${stats.total}`); + console.log(` With coordinates: ${stats.withCoordinates}`); + console.log(` By status:`); + stats.byStatus.forEach(s => console.log(` ${s.status}: ${s.count}`)); + + if (result.totalCities === 0) { + console.log('\n⚠️ No crawl-enabled cities found.'); + console.log(' Seed cities first:'); + console.log(' npm run discovery:dt:cities:manual -- --city-slug=ny-hudson --city-name=Hudson --state-code=NY'); + process.exit(1); + } + + if (result.errors.length > 0) { + console.log('\n⚠️ Completed with errors'); + process.exit(1); + } + + console.log('\n✅ Location discovery completed successfully'); + process.exit(0); + } catch (error: any) { + console.error('\n❌ Location discovery failed:', error.message); + process.exit(1); + } finally { + await pool.end(); + } +} + +main(); diff --git a/backend/src/dutchie-az/discovery/discovery-dt-locations.ts b/backend/src/dutchie-az/discovery/discovery-dt-locations.ts new file mode 100644 index 00000000..cb7af618 --- /dev/null +++ b/backend/src/dutchie-az/discovery/discovery-dt-locations.ts @@ -0,0 +1,117 @@ +#!/usr/bin/env npx tsx +/** + * Discovery Runner: Dutchie Locations + * + * Discovers store locations for all crawl-enabled cities and upserts to dutchie_discovery_locations. + * + * Usage: + * npm run discovery:platforms:dt:locations + * npm run discovery:platforms:dt:locations -- --limit=10 + * DATABASE_URL="..." npx tsx src/dutchie-az/discovery/discovery-dt-locations.ts + * + * Options (via args): + * --limit=N Only process N cities (default: all) + * --delay=N Delay between cities in ms (default: 2000) + */ + +import { Pool } from 'pg'; +import { DutchieLocationDiscovery } from './DutchieLocationDiscovery'; + +const DB_URL = process.env.DATABASE_URL || process.env.CANNAIQ_DB_URL || + 'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus'; + +// Parse CLI args +function parseArgs(): { limit?: number; delay?: number } { + const args: { limit?: number; delay?: number } = {}; + + for (const arg of process.argv.slice(2)) { + const limitMatch = arg.match(/--limit=(\d+)/); + if (limitMatch) args.limit = parseInt(limitMatch[1], 10); + + const delayMatch = arg.match(/--delay=(\d+)/); + if (delayMatch) args.delay = parseInt(delayMatch[1], 10); + } + + return args; +} + +async function main() { + const args = parseArgs(); + + console.log('╔══════════════════════════════════════════════════╗'); + console.log('║ Dutchie Location Discovery Runner ║'); + console.log('╚══════════════════════════════════════════════════╝'); + console.log(`\nDatabase: ${DB_URL.replace(/:[^:@]+@/, ':****@')}`); + if (args.limit) console.log(`City limit: ${args.limit}`); + if (args.delay) console.log(`Delay: ${args.delay}ms`); + + const pool = new Pool({ connectionString: DB_URL }); + + try { + // Test DB connection + const { rows } = await pool.query('SELECT NOW() as time'); + console.log(`Connected at: ${rows[0].time}\n`); + + // Run location discovery + const discovery = new DutchieLocationDiscovery(pool); + const result = await discovery.discoverAllEnabled({ + limit: args.limit, + delayMs: args.delay ?? 2000, + }); + + // Print summary + console.log('\n' + '═'.repeat(50)); + console.log('SUMMARY'); + console.log('═'.repeat(50)); + console.log(`Cities processed: ${result.totalCities}`); + console.log(`Locations found: ${result.totalLocationsFound}`); + console.log(`Locations inserted: ${result.totalInserted}`); + console.log(`Locations updated: ${result.totalUpdated}`); + console.log(`Locations skipped: ${result.totalSkipped} (protected status)`); + console.log(`Errors: ${result.errors.length}`); + console.log(`Duration: ${(result.durationMs / 1000).toFixed(1)}s`); + + if (result.errors.length > 0) { + console.log('\nErrors (first 10):'); + result.errors.slice(0, 10).forEach((e, i) => console.log(` ${i + 1}. ${e}`)); + if (result.errors.length > 10) { + console.log(` ... and ${result.errors.length - 10} more`); + } + } + + // Get DB counts + const { rows: countRows } = await pool.query(` + SELECT + COUNT(*) as total, + COUNT(*) FILTER (WHERE status = 'discovered') as discovered, + COUNT(*) FILTER (WHERE status = 'verified') as verified, + COUNT(*) FILTER (WHERE status = 'merged') as merged, + COUNT(*) FILTER (WHERE status = 'rejected') as rejected + FROM dutchie_discovery_locations + WHERE platform = 'dutchie' AND active = TRUE + `); + + const counts = countRows[0]; + console.log('\nCurrent Database Stats:'); + console.log(` Total locations: ${counts.total}`); + console.log(` Status discovered: ${counts.discovered}`); + console.log(` Status verified: ${counts.verified}`); + console.log(` Status merged: ${counts.merged}`); + console.log(` Status rejected: ${counts.rejected}`); + + if (result.errors.length > 0) { + console.log('\n⚠️ Completed with errors'); + process.exit(1); + } + + console.log('\n✅ Location discovery completed successfully'); + process.exit(0); + } catch (error: any) { + console.error('\n❌ Location discovery failed:', error.message); + process.exit(1); + } finally { + await pool.end(); + } +} + +main(); diff --git a/backend/src/dutchie-az/discovery/index.ts b/backend/src/dutchie-az/discovery/index.ts new file mode 100644 index 00000000..5b10d0b2 --- /dev/null +++ b/backend/src/dutchie-az/discovery/index.ts @@ -0,0 +1,10 @@ +/** + * Dutchie Discovery Module + * + * Store discovery pipeline for Dutchie platform. + */ + +export { DutchieCityDiscovery } from './DutchieCityDiscovery'; +export { DutchieLocationDiscovery } from './DutchieLocationDiscovery'; +export { createDutchieDiscoveryRoutes } from './routes'; +export { promoteDiscoveryLocation } from './promoteDiscoveryLocation'; diff --git a/backend/src/dutchie-az/discovery/promoteDiscoveryLocation.ts b/backend/src/dutchie-az/discovery/promoteDiscoveryLocation.ts new file mode 100644 index 00000000..3311f8e2 --- /dev/null +++ b/backend/src/dutchie-az/discovery/promoteDiscoveryLocation.ts @@ -0,0 +1,248 @@ +/** + * Promote Discovery Location to Crawlable Dispensary + * + * When a discovery location is verified or merged: + * 1. Ensure a crawl profile exists for the dispensary + * 2. Seed/update crawl schedule + * 3. Create initial crawl job + */ + +import { Pool } from 'pg'; + +export interface PromotionResult { + success: boolean; + discoveryId: number; + dispensaryId: number; + crawlProfileId?: number; + scheduleUpdated?: boolean; + crawlJobCreated?: boolean; + error?: string; +} + +/** + * Promote a verified/merged discovery location to a crawlable dispensary. + * + * This function: + * 1. Verifies the discovery location is verified/merged and has a dispensary_id + * 2. Ensures the dispensary has platform info (menu_type, platform_dispensary_id) + * 3. Creates/updates a crawler profile if the profile table exists + * 4. Queues an initial crawl job + */ +export async function promoteDiscoveryLocation( + pool: Pool, + discoveryLocationId: number +): Promise { + console.log(`[Promote] Starting promotion for discovery location ${discoveryLocationId}...`); + + // Get the discovery location + const { rows: locRows } = await pool.query( + ` + SELECT + dl.*, + d.id as disp_id, + d.name as disp_name, + d.menu_type as disp_menu_type, + d.platform_dispensary_id as disp_platform_id + FROM dutchie_discovery_locations dl + JOIN dispensaries d ON dl.dispensary_id = d.id + WHERE dl.id = $1 + `, + [discoveryLocationId] + ); + + if (locRows.length === 0) { + return { + success: false, + discoveryId: discoveryLocationId, + dispensaryId: 0, + error: 'Discovery location not found or not linked to a dispensary', + }; + } + + const location = locRows[0]; + + // Verify status + if (!['verified', 'merged'].includes(location.status)) { + return { + success: false, + discoveryId: discoveryLocationId, + dispensaryId: location.dispensary_id || 0, + error: `Cannot promote: location status is '${location.status}', must be 'verified' or 'merged'`, + }; + } + + const dispensaryId = location.dispensary_id; + console.log(`[Promote] Location ${discoveryLocationId} -> Dispensary ${dispensaryId} (${location.disp_name})`); + + // Ensure dispensary has platform info + if (!location.disp_platform_id) { + console.log(`[Promote] Updating dispensary with platform info...`); + await pool.query( + ` + UPDATE dispensaries + SET platform_dispensary_id = COALESCE(platform_dispensary_id, $1), + menu_url = COALESCE(menu_url, $2), + menu_type = COALESCE(menu_type, 'dutchie'), + updated_at = NOW() + WHERE id = $3 + `, + [location.platform_location_id, location.platform_menu_url, dispensaryId] + ); + } + + let crawlProfileId: number | undefined; + let scheduleUpdated = false; + let crawlJobCreated = false; + + // Check if dispensary_crawler_profiles table exists + const { rows: tableCheck } = await pool.query(` + SELECT EXISTS ( + SELECT FROM information_schema.tables + WHERE table_name = 'dispensary_crawler_profiles' + ) as exists + `); + + if (tableCheck[0]?.exists) { + // Create or get crawler profile + console.log(`[Promote] Checking crawler profile...`); + + const { rows: profileRows } = await pool.query( + ` + SELECT id FROM dispensary_crawler_profiles + WHERE dispensary_id = $1 AND platform = 'dutchie' + `, + [dispensaryId] + ); + + if (profileRows.length > 0) { + crawlProfileId = profileRows[0].id; + console.log(`[Promote] Using existing profile ${crawlProfileId}`); + } else { + // Create new profile + const profileKey = `dutchie-${location.platform_slug}`; + const { rows: newProfile } = await pool.query( + ` + INSERT INTO dispensary_crawler_profiles ( + dispensary_id, + profile_key, + profile_name, + platform, + config, + status, + enabled, + created_at, + updated_at + ) VALUES ( + $1, $2, $3, 'dutchie', $4, 'sandbox', TRUE, NOW(), NOW() + ) + ON CONFLICT (dispensary_id, platform) DO UPDATE SET + enabled = TRUE, + updated_at = NOW() + RETURNING id + `, + [ + dispensaryId, + profileKey, + `${location.name} (Dutchie)`, + JSON.stringify({ + platformDispensaryId: location.platform_location_id, + platformSlug: location.platform_slug, + menuUrl: location.platform_menu_url, + pricingType: 'rec', + useBothModes: true, + }), + ] + ); + + crawlProfileId = newProfile[0]?.id; + console.log(`[Promote] Created new profile ${crawlProfileId}`); + } + + // Link profile to dispensary if not already linked + await pool.query( + ` + UPDATE dispensaries + SET active_crawler_profile_id = COALESCE(active_crawler_profile_id, $1), + updated_at = NOW() + WHERE id = $2 + `, + [crawlProfileId, dispensaryId] + ); + } + + // Check if crawl_jobs table exists and create initial job + const { rows: jobsTableCheck } = await pool.query(` + SELECT EXISTS ( + SELECT FROM information_schema.tables + WHERE table_name = 'crawl_jobs' + ) as exists + `); + + if (jobsTableCheck[0]?.exists) { + // Check if there's already a pending job + const { rows: existingJobs } = await pool.query( + ` + SELECT id FROM crawl_jobs + WHERE dispensary_id = $1 AND status IN ('pending', 'running') + LIMIT 1 + `, + [dispensaryId] + ); + + if (existingJobs.length === 0) { + // Create initial crawl job + console.log(`[Promote] Creating initial crawl job...`); + await pool.query( + ` + INSERT INTO crawl_jobs ( + dispensary_id, + job_type, + status, + priority, + config, + created_at, + updated_at + ) VALUES ( + $1, 'dutchie_product_crawl', 'pending', 1, $2, NOW(), NOW() + ) + `, + [ + dispensaryId, + JSON.stringify({ + source: 'discovery_promotion', + discoveryLocationId, + pricingType: 'rec', + useBothModes: true, + }), + ] + ); + crawlJobCreated = true; + } else { + console.log(`[Promote] Crawl job already exists for dispensary`); + } + } + + // Update discovery location notes + await pool.query( + ` + UPDATE dutchie_discovery_locations + SET notes = COALESCE(notes || E'\n', '') || $1, + updated_at = NOW() + WHERE id = $2 + `, + [`Promoted to crawlable at ${new Date().toISOString()}`, discoveryLocationId] + ); + + console.log(`[Promote] Promotion complete for discovery location ${discoveryLocationId}`); + + return { + success: true, + discoveryId: discoveryLocationId, + dispensaryId, + crawlProfileId, + scheduleUpdated, + crawlJobCreated, + }; +} + +export default promoteDiscoveryLocation; diff --git a/backend/src/dutchie-az/discovery/routes.ts b/backend/src/dutchie-az/discovery/routes.ts new file mode 100644 index 00000000..34b6b276 --- /dev/null +++ b/backend/src/dutchie-az/discovery/routes.ts @@ -0,0 +1,973 @@ +/** + * Platform Discovery API Routes (DT = Dutchie) + * + * Routes for the platform-specific store discovery pipeline. + * Mount at /api/discovery/platforms/dt + * + * Platform Slug Mapping (for trademark-safe URLs): + * dt = Dutchie + * jn = Jane (future) + * wm = Weedmaps (future) + * lf = Leafly (future) + * tz = Treez (future) + * + * Note: The actual platform value stored in the DB remains 'dutchie'. + * Only the URL paths use neutral slugs. + */ + +import { Router, Request, Response } from 'express'; +import { Pool } from 'pg'; +import { DutchieCityDiscovery } from './DutchieCityDiscovery'; +import { DutchieLocationDiscovery } from './DutchieLocationDiscovery'; +import { DiscoveryGeoService } from '../../services/DiscoveryGeoService'; +import { GeoValidationService } from '../../services/GeoValidationService'; + +export function createDutchieDiscoveryRoutes(pool: Pool): Router { + const router = Router(); + + // ============================================================ + // LOCATIONS + // ============================================================ + + /** + * GET /api/discovery/platforms/dt/locations + * + * List discovered locations with filtering. + * + * Query params: + * - status: 'discovered' | 'verified' | 'rejected' | 'merged' + * - state_code: e.g., 'AZ', 'CA' + * - country_code: 'US' | 'CA' + * - unlinked_only: 'true' to show only locations without dispensary_id + * - search: search by name + * - limit: number (default 50) + * - offset: number (default 0) + */ + router.get('/locations', async (req: Request, res: Response) => { + try { + const { + status, + state_code, + country_code, + unlinked_only, + search, + limit = '50', + offset = '0', + } = req.query; + + let whereClause = "WHERE platform = 'dutchie' AND active = TRUE"; + const params: any[] = []; + let paramIndex = 1; + + if (status) { + whereClause += ` AND status = $${paramIndex}`; + params.push(status); + paramIndex++; + } + + if (state_code) { + whereClause += ` AND state_code = $${paramIndex}`; + params.push(state_code); + paramIndex++; + } + + if (country_code) { + whereClause += ` AND country_code = $${paramIndex}`; + params.push(country_code); + paramIndex++; + } + + if (unlinked_only === 'true') { + whereClause += ' AND dispensary_id IS NULL'; + } + + if (search) { + whereClause += ` AND (name ILIKE $${paramIndex} OR platform_slug ILIKE $${paramIndex})`; + params.push(`%${search}%`); + paramIndex++; + } + + const limitVal = parseInt(limit as string, 10); + const offsetVal = parseInt(offset as string, 10); + params.push(limitVal, offsetVal); + + const { rows } = await pool.query( + ` + SELECT + dl.id, + dl.platform, + dl.platform_location_id, + dl.platform_slug, + dl.platform_menu_url, + dl.name, + dl.raw_address, + dl.address_line1, + dl.city, + dl.state_code, + dl.postal_code, + dl.country_code, + dl.latitude, + dl.longitude, + dl.status, + dl.dispensary_id, + dl.offers_delivery, + dl.offers_pickup, + dl.is_recreational, + dl.is_medical, + dl.first_seen_at, + dl.last_seen_at, + dl.verified_at, + dl.verified_by, + dl.notes, + d.name as dispensary_name + FROM dutchie_discovery_locations dl + LEFT JOIN dispensaries d ON dl.dispensary_id = d.id + ${whereClause} + ORDER BY dl.first_seen_at DESC + LIMIT $${paramIndex} OFFSET $${paramIndex + 1} + `, + params + ); + + // Get total count + const countParams = params.slice(0, -2); + const { rows: countRows } = await pool.query( + `SELECT COUNT(*) as total FROM dutchie_discovery_locations dl ${whereClause}`, + countParams + ); + + res.json({ + success: true, + locations: rows.map((r) => ({ + id: r.id, + platform: r.platform, + platformLocationId: r.platform_location_id, + platformSlug: r.platform_slug, + platformMenuUrl: r.platform_menu_url, + name: r.name, + rawAddress: r.raw_address, + addressLine1: r.address_line1, + city: r.city, + stateCode: r.state_code, + postalCode: r.postal_code, + countryCode: r.country_code, + latitude: r.latitude, + longitude: r.longitude, + status: r.status, + dispensaryId: r.dispensary_id, + dispensaryName: r.dispensary_name, + offersDelivery: r.offers_delivery, + offersPickup: r.offers_pickup, + isRecreational: r.is_recreational, + isMedical: r.is_medical, + firstSeenAt: r.first_seen_at, + lastSeenAt: r.last_seen_at, + verifiedAt: r.verified_at, + verifiedBy: r.verified_by, + notes: r.notes, + })), + total: parseInt(countRows[0]?.total || '0', 10), + limit: limitVal, + offset: offsetVal, + }); + } catch (error: any) { + console.error('[Discovery Routes] Error fetching locations:', error); + res.status(500).json({ success: false, error: error.message }); + } + }); + + /** + * GET /api/discovery/platforms/dt/locations/:id + * + * Get a single location by ID. + */ + router.get('/locations/:id', async (req: Request, res: Response) => { + try { + const { id } = req.params; + + const { rows } = await pool.query( + ` + SELECT + dl.*, + d.name as dispensary_name, + d.menu_url as dispensary_menu_url + FROM dutchie_discovery_locations dl + LEFT JOIN dispensaries d ON dl.dispensary_id = d.id + WHERE dl.id = $1 + `, + [parseInt(id, 10)] + ); + + if (rows.length === 0) { + return res.status(404).json({ success: false, error: 'Location not found' }); + } + + const r = rows[0]; + res.json({ + success: true, + location: { + id: r.id, + platform: r.platform, + platformLocationId: r.platform_location_id, + platformSlug: r.platform_slug, + platformMenuUrl: r.platform_menu_url, + name: r.name, + rawAddress: r.raw_address, + addressLine1: r.address_line1, + addressLine2: r.address_line2, + city: r.city, + stateCode: r.state_code, + postalCode: r.postal_code, + countryCode: r.country_code, + latitude: r.latitude, + longitude: r.longitude, + timezone: r.timezone, + status: r.status, + dispensaryId: r.dispensary_id, + dispensaryName: r.dispensary_name, + dispensaryMenuUrl: r.dispensary_menu_url, + offersDelivery: r.offers_delivery, + offersPickup: r.offers_pickup, + isRecreational: r.is_recreational, + isMedical: r.is_medical, + firstSeenAt: r.first_seen_at, + lastSeenAt: r.last_seen_at, + verifiedAt: r.verified_at, + verifiedBy: r.verified_by, + notes: r.notes, + metadata: r.metadata, + }, + }); + } catch (error: any) { + console.error('[Discovery Routes] Error fetching location:', error); + res.status(500).json({ success: false, error: error.message }); + } + }); + + // ============================================================ + // VERIFICATION ACTIONS + // ============================================================ + + /** + * POST /api/discovery/platforms/dt/locations/:id/verify-create + * + * Verify a discovered location and create a new canonical dispensary. + */ + router.post('/locations/:id/verify-create', async (req: Request, res: Response) => { + const client = await pool.connect(); + try { + const { id } = req.params; + const { verifiedBy = 'admin' } = req.body; + + await client.query('BEGIN'); + + // Get the discovery location + const { rows: locRows } = await client.query( + `SELECT * FROM dutchie_discovery_locations WHERE id = $1 FOR UPDATE`, + [parseInt(id, 10)] + ); + + if (locRows.length === 0) { + await client.query('ROLLBACK'); + return res.status(404).json({ success: false, error: 'Location not found' }); + } + + const location = locRows[0]; + + if (location.status !== 'discovered') { + await client.query('ROLLBACK'); + return res.status(400).json({ + success: false, + error: `Cannot verify: location status is '${location.status}'`, + }); + } + + // Look up state_id if we have a state_code + let stateId: number | null = null; + if (location.state_code) { + const { rows: stateRows } = await client.query( + `SELECT id FROM states WHERE code = $1`, + [location.state_code] + ); + if (stateRows.length > 0) { + stateId = stateRows[0].id; + } + } + + // Create the canonical dispensary + const { rows: dispRows } = await client.query( + ` + INSERT INTO dispensaries ( + name, + slug, + address, + city, + state, + zip, + latitude, + longitude, + timezone, + menu_type, + menu_url, + platform_dispensary_id, + state_id, + active, + created_at, + updated_at + ) VALUES ( + $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, TRUE, NOW(), NOW() + ) + RETURNING id + `, + [ + location.name, + location.platform_slug, + location.address_line1, + location.city, + location.state_code, + location.postal_code, + location.latitude, + location.longitude, + location.timezone, + 'dutchie', + location.platform_menu_url, + location.platform_location_id, + stateId, + ] + ); + + const dispensaryId = dispRows[0].id; + + // Update the discovery location + await client.query( + ` + UPDATE dutchie_discovery_locations + SET status = 'verified', + dispensary_id = $1, + verified_at = NOW(), + verified_by = $2, + updated_at = NOW() + WHERE id = $3 + `, + [dispensaryId, verifiedBy, id] + ); + + await client.query('COMMIT'); + + res.json({ + success: true, + action: 'created', + discoveryId: parseInt(id, 10), + dispensaryId, + message: `Created new dispensary (ID: ${dispensaryId})`, + }); + } catch (error: any) { + await client.query('ROLLBACK'); + console.error('[Discovery Routes] Error in verify-create:', error); + res.status(500).json({ success: false, error: error.message }); + } finally { + client.release(); + } + }); + + /** + * POST /api/discovery/platforms/dt/locations/:id/verify-link + * + * Link a discovered location to an existing dispensary. + * + * Body: + * - dispensaryId: number (required) + * - verifiedBy: string (optional) + */ + router.post('/locations/:id/verify-link', async (req: Request, res: Response) => { + const client = await pool.connect(); + try { + const { id } = req.params; + const { dispensaryId, verifiedBy = 'admin' } = req.body; + + if (!dispensaryId) { + return res.status(400).json({ success: false, error: 'dispensaryId is required' }); + } + + await client.query('BEGIN'); + + // Verify dispensary exists + const { rows: dispRows } = await client.query( + `SELECT id, name FROM dispensaries WHERE id = $1`, + [dispensaryId] + ); + + if (dispRows.length === 0) { + await client.query('ROLLBACK'); + return res.status(404).json({ success: false, error: 'Dispensary not found' }); + } + + // Get the discovery location + const { rows: locRows } = await client.query( + `SELECT * FROM dutchie_discovery_locations WHERE id = $1 FOR UPDATE`, + [parseInt(id, 10)] + ); + + if (locRows.length === 0) { + await client.query('ROLLBACK'); + return res.status(404).json({ success: false, error: 'Location not found' }); + } + + const location = locRows[0]; + + if (location.status !== 'discovered') { + await client.query('ROLLBACK'); + return res.status(400).json({ + success: false, + error: `Cannot link: location status is '${location.status}'`, + }); + } + + // Update dispensary with platform info if missing + await client.query( + ` + UPDATE dispensaries + SET platform_dispensary_id = COALESCE(platform_dispensary_id, $1), + menu_url = COALESCE(menu_url, $2), + menu_type = COALESCE(menu_type, 'dutchie'), + updated_at = NOW() + WHERE id = $3 + `, + [location.platform_location_id, location.platform_menu_url, dispensaryId] + ); + + // Update the discovery location + await client.query( + ` + UPDATE dutchie_discovery_locations + SET status = 'merged', + dispensary_id = $1, + verified_at = NOW(), + verified_by = $2, + updated_at = NOW() + WHERE id = $3 + `, + [dispensaryId, verifiedBy, id] + ); + + await client.query('COMMIT'); + + res.json({ + success: true, + action: 'linked', + discoveryId: parseInt(id, 10), + dispensaryId, + dispensaryName: dispRows[0].name, + message: `Linked to existing dispensary: ${dispRows[0].name}`, + }); + } catch (error: any) { + await client.query('ROLLBACK'); + console.error('[Discovery Routes] Error in verify-link:', error); + res.status(500).json({ success: false, error: error.message }); + } finally { + client.release(); + } + }); + + /** + * POST /api/discovery/platforms/dt/locations/:id/reject + * + * Reject a discovered location. + * + * Body: + * - reason: string (optional) + * - verifiedBy: string (optional) + */ + router.post('/locations/:id/reject', async (req: Request, res: Response) => { + try { + const { id } = req.params; + const { reason, verifiedBy = 'admin' } = req.body; + + // Get current status + const { rows } = await pool.query( + `SELECT status FROM dutchie_discovery_locations WHERE id = $1`, + [parseInt(id, 10)] + ); + + if (rows.length === 0) { + return res.status(404).json({ success: false, error: 'Location not found' }); + } + + if (rows[0].status !== 'discovered') { + return res.status(400).json({ + success: false, + error: `Cannot reject: location status is '${rows[0].status}'`, + }); + } + + await pool.query( + ` + UPDATE dutchie_discovery_locations + SET status = 'rejected', + verified_at = NOW(), + verified_by = $1, + notes = COALESCE($2, notes), + updated_at = NOW() + WHERE id = $3 + `, + [verifiedBy, reason, id] + ); + + res.json({ + success: true, + action: 'rejected', + discoveryId: parseInt(id, 10), + message: 'Location rejected', + }); + } catch (error: any) { + console.error('[Discovery Routes] Error in reject:', error); + res.status(500).json({ success: false, error: error.message }); + } + }); + + /** + * POST /api/discovery/platforms/dt/locations/:id/unreject + * + * Restore a rejected location to discovered status. + */ + router.post('/locations/:id/unreject', async (req: Request, res: Response) => { + try { + const { id } = req.params; + + // Get current status + const { rows } = await pool.query( + `SELECT status FROM dutchie_discovery_locations WHERE id = $1`, + [parseInt(id, 10)] + ); + + if (rows.length === 0) { + return res.status(404).json({ success: false, error: 'Location not found' }); + } + + if (rows[0].status !== 'rejected') { + return res.status(400).json({ + success: false, + error: `Cannot unreject: location status is '${rows[0].status}'`, + }); + } + + await pool.query( + ` + UPDATE dutchie_discovery_locations + SET status = 'discovered', + verified_at = NULL, + verified_by = NULL, + updated_at = NOW() + WHERE id = $1 + `, + [id] + ); + + res.json({ + success: true, + action: 'unrejected', + discoveryId: parseInt(id, 10), + message: 'Location restored to discovered status', + }); + } catch (error: any) { + console.error('[Discovery Routes] Error in unreject:', error); + res.status(500).json({ success: false, error: error.message }); + } + }); + + // ============================================================ + // SUMMARY / REPORTING + // ============================================================ + + /** + * GET /api/discovery/platforms/dt/summary + * + * Get discovery summary statistics. + */ + router.get('/summary', async (_req: Request, res: Response) => { + try { + // Total counts by status + const { rows: statusRows } = await pool.query(` + SELECT status, COUNT(*) as cnt + FROM dutchie_discovery_locations + WHERE platform = 'dutchie' AND active = TRUE + GROUP BY status + `); + + const statusCounts: Record = {}; + let totalLocations = 0; + for (const row of statusRows) { + statusCounts[row.status] = parseInt(row.cnt, 10); + totalLocations += parseInt(row.cnt, 10); + } + + // By state + const { rows: stateRows } = await pool.query(` + SELECT + state_code, + COUNT(*) as total, + COUNT(*) FILTER (WHERE status = 'verified') as verified, + COUNT(*) FILTER (WHERE dispensary_id IS NULL AND status = 'discovered') as unlinked + FROM dutchie_discovery_locations + WHERE platform = 'dutchie' AND active = TRUE AND state_code IS NOT NULL + GROUP BY state_code + ORDER BY total DESC + `); + + res.json({ + success: true, + summary: { + total_locations: totalLocations, + discovered: statusCounts['discovered'] || 0, + verified: statusCounts['verified'] || 0, + merged: statusCounts['merged'] || 0, + rejected: statusCounts['rejected'] || 0, + }, + by_state: stateRows.map((r) => ({ + state_code: r.state_code, + total: parseInt(r.total, 10), + verified: parseInt(r.verified, 10), + unlinked: parseInt(r.unlinked, 10), + })), + }); + } catch (error: any) { + console.error('[Discovery Routes] Error in summary:', error); + res.status(500).json({ success: false, error: error.message }); + } + }); + + // ============================================================ + // CITIES + // ============================================================ + + /** + * GET /api/discovery/platforms/dt/cities + * + * List discovery cities. + */ + router.get('/cities', async (req: Request, res: Response) => { + try { + const { state_code, country_code, crawl_enabled, limit = '100', offset = '0' } = req.query; + + let whereClause = "WHERE platform = 'dutchie'"; + const params: any[] = []; + let paramIndex = 1; + + if (state_code) { + whereClause += ` AND state_code = $${paramIndex}`; + params.push(state_code); + paramIndex++; + } + + if (country_code) { + whereClause += ` AND country_code = $${paramIndex}`; + params.push(country_code); + paramIndex++; + } + + if (crawl_enabled === 'true') { + whereClause += ' AND crawl_enabled = TRUE'; + } else if (crawl_enabled === 'false') { + whereClause += ' AND crawl_enabled = FALSE'; + } + + params.push(parseInt(limit as string, 10), parseInt(offset as string, 10)); + + const { rows } = await pool.query( + ` + SELECT + id, + platform, + city_name, + city_slug, + state_code, + country_code, + last_crawled_at, + crawl_enabled, + location_count + FROM dutchie_discovery_cities + ${whereClause} + ORDER BY country_code, state_code, city_name + LIMIT $${paramIndex} OFFSET $${paramIndex + 1} + `, + params + ); + + const { rows: countRows } = await pool.query( + `SELECT COUNT(*) as total FROM dutchie_discovery_cities ${whereClause}`, + params.slice(0, -2) + ); + + res.json({ + success: true, + cities: rows.map((r) => ({ + id: r.id, + platform: r.platform, + cityName: r.city_name, + citySlug: r.city_slug, + stateCode: r.state_code, + countryCode: r.country_code, + lastCrawledAt: r.last_crawled_at, + crawlEnabled: r.crawl_enabled, + locationCount: r.location_count, + })), + total: parseInt(countRows[0]?.total || '0', 10), + }); + } catch (error: any) { + console.error('[Discovery Routes] Error fetching cities:', error); + res.status(500).json({ success: false, error: error.message }); + } + }); + + // ============================================================ + // MATCH CANDIDATES + // ============================================================ + + /** + * GET /api/discovery/platforms/dt/locations/:id/match-candidates + * + * Find potential dispensary matches for a discovery location. + */ + router.get('/locations/:id/match-candidates', async (req: Request, res: Response) => { + try { + const { id } = req.params; + + // Get the discovery location + const { rows: locRows } = await pool.query( + `SELECT * FROM dutchie_discovery_locations WHERE id = $1`, + [parseInt(id, 10)] + ); + + if (locRows.length === 0) { + return res.status(404).json({ success: false, error: 'Location not found' }); + } + + const location = locRows[0]; + + // Find potential matches + const { rows: candidates } = await pool.query( + ` + SELECT + d.id, + d.name, + d.city, + d.state, + d.address, + d.menu_type, + d.platform_dispensary_id, + d.menu_url, + d.latitude, + d.longitude, + CASE + WHEN d.name ILIKE $1 THEN 'exact_name' + WHEN d.name ILIKE $2 THEN 'partial_name' + WHEN d.city ILIKE $3 AND d.state = $4 THEN 'same_city' + ELSE 'location_match' + END as match_type, + CASE + WHEN d.latitude IS NOT NULL AND d.longitude IS NOT NULL + AND $5::float IS NOT NULL AND $6::float IS NOT NULL + THEN (3959 * acos( + LEAST(1.0, GREATEST(-1.0, + cos(radians($5::float)) * cos(radians(d.latitude)) * + cos(radians(d.longitude) - radians($6::float)) + + sin(radians($5::float)) * sin(radians(d.latitude)) + )) + )) + ELSE NULL + END as distance_miles + FROM dispensaries d + WHERE d.state = $4 + AND ( + d.name ILIKE $1 + OR d.name ILIKE $2 + OR d.city ILIKE $3 + OR ( + d.latitude IS NOT NULL + AND d.longitude IS NOT NULL + AND $5::float IS NOT NULL + AND $6::float IS NOT NULL + ) + ) + ORDER BY + CASE + WHEN d.name ILIKE $1 THEN 1 + WHEN d.name ILIKE $2 THEN 2 + ELSE 3 + END, + distance_miles NULLS LAST + LIMIT 10 + `, + [ + location.name, + `%${location.name.split(' ')[0]}%`, + location.city, + location.state_code, + location.latitude, + location.longitude, + ] + ); + + res.json({ + success: true, + location: { + id: location.id, + name: location.name, + city: location.city, + stateCode: location.state_code, + }, + candidates: candidates.map((c) => ({ + id: c.id, + name: c.name, + city: c.city, + state: c.state, + address: c.address, + menuType: c.menu_type, + platformDispensaryId: c.platform_dispensary_id, + menuUrl: c.menu_url, + matchType: c.match_type, + distanceMiles: c.distance_miles ? Math.round(c.distance_miles * 10) / 10 : null, + })), + }); + } catch (error: any) { + console.error('[Discovery Routes] Error fetching match candidates:', error); + res.status(500).json({ success: false, error: error.message }); + } + }); + + // ============================================================ + // GEO / NEARBY (Admin/Debug Only) + // ============================================================ + + /** + * GET /api/discovery/platforms/dt/nearby + * + * Find discovery locations near a given coordinate. + * This is an internal/debug endpoint for admin use. + * + * Query params: + * - lat: number (required) + * - lon: number (required) + * - radiusKm: number (optional, default 50) + * - limit: number (optional, default 20) + * - status: string (optional, filter by status) + */ + router.get('/nearby', async (req: Request, res: Response) => { + try { + const { lat, lon, radiusKm = '50', limit = '20', status } = req.query; + + // Validate required params + if (!lat || !lon) { + return res.status(400).json({ + success: false, + error: 'lat and lon are required query parameters', + }); + } + + const latNum = parseFloat(lat as string); + const lonNum = parseFloat(lon as string); + const radiusNum = parseFloat(radiusKm as string); + const limitNum = parseInt(limit as string, 10); + + if (isNaN(latNum) || isNaN(lonNum)) { + return res.status(400).json({ + success: false, + error: 'lat and lon must be valid numbers', + }); + } + + const geoService = new DiscoveryGeoService(pool); + + const locations = await geoService.findNearbyDiscoveryLocations(latNum, lonNum, { + radiusKm: radiusNum, + limit: limitNum, + platform: 'dutchie', + status: status as string | undefined, + }); + + res.json({ + success: true, + center: { lat: latNum, lon: lonNum }, + radiusKm: radiusNum, + count: locations.length, + locations, + }); + } catch (error: any) { + console.error('[Discovery Routes] Error in nearby:', error); + res.status(500).json({ success: false, error: error.message }); + } + }); + + /** + * GET /api/discovery/platforms/dt/geo-stats + * + * Get coordinate coverage statistics for discovery locations. + * This is an internal/debug endpoint for admin use. + */ + router.get('/geo-stats', async (_req: Request, res: Response) => { + try { + const geoService = new DiscoveryGeoService(pool); + const stats = await geoService.getCoordinateCoverageStats(); + + res.json({ + success: true, + stats, + }); + } catch (error: any) { + console.error('[Discovery Routes] Error in geo-stats:', error); + res.status(500).json({ success: false, error: error.message }); + } + }); + + /** + * GET /api/discovery/platforms/dt/locations/:id/validate-geo + * + * Validate the geographic data for a discovery location. + * This is an internal/debug endpoint for admin use. + */ + router.get('/locations/:id/validate-geo', async (req: Request, res: Response) => { + try { + const { id } = req.params; + + // Get the location + const { rows } = await pool.query( + `SELECT latitude, longitude, state_code, country_code, name + FROM dutchie_discovery_locations WHERE id = $1`, + [parseInt(id, 10)] + ); + + if (rows.length === 0) { + return res.status(404).json({ success: false, error: 'Location not found' }); + } + + const location = rows[0]; + const geoValidation = new GeoValidationService(); + const result = geoValidation.validateLocationState({ + latitude: location.latitude, + longitude: location.longitude, + state_code: location.state_code, + country_code: location.country_code, + }); + + res.json({ + success: true, + location: { + id: parseInt(id, 10), + name: location.name, + latitude: location.latitude, + longitude: location.longitude, + stateCode: location.state_code, + countryCode: location.country_code, + }, + validation: result, + }); + } catch (error: any) { + console.error('[Discovery Routes] Error in validate-geo:', error); + res.status(500).json({ success: false, error: error.message }); + } + }); + + return router; +} + +export default createDutchieDiscoveryRoutes; diff --git a/backend/src/dutchie-az/routes/analytics.ts b/backend/src/dutchie-az/routes/analytics.ts new file mode 100644 index 00000000..549e919a --- /dev/null +++ b/backend/src/dutchie-az/routes/analytics.ts @@ -0,0 +1,682 @@ +/** + * Analytics API Routes + * + * Provides REST API endpoints for all analytics services. + * All routes are prefixed with /api/analytics + * + * Phase 3: Analytics Dashboards + */ + +import { Router, Request, Response } from 'express'; +import { Pool } from 'pg'; +import { + AnalyticsCache, + PriceTrendService, + PenetrationService, + CategoryAnalyticsService, + StoreChangeService, + BrandOpportunityService, +} from '../services/analytics'; + +export function createAnalyticsRouter(pool: Pool): Router { + const router = Router(); + + // Initialize services + const cache = new AnalyticsCache(pool, { defaultTtlMinutes: 15 }); + const priceService = new PriceTrendService(pool, cache); + const penetrationService = new PenetrationService(pool, cache); + const categoryService = new CategoryAnalyticsService(pool, cache); + const storeService = new StoreChangeService(pool, cache); + const brandOpportunityService = new BrandOpportunityService(pool, cache); + + // ============================================================ + // PRICE ANALYTICS + // ============================================================ + + /** + * GET /api/analytics/price/product/:id + * Get price trend for a specific product + */ + router.get('/price/product/:id', async (req: Request, res: Response) => { + try { + const productId = parseInt(req.params.id); + const storeId = req.query.storeId ? parseInt(req.query.storeId as string) : undefined; + const days = req.query.days ? parseInt(req.query.days as string) : 30; + + const result = await priceService.getProductPriceTrend(productId, storeId, days); + res.json(result); + } catch (error) { + console.error('[Analytics] Price product error:', error); + res.status(500).json({ error: 'Failed to fetch product price trend' }); + } + }); + + /** + * GET /api/analytics/price/brand/:name + * Get price trend for a brand + */ + router.get('/price/brand/:name', async (req: Request, res: Response) => { + try { + const brandName = decodeURIComponent(req.params.name); + const filters = { + storeId: req.query.storeId ? parseInt(req.query.storeId as string) : undefined, + category: req.query.category as string | undefined, + state: req.query.state as string | undefined, + days: req.query.days ? parseInt(req.query.days as string) : 30, + }; + + const result = await priceService.getBrandPriceTrend(brandName, filters); + res.json(result); + } catch (error) { + console.error('[Analytics] Price brand error:', error); + res.status(500).json({ error: 'Failed to fetch brand price trend' }); + } + }); + + /** + * GET /api/analytics/price/category/:name + * Get price trend for a category + */ + router.get('/price/category/:name', async (req: Request, res: Response) => { + try { + const category = decodeURIComponent(req.params.name); + const filters = { + storeId: req.query.storeId ? parseInt(req.query.storeId as string) : undefined, + brandName: req.query.brand as string | undefined, + state: req.query.state as string | undefined, + days: req.query.days ? parseInt(req.query.days as string) : 30, + }; + + const result = await priceService.getCategoryPriceTrend(category, filters); + res.json(result); + } catch (error) { + console.error('[Analytics] Price category error:', error); + res.status(500).json({ error: 'Failed to fetch category price trend' }); + } + }); + + /** + * GET /api/analytics/price/summary + * Get price summary statistics + */ + router.get('/price/summary', async (req: Request, res: Response) => { + try { + const filters = { + storeId: req.query.storeId ? parseInt(req.query.storeId as string) : undefined, + brandName: req.query.brand as string | undefined, + category: req.query.category as string | undefined, + state: req.query.state as string | undefined, + }; + + const result = await priceService.getPriceSummary(filters); + res.json(result); + } catch (error) { + console.error('[Analytics] Price summary error:', error); + res.status(500).json({ error: 'Failed to fetch price summary' }); + } + }); + + /** + * GET /api/analytics/price/compression/:category + * Get price compression analysis for a category + */ + router.get('/price/compression/:category', async (req: Request, res: Response) => { + try { + const category = decodeURIComponent(req.params.category); + const state = req.query.state as string | undefined; + + const result = await priceService.detectPriceCompression(category, state); + res.json(result); + } catch (error) { + console.error('[Analytics] Price compression error:', error); + res.status(500).json({ error: 'Failed to analyze price compression' }); + } + }); + + /** + * GET /api/analytics/price/global + * Get global price statistics + */ + router.get('/price/global', async (_req: Request, res: Response) => { + try { + const result = await priceService.getGlobalPriceStats(); + res.json(result); + } catch (error) { + console.error('[Analytics] Global price error:', error); + res.status(500).json({ error: 'Failed to fetch global price stats' }); + } + }); + + // ============================================================ + // PENETRATION ANALYTICS + // ============================================================ + + /** + * GET /api/analytics/penetration/brand/:name + * Get penetration data for a brand + */ + router.get('/penetration/brand/:name', async (req: Request, res: Response) => { + try { + const brandName = decodeURIComponent(req.params.name); + const filters = { + state: req.query.state as string | undefined, + category: req.query.category as string | undefined, + }; + + const result = await penetrationService.getBrandPenetration(brandName, filters); + res.json(result); + } catch (error) { + console.error('[Analytics] Brand penetration error:', error); + res.status(500).json({ error: 'Failed to fetch brand penetration' }); + } + }); + + /** + * GET /api/analytics/penetration/top + * Get top brands by penetration + */ + router.get('/penetration/top', async (req: Request, res: Response) => { + try { + const limit = req.query.limit ? parseInt(req.query.limit as string) : 20; + const filters = { + state: req.query.state as string | undefined, + category: req.query.category as string | undefined, + minStores: req.query.minStores ? parseInt(req.query.minStores as string) : 2, + minSkus: req.query.minSkus ? parseInt(req.query.minSkus as string) : 5, + }; + + const result = await penetrationService.getTopBrandsByPenetration(limit, filters); + res.json(result); + } catch (error) { + console.error('[Analytics] Top penetration error:', error); + res.status(500).json({ error: 'Failed to fetch top brands' }); + } + }); + + /** + * GET /api/analytics/penetration/trend/:brand + * Get penetration trend for a brand + */ + router.get('/penetration/trend/:brand', async (req: Request, res: Response) => { + try { + const brandName = decodeURIComponent(req.params.brand); + const days = req.query.days ? parseInt(req.query.days as string) : 30; + + const result = await penetrationService.getPenetrationTrend(brandName, days); + res.json(result); + } catch (error) { + console.error('[Analytics] Penetration trend error:', error); + res.status(500).json({ error: 'Failed to fetch penetration trend' }); + } + }); + + /** + * GET /api/analytics/penetration/shelf-share/:brand + * Get shelf share by category for a brand + */ + router.get('/penetration/shelf-share/:brand', async (req: Request, res: Response) => { + try { + const brandName = decodeURIComponent(req.params.brand); + const result = await penetrationService.getShelfShareByCategory(brandName); + res.json(result); + } catch (error) { + console.error('[Analytics] Shelf share error:', error); + res.status(500).json({ error: 'Failed to fetch shelf share' }); + } + }); + + /** + * GET /api/analytics/penetration/by-state/:brand + * Get brand presence by state + */ + router.get('/penetration/by-state/:brand', async (req: Request, res: Response) => { + try { + const brandName = decodeURIComponent(req.params.brand); + const result = await penetrationService.getBrandPresenceByState(brandName); + res.json(result); + } catch (error) { + console.error('[Analytics] Brand by state error:', error); + res.status(500).json({ error: 'Failed to fetch brand presence by state' }); + } + }); + + /** + * GET /api/analytics/penetration/stores/:brand + * Get stores carrying a brand + */ + router.get('/penetration/stores/:brand', async (req: Request, res: Response) => { + try { + const brandName = decodeURIComponent(req.params.brand); + const result = await penetrationService.getStoresCarryingBrand(brandName); + res.json(result); + } catch (error) { + console.error('[Analytics] Stores carrying brand error:', error); + res.status(500).json({ error: 'Failed to fetch stores' }); + } + }); + + /** + * GET /api/analytics/penetration/heatmap + * Get penetration heatmap data + */ + router.get('/penetration/heatmap', async (req: Request, res: Response) => { + try { + const brandName = req.query.brand as string | undefined; + const result = await penetrationService.getPenetrationHeatmap(brandName); + res.json(result); + } catch (error) { + console.error('[Analytics] Heatmap error:', error); + res.status(500).json({ error: 'Failed to fetch heatmap data' }); + } + }); + + // ============================================================ + // CATEGORY ANALYTICS + // ============================================================ + + /** + * GET /api/analytics/category/summary + * Get category summary + */ + router.get('/category/summary', async (req: Request, res: Response) => { + try { + const category = req.query.category as string | undefined; + const filters = { + state: req.query.state as string | undefined, + storeId: req.query.storeId ? parseInt(req.query.storeId as string) : undefined, + }; + + const result = await categoryService.getCategorySummary(category, filters); + res.json(result); + } catch (error) { + console.error('[Analytics] Category summary error:', error); + res.status(500).json({ error: 'Failed to fetch category summary' }); + } + }); + + /** + * GET /api/analytics/category/growth + * Get category growth data + */ + router.get('/category/growth', async (req: Request, res: Response) => { + try { + const days = req.query.days ? parseInt(req.query.days as string) : 7; + const filters = { + state: req.query.state as string | undefined, + storeId: req.query.storeId ? parseInt(req.query.storeId as string) : undefined, + minSkus: req.query.minSkus ? parseInt(req.query.minSkus as string) : 10, + }; + + const result = await categoryService.getCategoryGrowth(days, filters); + res.json(result); + } catch (error) { + console.error('[Analytics] Category growth error:', error); + res.status(500).json({ error: 'Failed to fetch category growth' }); + } + }); + + /** + * GET /api/analytics/category/trend/:category + * Get category growth trend over time + */ + router.get('/category/trend/:category', async (req: Request, res: Response) => { + try { + const category = decodeURIComponent(req.params.category); + const days = req.query.days ? parseInt(req.query.days as string) : 90; + + const result = await categoryService.getCategoryGrowthTrend(category, days); + res.json(result); + } catch (error) { + console.error('[Analytics] Category trend error:', error); + res.status(500).json({ error: 'Failed to fetch category trend' }); + } + }); + + /** + * GET /api/analytics/category/heatmap + * Get category heatmap data + */ + router.get('/category/heatmap', async (req: Request, res: Response) => { + try { + const metric = (req.query.metric as 'skus' | 'growth' | 'price') || 'skus'; + const periods = req.query.periods ? parseInt(req.query.periods as string) : 12; + + const result = await categoryService.getCategoryHeatmap(metric, periods); + res.json(result); + } catch (error) { + console.error('[Analytics] Category heatmap error:', error); + res.status(500).json({ error: 'Failed to fetch heatmap' }); + } + }); + + /** + * GET /api/analytics/category/top-movers + * Get top growing and declining categories + */ + router.get('/category/top-movers', async (req: Request, res: Response) => { + try { + const limit = req.query.limit ? parseInt(req.query.limit as string) : 5; + const days = req.query.days ? parseInt(req.query.days as string) : 30; + + const result = await categoryService.getTopMovers(limit, days); + res.json(result); + } catch (error) { + console.error('[Analytics] Top movers error:', error); + res.status(500).json({ error: 'Failed to fetch top movers' }); + } + }); + + /** + * GET /api/analytics/category/:category/subcategories + * Get subcategory breakdown + */ + router.get('/category/:category/subcategories', async (req: Request, res: Response) => { + try { + const category = decodeURIComponent(req.params.category); + const result = await categoryService.getSubcategoryBreakdown(category); + res.json(result); + } catch (error) { + console.error('[Analytics] Subcategory error:', error); + res.status(500).json({ error: 'Failed to fetch subcategories' }); + } + }); + + // ============================================================ + // STORE CHANGE TRACKING + // ============================================================ + + /** + * GET /api/analytics/store/:id/summary + * Get change summary for a store + */ + router.get('/store/:id/summary', async (req: Request, res: Response) => { + try { + const storeId = parseInt(req.params.id); + const result = await storeService.getStoreChangeSummary(storeId); + + if (!result) { + return res.status(404).json({ error: 'Store not found' }); + } + + res.json(result); + } catch (error) { + console.error('[Analytics] Store summary error:', error); + res.status(500).json({ error: 'Failed to fetch store summary' }); + } + }); + + /** + * GET /api/analytics/store/:id/events + * Get recent change events for a store + */ + router.get('/store/:id/events', async (req: Request, res: Response) => { + try { + const storeId = parseInt(req.params.id); + const filters = { + eventType: req.query.type as string | undefined, + days: req.query.days ? parseInt(req.query.days as string) : 30, + limit: req.query.limit ? parseInt(req.query.limit as string) : 100, + }; + + const result = await storeService.getStoreChangeEvents(storeId, filters); + res.json(result); + } catch (error) { + console.error('[Analytics] Store events error:', error); + res.status(500).json({ error: 'Failed to fetch store events' }); + } + }); + + /** + * GET /api/analytics/store/:id/brands/new + * Get new brands added to a store + */ + router.get('/store/:id/brands/new', async (req: Request, res: Response) => { + try { + const storeId = parseInt(req.params.id); + const days = req.query.days ? parseInt(req.query.days as string) : 30; + + const result = await storeService.getNewBrands(storeId, days); + res.json(result); + } catch (error) { + console.error('[Analytics] New brands error:', error); + res.status(500).json({ error: 'Failed to fetch new brands' }); + } + }); + + /** + * GET /api/analytics/store/:id/brands/lost + * Get brands lost from a store + */ + router.get('/store/:id/brands/lost', async (req: Request, res: Response) => { + try { + const storeId = parseInt(req.params.id); + const days = req.query.days ? parseInt(req.query.days as string) : 30; + + const result = await storeService.getLostBrands(storeId, days); + res.json(result); + } catch (error) { + console.error('[Analytics] Lost brands error:', error); + res.status(500).json({ error: 'Failed to fetch lost brands' }); + } + }); + + /** + * GET /api/analytics/store/:id/products/changes + * Get product changes for a store + */ + router.get('/store/:id/products/changes', async (req: Request, res: Response) => { + try { + const storeId = parseInt(req.params.id); + const changeType = req.query.type as 'added' | 'discontinued' | 'price_drop' | 'price_increase' | 'restocked' | 'out_of_stock' | undefined; + const days = req.query.days ? parseInt(req.query.days as string) : 7; + + const result = await storeService.getProductChanges(storeId, changeType, days); + res.json(result); + } catch (error) { + console.error('[Analytics] Product changes error:', error); + res.status(500).json({ error: 'Failed to fetch product changes' }); + } + }); + + /** + * GET /api/analytics/store/leaderboard/:category + * Get category leaderboard across stores + */ + router.get('/store/leaderboard/:category', async (req: Request, res: Response) => { + try { + const category = decodeURIComponent(req.params.category); + const limit = req.query.limit ? parseInt(req.query.limit as string) : 20; + + const result = await storeService.getCategoryLeaderboard(category, limit); + res.json(result); + } catch (error) { + console.error('[Analytics] Leaderboard error:', error); + res.status(500).json({ error: 'Failed to fetch leaderboard' }); + } + }); + + /** + * GET /api/analytics/store/most-active + * Get most active stores (by changes) + */ + router.get('/store/most-active', async (req: Request, res: Response) => { + try { + const days = req.query.days ? parseInt(req.query.days as string) : 7; + const limit = req.query.limit ? parseInt(req.query.limit as string) : 10; + + const result = await storeService.getMostActiveStores(days, limit); + res.json(result); + } catch (error) { + console.error('[Analytics] Most active error:', error); + res.status(500).json({ error: 'Failed to fetch active stores' }); + } + }); + + /** + * GET /api/analytics/store/compare + * Compare two stores + */ + router.get('/store/compare', async (req: Request, res: Response) => { + try { + const store1 = parseInt(req.query.store1 as string); + const store2 = parseInt(req.query.store2 as string); + + if (!store1 || !store2) { + return res.status(400).json({ error: 'Both store1 and store2 are required' }); + } + + const result = await storeService.compareStores(store1, store2); + res.json(result); + } catch (error) { + console.error('[Analytics] Compare stores error:', error); + res.status(500).json({ error: 'Failed to compare stores' }); + } + }); + + // ============================================================ + // BRAND OPPORTUNITY / RISK + // ============================================================ + + /** + * GET /api/analytics/brand/:name/opportunity + * Get full opportunity analysis for a brand + */ + router.get('/brand/:name/opportunity', async (req: Request, res: Response) => { + try { + const brandName = decodeURIComponent(req.params.name); + const result = await brandOpportunityService.getBrandOpportunity(brandName); + res.json(result); + } catch (error) { + console.error('[Analytics] Brand opportunity error:', error); + res.status(500).json({ error: 'Failed to fetch brand opportunity' }); + } + }); + + /** + * GET /api/analytics/brand/:name/position + * Get market position summary for a brand + */ + router.get('/brand/:name/position', async (req: Request, res: Response) => { + try { + const brandName = decodeURIComponent(req.params.name); + const result = await brandOpportunityService.getMarketPositionSummary(brandName); + res.json(result); + } catch (error) { + console.error('[Analytics] Brand position error:', error); + res.status(500).json({ error: 'Failed to fetch brand position' }); + } + }); + + // ============================================================ + // ALERTS + // ============================================================ + + /** + * GET /api/analytics/alerts + * Get analytics alerts + */ + router.get('/alerts', async (req: Request, res: Response) => { + try { + const filters = { + brandName: req.query.brand as string | undefined, + storeId: req.query.storeId ? parseInt(req.query.storeId as string) : undefined, + alertType: req.query.type as string | undefined, + unreadOnly: req.query.unreadOnly === 'true', + limit: req.query.limit ? parseInt(req.query.limit as string) : 50, + }; + + const result = await brandOpportunityService.getAlerts(filters); + res.json(result); + } catch (error) { + console.error('[Analytics] Alerts error:', error); + res.status(500).json({ error: 'Failed to fetch alerts' }); + } + }); + + /** + * POST /api/analytics/alerts/mark-read + * Mark alerts as read + */ + router.post('/alerts/mark-read', async (req: Request, res: Response) => { + try { + const { alertIds } = req.body; + + if (!Array.isArray(alertIds)) { + return res.status(400).json({ error: 'alertIds must be an array' }); + } + + await brandOpportunityService.markAlertsRead(alertIds); + res.json({ success: true }); + } catch (error) { + console.error('[Analytics] Mark read error:', error); + res.status(500).json({ error: 'Failed to mark alerts as read' }); + } + }); + + // ============================================================ + // CACHE MANAGEMENT + // ============================================================ + + /** + * GET /api/analytics/cache/stats + * Get cache statistics + */ + router.get('/cache/stats', async (_req: Request, res: Response) => { + try { + const stats = await cache.getStats(); + res.json(stats); + } catch (error) { + console.error('[Analytics] Cache stats error:', error); + res.status(500).json({ error: 'Failed to get cache stats' }); + } + }); + + /** + * POST /api/analytics/cache/clear + * Clear cache (admin only) + */ + router.post('/cache/clear', async (req: Request, res: Response) => { + try { + const pattern = req.query.pattern as string | undefined; + + if (pattern) { + const cleared = await cache.invalidatePattern(pattern); + res.json({ success: true, clearedCount: cleared }); + } else { + await cache.cleanExpired(); + res.json({ success: true, message: 'Expired entries cleaned' }); + } + } catch (error) { + console.error('[Analytics] Cache clear error:', error); + res.status(500).json({ error: 'Failed to clear cache' }); + } + }); + + // ============================================================ + // SNAPSHOT CAPTURE (for cron/scheduled jobs) + // ============================================================ + + /** + * POST /api/analytics/snapshots/capture + * Capture daily snapshots (run by scheduler) + */ + router.post('/snapshots/capture', async (_req: Request, res: Response) => { + try { + const [brandResult, categoryResult] = await Promise.all([ + pool.query('SELECT capture_brand_snapshots() as count'), + pool.query('SELECT capture_category_snapshots() as count'), + ]); + + res.json({ + success: true, + brandSnapshots: parseInt(brandResult.rows[0]?.count || '0'), + categorySnapshots: parseInt(categoryResult.rows[0]?.count || '0'), + }); + } catch (error) { + console.error('[Analytics] Snapshot capture error:', error); + res.status(500).json({ error: 'Failed to capture snapshots' }); + } + }); + + return router; +} diff --git a/backend/src/dutchie-az/routes/index.ts b/backend/src/dutchie-az/routes/index.ts index 375799d5..bf0fa79e 100644 --- a/backend/src/dutchie-az/routes/index.ts +++ b/backend/src/dutchie-az/routes/index.ts @@ -21,12 +21,8 @@ import { } from '../services/discovery'; import { crawlDispensaryProducts } from '../services/product-crawler'; -// Explicit column list for dispensaries table (avoids SELECT * issues with schema differences) -const DISPENSARY_COLUMNS = ` - id, name, dba_name, slug, city, state, zip, address, latitude, longitude, - menu_type, menu_url, platform_dispensary_id, website, - provider_detection_data, created_at, updated_at -`; +// Use shared dispensary columns (handles optional columns like provider_detection_data) +import { DISPENSARY_COLUMNS_WITH_PROFILE as DISPENSARY_COLUMNS } from '../db/dispensary-columns'; import { startScheduler, stopScheduler, @@ -43,6 +39,7 @@ import { getRunLogs, } from '../services/scheduler'; import { StockStatus } from '../types'; +import { getProviderDisplayName } from '../../utils/provider-display'; const router = Router(); @@ -113,9 +110,17 @@ router.get('/stores', async (req: Request, res: Response) => { const { rows, rowCount } = await query( ` - SELECT ${DISPENSARY_COLUMNS} FROM dispensaries + SELECT ${DISPENSARY_COLUMNS}, + (SELECT COUNT(*) FROM dutchie_products WHERE dispensary_id = dispensaries.id) as product_count, + dcp.status as crawler_status, + dcp.profile_key as crawler_profile_key, + dcp.next_retry_at, + dcp.sandbox_attempt_count + FROM dispensaries + LEFT JOIN dispensary_crawler_profiles dcp + ON dcp.dispensary_id = dispensaries.id AND dcp.enabled = true ${whereClause} - ORDER BY name + ORDER BY dispensaries.name LIMIT $${paramIndex} OFFSET $${paramIndex + 1} `, params @@ -127,8 +132,15 @@ router.get('/stores', async (req: Request, res: Response) => { params.slice(0, -2) ); + // Transform stores to include provider_display + const transformedStores = rows.map((store: any) => ({ + ...store, + provider_raw: store.menu_type, + provider_display: getProviderDisplayName(store.menu_type), + })); + res.json({ - stores: rows, + stores: transformedStores, total: parseInt(countRows[0]?.total || '0', 10), limit: parseInt(limit as string, 10), offset: parseInt(offset as string, 10), @@ -780,7 +792,7 @@ router.get('/products/:id/availability', async (req: Request, res: Response) => ) SELECT d.id as dispensary_id, - COALESCE(d.dba_name, d.name) as dispensary_name, + d.name as dispensary_name, d.city, d.state, d.address, @@ -1042,8 +1054,12 @@ router.post('/admin/scheduler/trigger', async (_req: Request, res: Response) => }); /** - * POST /api/dutchie-az/admin/crawl/:id + * POST /api/az/admin/crawl/:id * Crawl a single dispensary with job tracking + * + * @deprecated Use POST /api/admin/crawl/:dispensaryId instead. + * This route is kept for backward compatibility only. + * The canonical crawl endpoint is now /api/admin/crawl/:dispensaryId */ router.post('/admin/crawl/:id', async (req: Request, res: Response) => { try { @@ -1075,7 +1091,6 @@ router.get('/admin/dutchie-stores', async (_req: Request, res: Response) => { SELECT d.id, d.name, - d.dba_name, d.city, d.state, d.menu_type, @@ -1113,7 +1128,7 @@ router.get('/admin/dutchie-stores', async (_req: Request, res: Response) => { failed: failed.length, stores: rows.map((r: any) => ({ id: r.id, - name: r.dba_name || r.name, + name: r.name, city: r.city, state: r.state, menuType: r.menu_type, @@ -1688,6 +1703,7 @@ import { router.get('/monitor/active-jobs', async (_req: Request, res: Response) => { try { // Get running jobs from job_run_logs (scheduled jobs like "enqueue all") + // Includes worker_name and run_role for named workforce display const { rows: runningScheduledJobs } = await query(` SELECT jrl.id, @@ -1699,7 +1715,11 @@ router.get('/monitor/active-jobs', async (_req: Request, res: Response) => { jrl.items_succeeded, jrl.items_failed, jrl.metadata, + jrl.worker_name, + jrl.run_role, js.description as job_description, + js.worker_name as schedule_worker_name, + js.worker_role as schedule_worker_role, EXTRACT(EPOCH FROM (NOW() - jrl.started_at)) as duration_seconds FROM job_run_logs jrl LEFT JOIN job_schedules js ON jrl.schedule_id = js.id @@ -1708,7 +1728,7 @@ router.get('/monitor/active-jobs', async (_req: Request, res: Response) => { `); // Get running crawl jobs (individual store crawls with worker info) - // Note: Use COALESCE for optional columns that may not exist in older schemas + // Includes enqueued_by_worker for tracking which named worker enqueued the job const { rows: runningCrawlJobs } = await query(` SELECT cj.id, @@ -1722,6 +1742,7 @@ router.get('/monitor/active-jobs', async (_req: Request, res: Response) => { cj.claimed_by as worker_id, cj.worker_hostname, cj.claimed_at, + cj.enqueued_by_worker, cj.products_found, cj.products_upserted, cj.snapshots_created, @@ -1792,14 +1813,18 @@ router.get('/monitor/recent-jobs', async (req: Request, res: Response) => { jrl.items_succeeded, jrl.items_failed, jrl.metadata, - js.description as job_description + jrl.worker_name, + jrl.run_role, + js.description as job_description, + js.worker_name as schedule_worker_name, + js.worker_role as schedule_worker_role FROM job_run_logs jrl LEFT JOIN job_schedules js ON jrl.schedule_id = js.id ORDER BY jrl.created_at DESC LIMIT $1 `, [limitNum]); - // Recent crawl jobs + // Recent crawl jobs (includes enqueued_by_worker for named workforce tracking) const { rows: recentCrawlJobs } = await query(` SELECT cj.id, @@ -1814,6 +1839,7 @@ router.get('/monitor/recent-jobs', async (req: Request, res: Response) => { cj.products_found, cj.snapshots_created, cj.metadata, + cj.enqueued_by_worker, EXTRACT(EPOCH FROM (COALESCE(cj.completed_at, NOW()) - cj.started_at)) * 1000 as duration_ms FROM dispensary_crawl_jobs cj LEFT JOIN dispensaries d ON cj.dispensary_id = d.id @@ -1912,12 +1938,14 @@ router.get('/monitor/summary', async (_req: Request, res: Response) => { (SELECT MAX(completed_at) FROM job_run_logs WHERE status = 'success') as last_job_completed `); - // Get next scheduled runs + // Get next scheduled runs (with worker names) const { rows: nextRuns } = await query(` SELECT id, job_name, description, + worker_name, + worker_role, enabled, next_run_at, last_status, @@ -2034,6 +2062,189 @@ router.post('/admin/detection/trigger', async (_req: Request, res: Response) => } }); +// ============================================================ +// CRAWLER RELIABILITY / HEALTH ENDPOINTS (Phase 1) +// ============================================================ + +/** + * GET /api/dutchie-az/admin/crawler/health + * Get overall crawler health metrics + */ +router.get('/admin/crawler/health', async (_req: Request, res: Response) => { + try { + const { rows } = await query(`SELECT * FROM v_crawl_health`); + res.json(rows[0] || { + active_crawlers: 0, + degraded_crawlers: 0, + paused_crawlers: 0, + failed_crawlers: 0, + due_now: 0, + stores_with_failures: 0, + avg_consecutive_failures: 0, + successful_last_24h: 0, + }); + } catch (error: any) { + // View might not exist yet + res.json({ + active_crawlers: 0, + degraded_crawlers: 0, + paused_crawlers: 0, + failed_crawlers: 0, + due_now: 0, + error: 'View not available - run migration 046', + }); + } +}); + +/** + * GET /api/dutchie-az/admin/crawler/error-summary + * Get error summary by code over last 7 days + */ +router.get('/admin/crawler/error-summary', async (_req: Request, res: Response) => { + try { + const { rows } = await query(`SELECT * FROM v_crawl_error_summary`); + res.json({ errors: rows }); + } catch (error: any) { + res.json({ errors: [], error: 'View not available - run migration 046' }); + } +}); + +/** + * GET /api/dutchie-az/admin/crawler/status + * Get detailed status for all crawlers + */ +router.get('/admin/crawler/status', async (req: Request, res: Response) => { + try { + const { status, limit = '100', offset = '0' } = req.query; + + let whereClause = ''; + const params: any[] = []; + let paramIndex = 1; + + if (status) { + whereClause = `WHERE crawl_status = $${paramIndex}`; + params.push(status); + paramIndex++; + } + + params.push(parseInt(limit as string, 10), parseInt(offset as string, 10)); + + const { rows } = await query( + `SELECT * FROM v_crawler_status + ${whereClause} + ORDER BY consecutive_failures DESC, name ASC + LIMIT $${paramIndex} OFFSET $${paramIndex + 1}`, + params + ); + + const { rows: countRows } = await query( + `SELECT COUNT(*) as total FROM v_crawler_status ${whereClause}`, + params.slice(0, -2) + ); + + res.json({ + stores: rows, + total: parseInt(countRows[0]?.total || '0', 10), + limit: parseInt(limit as string, 10), + offset: parseInt(offset as string, 10), + }); + } catch (error: any) { + res.status(500).json({ error: error.message }); + } +}); + +/** + * GET /api/dutchie-az/admin/crawler/attempts + * Get recent crawl attempts (for debugging) + */ +router.get('/admin/crawler/attempts', async (req: Request, res: Response) => { + try { + const { dispensaryId, errorCode, limit = '50', offset = '0' } = req.query; + + let whereClause = 'WHERE 1=1'; + const params: any[] = []; + let paramIndex = 1; + + if (dispensaryId) { + whereClause += ` AND ca.dispensary_id = $${paramIndex}`; + params.push(parseInt(dispensaryId as string, 10)); + paramIndex++; + } + + if (errorCode) { + whereClause += ` AND ca.error_code = $${paramIndex}`; + params.push(errorCode); + paramIndex++; + } + + params.push(parseInt(limit as string, 10), parseInt(offset as string, 10)); + + const { rows } = await query( + `SELECT + ca.*, + d.name as dispensary_name, + d.city + FROM crawl_attempts ca + LEFT JOIN dispensaries d ON ca.dispensary_id = d.id + ${whereClause} + ORDER BY ca.started_at DESC + LIMIT $${paramIndex} OFFSET $${paramIndex + 1}`, + params + ); + + res.json({ attempts: rows }); + } catch (error: any) { + res.status(500).json({ error: error.message }); + } +}); + +/** + * POST /api/dutchie-az/admin/dispensaries/:id/pause + * Pause crawling for a dispensary + */ +router.post('/admin/dispensaries/:id/pause', async (req: Request, res: Response) => { + try { + const { id } = req.params; + + await query(` + UPDATE dispensaries + SET crawl_status = 'paused', + next_crawl_at = NULL, + updated_at = NOW() + WHERE id = $1 + `, [id]); + + res.json({ success: true, message: `Crawling paused for dispensary ${id}` }); + } catch (error: any) { + res.status(500).json({ error: error.message }); + } +}); + +/** + * POST /api/dutchie-az/admin/dispensaries/:id/resume + * Resume crawling for a paused/degraded dispensary + */ +router.post('/admin/dispensaries/:id/resume', async (req: Request, res: Response) => { + try { + const { id } = req.params; + + // Reset to active and schedule next crawl + await query(` + UPDATE dispensaries + SET crawl_status = 'active', + consecutive_failures = 0, + backoff_multiplier = 1.0, + next_crawl_at = NOW() + INTERVAL '5 minutes', + updated_at = NOW() + WHERE id = $1 + `, [id]); + + res.json({ success: true, message: `Crawling resumed for dispensary ${id}` }); + } catch (error: any) { + res.status(500).json({ error: error.message }); + } +}); + // ============================================================ // FAILED DISPENSARIES ROUTES // ============================================================ @@ -2183,4 +2394,251 @@ router.get('/admin/dispensaries/health-summary', async (_req: Request, res: Resp } }); +// ============================================================ +// ORCHESTRATOR TRACE ROUTES +// ============================================================ + +import { + getLatestTrace, + getTraceById, + getTracesForDispensary, + getTraceByRunId, +} from '../../services/orchestrator-trace'; + +/** + * GET /api/dutchie-az/admin/dispensaries/:id/crawl-trace/latest + * Get the latest orchestrator trace for a dispensary + */ +router.get('/admin/dispensaries/:id/crawl-trace/latest', async (req: Request, res: Response) => { + try { + const { id } = req.params; + const trace = await getLatestTrace(parseInt(id, 10)); + + if (!trace) { + return res.status(404).json({ error: 'No trace found for this dispensary' }); + } + + res.json(trace); + } catch (error: any) { + res.status(500).json({ error: error.message }); + } +}); + +/** + * GET /api/dutchie-az/admin/dispensaries/:id/crawl-traces + * Get paginated list of orchestrator traces for a dispensary + */ +router.get('/admin/dispensaries/:id/crawl-traces', async (req: Request, res: Response) => { + try { + const { id } = req.params; + const { limit = '20', offset = '0' } = req.query; + + const result = await getTracesForDispensary( + parseInt(id, 10), + parseInt(limit as string, 10), + parseInt(offset as string, 10) + ); + + res.json(result); + } catch (error: any) { + res.status(500).json({ error: error.message }); + } +}); + +/** + * GET /api/dutchie-az/admin/crawl-traces/:traceId + * Get a specific orchestrator trace by ID + */ +router.get('/admin/crawl-traces/:traceId', async (req: Request, res: Response) => { + try { + const { traceId } = req.params; + const trace = await getTraceById(parseInt(traceId, 10)); + + if (!trace) { + return res.status(404).json({ error: 'Trace not found' }); + } + + res.json(trace); + } catch (error: any) { + res.status(500).json({ error: error.message }); + } +}); + +/** + * GET /api/dutchie-az/admin/crawl-traces/run/:runId + * Get a specific orchestrator trace by run ID + */ +router.get('/admin/crawl-traces/run/:runId', async (req: Request, res: Response) => { + try { + const { runId } = req.params; + const trace = await getTraceByRunId(runId); + + if (!trace) { + return res.status(404).json({ error: 'Trace not found for this run ID' }); + } + + res.json(trace); + } catch (error: any) { + res.status(500).json({ error: error.message }); + } +}); + +// ============================================================ +// SCRAPER OVERVIEW DASHBOARD ENDPOINTS +// ============================================================ + +/** + * GET /api/dutchie-az/scraper/overview + * Comprehensive scraper overview for the new dashboard + */ +router.get('/scraper/overview', async (_req: Request, res: Response) => { + try { + // 1. Core KPI metrics + const { rows: kpiRows } = await query(` + SELECT + -- Total products + (SELECT COUNT(*) FROM dutchie_products) AS total_products, + (SELECT COUNT(*) FROM dutchie_products WHERE stock_status = 'in_stock') AS in_stock_products, + -- Total dispensaries + (SELECT COUNT(*) FROM dispensaries WHERE menu_type = 'dutchie' AND state = 'AZ') AS total_dispensaries, + (SELECT COUNT(*) FROM dispensaries WHERE menu_type = 'dutchie' AND state = 'AZ' AND platform_dispensary_id IS NOT NULL) AS crawlable_dispensaries, + -- Visibility stats (24h) + (SELECT COUNT(*) FROM dutchie_products WHERE visibility_lost = true AND visibility_lost_at > NOW() - INTERVAL '24 hours') AS visibility_lost_24h, + (SELECT COUNT(*) FROM dutchie_products WHERE visibility_restored_at > NOW() - INTERVAL '24 hours') AS visibility_restored_24h, + (SELECT COUNT(*) FROM dutchie_products WHERE visibility_lost = true) AS total_visibility_lost, + -- Job stats (24h) + (SELECT COUNT(*) FROM job_run_logs WHERE status IN ('error', 'partial') AND created_at > NOW() - INTERVAL '24 hours') AS errors_24h, + (SELECT COUNT(*) FROM job_run_logs WHERE status = 'success' AND created_at > NOW() - INTERVAL '24 hours') AS successful_jobs_24h, + -- Active workers + (SELECT COUNT(*) FROM job_schedules WHERE enabled = true) AS active_workers + `); + + // 2. Get active worker names + const { rows: workerRows } = await query(` + SELECT worker_name, worker_role, enabled, last_status, last_run_at, next_run_at + FROM job_schedules + WHERE enabled = true + ORDER BY next_run_at ASC NULLS LAST + `); + + // 3. Scrape activity by hour (last 24h) + const { rows: activityRows } = await query(` + SELECT + date_trunc('hour', started_at) AS hour, + COUNT(*) FILTER (WHERE status = 'success') AS successful, + COUNT(*) FILTER (WHERE status IN ('error', 'partial')) AS failed, + COUNT(*) AS total + FROM job_run_logs + WHERE started_at > NOW() - INTERVAL '24 hours' + GROUP BY date_trunc('hour', started_at) + ORDER BY hour ASC + `); + + // 4. Product growth / coverage (last 7 days) + const { rows: growthRows } = await query(` + SELECT + date_trunc('day', created_at) AS day, + COUNT(*) AS new_products + FROM dutchie_products + WHERE created_at > NOW() - INTERVAL '7 days' + GROUP BY date_trunc('day', created_at) + ORDER BY day ASC + `); + + // 5. Recent worker runs (last 20) + const { rows: recentRuns } = await query(` + SELECT + jrl.id, + jrl.job_name, + jrl.status, + jrl.started_at, + jrl.completed_at, + jrl.items_processed, + jrl.items_succeeded, + jrl.items_failed, + jrl.metadata, + js.worker_name, + js.worker_role + FROM job_run_logs jrl + LEFT JOIN job_schedules js ON jrl.schedule_id = js.id + ORDER BY jrl.started_at DESC + LIMIT 20 + `); + + // 6. Recent visibility changes by store + const { rows: visibilityChanges } = await query(` + SELECT + d.id AS dispensary_id, + d.name AS dispensary_name, + d.state, + COUNT(dp.id) FILTER (WHERE dp.visibility_lost = true AND dp.visibility_lost_at > NOW() - INTERVAL '24 hours') AS lost_24h, + COUNT(dp.id) FILTER (WHERE dp.visibility_restored_at > NOW() - INTERVAL '24 hours') AS restored_24h, + MAX(dp.visibility_lost_at) AS latest_loss, + MAX(dp.visibility_restored_at) AS latest_restore + FROM dispensaries d + LEFT JOIN dutchie_products dp ON d.id = dp.dispensary_id + WHERE d.menu_type = 'dutchie' + GROUP BY d.id, d.name, d.state + HAVING COUNT(dp.id) FILTER (WHERE dp.visibility_lost = true AND dp.visibility_lost_at > NOW() - INTERVAL '24 hours') > 0 + OR COUNT(dp.id) FILTER (WHERE dp.visibility_restored_at > NOW() - INTERVAL '24 hours') > 0 + ORDER BY lost_24h DESC, restored_24h DESC + LIMIT 15 + `); + + const kpi = kpiRows[0] || {}; + + res.json({ + kpi: { + totalProducts: parseInt(kpi.total_products || '0'), + inStockProducts: parseInt(kpi.in_stock_products || '0'), + totalDispensaries: parseInt(kpi.total_dispensaries || '0'), + crawlableDispensaries: parseInt(kpi.crawlable_dispensaries || '0'), + visibilityLost24h: parseInt(kpi.visibility_lost_24h || '0'), + visibilityRestored24h: parseInt(kpi.visibility_restored_24h || '0'), + totalVisibilityLost: parseInt(kpi.total_visibility_lost || '0'), + errors24h: parseInt(kpi.errors_24h || '0'), + successfulJobs24h: parseInt(kpi.successful_jobs_24h || '0'), + activeWorkers: parseInt(kpi.active_workers || '0'), + }, + workers: workerRows, + activityByHour: activityRows.map((row: any) => ({ + hour: row.hour, + successful: parseInt(row.successful || '0'), + failed: parseInt(row.failed || '0'), + total: parseInt(row.total || '0'), + })), + productGrowth: growthRows.map((row: any) => ({ + day: row.day, + newProducts: parseInt(row.new_products || '0'), + })), + recentRuns: recentRuns.map((row: any) => ({ + id: row.id, + jobName: row.job_name, + status: row.status, + startedAt: row.started_at, + completedAt: row.completed_at, + itemsProcessed: row.items_processed, + itemsSucceeded: row.items_succeeded, + itemsFailed: row.items_failed, + workerName: row.worker_name, + workerRole: row.worker_role, + visibilityLost: row.metadata?.visibilityLostCount || 0, + visibilityRestored: row.metadata?.visibilityRestoredCount || 0, + })), + visibilityChanges: visibilityChanges.map((row: any) => ({ + dispensaryId: row.dispensary_id, + dispensaryName: row.dispensary_name, + state: row.state, + lost24h: parseInt(row.lost_24h || '0'), + restored24h: parseInt(row.restored_24h || '0'), + latestLoss: row.latest_loss, + latestRestore: row.latest_restore, + })), + }); + } catch (error: any) { + console.error('Error fetching scraper overview:', error); + res.status(500).json({ error: error.message }); + } +}); + export default router; diff --git a/backend/src/dutchie-az/scripts/stress-test.ts b/backend/src/dutchie-az/scripts/stress-test.ts new file mode 100644 index 00000000..ad82b208 --- /dev/null +++ b/backend/src/dutchie-az/scripts/stress-test.ts @@ -0,0 +1,486 @@ +#!/usr/bin/env npx tsx +/** + * Crawler Reliability Stress Test + * + * Simulates various failure scenarios to test: + * - Retry logic with exponential backoff + * - Error taxonomy classification + * - Self-healing (proxy/UA rotation) + * - Status transitions (active -> degraded -> failed) + * - Minimum crawl gap enforcement + * + * Phase 1: Crawler Reliability & Stabilization + * + * Usage: + * DATABASE_URL="postgresql://..." npx tsx src/dutchie-az/scripts/stress-test.ts [test-name] + * + * Available tests: + * retry - Test retry manager with various error types + * backoff - Test exponential backoff calculation + * status - Test status transitions + * gap - Test minimum crawl gap enforcement + * rotation - Test proxy/UA rotation + * all - Run all tests + */ + +import { + CrawlErrorCode, + classifyError, + isRetryable, + shouldRotateProxy, + shouldRotateUserAgent, + getBackoffMultiplier, + getErrorMetadata, +} from '../services/error-taxonomy'; + +import { + RetryManager, + withRetry, + calculateNextCrawlDelay, + calculateNextCrawlAt, + determineCrawlStatus, + shouldAttemptRecovery, + sleep, +} from '../services/retry-manager'; + +import { + UserAgentRotator, + USER_AGENTS, +} from '../services/proxy-rotator'; + +import { + validateStoreConfig, + isCrawlable, + DEFAULT_CONFIG, + RawStoreConfig, +} from '../services/store-validator'; + +// ============================================================ +// TEST UTILITIES +// ============================================================ + +let testsPassed = 0; +let testsFailed = 0; + +function assert(condition: boolean, message: string): void { + if (condition) { + console.log(` ✓ ${message}`); + testsPassed++; + } else { + console.log(` ✗ ${message}`); + testsFailed++; + } +} + +function section(name: string): void { + console.log(`\n${'='.repeat(60)}`); + console.log(`TEST: ${name}`); + console.log('='.repeat(60)); +} + +// ============================================================ +// TEST: Error Classification +// ============================================================ + +function testErrorClassification(): void { + section('Error Classification'); + + // HTTP status codes + assert(classifyError(null, 429) === CrawlErrorCode.RATE_LIMITED, '429 -> RATE_LIMITED'); + assert(classifyError(null, 407) === CrawlErrorCode.BLOCKED_PROXY, '407 -> BLOCKED_PROXY'); + assert(classifyError(null, 401) === CrawlErrorCode.AUTH_FAILED, '401 -> AUTH_FAILED'); + assert(classifyError(null, 403) === CrawlErrorCode.AUTH_FAILED, '403 -> AUTH_FAILED'); + assert(classifyError(null, 503) === CrawlErrorCode.SERVICE_UNAVAILABLE, '503 -> SERVICE_UNAVAILABLE'); + assert(classifyError(null, 500) === CrawlErrorCode.SERVER_ERROR, '500 -> SERVER_ERROR'); + + // Error messages + assert(classifyError('rate limit exceeded') === CrawlErrorCode.RATE_LIMITED, 'rate limit message -> RATE_LIMITED'); + assert(classifyError('request timed out') === CrawlErrorCode.TIMEOUT, 'timeout message -> TIMEOUT'); + assert(classifyError('proxy blocked') === CrawlErrorCode.BLOCKED_PROXY, 'proxy blocked -> BLOCKED_PROXY'); + assert(classifyError('ECONNREFUSED') === CrawlErrorCode.NETWORK_ERROR, 'ECONNREFUSED -> NETWORK_ERROR'); + assert(classifyError('ENOTFOUND') === CrawlErrorCode.DNS_ERROR, 'ENOTFOUND -> DNS_ERROR'); + assert(classifyError('selector not found') === CrawlErrorCode.HTML_CHANGED, 'selector error -> HTML_CHANGED'); + assert(classifyError('JSON parse error') === CrawlErrorCode.PARSE_ERROR, 'parse error -> PARSE_ERROR'); + assert(classifyError('0 products found') === CrawlErrorCode.NO_PRODUCTS, 'no products -> NO_PRODUCTS'); + + // Retryability + assert(isRetryable(CrawlErrorCode.RATE_LIMITED) === true, 'RATE_LIMITED is retryable'); + assert(isRetryable(CrawlErrorCode.TIMEOUT) === true, 'TIMEOUT is retryable'); + assert(isRetryable(CrawlErrorCode.HTML_CHANGED) === false, 'HTML_CHANGED is NOT retryable'); + assert(isRetryable(CrawlErrorCode.INVALID_CONFIG) === false, 'INVALID_CONFIG is NOT retryable'); + + // Rotation decisions + assert(shouldRotateProxy(CrawlErrorCode.BLOCKED_PROXY) === true, 'BLOCKED_PROXY -> rotate proxy'); + assert(shouldRotateProxy(CrawlErrorCode.RATE_LIMITED) === true, 'RATE_LIMITED -> rotate proxy'); + assert(shouldRotateUserAgent(CrawlErrorCode.AUTH_FAILED) === true, 'AUTH_FAILED -> rotate UA'); +} + +// ============================================================ +// TEST: Retry Manager +// ============================================================ + +function testRetryManager(): void { + section('Retry Manager'); + + const manager = new RetryManager({ maxRetries: 3, baseBackoffMs: 100 }); + + // Initial state + assert(manager.shouldAttempt() === true, 'Should attempt initially'); + assert(manager.getAttemptNumber() === 1, 'Attempt number starts at 1'); + + // First attempt + manager.recordAttempt(); + assert(manager.getAttemptNumber() === 2, 'Attempt number increments'); + + // Evaluate retryable error + const decision1 = manager.evaluateError(new Error('rate limit exceeded'), 429); + assert(decision1.shouldRetry === true, 'Should retry on rate limit'); + assert(decision1.errorCode === CrawlErrorCode.RATE_LIMITED, 'Error code is RATE_LIMITED'); + assert(decision1.rotateProxy === true, 'Should rotate proxy'); + assert(decision1.backoffMs > 0, 'Backoff is positive'); + + // More attempts + manager.recordAttempt(); + manager.recordAttempt(); + + // Now at max retries + const decision2 = manager.evaluateError(new Error('timeout'), 504); + assert(decision2.shouldRetry === true, 'Should still retry (at limit but not exceeded)'); + + manager.recordAttempt(); + const decision3 = manager.evaluateError(new Error('timeout')); + assert(decision3.shouldRetry === false, 'Should NOT retry after max'); + assert(decision3.reason.includes('exhausted'), 'Reason mentions exhausted'); + + // Reset + manager.reset(); + assert(manager.shouldAttempt() === true, 'Should attempt after reset'); + assert(manager.getAttemptNumber() === 1, 'Attempt number resets'); + + // Non-retryable error + const manager2 = new RetryManager({ maxRetries: 3 }); + manager2.recordAttempt(); + const nonRetryable = manager2.evaluateError(new Error('HTML structure changed')); + assert(nonRetryable.shouldRetry === false, 'Non-retryable error stops immediately'); + assert(nonRetryable.errorCode === CrawlErrorCode.HTML_CHANGED, 'Error code is HTML_CHANGED'); +} + +// ============================================================ +// TEST: Exponential Backoff +// ============================================================ + +function testExponentialBackoff(): void { + section('Exponential Backoff'); + + // Calculate next crawl delay + const delay0 = calculateNextCrawlDelay(0, 240); // No failures + const delay1 = calculateNextCrawlDelay(1, 240); // 1 failure + const delay2 = calculateNextCrawlDelay(2, 240); // 2 failures + const delay3 = calculateNextCrawlDelay(3, 240); // 3 failures + const delay5 = calculateNextCrawlDelay(5, 240); // 5 failures (should cap) + + console.log(` Delay with 0 failures: ${delay0} minutes`); + console.log(` Delay with 1 failure: ${delay1} minutes`); + console.log(` Delay with 2 failures: ${delay2} minutes`); + console.log(` Delay with 3 failures: ${delay3} minutes`); + console.log(` Delay with 5 failures: ${delay5} minutes`); + + assert(delay1 > delay0, 'Delay increases with failures'); + assert(delay2 > delay1, 'Delay keeps increasing'); + assert(delay3 > delay2, 'More delay with more failures'); + // With jitter, exact values vary but ratio should be close to 2x + assert(delay5 <= 240 * 4 * 1.2, 'Delay is capped at max multiplier'); + + // Next crawl time calculation + const now = new Date(); + const nextAt = calculateNextCrawlAt(2, 240); + assert(nextAt > now, 'Next crawl is in future'); + assert(nextAt.getTime() - now.getTime() > 240 * 60 * 1000, 'Includes backoff'); +} + +// ============================================================ +// TEST: Status Transitions +// ============================================================ + +function testStatusTransitions(): void { + section('Status Transitions'); + + // Active status + assert(determineCrawlStatus(0) === 'active', '0 failures -> active'); + assert(determineCrawlStatus(1) === 'active', '1 failure -> active'); + assert(determineCrawlStatus(2) === 'active', '2 failures -> active'); + + // Degraded status + assert(determineCrawlStatus(3) === 'degraded', '3 failures -> degraded'); + assert(determineCrawlStatus(5) === 'degraded', '5 failures -> degraded'); + assert(determineCrawlStatus(9) === 'degraded', '9 failures -> degraded'); + + // Failed status + assert(determineCrawlStatus(10) === 'failed', '10 failures -> failed'); + assert(determineCrawlStatus(15) === 'failed', '15 failures -> failed'); + + // Custom thresholds + const customStatus = determineCrawlStatus(5, { degraded: 5, failed: 8 }); + assert(customStatus === 'degraded', 'Custom threshold: 5 -> degraded'); + + // Recovery check + const recentFailure = new Date(Date.now() - 1 * 60 * 60 * 1000); // 1 hour ago + const oldFailure = new Date(Date.now() - 48 * 60 * 60 * 1000); // 48 hours ago + + assert(shouldAttemptRecovery(recentFailure, 1) === false, 'No recovery for recent failure'); + assert(shouldAttemptRecovery(oldFailure, 1) === true, 'Recovery allowed for old failure'); + assert(shouldAttemptRecovery(null, 0) === true, 'Recovery allowed if no previous failure'); +} + +// ============================================================ +// TEST: Store Validation +// ============================================================ + +function testStoreValidation(): void { + section('Store Validation'); + + // Valid config + const validConfig: RawStoreConfig = { + id: 1, + name: 'Test Store', + platformDispensaryId: '123abc', + menuType: 'dutchie', + }; + const validResult = validateStoreConfig(validConfig); + assert(validResult.isValid === true, 'Valid config passes'); + assert(validResult.config !== null, 'Valid config returns config'); + assert(validResult.config?.slug === 'test-store', 'Slug is generated'); + + // Missing required fields + const missingId: RawStoreConfig = { + id: 0, + name: 'Test', + platformDispensaryId: '123', + menuType: 'dutchie', + }; + const missingIdResult = validateStoreConfig(missingId); + assert(missingIdResult.isValid === false, 'Missing ID fails'); + + // Missing platform ID + const missingPlatform: RawStoreConfig = { + id: 1, + name: 'Test', + menuType: 'dutchie', + }; + const missingPlatformResult = validateStoreConfig(missingPlatform); + assert(missingPlatformResult.isValid === false, 'Missing platform ID fails'); + + // Unknown menu type + const unknownMenu: RawStoreConfig = { + id: 1, + name: 'Test', + platformDispensaryId: '123', + menuType: 'unknown', + }; + const unknownMenuResult = validateStoreConfig(unknownMenu); + assert(unknownMenuResult.isValid === false, 'Unknown menu type fails'); + + // Crawlable check + assert(isCrawlable(validConfig) === true, 'Valid config is crawlable'); + assert(isCrawlable(missingPlatform) === false, 'Missing platform not crawlable'); + assert(isCrawlable({ ...validConfig, crawlStatus: 'failed' }) === false, 'Failed status not crawlable'); + assert(isCrawlable({ ...validConfig, crawlStatus: 'paused' }) === false, 'Paused status not crawlable'); +} + +// ============================================================ +// TEST: User Agent Rotation +// ============================================================ + +function testUserAgentRotation(): void { + section('User Agent Rotation'); + + const rotator = new UserAgentRotator(); + + const first = rotator.getCurrent(); + const second = rotator.getNext(); + const third = rotator.getNext(); + + assert(first !== second, 'User agents rotate'); + assert(second !== third, 'User agents keep rotating'); + assert(USER_AGENTS.includes(first), 'Returns valid UA'); + assert(USER_AGENTS.includes(second), 'Returns valid UA'); + + // Random UA + const random = rotator.getRandom(); + assert(USER_AGENTS.includes(random), 'Random returns valid UA'); + + // Count + assert(rotator.getCount() === USER_AGENTS.length, 'Reports correct count'); +} + +// ============================================================ +// TEST: WithRetry Helper +// ============================================================ + +async function testWithRetryHelper(): Promise { + section('WithRetry Helper'); + + // Successful on first try + let attempts = 0; + const successResult = await withRetry(async () => { + attempts++; + return 'success'; + }, { maxRetries: 3 }); + assert(attempts === 1, 'Succeeds on first try'); + assert(successResult.result === 'success', 'Returns result'); + + // Fails then succeeds + let failThenSucceedAttempts = 0; + const failThenSuccessResult = await withRetry(async () => { + failThenSucceedAttempts++; + if (failThenSucceedAttempts < 3) { + throw new Error('temporary error'); + } + return 'finally succeeded'; + }, { maxRetries: 5, baseBackoffMs: 10 }); + assert(failThenSucceedAttempts === 3, 'Retries until success'); + assert(failThenSuccessResult.result === 'finally succeeded', 'Returns final result'); + assert(failThenSuccessResult.summary.attemptsMade === 3, 'Summary tracks attempts'); + + // Exhausts retries + let alwaysFailAttempts = 0; + try { + await withRetry(async () => { + alwaysFailAttempts++; + throw new Error('always fails'); + }, { maxRetries: 2, baseBackoffMs: 10 }); + assert(false, 'Should have thrown'); + } catch (error: any) { + assert(alwaysFailAttempts === 3, 'Attempts all retries'); // 1 initial + 2 retries + assert(error.name === 'RetryExhaustedError', 'Throws RetryExhaustedError'); + } + + // Non-retryable error stops immediately + let nonRetryableAttempts = 0; + try { + await withRetry(async () => { + nonRetryableAttempts++; + const err = new Error('HTML structure changed - selector not found'); + throw err; + }, { maxRetries: 3, baseBackoffMs: 10 }); + assert(false, 'Should have thrown'); + } catch { + assert(nonRetryableAttempts === 1, 'Non-retryable stops immediately'); + } +} + +// ============================================================ +// TEST: Minimum Crawl Gap +// ============================================================ + +function testMinimumCrawlGap(): void { + section('Minimum Crawl Gap'); + + // Default config + assert(DEFAULT_CONFIG.minCrawlGapMinutes === 2, 'Default gap is 2 minutes'); + assert(DEFAULT_CONFIG.crawlFrequencyMinutes === 240, 'Default frequency is 4 hours'); + + // Gap calculation + const gapMs = DEFAULT_CONFIG.minCrawlGapMinutes * 60 * 1000; + assert(gapMs === 120000, 'Gap is 2 minutes in ms'); + + console.log(' Note: Gap enforcement is tested at DB level (trigger) and application level'); +} + +// ============================================================ +// TEST: Error Metadata +// ============================================================ + +function testErrorMetadata(): void { + section('Error Metadata'); + + // RATE_LIMITED + const rateLimited = getErrorMetadata(CrawlErrorCode.RATE_LIMITED); + assert(rateLimited.retryable === true, 'RATE_LIMITED is retryable'); + assert(rateLimited.rotateProxy === true, 'RATE_LIMITED rotates proxy'); + assert(rateLimited.backoffMultiplier === 2.0, 'RATE_LIMITED has 2x backoff'); + assert(rateLimited.severity === 'medium', 'RATE_LIMITED is medium severity'); + + // HTML_CHANGED + const htmlChanged = getErrorMetadata(CrawlErrorCode.HTML_CHANGED); + assert(htmlChanged.retryable === false, 'HTML_CHANGED is NOT retryable'); + assert(htmlChanged.severity === 'high', 'HTML_CHANGED is high severity'); + + // INVALID_CONFIG + const invalidConfig = getErrorMetadata(CrawlErrorCode.INVALID_CONFIG); + assert(invalidConfig.retryable === false, 'INVALID_CONFIG is NOT retryable'); + assert(invalidConfig.severity === 'critical', 'INVALID_CONFIG is critical'); +} + +// ============================================================ +// MAIN +// ============================================================ + +async function runTests(testName?: string): Promise { + console.log('\n'); + console.log('╔══════════════════════════════════════════════════════════╗'); + console.log('║ CRAWLER RELIABILITY STRESS TEST - PHASE 1 ║'); + console.log('╚══════════════════════════════════════════════════════════╝'); + + const allTests = !testName || testName === 'all'; + + if (allTests || testName === 'error' || testName === 'classification') { + testErrorClassification(); + } + + if (allTests || testName === 'retry') { + testRetryManager(); + } + + if (allTests || testName === 'backoff') { + testExponentialBackoff(); + } + + if (allTests || testName === 'status') { + testStatusTransitions(); + } + + if (allTests || testName === 'validation' || testName === 'store') { + testStoreValidation(); + } + + if (allTests || testName === 'rotation' || testName === 'ua') { + testUserAgentRotation(); + } + + if (allTests || testName === 'withRetry' || testName === 'helper') { + await testWithRetryHelper(); + } + + if (allTests || testName === 'gap') { + testMinimumCrawlGap(); + } + + if (allTests || testName === 'metadata') { + testErrorMetadata(); + } + + // Summary + console.log('\n'); + console.log('═'.repeat(60)); + console.log('SUMMARY'); + console.log('═'.repeat(60)); + console.log(` Passed: ${testsPassed}`); + console.log(` Failed: ${testsFailed}`); + console.log(` Total: ${testsPassed + testsFailed}`); + + if (testsFailed > 0) { + console.log('\n❌ SOME TESTS FAILED\n'); + process.exit(1); + } else { + console.log('\n✅ ALL TESTS PASSED\n'); + process.exit(0); + } +} + +// Run tests +const testName = process.argv[2]; +runTests(testName).catch((error) => { + console.error('Fatal error:', error); + process.exit(1); +}); diff --git a/backend/src/dutchie-az/services/analytics/brand-opportunity.ts b/backend/src/dutchie-az/services/analytics/brand-opportunity.ts new file mode 100644 index 00000000..b23817e9 --- /dev/null +++ b/backend/src/dutchie-az/services/analytics/brand-opportunity.ts @@ -0,0 +1,659 @@ +/** + * Brand Opportunity / Risk Analytics Service + * + * Provides brand-level opportunity and risk analysis including: + * - Under/overpriced vs market + * - Missing SKU opportunities + * - Stores with declining/growing shelf share + * - Competitor intrusion alerts + * + * Phase 3: Analytics Dashboards + */ + +import { Pool } from 'pg'; +import { AnalyticsCache, cacheKey } from './cache'; + +export interface BrandOpportunity { + brandName: string; + underpricedVsMarket: PricePosition[]; + overpricedVsMarket: PricePosition[]; + missingSkuOpportunities: MissingSkuOpportunity[]; + storesWithDecliningShelfShare: StoreShelfShareChange[]; + storesWithGrowingShelfShare: StoreShelfShareChange[]; + competitorIntrusionAlerts: CompetitorAlert[]; + overallScore: number; // 0-100, higher = more opportunity + riskScore: number; // 0-100, higher = more risk +} + +export interface PricePosition { + category: string; + brandAvgPrice: number; + marketAvgPrice: number; + priceDifferencePercent: number; + skuCount: number; + suggestion: string; +} + +export interface MissingSkuOpportunity { + category: string; + subcategory: string | null; + marketSkuCount: number; + brandSkuCount: number; + gapPercent: number; + topCompetitors: string[]; + opportunityScore: number; // 0-100 +} + +export interface StoreShelfShareChange { + storeId: number; + storeName: string; + city: string; + state: string; + currentShelfShare: number; + previousShelfShare: number; + changePercent: number; + currentSkus: number; + competitors: string[]; +} + +export interface CompetitorAlert { + competitorBrand: string; + storeId: number; + storeName: string; + alertType: 'new_entry' | 'expanding' | 'price_undercut'; + details: string; + severity: 'low' | 'medium' | 'high'; + date: string; +} + +export interface MarketPositionSummary { + brandName: string; + marketSharePercent: number; + avgPriceVsMarket: number; // -X% to +X% + categoryStrengths: Array<{ category: string; shelfSharePercent: number }>; + categoryWeaknesses: Array<{ category: string; shelfSharePercent: number; marketLeader: string }>; + growthTrend: 'growing' | 'stable' | 'declining'; + competitorThreats: string[]; +} + +export class BrandOpportunityService { + private pool: Pool; + private cache: AnalyticsCache; + + constructor(pool: Pool, cache: AnalyticsCache) { + this.pool = pool; + this.cache = cache; + } + + /** + * Get full opportunity analysis for a brand + */ + async getBrandOpportunity(brandName: string): Promise { + const key = cacheKey('brand_opportunity', { brandName }); + + return (await this.cache.getOrCompute(key, async () => { + const [ + underpriced, + overpriced, + missingSkus, + decliningStores, + growingStores, + alerts, + ] = await Promise.all([ + this.getUnderpricedPositions(brandName), + this.getOverpricedPositions(brandName), + this.getMissingSkuOpportunities(brandName), + this.getStoresWithDecliningShare(brandName), + this.getStoresWithGrowingShare(brandName), + this.getCompetitorAlerts(brandName), + ]); + + // Calculate opportunity score (higher = more opportunity) + const opportunityFactors = [ + missingSkus.length > 0 ? 20 : 0, + underpriced.length > 0 ? 15 : 0, + growingStores.length > 5 ? 20 : growingStores.length * 3, + missingSkus.reduce((sum, m) => sum + m.opportunityScore, 0) / Math.max(1, missingSkus.length) * 0.3, + ]; + const opportunityScore = Math.min(100, opportunityFactors.reduce((a, b) => a + b, 0)); + + // Calculate risk score (higher = more risk) + const riskFactors = [ + decliningStores.length > 5 ? 30 : decliningStores.length * 5, + alerts.filter(a => a.severity === 'high').length * 15, + alerts.filter(a => a.severity === 'medium').length * 8, + overpriced.length > 3 ? 15 : overpriced.length * 3, + ]; + const riskScore = Math.min(100, riskFactors.reduce((a, b) => a + b, 0)); + + return { + brandName, + underpricedVsMarket: underpriced, + overpricedVsMarket: overpriced, + missingSkuOpportunities: missingSkus, + storesWithDecliningShelfShare: decliningStores, + storesWithGrowingShelfShare: growingStores, + competitorIntrusionAlerts: alerts, + overallScore: Math.round(opportunityScore), + riskScore: Math.round(riskScore), + }; + }, 30)).data; + } + + /** + * Get categories where brand is underpriced vs market + */ + async getUnderpricedPositions(brandName: string): Promise { + const result = await this.pool.query(` + WITH brand_prices AS ( + SELECT + type as category, + AVG(extract_min_price(latest_raw_payload)) as brand_avg, + COUNT(*) as sku_count + FROM dutchie_products + WHERE brand_name = $1 AND type IS NOT NULL + GROUP BY type + HAVING COUNT(*) >= 3 + ), + market_prices AS ( + SELECT + type as category, + AVG(extract_min_price(latest_raw_payload)) as market_avg + FROM dutchie_products + WHERE type IS NOT NULL AND brand_name != $1 + GROUP BY type + ) + SELECT + bp.category, + bp.brand_avg, + mp.market_avg, + bp.sku_count, + ((bp.brand_avg - mp.market_avg) / NULLIF(mp.market_avg, 0)) * 100 as diff_pct + FROM brand_prices bp + JOIN market_prices mp ON bp.category = mp.category + WHERE bp.brand_avg < mp.market_avg * 0.9 -- 10% or more below market + AND bp.brand_avg IS NOT NULL + AND mp.market_avg IS NOT NULL + ORDER BY diff_pct + `, [brandName]); + + return result.rows.map(row => ({ + category: row.category, + brandAvgPrice: Math.round(parseFloat(row.brand_avg) * 100) / 100, + marketAvgPrice: Math.round(parseFloat(row.market_avg) * 100) / 100, + priceDifferencePercent: Math.round(parseFloat(row.diff_pct) * 10) / 10, + skuCount: parseInt(row.sku_count) || 0, + suggestion: `Consider price increase - ${Math.abs(Math.round(parseFloat(row.diff_pct)))}% below market average`, + })); + } + + /** + * Get categories where brand is overpriced vs market + */ + async getOverpricedPositions(brandName: string): Promise { + const result = await this.pool.query(` + WITH brand_prices AS ( + SELECT + type as category, + AVG(extract_min_price(latest_raw_payload)) as brand_avg, + COUNT(*) as sku_count + FROM dutchie_products + WHERE brand_name = $1 AND type IS NOT NULL + GROUP BY type + HAVING COUNT(*) >= 3 + ), + market_prices AS ( + SELECT + type as category, + AVG(extract_min_price(latest_raw_payload)) as market_avg + FROM dutchie_products + WHERE type IS NOT NULL AND brand_name != $1 + GROUP BY type + ) + SELECT + bp.category, + bp.brand_avg, + mp.market_avg, + bp.sku_count, + ((bp.brand_avg - mp.market_avg) / NULLIF(mp.market_avg, 0)) * 100 as diff_pct + FROM brand_prices bp + JOIN market_prices mp ON bp.category = mp.category + WHERE bp.brand_avg > mp.market_avg * 1.15 -- 15% or more above market + AND bp.brand_avg IS NOT NULL + AND mp.market_avg IS NOT NULL + ORDER BY diff_pct DESC + `, [brandName]); + + return result.rows.map(row => ({ + category: row.category, + brandAvgPrice: Math.round(parseFloat(row.brand_avg) * 100) / 100, + marketAvgPrice: Math.round(parseFloat(row.market_avg) * 100) / 100, + priceDifferencePercent: Math.round(parseFloat(row.diff_pct) * 10) / 10, + skuCount: parseInt(row.sku_count) || 0, + suggestion: `Price sensitivity risk - ${Math.round(parseFloat(row.diff_pct))}% above market average`, + })); + } + + /** + * Get missing SKU opportunities (category gaps) + */ + async getMissingSkuOpportunities(brandName: string): Promise { + const result = await this.pool.query(` + WITH market_categories AS ( + SELECT + type as category, + subcategory, + COUNT(*) as market_skus, + ARRAY_AGG(DISTINCT brand_name ORDER BY brand_name) FILTER (WHERE brand_name IS NOT NULL) as top_brands + FROM dutchie_products + WHERE type IS NOT NULL + GROUP BY type, subcategory + HAVING COUNT(*) >= 20 + ), + brand_presence AS ( + SELECT + type as category, + subcategory, + COUNT(*) as brand_skus + FROM dutchie_products + WHERE brand_name = $1 AND type IS NOT NULL + GROUP BY type, subcategory + ) + SELECT + mc.category, + mc.subcategory, + mc.market_skus, + COALESCE(bp.brand_skus, 0) as brand_skus, + mc.top_brands[1:5] as competitors + FROM market_categories mc + LEFT JOIN brand_presence bp ON mc.category = bp.category + AND (mc.subcategory = bp.subcategory OR (mc.subcategory IS NULL AND bp.subcategory IS NULL)) + WHERE COALESCE(bp.brand_skus, 0) < mc.market_skus * 0.05 -- Brand has <5% of market presence + ORDER BY mc.market_skus DESC + LIMIT 10 + `, [brandName]); + + return result.rows.map(row => { + const marketSkus = parseInt(row.market_skus) || 0; + const brandSkus = parseInt(row.brand_skus) || 0; + const gapPercent = marketSkus > 0 ? ((marketSkus - brandSkus) / marketSkus) * 100 : 100; + const opportunityScore = Math.min(100, Math.round((marketSkus / 100) * (gapPercent / 100) * 100)); + + return { + category: row.category, + subcategory: row.subcategory, + marketSkuCount: marketSkus, + brandSkuCount: brandSkus, + gapPercent: Math.round(gapPercent), + topCompetitors: (row.competitors || []).filter((c: string) => c !== brandName).slice(0, 5), + opportunityScore, + }; + }); + } + + /** + * Get stores where brand's shelf share is declining + */ + async getStoresWithDecliningShare(brandName: string): Promise { + // Use brand_snapshots for historical comparison + const result = await this.pool.query(` + WITH current_share AS ( + SELECT + dp.dispensary_id as store_id, + d.name as store_name, + d.city, + d.state, + COUNT(*) FILTER (WHERE dp.brand_name = $1) as brand_skus, + COUNT(*) as total_skus, + ARRAY_AGG(DISTINCT dp.brand_name) FILTER (WHERE dp.brand_name != $1 AND dp.brand_name IS NOT NULL) as competitors + FROM dutchie_products dp + JOIN dispensaries d ON dp.dispensary_id = d.id + GROUP BY dp.dispensary_id, d.name, d.city, d.state + HAVING COUNT(*) FILTER (WHERE dp.brand_name = $1) > 0 + ) + SELECT + cs.store_id, + cs.store_name, + cs.city, + cs.state, + cs.brand_skus as current_skus, + cs.total_skus, + ROUND((cs.brand_skus::NUMERIC / cs.total_skus) * 100, 2) as current_share, + cs.competitors[1:5] as top_competitors + FROM current_share cs + WHERE cs.brand_skus < 10 -- Low presence + ORDER BY cs.brand_skus + LIMIT 10 + `, [brandName]); + + return result.rows.map(row => ({ + storeId: row.store_id, + storeName: row.store_name, + city: row.city, + state: row.state, + currentShelfShare: parseFloat(row.current_share) || 0, + previousShelfShare: parseFloat(row.current_share) || 0, // Would need historical data + changePercent: 0, + currentSkus: parseInt(row.current_skus) || 0, + competitors: row.top_competitors || [], + })); + } + + /** + * Get stores where brand's shelf share is growing + */ + async getStoresWithGrowingShare(brandName: string): Promise { + const result = await this.pool.query(` + WITH store_share AS ( + SELECT + dp.dispensary_id as store_id, + d.name as store_name, + d.city, + d.state, + COUNT(*) FILTER (WHERE dp.brand_name = $1) as brand_skus, + COUNT(*) as total_skus, + ARRAY_AGG(DISTINCT dp.brand_name) FILTER (WHERE dp.brand_name != $1 AND dp.brand_name IS NOT NULL) as competitors + FROM dutchie_products dp + JOIN dispensaries d ON dp.dispensary_id = d.id + GROUP BY dp.dispensary_id, d.name, d.city, d.state + HAVING COUNT(*) FILTER (WHERE dp.brand_name = $1) > 0 + ) + SELECT + ss.store_id, + ss.store_name, + ss.city, + ss.state, + ss.brand_skus as current_skus, + ss.total_skus, + ROUND((ss.brand_skus::NUMERIC / ss.total_skus) * 100, 2) as current_share, + ss.competitors[1:5] as top_competitors + FROM store_share ss + ORDER BY current_share DESC + LIMIT 10 + `, [brandName]); + + return result.rows.map(row => ({ + storeId: row.store_id, + storeName: row.store_name, + city: row.city, + state: row.state, + currentShelfShare: parseFloat(row.current_share) || 0, + previousShelfShare: parseFloat(row.current_share) || 0, + changePercent: 0, + currentSkus: parseInt(row.current_skus) || 0, + competitors: row.top_competitors || [], + })); + } + + /** + * Get competitor intrusion alerts + */ + async getCompetitorAlerts(brandName: string): Promise { + // Check for competitor entries in stores where this brand has presence + const result = await this.pool.query(` + WITH brand_stores AS ( + SELECT DISTINCT dispensary_id + FROM dutchie_products + WHERE brand_name = $1 + ), + competitor_presence AS ( + SELECT + dp.brand_name as competitor, + dp.dispensary_id as store_id, + d.name as store_name, + COUNT(*) as sku_count, + MAX(dp.created_at) as latest_add + FROM dutchie_products dp + JOIN dispensaries d ON dp.dispensary_id = d.id + WHERE dp.dispensary_id IN (SELECT dispensary_id FROM brand_stores) + AND dp.brand_name != $1 + AND dp.brand_name IS NOT NULL + AND dp.created_at >= NOW() - INTERVAL '30 days' + GROUP BY dp.brand_name, dp.dispensary_id, d.name + HAVING COUNT(*) >= 5 + ) + SELECT + competitor, + store_id, + store_name, + sku_count, + latest_add + FROM competitor_presence + ORDER BY sku_count DESC + LIMIT 10 + `, [brandName]); + + return result.rows.map(row => { + const skuCount = parseInt(row.sku_count) || 0; + let severity: 'low' | 'medium' | 'high' = 'low'; + if (skuCount >= 20) severity = 'high'; + else if (skuCount >= 10) severity = 'medium'; + + return { + competitorBrand: row.competitor, + storeId: row.store_id, + storeName: row.store_name, + alertType: 'expanding' as const, + details: `${row.competitor} has ${skuCount} SKUs in ${row.store_name}`, + severity, + date: new Date(row.latest_add).toISOString().split('T')[0], + }; + }); + } + + /** + * Get market position summary for a brand + */ + async getMarketPositionSummary(brandName: string): Promise { + const key = cacheKey('market_position', { brandName }); + + return (await this.cache.getOrCompute(key, async () => { + const [shareResult, priceResult, categoryResult, threatResult] = await Promise.all([ + // Market share + this.pool.query(` + SELECT + (SELECT COUNT(*) FROM dutchie_products WHERE brand_name = $1) as brand_count, + (SELECT COUNT(*) FROM dutchie_products) as total_count + `, [brandName]), + + // Price vs market + this.pool.query(` + SELECT + (SELECT AVG(extract_min_price(latest_raw_payload)) FROM dutchie_products WHERE brand_name = $1) as brand_avg, + (SELECT AVG(extract_min_price(latest_raw_payload)) FROM dutchie_products WHERE brand_name != $1) as market_avg + `, [brandName]), + + // Category strengths/weaknesses + this.pool.query(` + WITH brand_by_cat AS ( + SELECT type as category, COUNT(*) as brand_count + FROM dutchie_products + WHERE brand_name = $1 AND type IS NOT NULL + GROUP BY type + ), + market_by_cat AS ( + SELECT type as category, COUNT(*) as total_count + FROM dutchie_products WHERE type IS NOT NULL + GROUP BY type + ), + leaders AS ( + SELECT type as category, brand_name, COUNT(*) as cnt, + RANK() OVER (PARTITION BY type ORDER BY COUNT(*) DESC) as rnk + FROM dutchie_products WHERE type IS NOT NULL AND brand_name IS NOT NULL + GROUP BY type, brand_name + ) + SELECT + mc.category, + COALESCE(bc.brand_count, 0) as brand_count, + mc.total_count, + ROUND((COALESCE(bc.brand_count, 0)::NUMERIC / mc.total_count) * 100, 2) as share_pct, + (SELECT brand_name FROM leaders WHERE category = mc.category AND rnk = 1) as leader + FROM market_by_cat mc + LEFT JOIN brand_by_cat bc ON mc.category = bc.category + ORDER BY share_pct DESC + `, [brandName]), + + // Top competitors + this.pool.query(` + SELECT brand_name, COUNT(*) as cnt + FROM dutchie_products + WHERE brand_name IS NOT NULL AND brand_name != $1 + GROUP BY brand_name + ORDER BY cnt DESC + LIMIT 5 + `, [brandName]), + ]); + + const brandCount = parseInt(shareResult.rows[0]?.brand_count) || 0; + const totalCount = parseInt(shareResult.rows[0]?.total_count) || 1; + const marketSharePercent = Math.round((brandCount / totalCount) * 1000) / 10; + + const brandAvg = parseFloat(priceResult.rows[0]?.brand_avg) || 0; + const marketAvg = parseFloat(priceResult.rows[0]?.market_avg) || 1; + const avgPriceVsMarket = Math.round(((brandAvg - marketAvg) / marketAvg) * 1000) / 10; + + const categories = categoryResult.rows; + const strengths = categories + .filter(c => parseFloat(c.share_pct) > 5) + .map(c => ({ category: c.category, shelfSharePercent: parseFloat(c.share_pct) })); + + const weaknesses = categories + .filter(c => parseFloat(c.share_pct) < 2 && c.leader !== brandName) + .map(c => ({ + category: c.category, + shelfSharePercent: parseFloat(c.share_pct), + marketLeader: c.leader || 'Unknown', + })); + + return { + brandName, + marketSharePercent, + avgPriceVsMarket, + categoryStrengths: strengths.slice(0, 5), + categoryWeaknesses: weaknesses.slice(0, 5), + growthTrend: 'stable' as const, // Would need historical data + competitorThreats: threatResult.rows.map(r => r.brand_name), + }; + }, 30)).data; + } + + /** + * Create an analytics alert + */ + async createAlert(alert: { + alertType: string; + severity: 'info' | 'warning' | 'critical'; + title: string; + description?: string; + storeId?: number; + brandName?: string; + productId?: number; + category?: string; + metadata?: Record; + }): Promise { + await this.pool.query(` + INSERT INTO analytics_alerts + (alert_type, severity, title, description, store_id, brand_name, product_id, category, metadata) + VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9) + `, [ + alert.alertType, + alert.severity, + alert.title, + alert.description || null, + alert.storeId || null, + alert.brandName || null, + alert.productId || null, + alert.category || null, + alert.metadata ? JSON.stringify(alert.metadata) : null, + ]); + } + + /** + * Get recent alerts + */ + async getAlerts(filters: { + brandName?: string; + storeId?: number; + alertType?: string; + unreadOnly?: boolean; + limit?: number; + } = {}): Promise> { + const { brandName, storeId, alertType, unreadOnly = false, limit = 50 } = filters; + const params: (string | number | boolean)[] = [limit]; + const conditions: string[] = []; + let paramIndex = 2; + + if (brandName) { + conditions.push(`a.brand_name = $${paramIndex++}`); + params.push(brandName); + } + if (storeId) { + conditions.push(`a.store_id = $${paramIndex++}`); + params.push(storeId); + } + if (alertType) { + conditions.push(`a.alert_type = $${paramIndex++}`); + params.push(alertType); + } + if (unreadOnly) { + conditions.push('a.is_read = false'); + } + + const whereClause = conditions.length > 0 + ? 'WHERE ' + conditions.join(' AND ') + : ''; + + const result = await this.pool.query(` + SELECT + a.id, + a.alert_type, + a.severity, + a.title, + a.description, + d.name as store_name, + a.brand_name, + a.created_at, + a.is_read + FROM analytics_alerts a + LEFT JOIN dispensaries d ON a.store_id = d.id + ${whereClause} + ORDER BY a.created_at DESC + LIMIT $1 + `, params); + + return result.rows.map(row => ({ + id: row.id, + alertType: row.alert_type, + severity: row.severity, + title: row.title, + description: row.description, + storeName: row.store_name, + brandName: row.brand_name, + createdAt: row.created_at.toISOString(), + isRead: row.is_read, + })); + } + + /** + * Mark alerts as read + */ + async markAlertsRead(alertIds: number[]): Promise { + if (alertIds.length === 0) return; + + await this.pool.query(` + UPDATE analytics_alerts + SET is_read = true + WHERE id = ANY($1) + `, [alertIds]); + } +} diff --git a/backend/src/dutchie-az/services/analytics/cache.ts b/backend/src/dutchie-az/services/analytics/cache.ts new file mode 100644 index 00000000..75d15a03 --- /dev/null +++ b/backend/src/dutchie-az/services/analytics/cache.ts @@ -0,0 +1,227 @@ +/** + * Analytics Cache Service + * + * Provides caching layer for expensive analytics queries. + * Uses PostgreSQL for persistence with configurable TTLs. + * + * Phase 3: Analytics Dashboards + */ + +import { Pool } from 'pg'; + +export interface CacheEntry { + key: string; + data: T; + computedAt: Date; + expiresAt: Date; + queryTimeMs?: number; +} + +export interface CacheConfig { + defaultTtlMinutes: number; +} + +const DEFAULT_CONFIG: CacheConfig = { + defaultTtlMinutes: 15, +}; + +export class AnalyticsCache { + private pool: Pool; + private config: CacheConfig; + private memoryCache: Map = new Map(); + + constructor(pool: Pool, config: Partial = {}) { + this.pool = pool; + this.config = { ...DEFAULT_CONFIG, ...config }; + } + + /** + * Get cached data or compute and cache it + */ + async getOrCompute( + key: string, + computeFn: () => Promise, + ttlMinutes?: number + ): Promise<{ data: T; fromCache: boolean; queryTimeMs: number }> { + const ttl = ttlMinutes ?? this.config.defaultTtlMinutes; + + // Check memory cache first + const memEntry = this.memoryCache.get(key); + if (memEntry && new Date() < memEntry.expiresAt) { + return { data: memEntry.data as T, fromCache: true, queryTimeMs: memEntry.queryTimeMs || 0 }; + } + + // Check database cache + const dbEntry = await this.getFromDb(key); + if (dbEntry && new Date() < dbEntry.expiresAt) { + this.memoryCache.set(key, dbEntry); + return { data: dbEntry.data, fromCache: true, queryTimeMs: dbEntry.queryTimeMs || 0 }; + } + + // Compute fresh data + const startTime = Date.now(); + const data = await computeFn(); + const queryTimeMs = Date.now() - startTime; + + // Cache result + const entry: CacheEntry = { + key, + data, + computedAt: new Date(), + expiresAt: new Date(Date.now() + ttl * 60 * 1000), + queryTimeMs, + }; + + await this.saveToDb(entry); + this.memoryCache.set(key, entry); + + return { data, fromCache: false, queryTimeMs }; + } + + /** + * Get from database cache + */ + private async getFromDb(key: string): Promise | null> { + try { + const result = await this.pool.query(` + SELECT cache_data, computed_at, expires_at, query_time_ms + FROM analytics_cache + WHERE cache_key = $1 + AND expires_at > NOW() + `, [key]); + + if (result.rows.length === 0) return null; + + const row = result.rows[0]; + return { + key, + data: row.cache_data as T, + computedAt: row.computed_at, + expiresAt: row.expires_at, + queryTimeMs: row.query_time_ms, + }; + } catch (error) { + console.warn(`[AnalyticsCache] Failed to get from DB: ${error}`); + return null; + } + } + + /** + * Save to database cache + */ + private async saveToDb(entry: CacheEntry): Promise { + try { + await this.pool.query(` + INSERT INTO analytics_cache (cache_key, cache_data, computed_at, expires_at, query_time_ms) + VALUES ($1, $2, $3, $4, $5) + ON CONFLICT (cache_key) + DO UPDATE SET + cache_data = EXCLUDED.cache_data, + computed_at = EXCLUDED.computed_at, + expires_at = EXCLUDED.expires_at, + query_time_ms = EXCLUDED.query_time_ms + `, [entry.key, JSON.stringify(entry.data), entry.computedAt, entry.expiresAt, entry.queryTimeMs]); + } catch (error) { + console.warn(`[AnalyticsCache] Failed to save to DB: ${error}`); + } + } + + /** + * Invalidate a cache entry + */ + async invalidate(key: string): Promise { + this.memoryCache.delete(key); + try { + await this.pool.query('DELETE FROM analytics_cache WHERE cache_key = $1', [key]); + } catch (error) { + console.warn(`[AnalyticsCache] Failed to invalidate: ${error}`); + } + } + + /** + * Invalidate all entries matching a pattern + */ + async invalidatePattern(pattern: string): Promise { + // Clear memory cache + for (const key of this.memoryCache.keys()) { + if (key.includes(pattern)) { + this.memoryCache.delete(key); + } + } + + try { + const result = await this.pool.query( + 'DELETE FROM analytics_cache WHERE cache_key LIKE $1', + [`%${pattern}%`] + ); + return result.rowCount || 0; + } catch (error) { + console.warn(`[AnalyticsCache] Failed to invalidate pattern: ${error}`); + return 0; + } + } + + /** + * Clean expired entries + */ + async cleanExpired(): Promise { + // Clean memory cache + const now = new Date(); + for (const [key, entry] of this.memoryCache.entries()) { + if (now >= entry.expiresAt) { + this.memoryCache.delete(key); + } + } + + try { + const result = await this.pool.query('DELETE FROM analytics_cache WHERE expires_at < NOW()'); + return result.rowCount || 0; + } catch (error) { + console.warn(`[AnalyticsCache] Failed to clean expired: ${error}`); + return 0; + } + } + + /** + * Get cache statistics + */ + async getStats(): Promise<{ + memoryCacheSize: number; + dbCacheSize: number; + expiredCount: number; + }> { + try { + const result = await this.pool.query(` + SELECT + COUNT(*) FILTER (WHERE expires_at > NOW()) as active, + COUNT(*) FILTER (WHERE expires_at <= NOW()) as expired + FROM analytics_cache + `); + + return { + memoryCacheSize: this.memoryCache.size, + dbCacheSize: parseInt(result.rows[0]?.active || '0'), + expiredCount: parseInt(result.rows[0]?.expired || '0'), + }; + } catch (error) { + return { + memoryCacheSize: this.memoryCache.size, + dbCacheSize: 0, + expiredCount: 0, + }; + } + } +} + +/** + * Generate cache key with parameters + */ +export function cacheKey(prefix: string, params: Record = {}): string { + const sortedParams = Object.keys(params) + .sort() + .filter(k => params[k] !== undefined && params[k] !== null) + .map(k => `${k}=${params[k]}`) + .join('&'); + + return sortedParams ? `${prefix}:${sortedParams}` : prefix; +} diff --git a/backend/src/dutchie-az/services/analytics/category-analytics.ts b/backend/src/dutchie-az/services/analytics/category-analytics.ts new file mode 100644 index 00000000..429bae8d --- /dev/null +++ b/backend/src/dutchie-az/services/analytics/category-analytics.ts @@ -0,0 +1,530 @@ +/** + * Category Growth Analytics Service + * + * Provides category-level analytics including: + * - SKU count growth + * - Price growth trends + * - New product additions + * - Category shrinkage + * - Seasonality patterns + * + * Phase 3: Analytics Dashboards + */ + +import { Pool } from 'pg'; +import { AnalyticsCache, cacheKey } from './cache'; + +export interface CategoryGrowth { + category: string; + currentSkuCount: number; + previousSkuCount: number; + skuGrowthPercent: number; + currentBrandCount: number; + previousBrandCount: number; + brandGrowthPercent: number; + currentAvgPrice: number | null; + previousAvgPrice: number | null; + priceChangePercent: number | null; + newProducts: number; + discontinuedProducts: number; + trend: 'growing' | 'declining' | 'stable'; +} + +export interface CategorySummary { + category: string; + totalSkus: number; + brandCount: number; + storeCount: number; + avgPrice: number | null; + minPrice: number | null; + maxPrice: number | null; + inStockSkus: number; + outOfStockSkus: number; + stockHealthPercent: number; +} + +export interface CategoryGrowthTrend { + category: string; + dataPoints: Array<{ + date: string; + skuCount: number; + brandCount: number; + avgPrice: number | null; + storeCount: number; + }>; + growth7d: number | null; + growth30d: number | null; + growth90d: number | null; +} + +export interface CategoryHeatmapData { + categories: string[]; + periods: string[]; + data: Array<{ + category: string; + period: string; + value: number; // SKU count, growth %, or price + changeFromPrevious: number | null; + }>; +} + +export interface SeasonalityPattern { + category: string; + monthlyPattern: Array<{ + month: number; + monthName: string; + avgSkuCount: number; + avgPrice: number | null; + seasonalityIndex: number; // 100 = average, >100 = above, <100 = below + }>; + peakMonth: number; + troughMonth: number; +} + +export interface CategoryFilters { + state?: string; + storeId?: number; + minSkus?: number; +} + +export class CategoryAnalyticsService { + private pool: Pool; + private cache: AnalyticsCache; + + constructor(pool: Pool, cache: AnalyticsCache) { + this.pool = pool; + this.cache = cache; + } + + /** + * Get current category summary + */ + async getCategorySummary( + category?: string, + filters: CategoryFilters = {} + ): Promise { + const { state, storeId } = filters; + const key = cacheKey('category_summary', { category, state, storeId }); + + return (await this.cache.getOrCompute(key, async () => { + const params: (string | number)[] = []; + const conditions: string[] = []; + let paramIndex = 1; + + if (category) { + conditions.push(`dp.type = $${paramIndex++}`); + params.push(category); + } + if (state) { + conditions.push(`d.state = $${paramIndex++}`); + params.push(state); + } + if (storeId) { + conditions.push(`dp.dispensary_id = $${paramIndex++}`); + params.push(storeId); + } + + const whereClause = conditions.length > 0 + ? 'WHERE dp.type IS NOT NULL AND ' + conditions.join(' AND ') + : 'WHERE dp.type IS NOT NULL'; + + const result = await this.pool.query(` + SELECT + dp.type as category, + COUNT(*) as total_skus, + COUNT(DISTINCT dp.brand_name) as brand_count, + COUNT(DISTINCT dp.dispensary_id) as store_count, + AVG(extract_min_price(dp.latest_raw_payload)) as avg_price, + MIN(extract_min_price(dp.latest_raw_payload)) as min_price, + MAX(extract_max_price(dp.latest_raw_payload)) as max_price, + SUM(CASE WHEN dp.stock_status = 'in_stock' THEN 1 ELSE 0 END) as in_stock, + SUM(CASE WHEN dp.stock_status != 'in_stock' OR dp.stock_status IS NULL THEN 1 ELSE 0 END) as out_of_stock + FROM dutchie_products dp + JOIN dispensaries d ON dp.dispensary_id = d.id + ${whereClause} + GROUP BY dp.type + ORDER BY total_skus DESC + `, params); + + return result.rows.map(row => { + const totalSkus = parseInt(row.total_skus) || 0; + const inStock = parseInt(row.in_stock) || 0; + + return { + category: row.category, + totalSkus, + brandCount: parseInt(row.brand_count) || 0, + storeCount: parseInt(row.store_count) || 0, + avgPrice: row.avg_price ? Math.round(parseFloat(row.avg_price) * 100) / 100 : null, + minPrice: row.min_price ? Math.round(parseFloat(row.min_price) * 100) / 100 : null, + maxPrice: row.max_price ? Math.round(parseFloat(row.max_price) * 100) / 100 : null, + inStockSkus: inStock, + outOfStockSkus: parseInt(row.out_of_stock) || 0, + stockHealthPercent: totalSkus > 0 + ? Math.round((inStock / totalSkus) * 100) + : 0, + }; + }); + }, 15)).data; + } + + /** + * Get category growth (comparing periods) + */ + async getCategoryGrowth( + days: number = 7, + filters: CategoryFilters = {} + ): Promise { + const { state, storeId, minSkus = 10 } = filters; + const key = cacheKey('category_growth', { days, state, storeId, minSkus }); + + return (await this.cache.getOrCompute(key, async () => { + // Use category_snapshots for historical comparison + const result = await this.pool.query(` + WITH current_data AS ( + SELECT + category, + total_skus, + brand_count, + avg_price, + store_count + FROM category_snapshots + WHERE snapshot_date = (SELECT MAX(snapshot_date) FROM category_snapshots) + ), + previous_data AS ( + SELECT + category, + total_skus, + brand_count, + avg_price, + store_count + FROM category_snapshots + WHERE snapshot_date = ( + SELECT MAX(snapshot_date) + FROM category_snapshots + WHERE snapshot_date < (SELECT MAX(snapshot_date) FROM category_snapshots) - ($1 || ' days')::INTERVAL + ) + ) + SELECT + c.category, + c.total_skus as current_skus, + COALESCE(p.total_skus, c.total_skus) as previous_skus, + c.brand_count as current_brands, + COALESCE(p.brand_count, c.brand_count) as previous_brands, + c.avg_price as current_price, + p.avg_price as previous_price + FROM current_data c + LEFT JOIN previous_data p ON c.category = p.category + WHERE c.total_skus >= $2 + ORDER BY c.total_skus DESC + `, [days, minSkus]); + + // If no snapshots exist, use current data + if (result.rows.length === 0) { + const fallbackResult = await this.pool.query(` + SELECT + type as category, + COUNT(*) as total_skus, + COUNT(DISTINCT brand_name) as brand_count, + AVG(extract_min_price(latest_raw_payload)) as avg_price + FROM dutchie_products + WHERE type IS NOT NULL + GROUP BY type + HAVING COUNT(*) >= $1 + ORDER BY total_skus DESC + `, [minSkus]); + + return fallbackResult.rows.map(row => ({ + category: row.category, + currentSkuCount: parseInt(row.total_skus) || 0, + previousSkuCount: parseInt(row.total_skus) || 0, + skuGrowthPercent: 0, + currentBrandCount: parseInt(row.brand_count) || 0, + previousBrandCount: parseInt(row.brand_count) || 0, + brandGrowthPercent: 0, + currentAvgPrice: row.avg_price ? Math.round(parseFloat(row.avg_price) * 100) / 100 : null, + previousAvgPrice: row.avg_price ? Math.round(parseFloat(row.avg_price) * 100) / 100 : null, + priceChangePercent: null, + newProducts: 0, + discontinuedProducts: 0, + trend: 'stable' as const, + })); + } + + return result.rows.map(row => { + const currentSkus = parseInt(row.current_skus) || 0; + const previousSkus = parseInt(row.previous_skus) || currentSkus; + const currentBrands = parseInt(row.current_brands) || 0; + const previousBrands = parseInt(row.previous_brands) || currentBrands; + const currentPrice = row.current_price ? parseFloat(row.current_price) : null; + const previousPrice = row.previous_price ? parseFloat(row.previous_price) : null; + + const skuGrowth = previousSkus > 0 + ? ((currentSkus - previousSkus) / previousSkus) * 100 + : 0; + const brandGrowth = previousBrands > 0 + ? ((currentBrands - previousBrands) / previousBrands) * 100 + : 0; + const priceChange = previousPrice && currentPrice + ? ((currentPrice - previousPrice) / previousPrice) * 100 + : null; + + let trend: 'growing' | 'declining' | 'stable' = 'stable'; + if (skuGrowth > 5) trend = 'growing'; + else if (skuGrowth < -5) trend = 'declining'; + + return { + category: row.category, + currentSkuCount: currentSkus, + previousSkuCount: previousSkus, + skuGrowthPercent: Math.round(skuGrowth * 10) / 10, + currentBrandCount: currentBrands, + previousBrandCount: previousBrands, + brandGrowthPercent: Math.round(brandGrowth * 10) / 10, + currentAvgPrice: currentPrice ? Math.round(currentPrice * 100) / 100 : null, + previousAvgPrice: previousPrice ? Math.round(previousPrice * 100) / 100 : null, + priceChangePercent: priceChange !== null ? Math.round(priceChange * 10) / 10 : null, + newProducts: Math.max(0, currentSkus - previousSkus), + discontinuedProducts: Math.max(0, previousSkus - currentSkus), + trend, + }; + }); + }, 15)).data; + } + + /** + * Get category growth trend over time + */ + async getCategoryGrowthTrend( + category: string, + days: number = 90 + ): Promise { + const key = cacheKey('category_growth_trend', { category, days }); + + return (await this.cache.getOrCompute(key, async () => { + const result = await this.pool.query(` + SELECT + snapshot_date as date, + total_skus as sku_count, + brand_count, + avg_price, + store_count + FROM category_snapshots + WHERE category = $1 + AND snapshot_date >= CURRENT_DATE - ($2 || ' days')::INTERVAL + ORDER BY snapshot_date + `, [category, days]); + + const dataPoints = result.rows.map(row => ({ + date: row.date.toISOString().split('T')[0], + skuCount: parseInt(row.sku_count) || 0, + brandCount: parseInt(row.brand_count) || 0, + avgPrice: row.avg_price ? Math.round(parseFloat(row.avg_price) * 100) / 100 : null, + storeCount: parseInt(row.store_count) || 0, + })); + + // Calculate growth rates + const calculateGrowth = (daysBack: number): number | null => { + if (dataPoints.length < 2) return null; + const targetDate = new Date(); + targetDate.setDate(targetDate.getDate() - daysBack); + const targetDateStr = targetDate.toISOString().split('T')[0]; + + const recent = dataPoints[dataPoints.length - 1]; + const older = dataPoints.find(d => d.date <= targetDateStr) || dataPoints[0]; + + if (older.skuCount === 0) return null; + return Math.round(((recent.skuCount - older.skuCount) / older.skuCount) * 1000) / 10; + }; + + return { + category, + dataPoints, + growth7d: calculateGrowth(7), + growth30d: calculateGrowth(30), + growth90d: calculateGrowth(90), + }; + }, 15)).data; + } + + /** + * Get category heatmap data + */ + async getCategoryHeatmap( + metric: 'skus' | 'growth' | 'price' = 'skus', + periods: number = 12 // weeks + ): Promise { + const key = cacheKey('category_heatmap', { metric, periods }); + + return (await this.cache.getOrCompute(key, async () => { + const result = await this.pool.query(` + SELECT + category, + snapshot_date, + total_skus, + avg_price + FROM category_snapshots + WHERE snapshot_date >= CURRENT_DATE - ($1 * 7 || ' days')::INTERVAL + ORDER BY category, snapshot_date + `, [periods]); + + // Get unique categories and generate weekly periods + const categoriesSet = new Set(); + const periodsSet = new Set(); + + result.rows.forEach(row => { + categoriesSet.add(row.category); + // Group by week + const date = new Date(row.snapshot_date); + const weekStart = new Date(date); + weekStart.setDate(date.getDate() - date.getDay()); + periodsSet.add(weekStart.toISOString().split('T')[0]); + }); + + const categories = Array.from(categoriesSet).sort(); + const periodsList = Array.from(periodsSet).sort(); + + // Aggregate data by category and week + const dataMap = new Map>(); + + result.rows.forEach(row => { + const date = new Date(row.snapshot_date); + const weekStart = new Date(date); + weekStart.setDate(date.getDate() - date.getDay()); + const period = weekStart.toISOString().split('T')[0]; + + if (!dataMap.has(row.category)) { + dataMap.set(row.category, new Map()); + } + const categoryData = dataMap.get(row.category)!; + + if (!categoryData.has(period)) { + categoryData.set(period, { skus: 0, price: null }); + } + const existing = categoryData.get(period)!; + existing.skus = Math.max(existing.skus, parseInt(row.total_skus) || 0); + if (row.avg_price) { + existing.price = parseFloat(row.avg_price); + } + }); + + // Build heatmap data + const data: CategoryHeatmapData['data'] = []; + + categories.forEach(category => { + let previousValue: number | null = null; + + periodsList.forEach(period => { + const categoryData = dataMap.get(category)?.get(period); + let value = 0; + + if (categoryData) { + switch (metric) { + case 'skus': + value = categoryData.skus; + break; + case 'price': + value = categoryData.price || 0; + break; + case 'growth': + value = previousValue !== null && previousValue > 0 + ? ((categoryData.skus - previousValue) / previousValue) * 100 + : 0; + break; + } + } + + const changeFromPrevious = previousValue !== null && previousValue > 0 + ? ((value - previousValue) / previousValue) * 100 + : null; + + data.push({ + category, + period, + value: Math.round(value * 100) / 100, + changeFromPrevious: changeFromPrevious !== null + ? Math.round(changeFromPrevious * 10) / 10 + : null, + }); + + if (metric !== 'growth') { + previousValue = value; + } else if (categoryData) { + previousValue = categoryData.skus; + } + }); + }); + + return { + categories, + periods: periodsList, + data, + }; + }, 30)).data; + } + + /** + * Get top growing/declining categories + */ + async getTopMovers( + limit: number = 5, + days: number = 30 + ): Promise<{ + growing: CategoryGrowth[]; + declining: CategoryGrowth[]; + }> { + const key = cacheKey('top_movers', { limit, days }); + + return (await this.cache.getOrCompute(key, async () => { + const allGrowth = await this.getCategoryGrowth(days); + + const sorted = [...allGrowth].sort((a, b) => b.skuGrowthPercent - a.skuGrowthPercent); + + return { + growing: sorted.filter(c => c.skuGrowthPercent > 0).slice(0, limit), + declining: sorted.filter(c => c.skuGrowthPercent < 0).slice(-limit).reverse(), + }; + }, 15)).data; + } + + /** + * Get category subcategory breakdown + */ + async getSubcategoryBreakdown(category: string): Promise> { + const key = cacheKey('subcategory_breakdown', { category }); + + return (await this.cache.getOrCompute(key, async () => { + const result = await this.pool.query(` + WITH category_total AS ( + SELECT COUNT(*) as total FROM dutchie_products WHERE type = $1 + ) + SELECT + COALESCE(dp.subcategory, 'Other') as subcategory, + COUNT(*) as sku_count, + COUNT(DISTINCT dp.brand_name) as brand_count, + AVG(extract_min_price(dp.latest_raw_payload)) as avg_price, + ct.total as category_total + FROM dutchie_products dp, category_total ct + WHERE dp.type = $1 + GROUP BY dp.subcategory, ct.total + ORDER BY sku_count DESC + `, [category]); + + return result.rows.map(row => ({ + subcategory: row.subcategory, + skuCount: parseInt(row.sku_count) || 0, + brandCount: parseInt(row.brand_count) || 0, + avgPrice: row.avg_price ? Math.round(parseFloat(row.avg_price) * 100) / 100 : null, + percentOfCategory: parseInt(row.category_total) > 0 + ? Math.round((parseInt(row.sku_count) / parseInt(row.category_total)) * 1000) / 10 + : 0, + })); + }, 15)).data; + } +} diff --git a/backend/src/dutchie-az/services/analytics/index.ts b/backend/src/dutchie-az/services/analytics/index.ts new file mode 100644 index 00000000..ac73bed0 --- /dev/null +++ b/backend/src/dutchie-az/services/analytics/index.ts @@ -0,0 +1,57 @@ +/** + * Analytics Module Index + * + * Exports all analytics services for CannaiQ dashboards. + * + * Phase 3: Analytics Dashboards + */ + +export { AnalyticsCache, cacheKey, type CacheEntry, type CacheConfig } from './cache'; + +export { + PriceTrendService, + type PricePoint, + type PriceTrend, + type PriceSummary, + type PriceCompressionResult, + type PriceFilters, +} from './price-trends'; + +export { + PenetrationService, + type BrandPenetration, + type PenetrationTrend, + type ShelfShare, + type BrandPresenceByState, + type PenetrationFilters, +} from './penetration'; + +export { + CategoryAnalyticsService, + type CategoryGrowth, + type CategorySummary, + type CategoryGrowthTrend, + type CategoryHeatmapData, + type SeasonalityPattern, + type CategoryFilters, +} from './category-analytics'; + +export { + StoreChangeService, + type StoreChangeSummary, + type StoreChangeEvent, + type BrandChange, + type ProductChange, + type CategoryLeaderboard, + type StoreFilters, +} from './store-changes'; + +export { + BrandOpportunityService, + type BrandOpportunity, + type PricePosition, + type MissingSkuOpportunity, + type StoreShelfShareChange, + type CompetitorAlert, + type MarketPositionSummary, +} from './brand-opportunity'; diff --git a/backend/src/dutchie-az/services/analytics/penetration.ts b/backend/src/dutchie-az/services/analytics/penetration.ts new file mode 100644 index 00000000..92baad60 --- /dev/null +++ b/backend/src/dutchie-az/services/analytics/penetration.ts @@ -0,0 +1,556 @@ +/** + * Brand Penetration Analytics Service + * + * Provides analytics for brand market penetration including: + * - Stores carrying brand + * - SKU counts per brand + * - Percentage of stores carrying + * - Shelf share calculations + * - Penetration trends and momentum + * + * Phase 3: Analytics Dashboards + */ + +import { Pool } from 'pg'; +import { AnalyticsCache, cacheKey } from './cache'; + +export interface BrandPenetration { + brandName: string; + brandId: string | null; + totalStores: number; + storesCarrying: number; + penetrationPercent: number; + totalSkus: number; + avgSkusPerStore: number; + shelfSharePercent: number; + categories: string[]; + avgPrice: number | null; + inStockSkus: number; +} + +export interface PenetrationTrend { + brandName: string; + dataPoints: Array<{ + date: string; + storeCount: number; + skuCount: number; + penetrationPercent: number; + }>; + momentumScore: number; // -100 to +100 + riskScore: number; // 0 to 100, higher = more risk + trend: 'growing' | 'declining' | 'stable'; +} + +export interface ShelfShare { + brandName: string; + category: string; + skuCount: number; + categoryTotalSkus: number; + shelfSharePercent: number; + rank: number; +} + +export interface BrandPresenceByState { + state: string; + storeCount: number; + skuCount: number; + avgPrice: number | null; +} + +export interface PenetrationFilters { + state?: string; + category?: string; + minStores?: number; + minSkus?: number; +} + +export class PenetrationService { + private pool: Pool; + private cache: AnalyticsCache; + + constructor(pool: Pool, cache: AnalyticsCache) { + this.pool = pool; + this.cache = cache; + } + + /** + * Get penetration data for a specific brand + */ + async getBrandPenetration( + brandName: string, + filters: PenetrationFilters = {} + ): Promise { + const { state, category } = filters; + const key = cacheKey('brand_penetration', { brandName, state, category }); + + return (await this.cache.getOrCompute(key, async () => { + // Build where clauses + const conditions: string[] = []; + const params: (string | number)[] = [brandName]; + let paramIndex = 2; + + if (state) { + conditions.push(`d.state = $${paramIndex++}`); + params.push(state); + } + if (category) { + conditions.push(`dp.type = $${paramIndex++}`); + params.push(category); + } + + const stateCondition = state ? `AND d.state = $${params.indexOf(state) + 1}` : ''; + const categoryCondition = category ? `AND dp.type = $${params.indexOf(category) + 1}` : ''; + + const result = await this.pool.query(` + WITH total_stores AS ( + SELECT COUNT(DISTINCT id) as total + FROM dispensaries + WHERE 1=1 ${state ? `AND state = $2` : ''} + ), + brand_data AS ( + SELECT + dp.brand_name, + dp.brand_id, + COUNT(DISTINCT dp.dispensary_id) as stores_carrying, + COUNT(*) as total_skus, + AVG(extract_min_price(dp.latest_raw_payload)) as avg_price, + SUM(CASE WHEN dp.stock_status = 'in_stock' THEN 1 ELSE 0 END) as in_stock, + ARRAY_AGG(DISTINCT dp.type) FILTER (WHERE dp.type IS NOT NULL) as categories + FROM dutchie_products dp + JOIN dispensaries d ON dp.dispensary_id = d.id + WHERE dp.brand_name = $1 + ${stateCondition} + ${categoryCondition} + GROUP BY dp.brand_name, dp.brand_id + ), + total_skus AS ( + SELECT COUNT(*) as total + FROM dutchie_products dp + JOIN dispensaries d ON dp.dispensary_id = d.id + WHERE 1=1 ${stateCondition} ${categoryCondition} + ) + SELECT + bd.brand_name, + bd.brand_id, + ts.total as total_stores, + bd.stores_carrying, + bd.total_skus, + bd.avg_price, + bd.in_stock, + bd.categories, + tsk.total as market_total_skus + FROM brand_data bd, total_stores ts, total_skus tsk + `, params); + + if (result.rows.length === 0) { + return { + brandName, + brandId: null, + totalStores: 0, + storesCarrying: 0, + penetrationPercent: 0, + totalSkus: 0, + avgSkusPerStore: 0, + shelfSharePercent: 0, + categories: [], + avgPrice: null, + inStockSkus: 0, + }; + } + + const row = result.rows[0]; + const totalStores = parseInt(row.total_stores) || 1; + const storesCarrying = parseInt(row.stores_carrying) || 0; + const totalSkus = parseInt(row.total_skus) || 0; + const marketTotalSkus = parseInt(row.market_total_skus) || 1; + + return { + brandName: row.brand_name, + brandId: row.brand_id, + totalStores, + storesCarrying, + penetrationPercent: Math.round((storesCarrying / totalStores) * 1000) / 10, + totalSkus, + avgSkusPerStore: storesCarrying > 0 + ? Math.round((totalSkus / storesCarrying) * 10) / 10 + : 0, + shelfSharePercent: Math.round((totalSkus / marketTotalSkus) * 1000) / 10, + categories: row.categories || [], + avgPrice: row.avg_price ? Math.round(parseFloat(row.avg_price) * 100) / 100 : null, + inStockSkus: parseInt(row.in_stock) || 0, + }; + }, 15)).data; + } + + /** + * Get top brands by penetration + */ + async getTopBrandsByPenetration( + limit: number = 20, + filters: PenetrationFilters = {} + ): Promise { + const { state, category, minStores = 2, minSkus = 5 } = filters; + const key = cacheKey('top_brands_penetration', { limit, state, category, minStores, minSkus }); + + return (await this.cache.getOrCompute(key, async () => { + const params: (string | number)[] = [limit, minStores, minSkus]; + let paramIndex = 4; + + let stateCondition = ''; + let categoryCondition = ''; + + if (state) { + stateCondition = `AND d.state = $${paramIndex++}`; + params.push(state); + } + if (category) { + categoryCondition = `AND dp.type = $${paramIndex++}`; + params.push(category); + } + + const result = await this.pool.query(` + WITH total_stores AS ( + SELECT COUNT(DISTINCT id) as total + FROM dispensaries + WHERE 1=1 ${state ? `AND state = $${params.indexOf(state) + 1}` : ''} + ), + total_skus AS ( + SELECT COUNT(*) as total + FROM dutchie_products dp + JOIN dispensaries d ON dp.dispensary_id = d.id + WHERE 1=1 ${stateCondition} ${categoryCondition} + ), + brand_data AS ( + SELECT + dp.brand_name, + dp.brand_id, + COUNT(DISTINCT dp.dispensary_id) as stores_carrying, + COUNT(*) as total_skus, + AVG(extract_min_price(dp.latest_raw_payload)) as avg_price, + SUM(CASE WHEN dp.stock_status = 'in_stock' THEN 1 ELSE 0 END) as in_stock, + ARRAY_AGG(DISTINCT dp.type) FILTER (WHERE dp.type IS NOT NULL) as categories + FROM dutchie_products dp + JOIN dispensaries d ON dp.dispensary_id = d.id + WHERE dp.brand_name IS NOT NULL + ${stateCondition} + ${categoryCondition} + GROUP BY dp.brand_name, dp.brand_id + HAVING COUNT(DISTINCT dp.dispensary_id) >= $2 + AND COUNT(*) >= $3 + ) + SELECT + bd.*, + ts.total as total_stores, + tsk.total as market_total_skus + FROM brand_data bd, total_stores ts, total_skus tsk + ORDER BY bd.stores_carrying DESC, bd.total_skus DESC + LIMIT $1 + `, params); + + return result.rows.map(row => { + const totalStores = parseInt(row.total_stores) || 1; + const storesCarrying = parseInt(row.stores_carrying) || 0; + const totalSkus = parseInt(row.total_skus) || 0; + const marketTotalSkus = parseInt(row.market_total_skus) || 1; + + return { + brandName: row.brand_name, + brandId: row.brand_id, + totalStores, + storesCarrying, + penetrationPercent: Math.round((storesCarrying / totalStores) * 1000) / 10, + totalSkus, + avgSkusPerStore: storesCarrying > 0 + ? Math.round((totalSkus / storesCarrying) * 10) / 10 + : 0, + shelfSharePercent: Math.round((totalSkus / marketTotalSkus) * 1000) / 10, + categories: row.categories || [], + avgPrice: row.avg_price ? Math.round(parseFloat(row.avg_price) * 100) / 100 : null, + inStockSkus: parseInt(row.in_stock) || 0, + }; + }); + }, 15)).data; + } + + /** + * Get penetration trend for a brand (requires historical snapshots) + */ + async getPenetrationTrend( + brandName: string, + days: number = 30 + ): Promise { + const key = cacheKey('penetration_trend', { brandName, days }); + + return (await this.cache.getOrCompute(key, async () => { + // Use brand_snapshots table for historical data + const result = await this.pool.query(` + SELECT + snapshot_date as date, + store_count, + total_skus + FROM brand_snapshots + WHERE brand_name = $1 + AND snapshot_date >= CURRENT_DATE - ($2 || ' days')::INTERVAL + ORDER BY snapshot_date + `, [brandName, days]); + + // Get total stores for penetration calculation + const totalResult = await this.pool.query( + 'SELECT COUNT(*) as total FROM dispensaries' + ); + const totalStores = parseInt(totalResult.rows[0]?.total) || 1; + + const dataPoints = result.rows.map(row => ({ + date: row.date.toISOString().split('T')[0], + storeCount: parseInt(row.store_count) || 0, + skuCount: parseInt(row.total_skus) || 0, + penetrationPercent: Math.round((parseInt(row.store_count) / totalStores) * 1000) / 10, + })); + + // Calculate momentum and risk scores + let momentumScore = 0; + let riskScore = 0; + let trend: 'growing' | 'declining' | 'stable' = 'stable'; + + if (dataPoints.length >= 2) { + const first = dataPoints[0]; + const last = dataPoints[dataPoints.length - 1]; + + // Momentum: change in store count + const storeChange = last.storeCount - first.storeCount; + const storeChangePercent = first.storeCount > 0 + ? (storeChange / first.storeCount) * 100 + : 0; + + // Momentum score: -100 to +100 + momentumScore = Math.max(-100, Math.min(100, storeChangePercent * 10)); + + // Risk score: higher if losing stores + if (storeChange < 0) { + riskScore = Math.min(100, Math.abs(storeChangePercent) * 5); + } + + // Determine trend + if (storeChangePercent > 5) trend = 'growing'; + else if (storeChangePercent < -5) trend = 'declining'; + } + + return { + brandName, + dataPoints, + momentumScore: Math.round(momentumScore), + riskScore: Math.round(riskScore), + trend, + }; + }, 15)).data; + } + + /** + * Get shelf share by category for a brand + */ + async getShelfShareByCategory(brandName: string): Promise { + const key = cacheKey('shelf_share_category', { brandName }); + + return (await this.cache.getOrCompute(key, async () => { + const result = await this.pool.query(` + WITH category_totals AS ( + SELECT + type as category, + COUNT(*) as total_skus + FROM dutchie_products + WHERE type IS NOT NULL + GROUP BY type + ), + brand_by_category AS ( + SELECT + type as category, + COUNT(*) as sku_count + FROM dutchie_products + WHERE brand_name = $1 + AND type IS NOT NULL + GROUP BY type + ), + ranked AS ( + SELECT + ct.category, + COALESCE(bc.sku_count, 0) as sku_count, + ct.total_skus, + RANK() OVER (PARTITION BY ct.category ORDER BY bc.sku_count DESC NULLS LAST) as rank + FROM category_totals ct + LEFT JOIN brand_by_category bc ON ct.category = bc.category + ) + SELECT + r.category, + r.sku_count, + r.total_skus as category_total_skus, + ROUND((r.sku_count::NUMERIC / r.total_skus) * 100, 2) as shelf_share_pct, + (SELECT COUNT(*) + 1 FROM ( + SELECT brand_name, COUNT(*) as cnt + FROM dutchie_products + WHERE type = r.category AND brand_name IS NOT NULL + GROUP BY brand_name + HAVING COUNT(*) > r.sku_count + ) t) as rank + FROM ranked r + WHERE r.sku_count > 0 + ORDER BY r.shelf_share_pct DESC + `, [brandName]); + + return result.rows.map(row => ({ + brandName, + category: row.category, + skuCount: parseInt(row.sku_count) || 0, + categoryTotalSkus: parseInt(row.category_total_skus) || 0, + shelfSharePercent: parseFloat(row.shelf_share_pct) || 0, + rank: parseInt(row.rank) || 0, + })); + }, 15)).data; + } + + /** + * Get brand presence by state/region + */ + async getBrandPresenceByState(brandName: string): Promise { + const key = cacheKey('brand_presence_state', { brandName }); + + return (await this.cache.getOrCompute(key, async () => { + const result = await this.pool.query(` + SELECT + d.state, + COUNT(DISTINCT dp.dispensary_id) as store_count, + COUNT(*) as sku_count, + AVG(extract_min_price(dp.latest_raw_payload)) as avg_price + FROM dutchie_products dp + JOIN dispensaries d ON dp.dispensary_id = d.id + WHERE dp.brand_name = $1 + GROUP BY d.state + ORDER BY store_count DESC + `, [brandName]); + + return result.rows.map(row => ({ + state: row.state, + storeCount: parseInt(row.store_count) || 0, + skuCount: parseInt(row.sku_count) || 0, + avgPrice: row.avg_price ? Math.round(parseFloat(row.avg_price) * 100) / 100 : null, + })); + }, 15)).data; + } + + /** + * Get stores carrying a brand + */ + async getStoresCarryingBrand(brandName: string): Promise> { + const key = cacheKey('stores_carrying_brand', { brandName }); + + return (await this.cache.getOrCompute(key, async () => { + const result = await this.pool.query(` + SELECT + d.id as store_id, + d.name as store_name, + d.city, + d.state, + COUNT(*) as sku_count, + AVG(extract_min_price(dp.latest_raw_payload)) as avg_price, + ARRAY_AGG(DISTINCT dp.type) FILTER (WHERE dp.type IS NOT NULL) as categories + FROM dutchie_products dp + JOIN dispensaries d ON dp.dispensary_id = d.id + WHERE dp.brand_name = $1 + GROUP BY d.id, d.name, d.city, d.state + ORDER BY sku_count DESC + `, [brandName]); + + return result.rows.map(row => ({ + storeId: row.store_id, + storeName: row.store_name, + city: row.city, + state: row.state, + skuCount: parseInt(row.sku_count) || 0, + avgPrice: row.avg_price ? Math.round(parseFloat(row.avg_price) * 100) / 100 : null, + categories: row.categories || [], + })); + }, 15)).data; + } + + /** + * Get penetration heatmap data (state-based) + */ + async getPenetrationHeatmap( + brandName?: string + ): Promise> { + const key = cacheKey('penetration_heatmap', { brandName }); + + return (await this.cache.getOrCompute(key, async () => { + if (brandName) { + const result = await this.pool.query(` + WITH state_totals AS ( + SELECT state, COUNT(*) as total_stores + FROM dispensaries + GROUP BY state + ), + brand_by_state AS ( + SELECT + d.state, + COUNT(DISTINCT dp.dispensary_id) as stores_with_brand, + COUNT(*) as total_skus + FROM dutchie_products dp + JOIN dispensaries d ON dp.dispensary_id = d.id + WHERE dp.brand_name = $1 + GROUP BY d.state + ) + SELECT + st.state, + st.total_stores, + COALESCE(bs.stores_with_brand, 0) as stores_with_brand, + ROUND(COALESCE(bs.stores_with_brand, 0)::NUMERIC / st.total_stores * 100, 1) as penetration_pct, + COALESCE(bs.total_skus, 0) as total_skus + FROM state_totals st + LEFT JOIN brand_by_state bs ON st.state = bs.state + ORDER BY penetration_pct DESC + `, [brandName]); + + return result.rows.map(row => ({ + state: row.state, + totalStores: parseInt(row.total_stores) || 0, + storesWithBrand: parseInt(row.stores_with_brand) || 0, + penetrationPercent: parseFloat(row.penetration_pct) || 0, + totalSkus: parseInt(row.total_skus) || 0, + })); + } else { + // Overall market data by state + const result = await this.pool.query(` + SELECT + d.state, + COUNT(DISTINCT d.id) as total_stores, + COUNT(DISTINCT dp.brand_name) as brand_count, + COUNT(*) as total_skus + FROM dispensaries d + LEFT JOIN dutchie_products dp ON d.id = dp.dispensary_id + GROUP BY d.state + ORDER BY total_stores DESC + `); + + return result.rows.map(row => ({ + state: row.state, + totalStores: parseInt(row.total_stores) || 0, + storesWithBrand: parseInt(row.brand_count) || 0, // Using brand count here + penetrationPercent: 100, // Full penetration for overall view + totalSkus: parseInt(row.total_skus) || 0, + })); + } + }, 30)).data; + } +} diff --git a/backend/src/dutchie-az/services/analytics/price-trends.ts b/backend/src/dutchie-az/services/analytics/price-trends.ts new file mode 100644 index 00000000..8c4e31bf --- /dev/null +++ b/backend/src/dutchie-az/services/analytics/price-trends.ts @@ -0,0 +1,534 @@ +/** + * Price Trend Analytics Service + * + * Provides time-series price analytics including: + * - Price over time for products + * - Average MSRP/Wholesale by period + * - Price volatility scoring + * - Price compression detection + * + * Phase 3: Analytics Dashboards + */ + +import { Pool } from 'pg'; +import { AnalyticsCache, cacheKey } from './cache'; + +export interface PricePoint { + date: string; + minPrice: number | null; + maxPrice: number | null; + avgPrice: number | null; + wholesalePrice: number | null; + sampleSize: number; +} + +export interface PriceTrend { + productId?: number; + storeId?: number; + brandName?: string; + category?: string; + dataPoints: PricePoint[]; + summary: { + currentAvg: number | null; + previousAvg: number | null; + changePercent: number | null; + trend: 'up' | 'down' | 'stable'; + volatilityScore: number | null; + }; +} + +export interface PriceSummary { + avg7d: number | null; + avg30d: number | null; + avg90d: number | null; + wholesaleAvg7d: number | null; + wholesaleAvg30d: number | null; + wholesaleAvg90d: number | null; + minPrice: number | null; + maxPrice: number | null; + priceRange: number | null; + volatilityScore: number | null; +} + +export interface PriceCompressionResult { + category: string; + brands: Array<{ + brandName: string; + avgPrice: number; + priceDistance: number; // distance from category mean + }>; + compressionScore: number; // 0-100, higher = more compressed + standardDeviation: number; +} + +export interface PriceFilters { + storeId?: number; + brandName?: string; + category?: string; + state?: string; + days?: number; +} + +export class PriceTrendService { + private pool: Pool; + private cache: AnalyticsCache; + + constructor(pool: Pool, cache: AnalyticsCache) { + this.pool = pool; + this.cache = cache; + } + + /** + * Get price trend for a specific product + */ + async getProductPriceTrend( + productId: number, + storeId?: number, + days: number = 30 + ): Promise { + const key = cacheKey('price_trend_product', { productId, storeId, days }); + + return (await this.cache.getOrCompute(key, async () => { + // Try to get from snapshots first + const snapshotResult = await this.pool.query(` + SELECT + DATE(crawled_at) as date, + MIN(rec_min_price_cents) / 100.0 as min_price, + MAX(rec_max_price_cents) / 100.0 as max_price, + AVG(rec_min_price_cents) / 100.0 as avg_price, + AVG(wholesale_min_price_cents) / 100.0 as wholesale_price, + COUNT(*) as sample_size + FROM dutchie_product_snapshots + WHERE dutchie_product_id = $1 + AND crawled_at >= NOW() - ($2 || ' days')::INTERVAL + ${storeId ? 'AND dispensary_id = $3' : ''} + GROUP BY DATE(crawled_at) + ORDER BY date + `, storeId ? [productId, days, storeId] : [productId, days]); + + let dataPoints: PricePoint[] = snapshotResult.rows.map(row => ({ + date: row.date.toISOString().split('T')[0], + minPrice: parseFloat(row.min_price) || null, + maxPrice: parseFloat(row.max_price) || null, + avgPrice: parseFloat(row.avg_price) || null, + wholesalePrice: parseFloat(row.wholesale_price) || null, + sampleSize: parseInt(row.sample_size), + })); + + // If no snapshots, get current price from product + if (dataPoints.length === 0) { + const productResult = await this.pool.query(` + SELECT + extract_min_price(latest_raw_payload) as min_price, + extract_max_price(latest_raw_payload) as max_price, + extract_wholesale_price(latest_raw_payload) as wholesale_price + FROM dutchie_products + WHERE id = $1 + `, [productId]); + + if (productResult.rows.length > 0) { + const row = productResult.rows[0]; + dataPoints = [{ + date: new Date().toISOString().split('T')[0], + minPrice: parseFloat(row.min_price) || null, + maxPrice: parseFloat(row.max_price) || null, + avgPrice: parseFloat(row.min_price) || null, + wholesalePrice: parseFloat(row.wholesale_price) || null, + sampleSize: 1, + }]; + } + } + + const summary = this.calculatePriceSummary(dataPoints); + + return { + productId, + storeId, + dataPoints, + summary, + }; + }, 15)).data; + } + + /** + * Get price trends by brand + */ + async getBrandPriceTrend( + brandName: string, + filters: PriceFilters = {} + ): Promise { + const { storeId, category, state, days = 30 } = filters; + const key = cacheKey('price_trend_brand', { brandName, storeId, category, state, days }); + + return (await this.cache.getOrCompute(key, async () => { + // Use current product data aggregated by date + const result = await this.pool.query(` + SELECT + DATE(dp.updated_at) as date, + MIN(extract_min_price(dp.latest_raw_payload)) as min_price, + MAX(extract_max_price(dp.latest_raw_payload)) as max_price, + AVG(extract_min_price(dp.latest_raw_payload)) as avg_price, + AVG(extract_wholesale_price(dp.latest_raw_payload)) as wholesale_price, + COUNT(*) as sample_size + FROM dutchie_products dp + JOIN dispensaries d ON dp.dispensary_id = d.id + WHERE dp.brand_name = $1 + AND dp.updated_at >= NOW() - ($2 || ' days')::INTERVAL + ${storeId ? 'AND dp.dispensary_id = $3' : ''} + ${category ? `AND dp.type = $${storeId ? 4 : 3}` : ''} + ${state ? `AND d.state = $${storeId ? (category ? 5 : 4) : (category ? 4 : 3)}` : ''} + GROUP BY DATE(dp.updated_at) + ORDER BY date + `, this.buildParams([brandName, days], { storeId, category, state })); + + const dataPoints: PricePoint[] = result.rows.map(row => ({ + date: row.date.toISOString().split('T')[0], + minPrice: parseFloat(row.min_price) || null, + maxPrice: parseFloat(row.max_price) || null, + avgPrice: parseFloat(row.avg_price) || null, + wholesalePrice: parseFloat(row.wholesale_price) || null, + sampleSize: parseInt(row.sample_size), + })); + + return { + brandName, + storeId, + category, + dataPoints, + summary: this.calculatePriceSummary(dataPoints), + }; + }, 15)).data; + } + + /** + * Get price trends by category + */ + async getCategoryPriceTrend( + category: string, + filters: PriceFilters = {} + ): Promise { + const { storeId, brandName, state, days = 30 } = filters; + const key = cacheKey('price_trend_category', { category, storeId, brandName, state, days }); + + return (await this.cache.getOrCompute(key, async () => { + const result = await this.pool.query(` + SELECT + DATE(dp.updated_at) as date, + MIN(extract_min_price(dp.latest_raw_payload)) as min_price, + MAX(extract_max_price(dp.latest_raw_payload)) as max_price, + AVG(extract_min_price(dp.latest_raw_payload)) as avg_price, + AVG(extract_wholesale_price(dp.latest_raw_payload)) as wholesale_price, + COUNT(*) as sample_size + FROM dutchie_products dp + JOIN dispensaries d ON dp.dispensary_id = d.id + WHERE dp.type = $1 + AND dp.updated_at >= NOW() - ($2 || ' days')::INTERVAL + ${storeId ? 'AND dp.dispensary_id = $3' : ''} + ${brandName ? `AND dp.brand_name = $${storeId ? 4 : 3}` : ''} + ${state ? `AND d.state = $${storeId ? (brandName ? 5 : 4) : (brandName ? 4 : 3)}` : ''} + GROUP BY DATE(dp.updated_at) + ORDER BY date + `, this.buildParams([category, days], { storeId, brandName, state })); + + const dataPoints: PricePoint[] = result.rows.map(row => ({ + date: row.date.toISOString().split('T')[0], + minPrice: parseFloat(row.min_price) || null, + maxPrice: parseFloat(row.max_price) || null, + avgPrice: parseFloat(row.avg_price) || null, + wholesalePrice: parseFloat(row.wholesale_price) || null, + sampleSize: parseInt(row.sample_size), + })); + + return { + category, + storeId, + brandName, + dataPoints, + summary: this.calculatePriceSummary(dataPoints), + }; + }, 15)).data; + } + + /** + * Get price summary statistics + */ + async getPriceSummary(filters: PriceFilters = {}): Promise { + const { storeId, brandName, category, state } = filters; + const key = cacheKey('price_summary', filters as Record); + + return (await this.cache.getOrCompute(key, async () => { + const whereConditions: string[] = []; + const params: (string | number)[] = []; + let paramIndex = 1; + + if (storeId) { + whereConditions.push(`dp.dispensary_id = $${paramIndex++}`); + params.push(storeId); + } + if (brandName) { + whereConditions.push(`dp.brand_name = $${paramIndex++}`); + params.push(brandName); + } + if (category) { + whereConditions.push(`dp.type = $${paramIndex++}`); + params.push(category); + } + if (state) { + whereConditions.push(`d.state = $${paramIndex++}`); + params.push(state); + } + + const whereClause = whereConditions.length > 0 + ? 'WHERE ' + whereConditions.join(' AND ') + : ''; + + const result = await this.pool.query(` + WITH prices AS ( + SELECT + extract_min_price(dp.latest_raw_payload) as min_price, + extract_max_price(dp.latest_raw_payload) as max_price, + extract_wholesale_price(dp.latest_raw_payload) as wholesale_price + FROM dutchie_products dp + JOIN dispensaries d ON dp.dispensary_id = d.id + ${whereClause} + ) + SELECT + AVG(min_price) as avg_price, + AVG(wholesale_price) as avg_wholesale, + MIN(min_price) as min_price, + MAX(max_price) as max_price, + STDDEV(min_price) as std_dev + FROM prices + WHERE min_price IS NOT NULL + `, params); + + const row = result.rows[0]; + const avgPrice = parseFloat(row.avg_price) || null; + const stdDev = parseFloat(row.std_dev) || null; + const volatility = avgPrice && stdDev ? (stdDev / avgPrice) * 100 : null; + + return { + avg7d: avgPrice, // Using current data as proxy + avg30d: avgPrice, + avg90d: avgPrice, + wholesaleAvg7d: parseFloat(row.avg_wholesale) || null, + wholesaleAvg30d: parseFloat(row.avg_wholesale) || null, + wholesaleAvg90d: parseFloat(row.avg_wholesale) || null, + minPrice: parseFloat(row.min_price) || null, + maxPrice: parseFloat(row.max_price) || null, + priceRange: row.max_price && row.min_price + ? parseFloat(row.max_price) - parseFloat(row.min_price) + : null, + volatilityScore: volatility ? Math.round(volatility * 10) / 10 : null, + }; + }, 30)).data; + } + + /** + * Detect price compression in a category + */ + async detectPriceCompression( + category: string, + state?: string + ): Promise { + const key = cacheKey('price_compression', { category, state }); + + return (await this.cache.getOrCompute(key, async () => { + const result = await this.pool.query(` + WITH brand_prices AS ( + SELECT + dp.brand_name, + AVG(extract_min_price(dp.latest_raw_payload)) as avg_price, + COUNT(*) as sku_count + FROM dutchie_products dp + JOIN dispensaries d ON dp.dispensary_id = d.id + WHERE dp.type = $1 + AND dp.brand_name IS NOT NULL + ${state ? 'AND d.state = $2' : ''} + GROUP BY dp.brand_name + HAVING COUNT(*) >= 3 + ), + stats AS ( + SELECT + AVG(avg_price) as category_avg, + STDDEV(avg_price) as std_dev + FROM brand_prices + WHERE avg_price IS NOT NULL + ) + SELECT + bp.brand_name, + bp.avg_price, + ABS(bp.avg_price - s.category_avg) as price_distance, + s.category_avg, + s.std_dev + FROM brand_prices bp, stats s + WHERE bp.avg_price IS NOT NULL + ORDER BY bp.avg_price + `, state ? [category, state] : [category]); + + if (result.rows.length === 0) { + return { + category, + brands: [], + compressionScore: 0, + standardDeviation: 0, + }; + } + + const categoryAvg = parseFloat(result.rows[0].category_avg) || 0; + const stdDev = parseFloat(result.rows[0].std_dev) || 0; + + // Compression score: lower std dev relative to mean = more compression + // Scale to 0-100 where 100 = very compressed + const cv = categoryAvg > 0 ? (stdDev / categoryAvg) * 100 : 0; + const compressionScore = Math.max(0, Math.min(100, 100 - cv)); + + const brands = result.rows.map(row => ({ + brandName: row.brand_name, + avgPrice: parseFloat(row.avg_price) || 0, + priceDistance: parseFloat(row.price_distance) || 0, + })); + + return { + category, + brands, + compressionScore: Math.round(compressionScore), + standardDeviation: Math.round(stdDev * 100) / 100, + }; + }, 30)).data; + } + + /** + * Get global price statistics + */ + async getGlobalPriceStats(): Promise<{ + totalProductsWithPrice: number; + avgPrice: number | null; + medianPrice: number | null; + priceByCategory: Array<{ category: string; avgPrice: number; count: number }>; + priceByState: Array<{ state: string; avgPrice: number; count: number }>; + }> { + const key = 'global_price_stats'; + + return (await this.cache.getOrCompute(key, async () => { + const [countResult, categoryResult, stateResult] = await Promise.all([ + this.pool.query(` + SELECT + COUNT(*) FILTER (WHERE extract_min_price(latest_raw_payload) IS NOT NULL) as with_price, + AVG(extract_min_price(latest_raw_payload)) as avg_price, + PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY extract_min_price(latest_raw_payload)) as median + FROM dutchie_products + `), + this.pool.query(` + SELECT + type as category, + AVG(extract_min_price(latest_raw_payload)) as avg_price, + COUNT(*) as count + FROM dutchie_products + WHERE type IS NOT NULL + AND extract_min_price(latest_raw_payload) IS NOT NULL + GROUP BY type + ORDER BY avg_price DESC + `), + this.pool.query(` + SELECT + d.state, + AVG(extract_min_price(dp.latest_raw_payload)) as avg_price, + COUNT(*) as count + FROM dutchie_products dp + JOIN dispensaries d ON dp.dispensary_id = d.id + WHERE extract_min_price(dp.latest_raw_payload) IS NOT NULL + GROUP BY d.state + ORDER BY avg_price DESC + `), + ]); + + return { + totalProductsWithPrice: parseInt(countResult.rows[0]?.with_price || '0'), + avgPrice: parseFloat(countResult.rows[0]?.avg_price) || null, + medianPrice: parseFloat(countResult.rows[0]?.median) || null, + priceByCategory: categoryResult.rows.map(r => ({ + category: r.category, + avgPrice: parseFloat(r.avg_price) || 0, + count: parseInt(r.count), + })), + priceByState: stateResult.rows.map(r => ({ + state: r.state, + avgPrice: parseFloat(r.avg_price) || 0, + count: parseInt(r.count), + })), + }; + }, 30)).data; + } + + // ============================================================ + // HELPER METHODS + // ============================================================ + + private calculatePriceSummary(dataPoints: PricePoint[]): PriceTrend['summary'] { + if (dataPoints.length === 0) { + return { + currentAvg: null, + previousAvg: null, + changePercent: null, + trend: 'stable', + volatilityScore: null, + }; + } + + const prices = dataPoints + .map(d => d.avgPrice) + .filter((p): p is number => p !== null); + + if (prices.length === 0) { + return { + currentAvg: null, + previousAvg: null, + changePercent: null, + trend: 'stable', + volatilityScore: null, + }; + } + + const currentAvg = prices[prices.length - 1]; + const midpoint = Math.floor(prices.length / 2); + const previousAvg = prices.length > 1 ? prices[midpoint] : currentAvg; + + const changePercent = previousAvg > 0 + ? ((currentAvg - previousAvg) / previousAvg) * 100 + : null; + + // Calculate volatility (coefficient of variation) + const mean = prices.reduce((a, b) => a + b, 0) / prices.length; + const variance = prices.reduce((sum, p) => sum + Math.pow(p - mean, 2), 0) / prices.length; + const stdDev = Math.sqrt(variance); + const volatilityScore = mean > 0 ? (stdDev / mean) * 100 : null; + + let trend: 'up' | 'down' | 'stable' = 'stable'; + if (changePercent !== null) { + if (changePercent > 5) trend = 'up'; + else if (changePercent < -5) trend = 'down'; + } + + return { + currentAvg: Math.round(currentAvg * 100) / 100, + previousAvg: Math.round(previousAvg * 100) / 100, + changePercent: changePercent !== null ? Math.round(changePercent * 10) / 10 : null, + trend, + volatilityScore: volatilityScore !== null ? Math.round(volatilityScore * 10) / 10 : null, + }; + } + + private buildParams( + baseParams: (string | number)[], + optionalParams: Record + ): (string | number)[] { + const params = [...baseParams]; + for (const value of Object.values(optionalParams)) { + if (value !== undefined) { + params.push(value); + } + } + return params; + } +} diff --git a/backend/src/dutchie-az/services/analytics/store-changes.ts b/backend/src/dutchie-az/services/analytics/store-changes.ts new file mode 100644 index 00000000..5a744070 --- /dev/null +++ b/backend/src/dutchie-az/services/analytics/store-changes.ts @@ -0,0 +1,587 @@ +/** + * Store Change Tracking Service + * + * Tracks changes at the store level including: + * - New/lost brands + * - New/discontinued products + * - Stock status transitions + * - Price changes + * - Category movement leaderboards + * + * Phase 3: Analytics Dashboards + */ + +import { Pool } from 'pg'; +import { AnalyticsCache, cacheKey } from './cache'; + +export interface StoreChangeSummary { + storeId: number; + storeName: string; + city: string; + state: string; + brandsAdded7d: number; + brandsAdded30d: number; + brandsLost7d: number; + brandsLost30d: number; + productsAdded7d: number; + productsAdded30d: number; + productsDiscontinued7d: number; + productsDiscontinued30d: number; + priceDrops7d: number; + priceIncreases7d: number; + restocks7d: number; + stockOuts7d: number; +} + +export interface StoreChangeEvent { + id: number; + storeId: number; + storeName: string; + eventType: string; + eventDate: string; + brandName: string | null; + productName: string | null; + category: string | null; + oldValue: string | null; + newValue: string | null; + metadata: Record | null; +} + +export interface BrandChange { + brandName: string; + changeType: 'added' | 'removed'; + date: string; + skuCount: number; + categories: string[]; +} + +export interface ProductChange { + productId: number; + productName: string; + brandName: string | null; + category: string | null; + changeType: 'added' | 'discontinued' | 'price_drop' | 'price_increase' | 'restocked' | 'out_of_stock'; + date: string; + oldValue?: string; + newValue?: string; +} + +export interface CategoryLeaderboard { + category: string; + storeId: number; + storeName: string; + skuCount: number; + brandCount: number; + avgPrice: number | null; + changePercent7d: number; + rank: number; +} + +export interface StoreFilters { + storeId?: number; + state?: string; + days?: number; + eventType?: string; +} + +export class StoreChangeService { + private pool: Pool; + private cache: AnalyticsCache; + + constructor(pool: Pool, cache: AnalyticsCache) { + this.pool = pool; + this.cache = cache; + } + + /** + * Get change summary for a store + */ + async getStoreChangeSummary( + storeId: number + ): Promise { + const key = cacheKey('store_change_summary', { storeId }); + + return (await this.cache.getOrCompute(key, async () => { + // Get store info + const storeResult = await this.pool.query(` + SELECT id, name, city, state FROM dispensaries WHERE id = $1 + `, [storeId]); + + if (storeResult.rows.length === 0) return null; + const store = storeResult.rows[0]; + + // Get change events counts + const eventsResult = await this.pool.query(` + SELECT + event_type, + COUNT(*) FILTER (WHERE event_date >= CURRENT_DATE - INTERVAL '7 days') as count_7d, + COUNT(*) FILTER (WHERE event_date >= CURRENT_DATE - INTERVAL '30 days') as count_30d + FROM store_change_events + WHERE store_id = $1 + GROUP BY event_type + `, [storeId]); + + const counts: Record = {}; + eventsResult.rows.forEach(row => { + counts[row.event_type] = { + count_7d: parseInt(row.count_7d) || 0, + count_30d: parseInt(row.count_30d) || 0, + }; + }); + + return { + storeId: store.id, + storeName: store.name, + city: store.city, + state: store.state, + brandsAdded7d: counts['brand_added']?.count_7d || 0, + brandsAdded30d: counts['brand_added']?.count_30d || 0, + brandsLost7d: counts['brand_removed']?.count_7d || 0, + brandsLost30d: counts['brand_removed']?.count_30d || 0, + productsAdded7d: counts['product_added']?.count_7d || 0, + productsAdded30d: counts['product_added']?.count_30d || 0, + productsDiscontinued7d: counts['product_removed']?.count_7d || 0, + productsDiscontinued30d: counts['product_removed']?.count_30d || 0, + priceDrops7d: counts['price_drop']?.count_7d || 0, + priceIncreases7d: counts['price_increase']?.count_7d || 0, + restocks7d: counts['restocked']?.count_7d || 0, + stockOuts7d: counts['out_of_stock']?.count_7d || 0, + }; + }, 15)).data; + } + + /** + * Get recent change events for a store + */ + async getStoreChangeEvents( + storeId: number, + filters: { eventType?: string; days?: number; limit?: number } = {} + ): Promise { + const { eventType, days = 30, limit = 100 } = filters; + const key = cacheKey('store_change_events', { storeId, eventType, days, limit }); + + return (await this.cache.getOrCompute(key, async () => { + const params: (string | number)[] = [storeId, days, limit]; + let eventTypeCondition = ''; + + if (eventType) { + eventTypeCondition = 'AND event_type = $4'; + params.push(eventType); + } + + const result = await this.pool.query(` + SELECT + sce.id, + sce.store_id, + d.name as store_name, + sce.event_type, + sce.event_date, + sce.brand_name, + sce.product_name, + sce.category, + sce.old_value, + sce.new_value, + sce.metadata + FROM store_change_events sce + JOIN dispensaries d ON sce.store_id = d.id + WHERE sce.store_id = $1 + AND sce.event_date >= CURRENT_DATE - ($2 || ' days')::INTERVAL + ${eventTypeCondition} + ORDER BY sce.event_date DESC, sce.id DESC + LIMIT $3 + `, params); + + return result.rows.map(row => ({ + id: row.id, + storeId: row.store_id, + storeName: row.store_name, + eventType: row.event_type, + eventDate: row.event_date.toISOString().split('T')[0], + brandName: row.brand_name, + productName: row.product_name, + category: row.category, + oldValue: row.old_value, + newValue: row.new_value, + metadata: row.metadata, + })); + }, 5)).data; + } + + /** + * Get new brands added to a store + */ + async getNewBrands( + storeId: number, + days: number = 30 + ): Promise { + const key = cacheKey('new_brands', { storeId, days }); + + return (await this.cache.getOrCompute(key, async () => { + const result = await this.pool.query(` + SELECT + brand_name, + event_date, + metadata + FROM store_change_events + WHERE store_id = $1 + AND event_type = 'brand_added' + AND event_date >= CURRENT_DATE - ($2 || ' days')::INTERVAL + ORDER BY event_date DESC + `, [storeId, days]); + + return result.rows.map(row => ({ + brandName: row.brand_name, + changeType: 'added' as const, + date: row.event_date.toISOString().split('T')[0], + skuCount: row.metadata?.sku_count || 0, + categories: row.metadata?.categories || [], + })); + }, 15)).data; + } + + /** + * Get brands lost from a store + */ + async getLostBrands( + storeId: number, + days: number = 30 + ): Promise { + const key = cacheKey('lost_brands', { storeId, days }); + + return (await this.cache.getOrCompute(key, async () => { + const result = await this.pool.query(` + SELECT + brand_name, + event_date, + metadata + FROM store_change_events + WHERE store_id = $1 + AND event_type = 'brand_removed' + AND event_date >= CURRENT_DATE - ($2 || ' days')::INTERVAL + ORDER BY event_date DESC + `, [storeId, days]); + + return result.rows.map(row => ({ + brandName: row.brand_name, + changeType: 'removed' as const, + date: row.event_date.toISOString().split('T')[0], + skuCount: row.metadata?.sku_count || 0, + categories: row.metadata?.categories || [], + })); + }, 15)).data; + } + + /** + * Get product changes for a store + */ + async getProductChanges( + storeId: number, + changeType?: 'added' | 'discontinued' | 'price_drop' | 'price_increase' | 'restocked' | 'out_of_stock', + days: number = 7 + ): Promise { + const key = cacheKey('product_changes', { storeId, changeType, days }); + + return (await this.cache.getOrCompute(key, async () => { + const eventTypeMap: Record = { + 'added': 'product_added', + 'discontinued': 'product_removed', + 'price_drop': 'price_drop', + 'price_increase': 'price_increase', + 'restocked': 'restocked', + 'out_of_stock': 'out_of_stock', + }; + + const params: (string | number)[] = [storeId, days]; + let eventCondition = ''; + + if (changeType) { + eventCondition = 'AND event_type = $3'; + params.push(eventTypeMap[changeType]); + } + + const result = await this.pool.query(` + SELECT + product_id, + product_name, + brand_name, + category, + event_type, + event_date, + old_value, + new_value + FROM store_change_events + WHERE store_id = $1 + AND event_date >= CURRENT_DATE - ($2 || ' days')::INTERVAL + AND product_id IS NOT NULL + ${eventCondition} + ORDER BY event_date DESC + LIMIT 100 + `, params); + + const reverseMap: Record = { + 'product_added': 'added', + 'product_removed': 'discontinued', + 'price_drop': 'price_drop', + 'price_increase': 'price_increase', + 'restocked': 'restocked', + 'out_of_stock': 'out_of_stock', + }; + + return result.rows.map(row => ({ + productId: row.product_id, + productName: row.product_name, + brandName: row.brand_name, + category: row.category, + changeType: reverseMap[row.event_type] || 'added', + date: row.event_date.toISOString().split('T')[0], + oldValue: row.old_value, + newValue: row.new_value, + })); + }, 5)).data; + } + + /** + * Get category leaderboard across stores + */ + async getCategoryLeaderboard( + category: string, + limit: number = 20 + ): Promise { + const key = cacheKey('category_leaderboard', { category, limit }); + + return (await this.cache.getOrCompute(key, async () => { + const result = await this.pool.query(` + WITH store_category_stats AS ( + SELECT + dp.dispensary_id as store_id, + d.name as store_name, + COUNT(*) as sku_count, + COUNT(DISTINCT dp.brand_name) as brand_count, + AVG(extract_min_price(dp.latest_raw_payload)) as avg_price + FROM dutchie_products dp + JOIN dispensaries d ON dp.dispensary_id = d.id + WHERE dp.type = $1 + GROUP BY dp.dispensary_id, d.name + ) + SELECT + scs.*, + RANK() OVER (ORDER BY scs.sku_count DESC) as rank + FROM store_category_stats scs + ORDER BY scs.sku_count DESC + LIMIT $2 + `, [category, limit]); + + return result.rows.map(row => ({ + category, + storeId: row.store_id, + storeName: row.store_name, + skuCount: parseInt(row.sku_count) || 0, + brandCount: parseInt(row.brand_count) || 0, + avgPrice: row.avg_price ? Math.round(parseFloat(row.avg_price) * 100) / 100 : null, + changePercent7d: 0, // Would need historical data + rank: parseInt(row.rank) || 0, + })); + }, 15)).data; + } + + /** + * Get stores with most activity (changes) + */ + async getMostActiveStores( + days: number = 7, + limit: number = 10 + ): Promise> { + const key = cacheKey('most_active_stores', { days, limit }); + + return (await this.cache.getOrCompute(key, async () => { + const result = await this.pool.query(` + SELECT + d.id as store_id, + d.name as store_name, + d.city, + d.state, + COUNT(*) as total_changes, + COUNT(*) FILTER (WHERE sce.event_type IN ('brand_added', 'brand_removed')) as brands_changed, + COUNT(*) FILTER (WHERE sce.event_type IN ('product_added', 'product_removed')) as products_changed, + COUNT(*) FILTER (WHERE sce.event_type IN ('price_drop', 'price_increase')) as price_changes, + COUNT(*) FILTER (WHERE sce.event_type IN ('restocked', 'out_of_stock')) as stock_changes + FROM store_change_events sce + JOIN dispensaries d ON sce.store_id = d.id + WHERE sce.event_date >= CURRENT_DATE - ($1 || ' days')::INTERVAL + GROUP BY d.id, d.name, d.city, d.state + ORDER BY total_changes DESC + LIMIT $2 + `, [days, limit]); + + return result.rows.map(row => ({ + storeId: row.store_id, + storeName: row.store_name, + city: row.city, + state: row.state, + totalChanges: parseInt(row.total_changes) || 0, + brandsChanged: parseInt(row.brands_changed) || 0, + productsChanged: parseInt(row.products_changed) || 0, + priceChanges: parseInt(row.price_changes) || 0, + stockChanges: parseInt(row.stock_changes) || 0, + })); + }, 15)).data; + } + + /** + * Compare two stores + */ + async compareStores( + storeId1: number, + storeId2: number + ): Promise<{ + store1: { id: number; name: string; brands: string[]; categories: string[]; skuCount: number }; + store2: { id: number; name: string; brands: string[]; categories: string[]; skuCount: number }; + sharedBrands: string[]; + uniqueToStore1: string[]; + uniqueToStore2: string[]; + categoryComparison: Array<{ + category: string; + store1Skus: number; + store2Skus: number; + difference: number; + }>; + }> { + const key = cacheKey('compare_stores', { storeId1, storeId2 }); + + return (await this.cache.getOrCompute(key, async () => { + const [store1Data, store2Data] = await Promise.all([ + this.pool.query(` + SELECT + d.id, d.name, + ARRAY_AGG(DISTINCT dp.brand_name) FILTER (WHERE dp.brand_name IS NOT NULL) as brands, + ARRAY_AGG(DISTINCT dp.type) FILTER (WHERE dp.type IS NOT NULL) as categories, + COUNT(*) as sku_count + FROM dispensaries d + LEFT JOIN dutchie_products dp ON d.id = dp.dispensary_id + WHERE d.id = $1 + GROUP BY d.id, d.name + `, [storeId1]), + this.pool.query(` + SELECT + d.id, d.name, + ARRAY_AGG(DISTINCT dp.brand_name) FILTER (WHERE dp.brand_name IS NOT NULL) as brands, + ARRAY_AGG(DISTINCT dp.type) FILTER (WHERE dp.type IS NOT NULL) as categories, + COUNT(*) as sku_count + FROM dispensaries d + LEFT JOIN dutchie_products dp ON d.id = dp.dispensary_id + WHERE d.id = $1 + GROUP BY d.id, d.name + `, [storeId2]), + ]); + + const s1 = store1Data.rows[0]; + const s2 = store2Data.rows[0]; + + const brands1Array: string[] = (s1?.brands || []).filter((b: string | null): b is string => b !== null); + const brands2Array: string[] = (s2?.brands || []).filter((b: string | null): b is string => b !== null); + const brands1 = new Set(brands1Array); + const brands2 = new Set(brands2Array); + + const sharedBrands: string[] = brands1Array.filter(b => brands2.has(b)); + const uniqueToStore1: string[] = brands1Array.filter(b => !brands2.has(b)); + const uniqueToStore2: string[] = brands2Array.filter(b => !brands1.has(b)); + + // Category comparison + const categoryResult = await this.pool.query(` + WITH store1_cats AS ( + SELECT type as category, COUNT(*) as sku_count + FROM dutchie_products WHERE dispensary_id = $1 AND type IS NOT NULL + GROUP BY type + ), + store2_cats AS ( + SELECT type as category, COUNT(*) as sku_count + FROM dutchie_products WHERE dispensary_id = $2 AND type IS NOT NULL + GROUP BY type + ), + all_cats AS ( + SELECT category FROM store1_cats + UNION + SELECT category FROM store2_cats + ) + SELECT + ac.category, + COALESCE(s1.sku_count, 0) as store1_skus, + COALESCE(s2.sku_count, 0) as store2_skus + FROM all_cats ac + LEFT JOIN store1_cats s1 ON ac.category = s1.category + LEFT JOIN store2_cats s2 ON ac.category = s2.category + ORDER BY (COALESCE(s1.sku_count, 0) + COALESCE(s2.sku_count, 0)) DESC + `, [storeId1, storeId2]); + + return { + store1: { + id: s1?.id || storeId1, + name: s1?.name || 'Unknown', + brands: s1?.brands || [], + categories: s1?.categories || [], + skuCount: parseInt(s1?.sku_count) || 0, + }, + store2: { + id: s2?.id || storeId2, + name: s2?.name || 'Unknown', + brands: s2?.brands || [], + categories: s2?.categories || [], + skuCount: parseInt(s2?.sku_count) || 0, + }, + sharedBrands, + uniqueToStore1, + uniqueToStore2, + categoryComparison: categoryResult.rows.map(row => ({ + category: row.category, + store1Skus: parseInt(row.store1_skus) || 0, + store2Skus: parseInt(row.store2_skus) || 0, + difference: (parseInt(row.store1_skus) || 0) - (parseInt(row.store2_skus) || 0), + })), + }; + }, 15)).data; + } + + /** + * Record a change event (used by crawler/worker) + */ + async recordChangeEvent(event: { + storeId: number; + eventType: string; + brandName?: string; + productId?: number; + productName?: string; + category?: string; + oldValue?: string; + newValue?: string; + metadata?: Record; + }): Promise { + await this.pool.query(` + INSERT INTO store_change_events + (store_id, event_type, event_date, brand_name, product_id, product_name, category, old_value, new_value, metadata) + VALUES ($1, $2, CURRENT_DATE, $3, $4, $5, $6, $7, $8, $9) + `, [ + event.storeId, + event.eventType, + event.brandName || null, + event.productId || null, + event.productName || null, + event.category || null, + event.oldValue || null, + event.newValue || null, + event.metadata ? JSON.stringify(event.metadata) : null, + ]); + + // Invalidate cache + await this.cache.invalidatePattern(`store_change_summary:storeId=${event.storeId}`); + } +} diff --git a/backend/src/dutchie-az/services/azdhs-import.ts b/backend/src/dutchie-az/services/azdhs-import.ts index a0b16af7..9f944518 100644 --- a/backend/src/dutchie-az/services/azdhs-import.ts +++ b/backend/src/dutchie-az/services/azdhs-import.ts @@ -1,20 +1,27 @@ /** - * AZDHS Import Service + * LEGACY SERVICE - AZDHS Import + * + * DEPRECATED: This service creates its own database pool. + * Future implementations should use the canonical CannaiQ connection. * * Imports Arizona dispensaries from the main database's dispensaries table * (which was populated from AZDHS data) into the isolated Dutchie AZ database. * * This establishes the canonical list of AZ dispensaries to match against Dutchie. + * + * DO NOT: + * - Run this in automated jobs + * - Use DATABASE_URL directly */ import { Pool } from 'pg'; import { query as dutchieQuery } from '../db/connection'; import { Dispensary } from '../types'; -// Main database connection (source of AZDHS data) -const MAIN_DATABASE_URL = - process.env.DATABASE_URL || - 'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus'; +// Single database connection (cannaiq in cannaiq-postgres container) +// Use CANNAIQ_DB_* env vars or defaults +const MAIN_DB_CONNECTION = process.env.CANNAIQ_DB_URL || + `postgresql://${process.env.CANNAIQ_DB_USER || 'dutchie'}:${process.env.CANNAIQ_DB_PASS || 'dutchie_local_pass'}@${process.env.CANNAIQ_DB_HOST || 'localhost'}:${process.env.CANNAIQ_DB_PORT || '54320'}/${process.env.CANNAIQ_DB_NAME || 'cannaiq'}`; /** * AZDHS dispensary record from the main database @@ -57,8 +64,9 @@ interface ImportResult { * Create a temporary connection to the main database */ function getMainDBPool(): Pool { + console.warn('[AZDHS Import] LEGACY: Using separate pool. Should use canonical CannaiQ connection.'); return new Pool({ - connectionString: MAIN_DATABASE_URL, + connectionString: MAIN_DB_CONNECTION, max: 5, idleTimeoutMillis: 30000, connectionTimeoutMillis: 5000, diff --git a/backend/src/dutchie-az/services/discovery.ts b/backend/src/dutchie-az/services/discovery.ts index 1c0169ce..de2f3ba1 100644 --- a/backend/src/dutchie-az/services/discovery.ts +++ b/backend/src/dutchie-az/services/discovery.ts @@ -344,15 +344,12 @@ export async function resolvePlatformDispensaryIds(): Promise<{ resolved: number return { resolved, failed, skipped, notCrawlable }; } +// Use shared dispensary columns (handles optional columns like provider_detection_data) +import { DISPENSARY_COLUMNS } from '../db/dispensary-columns'; + /** * Get all dispensaries */ -// Explicit column list for dispensaries table (avoids SELECT * issues with schema differences) -const DISPENSARY_COLUMNS = ` - id, name, slug, city, state, zip, address, latitude, longitude, - menu_type, menu_url, platform_dispensary_id, website, - provider_detection_data, created_at, updated_at -`; export async function getAllDispensaries(): Promise { const { rows } = await query( @@ -386,7 +383,7 @@ export function mapDbRowToDispensary(row: any): Dispensary { id: row.id, platform: row.platform || 'dutchie', // keep platform as-is, default to 'dutchie' name: row.name, - dbaName: row.dbaName || row.dba_name, + dbaName: row.dbaName || row.dba_name || undefined, // dba_name column is optional slug: row.slug, city: row.city, state: row.state, @@ -421,7 +418,6 @@ export async function getDispensaryById(id: number): Promise SELECT id, name, - dba_name AS "dbaName", slug, city, state, diff --git a/backend/src/dutchie-az/services/error-taxonomy.ts b/backend/src/dutchie-az/services/error-taxonomy.ts new file mode 100644 index 00000000..d2cb2929 --- /dev/null +++ b/backend/src/dutchie-az/services/error-taxonomy.ts @@ -0,0 +1,491 @@ +/** + * Error Taxonomy Module + * + * Standardized error codes and classification for crawler reliability. + * All crawl results must use these codes for consistent error handling. + * + * Phase 1: Crawler Reliability & Stabilization + */ + +// ============================================================ +// ERROR CODES +// ============================================================ + +/** + * Standardized error codes for all crawl operations. + * These codes are stored in the database for analytics and debugging. + */ +export const CrawlErrorCode = { + // Success states + SUCCESS: 'SUCCESS', + + // Rate limiting + RATE_LIMITED: 'RATE_LIMITED', // 429 responses + + // Proxy issues + BLOCKED_PROXY: 'BLOCKED_PROXY', // 407 or proxy-related blocks + PROXY_TIMEOUT: 'PROXY_TIMEOUT', // Proxy connection timeout + + // Content issues + HTML_CHANGED: 'HTML_CHANGED', // Page structure changed + NO_PRODUCTS: 'NO_PRODUCTS', // Empty response (valid but no data) + PARSE_ERROR: 'PARSE_ERROR', // Failed to parse response + + // Network issues + TIMEOUT: 'TIMEOUT', // Request timeout + NETWORK_ERROR: 'NETWORK_ERROR', // Connection failed + DNS_ERROR: 'DNS_ERROR', // DNS resolution failed + + // Authentication + AUTH_FAILED: 'AUTH_FAILED', // Authentication/session issues + + // Server errors + SERVER_ERROR: 'SERVER_ERROR', // 5xx responses + SERVICE_UNAVAILABLE: 'SERVICE_UNAVAILABLE', // 503 + + // Configuration issues + INVALID_CONFIG: 'INVALID_CONFIG', // Bad store configuration + MISSING_PLATFORM_ID: 'MISSING_PLATFORM_ID', // No platform_dispensary_id + + // Unknown + UNKNOWN_ERROR: 'UNKNOWN_ERROR', // Catch-all for unclassified errors +} as const; + +export type CrawlErrorCodeType = typeof CrawlErrorCode[keyof typeof CrawlErrorCode]; + +// ============================================================ +// ERROR CLASSIFICATION +// ============================================================ + +/** + * Error metadata for each error code + */ +interface ErrorMetadata { + code: CrawlErrorCodeType; + retryable: boolean; + rotateProxy: boolean; + rotateUserAgent: boolean; + backoffMultiplier: number; + severity: 'low' | 'medium' | 'high' | 'critical'; + description: string; +} + +/** + * Metadata for each error code - defines retry behavior + */ +export const ERROR_METADATA: Record = { + [CrawlErrorCode.SUCCESS]: { + code: CrawlErrorCode.SUCCESS, + retryable: false, + rotateProxy: false, + rotateUserAgent: false, + backoffMultiplier: 0, + severity: 'low', + description: 'Crawl completed successfully', + }, + + [CrawlErrorCode.RATE_LIMITED]: { + code: CrawlErrorCode.RATE_LIMITED, + retryable: true, + rotateProxy: true, + rotateUserAgent: true, + backoffMultiplier: 2.0, + severity: 'medium', + description: 'Rate limited by target (429)', + }, + + [CrawlErrorCode.BLOCKED_PROXY]: { + code: CrawlErrorCode.BLOCKED_PROXY, + retryable: true, + rotateProxy: true, + rotateUserAgent: true, + backoffMultiplier: 1.5, + severity: 'medium', + description: 'Proxy blocked or rejected (407)', + }, + + [CrawlErrorCode.PROXY_TIMEOUT]: { + code: CrawlErrorCode.PROXY_TIMEOUT, + retryable: true, + rotateProxy: true, + rotateUserAgent: false, + backoffMultiplier: 1.0, + severity: 'low', + description: 'Proxy connection timed out', + }, + + [CrawlErrorCode.HTML_CHANGED]: { + code: CrawlErrorCode.HTML_CHANGED, + retryable: false, + rotateProxy: false, + rotateUserAgent: false, + backoffMultiplier: 1.0, + severity: 'high', + description: 'Page structure changed - needs selector update', + }, + + [CrawlErrorCode.NO_PRODUCTS]: { + code: CrawlErrorCode.NO_PRODUCTS, + retryable: true, + rotateProxy: false, + rotateUserAgent: false, + backoffMultiplier: 1.0, + severity: 'low', + description: 'No products returned (may be temporary)', + }, + + [CrawlErrorCode.PARSE_ERROR]: { + code: CrawlErrorCode.PARSE_ERROR, + retryable: true, + rotateProxy: false, + rotateUserAgent: false, + backoffMultiplier: 1.0, + severity: 'medium', + description: 'Failed to parse response data', + }, + + [CrawlErrorCode.TIMEOUT]: { + code: CrawlErrorCode.TIMEOUT, + retryable: true, + rotateProxy: true, + rotateUserAgent: false, + backoffMultiplier: 1.5, + severity: 'medium', + description: 'Request timed out', + }, + + [CrawlErrorCode.NETWORK_ERROR]: { + code: CrawlErrorCode.NETWORK_ERROR, + retryable: true, + rotateProxy: true, + rotateUserAgent: false, + backoffMultiplier: 1.0, + severity: 'medium', + description: 'Network connection failed', + }, + + [CrawlErrorCode.DNS_ERROR]: { + code: CrawlErrorCode.DNS_ERROR, + retryable: true, + rotateProxy: true, + rotateUserAgent: false, + backoffMultiplier: 1.0, + severity: 'medium', + description: 'DNS resolution failed', + }, + + [CrawlErrorCode.AUTH_FAILED]: { + code: CrawlErrorCode.AUTH_FAILED, + retryable: true, + rotateProxy: false, + rotateUserAgent: true, + backoffMultiplier: 2.0, + severity: 'high', + description: 'Authentication or session failed', + }, + + [CrawlErrorCode.SERVER_ERROR]: { + code: CrawlErrorCode.SERVER_ERROR, + retryable: true, + rotateProxy: false, + rotateUserAgent: false, + backoffMultiplier: 1.5, + severity: 'medium', + description: 'Server error (5xx)', + }, + + [CrawlErrorCode.SERVICE_UNAVAILABLE]: { + code: CrawlErrorCode.SERVICE_UNAVAILABLE, + retryable: true, + rotateProxy: false, + rotateUserAgent: false, + backoffMultiplier: 2.0, + severity: 'high', + description: 'Service temporarily unavailable (503)', + }, + + [CrawlErrorCode.INVALID_CONFIG]: { + code: CrawlErrorCode.INVALID_CONFIG, + retryable: false, + rotateProxy: false, + rotateUserAgent: false, + backoffMultiplier: 0, + severity: 'critical', + description: 'Invalid store configuration', + }, + + [CrawlErrorCode.MISSING_PLATFORM_ID]: { + code: CrawlErrorCode.MISSING_PLATFORM_ID, + retryable: false, + rotateProxy: false, + rotateUserAgent: false, + backoffMultiplier: 0, + severity: 'critical', + description: 'Missing platform_dispensary_id', + }, + + [CrawlErrorCode.UNKNOWN_ERROR]: { + code: CrawlErrorCode.UNKNOWN_ERROR, + retryable: true, + rotateProxy: false, + rotateUserAgent: false, + backoffMultiplier: 1.0, + severity: 'high', + description: 'Unknown/unclassified error', + }, +}; + +// ============================================================ +// ERROR CLASSIFICATION FUNCTIONS +// ============================================================ + +/** + * Classify an error into a standardized error code. + * + * @param error - The error to classify (Error object, string, or HTTP status) + * @param httpStatus - Optional HTTP status code + * @returns Standardized error code + */ +export function classifyError( + error: Error | string | null, + httpStatus?: number +): CrawlErrorCodeType { + // Check HTTP status first + if (httpStatus) { + if (httpStatus === 429) return CrawlErrorCode.RATE_LIMITED; + if (httpStatus === 407) return CrawlErrorCode.BLOCKED_PROXY; + if (httpStatus === 401 || httpStatus === 403) return CrawlErrorCode.AUTH_FAILED; + if (httpStatus === 503) return CrawlErrorCode.SERVICE_UNAVAILABLE; + if (httpStatus >= 500) return CrawlErrorCode.SERVER_ERROR; + } + + if (!error) return CrawlErrorCode.UNKNOWN_ERROR; + + const message = typeof error === 'string' ? error.toLowerCase() : error.message.toLowerCase(); + + // Rate limiting patterns + if (message.includes('rate limit') || message.includes('too many requests') || message.includes('429')) { + return CrawlErrorCode.RATE_LIMITED; + } + + // Proxy patterns + if (message.includes('proxy') && (message.includes('block') || message.includes('reject') || message.includes('407'))) { + return CrawlErrorCode.BLOCKED_PROXY; + } + + // Timeout patterns + if (message.includes('timeout') || message.includes('timed out') || message.includes('etimedout')) { + if (message.includes('proxy')) { + return CrawlErrorCode.PROXY_TIMEOUT; + } + return CrawlErrorCode.TIMEOUT; + } + + // Network patterns + if (message.includes('econnrefused') || message.includes('econnreset') || message.includes('network')) { + return CrawlErrorCode.NETWORK_ERROR; + } + + // DNS patterns + if (message.includes('enotfound') || message.includes('dns') || message.includes('getaddrinfo')) { + return CrawlErrorCode.DNS_ERROR; + } + + // Auth patterns + if (message.includes('auth') || message.includes('unauthorized') || message.includes('forbidden') || message.includes('401') || message.includes('403')) { + return CrawlErrorCode.AUTH_FAILED; + } + + // HTML change patterns + if (message.includes('selector') || message.includes('element not found') || message.includes('structure changed')) { + return CrawlErrorCode.HTML_CHANGED; + } + + // Parse patterns + if (message.includes('parse') || message.includes('json') || message.includes('syntax')) { + return CrawlErrorCode.PARSE_ERROR; + } + + // No products patterns + if (message.includes('no products') || message.includes('empty') || message.includes('0 products')) { + return CrawlErrorCode.NO_PRODUCTS; + } + + // Server error patterns + if (message.includes('500') || message.includes('502') || message.includes('503') || message.includes('504')) { + return CrawlErrorCode.SERVER_ERROR; + } + + // Config patterns + if (message.includes('config') || message.includes('invalid') || message.includes('missing')) { + if (message.includes('platform') || message.includes('dispensary_id')) { + return CrawlErrorCode.MISSING_PLATFORM_ID; + } + return CrawlErrorCode.INVALID_CONFIG; + } + + return CrawlErrorCode.UNKNOWN_ERROR; +} + +/** + * Get metadata for an error code + */ +export function getErrorMetadata(code: CrawlErrorCodeType): ErrorMetadata { + return ERROR_METADATA[code] || ERROR_METADATA[CrawlErrorCode.UNKNOWN_ERROR]; +} + +/** + * Check if an error is retryable + */ +export function isRetryable(code: CrawlErrorCodeType): boolean { + return getErrorMetadata(code).retryable; +} + +/** + * Check if proxy should be rotated for this error + */ +export function shouldRotateProxy(code: CrawlErrorCodeType): boolean { + return getErrorMetadata(code).rotateProxy; +} + +/** + * Check if user agent should be rotated for this error + */ +export function shouldRotateUserAgent(code: CrawlErrorCodeType): boolean { + return getErrorMetadata(code).rotateUserAgent; +} + +/** + * Get backoff multiplier for this error + */ +export function getBackoffMultiplier(code: CrawlErrorCodeType): number { + return getErrorMetadata(code).backoffMultiplier; +} + +// ============================================================ +// CRAWL RESULT TYPE +// ============================================================ + +/** + * Standardized crawl result with error taxonomy + */ +export interface CrawlResult { + success: boolean; + dispensaryId: number; + + // Error info + errorCode: CrawlErrorCodeType; + errorMessage?: string; + httpStatus?: number; + + // Timing + startedAt: Date; + finishedAt: Date; + durationMs: number; + + // Context + attemptNumber: number; + proxyUsed?: string; + userAgentUsed?: string; + + // Metrics (on success) + productsFound?: number; + productsUpserted?: number; + snapshotsCreated?: number; + imagesDownloaded?: number; + + // Metadata + metadata?: Record; +} + +/** + * Create a success result + */ +export function createSuccessResult( + dispensaryId: number, + startedAt: Date, + metrics: { + productsFound: number; + productsUpserted: number; + snapshotsCreated: number; + imagesDownloaded?: number; + }, + context?: { + attemptNumber?: number; + proxyUsed?: string; + userAgentUsed?: string; + } +): CrawlResult { + const finishedAt = new Date(); + return { + success: true, + dispensaryId, + errorCode: CrawlErrorCode.SUCCESS, + startedAt, + finishedAt, + durationMs: finishedAt.getTime() - startedAt.getTime(), + attemptNumber: context?.attemptNumber || 1, + proxyUsed: context?.proxyUsed, + userAgentUsed: context?.userAgentUsed, + ...metrics, + }; +} + +/** + * Create a failure result + */ +export function createFailureResult( + dispensaryId: number, + startedAt: Date, + error: Error | string, + httpStatus?: number, + context?: { + attemptNumber?: number; + proxyUsed?: string; + userAgentUsed?: string; + } +): CrawlResult { + const finishedAt = new Date(); + const errorCode = classifyError(error, httpStatus); + const errorMessage = typeof error === 'string' ? error : error.message; + + return { + success: false, + dispensaryId, + errorCode, + errorMessage, + httpStatus, + startedAt, + finishedAt, + durationMs: finishedAt.getTime() - startedAt.getTime(), + attemptNumber: context?.attemptNumber || 1, + proxyUsed: context?.proxyUsed, + userAgentUsed: context?.userAgentUsed, + }; +} + +// ============================================================ +// LOGGING HELPERS +// ============================================================ + +/** + * Format error code for logging + */ +export function formatErrorForLog(result: CrawlResult): string { + const metadata = getErrorMetadata(result.errorCode); + const retryInfo = metadata.retryable ? '(retryable)' : '(non-retryable)'; + const proxyInfo = result.proxyUsed ? ` via ${result.proxyUsed}` : ''; + + if (result.success) { + return `[${result.errorCode}] Crawl successful: ${result.productsFound} products${proxyInfo}`; + } + + return `[${result.errorCode}] ${result.errorMessage}${proxyInfo} ${retryInfo}`; +} + +/** + * Get user-friendly error description + */ +export function getErrorDescription(code: CrawlErrorCodeType): string { + return getErrorMetadata(code).description; +} diff --git a/backend/src/dutchie-az/services/menu-detection.ts b/backend/src/dutchie-az/services/menu-detection.ts index bc86ffc7..a8b094c9 100644 --- a/backend/src/dutchie-az/services/menu-detection.ts +++ b/backend/src/dutchie-az/services/menu-detection.ts @@ -16,12 +16,8 @@ import { extractCNameFromMenuUrl, extractFromMenuUrl, mapDbRowToDispensary } fro import { resolveDispensaryId } from './graphql-client'; import { Dispensary, JobStatus } from '../types'; -// Explicit column list for dispensaries table (avoids SELECT * issues with schema differences) -const DISPENSARY_COLUMNS = ` - id, name, slug, city, state, zip, address, latitude, longitude, - menu_type, menu_url, platform_dispensary_id, website, - provider_detection_data, created_at, updated_at -`; +// Use shared dispensary columns (handles optional columns like provider_detection_data) +import { DISPENSARY_COLUMNS } from '../db/dispensary-columns'; // ============================================================ // TYPES @@ -647,6 +643,9 @@ export async function detectAndResolveDispensary(dispensaryId: number): Promise< ` UPDATE dispensaries SET menu_type = 'dutchie', + last_id_resolution_at = NOW(), + id_resolution_attempts = COALESCE(id_resolution_attempts, 0) + 1, + id_resolution_error = $1, provider_detection_data = COALESCE(provider_detection_data, '{}'::jsonb) || jsonb_build_object( 'detected_provider', 'dutchie'::text, @@ -660,7 +659,7 @@ export async function detectAndResolveDispensary(dispensaryId: number): Promise< `, [result.error, dispensaryId] ); - console.log(`[MenuDetection] ${dispensary.name}: ${result.error}`); + console.log(`[Henry - Entry Point Finder] ${dispensary.name}: ${result.error}`); return result; } @@ -675,6 +674,9 @@ export async function detectAndResolveDispensary(dispensaryId: number): Promise< UPDATE dispensaries SET menu_type = 'dutchie', platform_dispensary_id = $1, + last_id_resolution_at = NOW(), + id_resolution_attempts = COALESCE(id_resolution_attempts, 0) + 1, + id_resolution_error = NULL, provider_detection_data = COALESCE(provider_detection_data, '{}'::jsonb) || jsonb_build_object( 'detected_provider', 'dutchie'::text, @@ -691,7 +693,7 @@ export async function detectAndResolveDispensary(dispensaryId: number): Promise< `, [platformId, dispensaryId] ); - console.log(`[MenuDetection] ${dispensary.name}: Platform ID extracted directly from URL = ${platformId}`); + console.log(`[Henry - Entry Point Finder] ${dispensary.name}: Platform ID extracted directly from URL = ${platformId}`); return result; } @@ -714,6 +716,9 @@ export async function detectAndResolveDispensary(dispensaryId: number): Promise< UPDATE dispensaries SET menu_type = 'dutchie', platform_dispensary_id = $1, + last_id_resolution_at = NOW(), + id_resolution_attempts = COALESCE(id_resolution_attempts, 0) + 1, + id_resolution_error = NULL, provider_detection_data = COALESCE(provider_detection_data, '{}'::jsonb) || jsonb_build_object( 'detected_provider', 'dutchie'::text, @@ -730,10 +735,10 @@ export async function detectAndResolveDispensary(dispensaryId: number): Promise< `, [platformId, cName, dispensaryId] ); - console.log(`[MenuDetection] ${dispensary.name}: Resolved platform ID = ${platformId}`); + console.log(`[Henry - Entry Point Finder] ${dispensary.name}: Resolved platform ID = ${platformId}`); } else { // cName resolution failed - try crawling website as fallback - console.log(`[MenuDetection] ${dispensary.name}: cName "${cName}" not found on Dutchie, trying website crawl fallback...`); + console.log(`[Henry - Entry Point Finder] ${dispensary.name}: cName "${cName}" not found on Dutchie, trying website crawl fallback...`); if (website && website.trim() !== '') { const fallbackCrawl = await crawlWebsiteForMenuLinks(website); @@ -796,6 +801,9 @@ export async function detectAndResolveDispensary(dispensaryId: number): Promise< UPDATE dispensaries SET menu_type = 'dutchie', platform_dispensary_id = NULL, + last_id_resolution_at = NOW(), + id_resolution_attempts = COALESCE(id_resolution_attempts, 0) + 1, + id_resolution_error = $2, provider_detection_data = COALESCE(provider_detection_data, '{}'::jsonb) || jsonb_build_object( 'detected_provider', 'dutchie'::text, @@ -812,7 +820,7 @@ export async function detectAndResolveDispensary(dispensaryId: number): Promise< `, [cName, result.error, dispensaryId] ); - console.log(`[MenuDetection] ${dispensary.name}: ${result.error}`); + console.log(`[Henry - Entry Point Finder] ${dispensary.name}: ${result.error}`); } } catch (error: any) { result.error = `Resolution failed: ${error.message}`; @@ -820,6 +828,9 @@ export async function detectAndResolveDispensary(dispensaryId: number): Promise< ` UPDATE dispensaries SET menu_type = 'dutchie', + last_id_resolution_at = NOW(), + id_resolution_attempts = COALESCE(id_resolution_attempts, 0) + 1, + id_resolution_error = $2, provider_detection_data = COALESCE(provider_detection_data, '{}'::jsonb) || jsonb_build_object( 'detected_provider', 'dutchie'::text, @@ -835,7 +846,7 @@ export async function detectAndResolveDispensary(dispensaryId: number): Promise< `, [cName, result.error, dispensaryId] ); - console.error(`[MenuDetection] ${dispensary.name}: ${result.error}`); + console.error(`[Henry - Entry Point Finder] ${dispensary.name}: ${result.error}`); } return result; @@ -844,6 +855,11 @@ export async function detectAndResolveDispensary(dispensaryId: number): Promise< /** * Run bulk detection on all dispensaries with unknown/missing menu_type or platform_dispensary_id * Also includes dispensaries with no menu_url but with a website (for website crawl discovery) + * + * Enhanced for Henry (Entry Point Finder) to also process: + * - Stores with slug changes that need re-resolution + * - Recently added stores from Alice's discovery + * - Stores that failed resolution and need retry */ export async function runBulkDetection(options: { state?: string; @@ -851,6 +867,9 @@ export async function runBulkDetection(options: { onlyMissingPlatformId?: boolean; includeWebsiteCrawl?: boolean; // Include dispensaries with website but no menu_url includeDutchieMissingPlatformId?: boolean; // include menu_type='dutchie' with null platform_id + includeSlugChanges?: boolean; // Include stores where Alice detected slug changes + includeRecentlyAdded?: boolean; // Include stores recently added by Alice + scope?: { states?: string[]; storeIds?: number[] }; // Scope filtering for sharding limit?: number; } = {}): Promise { const { @@ -859,14 +878,23 @@ export async function runBulkDetection(options: { onlyMissingPlatformId = false, includeWebsiteCrawl = true, includeDutchieMissingPlatformId = true, + includeSlugChanges = true, + includeRecentlyAdded = true, + scope, limit, } = options; - console.log('[MenuDetection] Starting bulk detection...'); + const scopeDesc = scope?.states?.length + ? ` (states: ${scope.states.join(', ')})` + : scope?.storeIds?.length + ? ` (${scope.storeIds.length} specific stores)` + : state ? ` (state: ${state})` : ''; + + console.log(`[Henry - Entry Point Finder] Starting bulk detection${scopeDesc}...`); // Build query to find dispensaries needing detection // Includes: dispensaries with menu_url OR (no menu_url but has website and not already marked not_crawlable) - // Optionally includes dutchie stores missing platform ID + // Optionally includes dutchie stores missing platform ID, slug changes, and recently added stores let whereClause = `WHERE ( menu_url IS NOT NULL ${includeWebsiteCrawl ? `OR ( @@ -882,7 +910,14 @@ export async function runBulkDetection(options: { const params: any[] = []; let paramIndex = 1; - if (state) { + // Apply scope filtering (takes precedence over single state filter) + if (scope?.storeIds?.length) { + whereClause += ` AND id = ANY($${paramIndex++})`; + params.push(scope.storeIds); + } else if (scope?.states?.length) { + whereClause += ` AND state = ANY($${paramIndex++})`; + params.push(scope.states); + } else if (state) { whereClause += ` AND state = $${paramIndex++}`; params.push(state); } @@ -962,6 +997,19 @@ export async function runBulkDetection(options: { /** * Execute the menu detection job (called by scheduler) + * + * Worker: Henry (Entry Point Finder) + * Uses METHOD 1 (reactEnv extraction) as primary method per user requirements. + * + * Scope filtering: + * - config.scope.states: Array of state codes to limit detection (e.g., ["AZ", "CA"]) + * - config.scope.storeIds: Array of specific store IDs to process + * + * Processes: + * - Stores with unknown/missing menu_type + * - Stores with missing platform_dispensary_id + * - Stores with slug changes that need re-resolution (from Alice) + * - Recently added stores (discovered by Alice) */ export async function executeMenuDetectionJob(config: Record = {}): Promise<{ status: JobStatus; @@ -972,19 +1020,31 @@ export async function executeMenuDetectionJob(config: Record = {}): metadata?: any; }> { const state = config.state || 'AZ'; + const scope = config.scope as { states?: string[]; storeIds?: number[] } | undefined; const onlyUnknown = config.onlyUnknown !== false; // Default to true - always try to resolve platform IDs for dutchie stores const onlyMissingPlatformId = config.onlyMissingPlatformId !== false; const includeDutchieMissingPlatformId = config.includeDutchieMissingPlatformId !== false; + const includeSlugChanges = config.includeSlugChanges !== false; + const includeRecentlyAdded = config.includeRecentlyAdded !== false; - console.log(`[MenuDetection] Executing scheduled job for state=${state}...`); + const scopeDesc = scope?.states?.length + ? ` (states: ${scope.states.join(', ')})` + : scope?.storeIds?.length + ? ` (${scope.storeIds.length} specific stores)` + : ` (state: ${state})`; + + console.log(`[Henry - Entry Point Finder] Executing scheduled job${scopeDesc}...`); try { const result = await runBulkDetection({ - state, + state: scope ? undefined : state, // Use scope if provided, otherwise fall back to state + scope, onlyUnknown, onlyMissingPlatformId, includeDutchieMissingPlatformId, + includeSlugChanges, + includeRecentlyAdded, }); const status: JobStatus = @@ -998,9 +1058,11 @@ export async function executeMenuDetectionJob(config: Record = {}): itemsFailed: result.totalFailed, errorMessage: result.errors.length > 0 ? result.errors.slice(0, 5).join('; ') : undefined, metadata: { - state, + scope: scope || { states: [state] }, onlyUnknown, onlyMissingPlatformId, + includeSlugChanges, + includeRecentlyAdded, providerCounts: countByProvider(result.results), }, }; @@ -1011,6 +1073,7 @@ export async function executeMenuDetectionJob(config: Record = {}): itemsSucceeded: 0, itemsFailed: 0, errorMessage: error.message, + metadata: { scope: scope || { states: [state] } }, }; } } diff --git a/backend/src/dutchie-az/services/proxy-rotator.ts b/backend/src/dutchie-az/services/proxy-rotator.ts new file mode 100644 index 00000000..ffdf8538 --- /dev/null +++ b/backend/src/dutchie-az/services/proxy-rotator.ts @@ -0,0 +1,455 @@ +/** + * Proxy & User Agent Rotator + * + * Manages rotation of proxies and user agents to avoid blocks. + * Integrates with error taxonomy for intelligent rotation decisions. + * + * Phase 1: Crawler Reliability & Stabilization + */ + +import { Pool } from 'pg'; + +// ============================================================ +// USER AGENT CONFIGURATION +// ============================================================ + +/** + * Modern browser user agents (Chrome, Firefox, Safari, Edge on various platforms) + * Updated: 2024 + */ +export const USER_AGENTS = [ + // Chrome on Windows + 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36', + 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36', + 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36', + + // Chrome on macOS + 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36', + 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36', + + // Firefox on Windows + 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0', + 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:120.0) Gecko/20100101 Firefox/120.0', + + // Firefox on macOS + 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:121.0) Gecko/20100101 Firefox/121.0', + + // Safari on macOS + 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15', + 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.1 Safari/605.1.15', + + // Edge on Windows + 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.0.0', + + // Chrome on Linux + 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36', +]; + +// ============================================================ +// PROXY TYPES +// ============================================================ + +export interface Proxy { + id: number; + host: string; + port: number; + username?: string; + password?: string; + protocol: 'http' | 'https' | 'socks5'; + isActive: boolean; + lastUsedAt: Date | null; + failureCount: number; + successCount: number; + avgResponseTimeMs: number | null; +} + +export interface ProxyStats { + totalProxies: number; + activeProxies: number; + blockedProxies: number; + avgSuccessRate: number; +} + +// ============================================================ +// PROXY ROTATOR CLASS +// ============================================================ + +export class ProxyRotator { + private pool: Pool | null = null; + private proxies: Proxy[] = []; + private currentIndex: number = 0; + private lastRotation: Date = new Date(); + + constructor(pool?: Pool) { + this.pool = pool || null; + } + + /** + * Initialize with database pool + */ + setPool(pool: Pool): void { + this.pool = pool; + } + + /** + * Load proxies from database + */ + async loadProxies(): Promise { + if (!this.pool) { + console.warn('[ProxyRotator] No database pool configured'); + return; + } + + try { + const result = await this.pool.query(` + SELECT + id, + host, + port, + username, + password, + protocol, + is_active as "isActive", + last_used_at as "lastUsedAt", + failure_count as "failureCount", + success_count as "successCount", + avg_response_time_ms as "avgResponseTimeMs" + FROM proxies + WHERE is_active = true + ORDER BY failure_count ASC, last_used_at ASC NULLS FIRST + `); + + this.proxies = result.rows; + console.log(`[ProxyRotator] Loaded ${this.proxies.length} active proxies`); + } catch (error) { + // Table might not exist - that's okay + console.warn(`[ProxyRotator] Could not load proxies: ${error}`); + this.proxies = []; + } + } + + /** + * Get next proxy in rotation + */ + getNext(): Proxy | null { + if (this.proxies.length === 0) return null; + + // Round-robin rotation + this.currentIndex = (this.currentIndex + 1) % this.proxies.length; + this.lastRotation = new Date(); + + return this.proxies[this.currentIndex]; + } + + /** + * Get current proxy without rotating + */ + getCurrent(): Proxy | null { + if (this.proxies.length === 0) return null; + return this.proxies[this.currentIndex]; + } + + /** + * Get proxy by ID + */ + getById(id: number): Proxy | null { + return this.proxies.find(p => p.id === id) || null; + } + + /** + * Rotate to a specific proxy + */ + setProxy(id: number): boolean { + const index = this.proxies.findIndex(p => p.id === id); + if (index === -1) return false; + + this.currentIndex = index; + this.lastRotation = new Date(); + return true; + } + + /** + * Mark proxy as failed (temporarily remove from rotation) + */ + async markFailed(proxyId: number, error?: string): Promise { + // Update in-memory + const proxy = this.proxies.find(p => p.id === proxyId); + if (proxy) { + proxy.failureCount++; + + // Deactivate if too many failures + if (proxy.failureCount >= 5) { + proxy.isActive = false; + this.proxies = this.proxies.filter(p => p.id !== proxyId); + console.log(`[ProxyRotator] Proxy ${proxyId} deactivated after ${proxy.failureCount} failures`); + } + } + + // Update database + if (this.pool) { + try { + await this.pool.query(` + UPDATE proxies + SET + failure_count = failure_count + 1, + last_failure_at = NOW(), + last_error = $2, + is_active = CASE WHEN failure_count >= 4 THEN false ELSE is_active END + WHERE id = $1 + `, [proxyId, error || null]); + } catch (err) { + console.error(`[ProxyRotator] Failed to update proxy ${proxyId}:`, err); + } + } + } + + /** + * Mark proxy as successful + */ + async markSuccess(proxyId: number, responseTimeMs?: number): Promise { + // Update in-memory + const proxy = this.proxies.find(p => p.id === proxyId); + if (proxy) { + proxy.successCount++; + proxy.lastUsedAt = new Date(); + if (responseTimeMs !== undefined) { + // Rolling average + proxy.avgResponseTimeMs = proxy.avgResponseTimeMs + ? (proxy.avgResponseTimeMs * 0.8) + (responseTimeMs * 0.2) + : responseTimeMs; + } + } + + // Update database + if (this.pool) { + try { + await this.pool.query(` + UPDATE proxies + SET + success_count = success_count + 1, + last_used_at = NOW(), + avg_response_time_ms = CASE + WHEN avg_response_time_ms IS NULL THEN $2 + ELSE (avg_response_time_ms * 0.8) + ($2 * 0.2) + END + WHERE id = $1 + `, [proxyId, responseTimeMs || null]); + } catch (err) { + console.error(`[ProxyRotator] Failed to update proxy ${proxyId}:`, err); + } + } + } + + /** + * Get proxy URL for HTTP client + */ + getProxyUrl(proxy: Proxy): string { + const auth = proxy.username && proxy.password + ? `${proxy.username}:${proxy.password}@` + : ''; + return `${proxy.protocol}://${auth}${proxy.host}:${proxy.port}`; + } + + /** + * Get stats about proxy pool + */ + getStats(): ProxyStats { + const totalProxies = this.proxies.length; + const activeProxies = this.proxies.filter(p => p.isActive).length; + const blockedProxies = this.proxies.filter(p => p.failureCount >= 5).length; + + const successRates = this.proxies + .filter(p => p.successCount + p.failureCount > 0) + .map(p => p.successCount / (p.successCount + p.failureCount)); + + const avgSuccessRate = successRates.length > 0 + ? successRates.reduce((a, b) => a + b, 0) / successRates.length + : 0; + + return { + totalProxies, + activeProxies, + blockedProxies, + avgSuccessRate, + }; + } + + /** + * Check if proxy pool has available proxies + */ + hasAvailableProxies(): boolean { + return this.proxies.length > 0; + } +} + +// ============================================================ +// USER AGENT ROTATOR CLASS +// ============================================================ + +export class UserAgentRotator { + private userAgents: string[]; + private currentIndex: number = 0; + private lastRotation: Date = new Date(); + + constructor(userAgents: string[] = USER_AGENTS) { + this.userAgents = userAgents; + // Start at random index to avoid patterns + this.currentIndex = Math.floor(Math.random() * userAgents.length); + } + + /** + * Get next user agent in rotation + */ + getNext(): string { + this.currentIndex = (this.currentIndex + 1) % this.userAgents.length; + this.lastRotation = new Date(); + return this.userAgents[this.currentIndex]; + } + + /** + * Get current user agent without rotating + */ + getCurrent(): string { + return this.userAgents[this.currentIndex]; + } + + /** + * Get a random user agent + */ + getRandom(): string { + const index = Math.floor(Math.random() * this.userAgents.length); + return this.userAgents[index]; + } + + /** + * Get total available user agents + */ + getCount(): number { + return this.userAgents.length; + } +} + +// ============================================================ +// COMBINED ROTATOR (for convenience) +// ============================================================ + +export class CrawlRotator { + public proxy: ProxyRotator; + public userAgent: UserAgentRotator; + + constructor(pool?: Pool) { + this.proxy = new ProxyRotator(pool); + this.userAgent = new UserAgentRotator(); + } + + /** + * Initialize rotator (load proxies from DB) + */ + async initialize(): Promise { + await this.proxy.loadProxies(); + } + + /** + * Rotate proxy only + */ + rotateProxy(): Proxy | null { + return this.proxy.getNext(); + } + + /** + * Rotate user agent only + */ + rotateUserAgent(): string { + return this.userAgent.getNext(); + } + + /** + * Rotate both proxy and user agent + */ + rotateBoth(): { proxy: Proxy | null; userAgent: string } { + return { + proxy: this.proxy.getNext(), + userAgent: this.userAgent.getNext(), + }; + } + + /** + * Get current proxy and user agent without rotating + */ + getCurrent(): { proxy: Proxy | null; userAgent: string } { + return { + proxy: this.proxy.getCurrent(), + userAgent: this.userAgent.getCurrent(), + }; + } + + /** + * Record success for current proxy + */ + async recordSuccess(responseTimeMs?: number): Promise { + const current = this.proxy.getCurrent(); + if (current) { + await this.proxy.markSuccess(current.id, responseTimeMs); + } + } + + /** + * Record failure for current proxy + */ + async recordFailure(error?: string): Promise { + const current = this.proxy.getCurrent(); + if (current) { + await this.proxy.markFailed(current.id, error); + } + } +} + +// ============================================================ +// DATABASE OPERATIONS +// ============================================================ + +/** + * Update dispensary's current proxy and user agent + */ +export async function updateDispensaryRotation( + pool: Pool, + dispensaryId: number, + proxyId: number | null, + userAgent: string | null +): Promise { + await pool.query(` + UPDATE dispensaries + SET + current_proxy_id = $2, + current_user_agent = $3 + WHERE id = $1 + `, [dispensaryId, proxyId, userAgent]); +} + +/** + * Get dispensary's current proxy and user agent + */ +export async function getDispensaryRotation( + pool: Pool, + dispensaryId: number +): Promise<{ proxyId: number | null; userAgent: string | null }> { + const result = await pool.query(` + SELECT current_proxy_id as "proxyId", current_user_agent as "userAgent" + FROM dispensaries + WHERE id = $1 + `, [dispensaryId]); + + if (result.rows.length === 0) { + return { proxyId: null, userAgent: null }; + } + + return result.rows[0]; +} + +// ============================================================ +// SINGLETON INSTANCES +// ============================================================ + +export const proxyRotator = new ProxyRotator(); +export const userAgentRotator = new UserAgentRotator(); +export const crawlRotator = new CrawlRotator(); diff --git a/backend/src/dutchie-az/services/retry-manager.ts b/backend/src/dutchie-az/services/retry-manager.ts new file mode 100644 index 00000000..95bad61b --- /dev/null +++ b/backend/src/dutchie-az/services/retry-manager.ts @@ -0,0 +1,435 @@ +/** + * Unified Retry Manager + * + * Handles retry logic with exponential backoff, jitter, and + * intelligent error-based decisions (rotate proxy, rotate UA, etc.) + * + * Phase 1: Crawler Reliability & Stabilization + */ + +import { + CrawlErrorCodeType, + CrawlErrorCode, + classifyError, + getErrorMetadata, + isRetryable, + shouldRotateProxy, + shouldRotateUserAgent, + getBackoffMultiplier, +} from './error-taxonomy'; +import { DEFAULT_CONFIG } from './store-validator'; + +// ============================================================ +// RETRY CONFIGURATION +// ============================================================ + +export interface RetryConfig { + maxRetries: number; + baseBackoffMs: number; + maxBackoffMs: number; + backoffMultiplier: number; + jitterFactor: number; // 0.0 - 1.0 (percentage of backoff to randomize) +} + +export const DEFAULT_RETRY_CONFIG: RetryConfig = { + maxRetries: DEFAULT_CONFIG.maxRetries, + baseBackoffMs: DEFAULT_CONFIG.baseBackoffMs, + maxBackoffMs: DEFAULT_CONFIG.maxBackoffMs, + backoffMultiplier: DEFAULT_CONFIG.backoffMultiplier, + jitterFactor: 0.25, // +/- 25% jitter +}; + +// ============================================================ +// RETRY CONTEXT +// ============================================================ + +/** + * Context for tracking retry state across attempts + */ +export interface RetryContext { + attemptNumber: number; + maxAttempts: number; + lastErrorCode: CrawlErrorCodeType | null; + lastHttpStatus: number | null; + totalBackoffMs: number; + proxyRotated: boolean; + userAgentRotated: boolean; + startedAt: Date; +} + +/** + * Decision about what to do after an error + */ +export interface RetryDecision { + shouldRetry: boolean; + reason: string; + backoffMs: number; + rotateProxy: boolean; + rotateUserAgent: boolean; + errorCode: CrawlErrorCodeType; + attemptNumber: number; +} + +// ============================================================ +// RETRY MANAGER CLASS +// ============================================================ + +export class RetryManager { + private config: RetryConfig; + private context: RetryContext; + + constructor(config: Partial = {}) { + this.config = { ...DEFAULT_RETRY_CONFIG, ...config }; + this.context = this.createInitialContext(); + } + + /** + * Create initial retry context + */ + private createInitialContext(): RetryContext { + return { + attemptNumber: 0, + maxAttempts: this.config.maxRetries + 1, // +1 for initial attempt + lastErrorCode: null, + lastHttpStatus: null, + totalBackoffMs: 0, + proxyRotated: false, + userAgentRotated: false, + startedAt: new Date(), + }; + } + + /** + * Reset retry state for a new operation + */ + reset(): void { + this.context = this.createInitialContext(); + } + + /** + * Get current attempt number (1-based) + */ + getAttemptNumber(): number { + return this.context.attemptNumber + 1; + } + + /** + * Check if we should attempt (call before each attempt) + */ + shouldAttempt(): boolean { + return this.context.attemptNumber < this.context.maxAttempts; + } + + /** + * Record an attempt (call at start of each attempt) + */ + recordAttempt(): void { + this.context.attemptNumber++; + } + + /** + * Evaluate an error and decide what to do + */ + evaluateError( + error: Error | string | null, + httpStatus?: number + ): RetryDecision { + const errorCode = classifyError(error, httpStatus); + const metadata = getErrorMetadata(errorCode); + const attemptNumber = this.context.attemptNumber; + + // Update context + this.context.lastErrorCode = errorCode; + this.context.lastHttpStatus = httpStatus || null; + + // Check if error is retryable + if (!isRetryable(errorCode)) { + return { + shouldRetry: false, + reason: `Error ${errorCode} is not retryable: ${metadata.description}`, + backoffMs: 0, + rotateProxy: false, + rotateUserAgent: false, + errorCode, + attemptNumber, + }; + } + + // Check if we've exhausted retries + if (!this.shouldAttempt()) { + return { + shouldRetry: false, + reason: `Max retries (${this.config.maxRetries}) exhausted`, + backoffMs: 0, + rotateProxy: false, + rotateUserAgent: false, + errorCode, + attemptNumber, + }; + } + + // Calculate backoff with exponential increase and jitter + const baseBackoff = this.calculateBackoff(attemptNumber, errorCode); + const backoffWithJitter = this.addJitter(baseBackoff); + + // Track total backoff + this.context.totalBackoffMs += backoffWithJitter; + + // Determine rotation needs + const rotateProxy = shouldRotateProxy(errorCode); + const rotateUserAgent = shouldRotateUserAgent(errorCode); + + if (rotateProxy) this.context.proxyRotated = true; + if (rotateUserAgent) this.context.userAgentRotated = true; + + const rotationInfo = []; + if (rotateProxy) rotationInfo.push('rotate proxy'); + if (rotateUserAgent) rotationInfo.push('rotate UA'); + const rotationStr = rotationInfo.length > 0 ? ` (${rotationInfo.join(', ')})` : ''; + + return { + shouldRetry: true, + reason: `Retrying after ${errorCode}${rotationStr}, backoff ${backoffWithJitter}ms`, + backoffMs: backoffWithJitter, + rotateProxy, + rotateUserAgent, + errorCode, + attemptNumber, + }; + } + + /** + * Calculate exponential backoff for an attempt + */ + private calculateBackoff(attemptNumber: number, errorCode: CrawlErrorCodeType): number { + // Base exponential: baseBackoff * multiplier^(attempt-1) + const exponential = this.config.baseBackoffMs * + Math.pow(this.config.backoffMultiplier, attemptNumber - 1); + + // Apply error-specific multiplier + const errorMultiplier = getBackoffMultiplier(errorCode); + const adjusted = exponential * errorMultiplier; + + // Cap at max backoff + return Math.min(adjusted, this.config.maxBackoffMs); + } + + /** + * Add jitter to backoff to prevent thundering herd + */ + private addJitter(backoffMs: number): number { + const jitterRange = backoffMs * this.config.jitterFactor; + // Random between -jitterRange and +jitterRange + const jitter = (Math.random() * 2 - 1) * jitterRange; + return Math.max(0, Math.round(backoffMs + jitter)); + } + + /** + * Get retry context summary + */ + getSummary(): RetryContextSummary { + const elapsedMs = Date.now() - this.context.startedAt.getTime(); + return { + attemptsMade: this.context.attemptNumber, + maxAttempts: this.context.maxAttempts, + lastErrorCode: this.context.lastErrorCode, + lastHttpStatus: this.context.lastHttpStatus, + totalBackoffMs: this.context.totalBackoffMs, + totalElapsedMs: elapsedMs, + proxyWasRotated: this.context.proxyRotated, + userAgentWasRotated: this.context.userAgentRotated, + }; + } +} + +export interface RetryContextSummary { + attemptsMade: number; + maxAttempts: number; + lastErrorCode: CrawlErrorCodeType | null; + lastHttpStatus: number | null; + totalBackoffMs: number; + totalElapsedMs: number; + proxyWasRotated: boolean; + userAgentWasRotated: boolean; +} + +// ============================================================ +// CONVENIENCE FUNCTIONS +// ============================================================ + +/** + * Sleep for specified milliseconds + */ +export function sleep(ms: number): Promise { + return new Promise(resolve => setTimeout(resolve, ms)); +} + +/** + * Execute a function with automatic retry logic + */ +export async function withRetry( + fn: (attemptNumber: number) => Promise, + config: Partial = {}, + callbacks?: { + onRetry?: (decision: RetryDecision) => void | Promise; + onRotateProxy?: () => void | Promise; + onRotateUserAgent?: () => void | Promise; + } +): Promise<{ result: T; summary: RetryContextSummary }> { + const manager = new RetryManager(config); + + while (manager.shouldAttempt()) { + manager.recordAttempt(); + const attemptNumber = manager.getAttemptNumber(); + + try { + const result = await fn(attemptNumber); + return { result, summary: manager.getSummary() }; + } catch (error) { + const err = error instanceof Error ? error : new Error(String(error)); + const httpStatus = (error as any)?.status || (error as any)?.statusCode; + + const decision = manager.evaluateError(err, httpStatus); + + if (!decision.shouldRetry) { + // Re-throw with enhanced context + const enhancedError = new RetryExhaustedError( + `${err.message} (${decision.reason})`, + err, + manager.getSummary() + ); + throw enhancedError; + } + + // Notify callbacks + if (callbacks?.onRetry) { + await callbacks.onRetry(decision); + } + if (decision.rotateProxy && callbacks?.onRotateProxy) { + await callbacks.onRotateProxy(); + } + if (decision.rotateUserAgent && callbacks?.onRotateUserAgent) { + await callbacks.onRotateUserAgent(); + } + + // Log retry decision + console.log( + `[RetryManager] Attempt ${attemptNumber} failed: ${decision.errorCode}. ` + + `${decision.reason}. Waiting ${decision.backoffMs}ms before retry.` + ); + + // Wait before retry + await sleep(decision.backoffMs); + } + } + + // Should not reach here, but handle edge case + throw new RetryExhaustedError( + 'Max retries exhausted', + null, + manager.getSummary() + ); +} + +// ============================================================ +// CUSTOM ERROR CLASS +// ============================================================ + +export class RetryExhaustedError extends Error { + public readonly originalError: Error | null; + public readonly summary: RetryContextSummary; + public readonly errorCode: CrawlErrorCodeType; + + constructor( + message: string, + originalError: Error | null, + summary: RetryContextSummary + ) { + super(message); + this.name = 'RetryExhaustedError'; + this.originalError = originalError; + this.summary = summary; + this.errorCode = summary.lastErrorCode || CrawlErrorCode.UNKNOWN_ERROR; + } +} + +// ============================================================ +// BACKOFF CALCULATOR (for external use) +// ============================================================ + +/** + * Calculate next crawl time based on consecutive failures + */ +export function calculateNextCrawlDelay( + consecutiveFailures: number, + baseFrequencyMinutes: number, + maxBackoffMultiplier: number = 4.0 +): number { + // Each failure doubles the delay, up to max multiplier + const multiplier = Math.min( + Math.pow(2, consecutiveFailures), + maxBackoffMultiplier + ); + + const delayMinutes = baseFrequencyMinutes * multiplier; + + // Add jitter (0-10% of delay) + const jitterMinutes = delayMinutes * Math.random() * 0.1; + + return Math.round(delayMinutes + jitterMinutes); +} + +/** + * Calculate next crawl timestamp + */ +export function calculateNextCrawlAt( + consecutiveFailures: number, + baseFrequencyMinutes: number +): Date { + const delayMinutes = calculateNextCrawlDelay(consecutiveFailures, baseFrequencyMinutes); + return new Date(Date.now() + delayMinutes * 60 * 1000); +} + +// ============================================================ +// STATUS DETERMINATION +// ============================================================ + +/** + * Determine crawl status based on failure count + */ +export function determineCrawlStatus( + consecutiveFailures: number, + thresholds: { degraded: number; failed: number } = { degraded: 3, failed: 10 } +): 'active' | 'degraded' | 'failed' { + if (consecutiveFailures >= thresholds.failed) { + return 'failed'; + } + if (consecutiveFailures >= thresholds.degraded) { + return 'degraded'; + } + return 'active'; +} + +/** + * Determine if store should be auto-recovered + * (Called periodically to check if failed stores can be retried) + */ +export function shouldAttemptRecovery( + lastFailureAt: Date | null, + consecutiveFailures: number, + recoveryIntervalHours: number = 24 +): boolean { + if (!lastFailureAt) return true; + + // Wait longer for more failures + const waitHours = recoveryIntervalHours * Math.min(consecutiveFailures, 5); + const recoveryTime = new Date(lastFailureAt.getTime() + waitHours * 60 * 60 * 1000); + + return new Date() >= recoveryTime; +} + +// ============================================================ +// SINGLETON INSTANCE +// ============================================================ + +export const retryManager = new RetryManager(); diff --git a/backend/src/dutchie-az/services/scheduler.ts b/backend/src/dutchie-az/services/scheduler.ts index 84a93e58..25ce5f42 100644 --- a/backend/src/dutchie-az/services/scheduler.ts +++ b/backend/src/dutchie-az/services/scheduler.ts @@ -11,12 +11,14 @@ * Example: 4-hour base with ±30min jitter = runs anywhere from 3h30m to 4h30m apart */ -import { query, getClient } from '../db/connection'; +import { query, getClient, getPool } from '../db/connection'; import { crawlDispensaryProducts, CrawlResult } from './product-crawler'; import { mapDbRowToDispensary } from './discovery'; import { executeMenuDetectionJob } from './menu-detection'; import { bulkEnqueueJobs, enqueueJob, getQueueStats } from './job-queue'; import { JobSchedule, JobStatus, Dispensary } from '../types'; +import { DtLocationDiscoveryService } from '../discovery/DtLocationDiscoveryService'; +import { StateQueryService } from '../../multi-state/state-query-service'; // Scheduler poll interval (how often we check for due jobs) const SCHEDULER_POLL_INTERVAL_MS = 60 * 1000; // 1 minute @@ -65,6 +67,7 @@ export async function getAllSchedules(): Promise { SELECT id, job_name, description, enabled, base_interval_minutes, jitter_minutes, + worker_name, worker_role, last_run_at, last_status, last_error_message, last_duration_ms, next_run_at, job_config, created_at, updated_at FROM job_schedules @@ -78,6 +81,8 @@ export async function getAllSchedules(): Promise { enabled: row.enabled, baseIntervalMinutes: row.base_interval_minutes, jitterMinutes: row.jitter_minutes, + workerName: row.worker_name, + workerRole: row.worker_role, lastRunAt: row.last_run_at, lastStatus: row.last_status, lastErrorMessage: row.last_error_message, @@ -108,6 +113,8 @@ export async function getScheduleById(id: number): Promise { enabled: row.enabled, baseIntervalMinutes: row.base_interval_minutes, jitterMinutes: row.jitter_minutes, + workerName: row.worker_name, + workerRole: row.worker_role, lastRunAt: row.last_run_at, lastStatus: row.last_status, lastErrorMessage: row.last_error_message, @@ -128,6 +135,8 @@ export async function createSchedule(schedule: { enabled?: boolean; baseIntervalMinutes: number; jitterMinutes: number; + workerName?: string; + workerRole?: string; jobConfig?: Record; startImmediately?: boolean; }): Promise { @@ -141,8 +150,9 @@ export async function createSchedule(schedule: { INSERT INTO job_schedules ( job_name, description, enabled, base_interval_minutes, jitter_minutes, + worker_name, worker_role, next_run_at, job_config - ) VALUES ($1, $2, $3, $4, $5, $6, $7) + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9) RETURNING * `, [ @@ -151,13 +161,16 @@ export async function createSchedule(schedule: { schedule.enabled ?? true, schedule.baseIntervalMinutes, schedule.jitterMinutes, + schedule.workerName || null, + schedule.workerRole || null, nextRunAt, schedule.jobConfig ? JSON.stringify(schedule.jobConfig) : null, ] ); const row = rows[0]; - console.log(`[Scheduler] Created schedule "${schedule.jobName}" - next run at ${nextRunAt.toISOString()}`); + const workerInfo = schedule.workerName ? ` (Worker: ${schedule.workerName})` : ''; + console.log(`[Scheduler] Created schedule "${schedule.jobName}"${workerInfo} - next run at ${nextRunAt.toISOString()}`); return { id: row.id, @@ -166,6 +179,8 @@ export async function createSchedule(schedule: { enabled: row.enabled, baseIntervalMinutes: row.base_interval_minutes, jitterMinutes: row.jitter_minutes, + workerName: row.worker_name, + workerRole: row.worker_role, lastRunAt: row.last_run_at, lastStatus: row.last_status, lastErrorMessage: row.last_error_message, @@ -304,20 +319,22 @@ async function updateScheduleAfterRun( } /** - * Create a job run log entry + * Create a job run log entry with worker metadata propagated from schedule */ async function createRunLog( scheduleId: number, jobName: string, - status: 'pending' | 'running' + status: 'pending' | 'running', + workerName?: string, + workerRole?: string ): Promise { const { rows } = await query<{ id: number }>( ` - INSERT INTO job_run_logs (schedule_id, job_name, status, started_at) - VALUES ($1, $2, $3, NOW()) + INSERT INTO job_run_logs (schedule_id, job_name, status, worker_name, run_role, started_at) + VALUES ($1, $2, $3, $4, $5, NOW()) RETURNING id `, - [scheduleId, jobName, status] + [scheduleId, jobName, status, workerName || null, workerRole || null] ); return rows[0].id; } @@ -434,22 +451,31 @@ async function executeJob(schedule: JobSchedule): Promise<{ return executeDiscovery(config); case 'dutchie_az_menu_detection': return executeMenuDetectionJob(config); + case 'dutchie_store_discovery': + return executeStoreDiscovery(config); + case 'analytics_refresh': + return executeAnalyticsRefresh(config); default: throw new Error(`Unknown job type: ${schedule.jobName}`); } } /** - * Execute the AZ Dutchie product crawl job + * Execute the AZ Dutchie product crawl job (Worker: Bella) * * NEW BEHAVIOR: Instead of running crawls directly, this now ENQUEUES jobs * into the crawl_jobs queue. Workers (running as separate replicas) will * pick up and process these jobs. * + * Scope filtering: + * - config.scope.states: Array of state codes to limit crawl (e.g., ["AZ", "CA"]) + * - config.scope.storeIds: Array of specific store IDs to crawl + * * This allows: * - Multiple workers to process jobs in parallel * - No double-crawls (DB-level locking per dispensary) * - Better scalability (add more worker replicas) + * - Sharding by state or store for parallel execution * - Live monitoring of individual job progress */ async function executeProductCrawl(config: Record): Promise<{ @@ -462,18 +488,45 @@ async function executeProductCrawl(config: Record): Promise<{ }> { const pricingType = config.pricingType || 'rec'; const useBothModes = config.useBothModes !== false; + const scope = config.scope as { states?: string[]; storeIds?: number[] } | undefined; - // Get all "ready" dispensaries (menu_type='dutchie' AND platform_dispensary_id IS NOT NULL AND not failed) - // Note: Menu detection is handled separately by the dutchie_az_menu_detection schedule + const scopeDesc = scope?.states?.length + ? ` (states: ${scope.states.join(', ')})` + : scope?.storeIds?.length + ? ` (${scope.storeIds.length} specific stores)` + : ' (all AZ stores)'; + + console.log(`[Bella - Product Sync] Starting product crawl job${scopeDesc}...`); + + // Build query based on scope + let whereClause = ` + WHERE menu_type = 'dutchie' + AND platform_dispensary_id IS NOT NULL + AND failed_at IS NULL + `; + const params: any[] = []; + let paramIndex = 1; + + // Apply scope filtering + if (scope?.storeIds?.length) { + whereClause += ` AND id = ANY($${paramIndex++})`; + params.push(scope.storeIds); + } else if (scope?.states?.length) { + whereClause += ` AND state = ANY($${paramIndex++})`; + params.push(scope.states); + } else { + // Default to AZ if no scope specified + whereClause += ` AND state = 'AZ'`; + } + + // Get all "ready" dispensaries matching scope const { rows: rawRows } = await query( ` SELECT id FROM dispensaries - WHERE state = 'AZ' - AND menu_type = 'dutchie' - AND platform_dispensary_id IS NOT NULL - AND failed_at IS NULL + ${whereClause} ORDER BY last_crawl_at ASC NULLS FIRST - ` + `, + params ); const dispensaryIds = rawRows.map((r: any) => r.id); @@ -483,11 +536,14 @@ async function executeProductCrawl(config: Record): Promise<{ itemsProcessed: 0, itemsSucceeded: 0, itemsFailed: 0, - metadata: { message: 'No ready dispensaries to crawl. Run menu detection to discover more.' }, + metadata: { + message: 'No ready dispensaries to crawl. Run menu detection to discover more.', + scope: scope || 'all', + }, }; } - console.log(`[Scheduler] Enqueueing crawl jobs for ${dispensaryIds.length} dispensaries...`); + console.log(`[Bella - Product Sync] Enqueueing crawl jobs for ${dispensaryIds.length} dispensaries...`); // Bulk enqueue jobs (skips dispensaries that already have pending/running jobs) const { enqueued, skipped } = await bulkEnqueueJobs( @@ -499,7 +555,7 @@ async function executeProductCrawl(config: Record): Promise<{ } ); - console.log(`[Scheduler] Enqueued ${enqueued} jobs, skipped ${skipped} (already queued)`); + console.log(`[Bella - Product Sync] Enqueued ${enqueued} jobs, skipped ${skipped} (already queued)`); // Get current queue stats const queueStats = await getQueueStats(); @@ -515,6 +571,7 @@ async function executeProductCrawl(config: Record): Promise<{ queueStats, pricingType, useBothModes, + scope: scope || 'all', message: `Enqueued ${enqueued} jobs. Workers will process them. Check /scraper-monitor for progress.`, }, }; @@ -541,6 +598,181 @@ async function executeDiscovery(_config: Record): Promise<{ }; } +/** + * Execute the Store Discovery job (Worker: Alice) + * + * Full discovery workflow: + * 1. Fetch master cities page from https://dutchie.com/cities + * 2. Upsert discovered states/cities into dutchie_discovery_cities + * 3. Crawl each city page to discover all stores + * 4. Detect new stores, slug changes, and removed stores + * 5. Mark retired stores (never delete) + * + * Scope filtering: + * - config.scope.states: Array of state codes to limit discovery (e.g., ["AZ", "CA"]) + * - config.scope.storeIds: Array of specific store IDs to process + */ +async function executeStoreDiscovery(config: Record): Promise<{ + status: JobStatus; + itemsProcessed: number; + itemsSucceeded: number; + itemsFailed: number; + errorMessage?: string; + metadata?: any; +}> { + const delayMs = config.delayMs || 2000; // Delay between cities + const scope = config.scope as { states?: string[]; storeIds?: number[] } | undefined; + + const scopeDesc = scope?.states?.length + ? ` (states: ${scope.states.join(', ')})` + : scope?.storeIds?.length + ? ` (${scope.storeIds.length} specific stores)` + : ' (all states)'; + + console.log(`[Alice - Store Discovery] Starting store discovery job${scopeDesc}...`); + + try { + const pool = getPool(); + const discoveryService = new DtLocationDiscoveryService(pool); + + // Get stats before + const statsBefore = await discoveryService.getStats(); + console.log(`[Alice - Store Discovery] Current stats: ${statsBefore.total} total locations, ${statsBefore.withCoordinates} with coordinates`); + + // Run full discovery with change detection + const result = await discoveryService.runFullDiscoveryWithChangeDetection({ + scope, + delayMs, + }); + + console.log(`[Alice - Store Discovery] Completed: ${result.statesDiscovered} states, ${result.citiesDiscovered} cities`); + console.log(`[Alice - Store Discovery] Stores found: ${result.totalLocationsFound} total`); + console.log(`[Alice - Store Discovery] Changes: +${result.newStoreCount} new, ~${result.updatedStoreCount} updated, =${result.slugChangedCount} slug changes, -${result.removedStoreCount} retired`); + + const totalChanges = result.newStoreCount + result.updatedStoreCount + result.slugChangedCount; + + return { + status: result.errors.length > 0 ? 'partial' : 'success', + itemsProcessed: result.totalLocationsFound, + itemsSucceeded: totalChanges, + itemsFailed: result.errors.length, + errorMessage: result.errors.length > 0 ? result.errors.slice(0, 5).join('; ') : undefined, + metadata: { + statesDiscovered: result.statesDiscovered, + citiesDiscovered: result.citiesDiscovered, + totalLocationsFound: result.totalLocationsFound, + newStoreCount: result.newStoreCount, + updatedStoreCount: result.updatedStoreCount, + slugChangedCount: result.slugChangedCount, + removedStoreCount: result.removedStoreCount, + durationMs: result.durationMs, + errorCount: result.errors.length, + scope: scope || 'all', + statsBefore: { + total: statsBefore.total, + withCoordinates: statsBefore.withCoordinates, + }, + }, + }; + } catch (error: any) { + console.error('[Alice - Store Discovery] Job failed:', error.message); + return { + status: 'error', + itemsProcessed: 0, + itemsSucceeded: 0, + itemsFailed: 1, + errorMessage: error.message, + metadata: { error: error.message, scope: scope || 'all' }, + }; + } +} + +/** + * Execute the Analytics Refresh job (Worker: Oscar) + * + * Refreshes materialized views and analytics data. + * Uses StateQueryService to refresh mv_state_metrics and other views. + */ +async function executeAnalyticsRefresh(config: Record): Promise<{ + status: JobStatus; + itemsProcessed: number; + itemsSucceeded: number; + itemsFailed: number; + errorMessage?: string; + metadata?: any; +}> { + console.log('[Oscar - Analytics Refresh] Starting analytics refresh job...'); + + const startTime = Date.now(); + const refreshedViews: string[] = []; + const errors: string[] = []; + + try { + const pool = getPool(); + const stateService = new StateQueryService(pool); + + // Refresh state metrics materialized view + console.log('[Oscar - Analytics Refresh] Refreshing mv_state_metrics...'); + try { + await stateService.refreshMetrics(); + refreshedViews.push('mv_state_metrics'); + console.log('[Oscar - Analytics Refresh] mv_state_metrics refreshed successfully'); + } catch (error: any) { + console.error('[Oscar - Analytics Refresh] Failed to refresh mv_state_metrics:', error.message); + errors.push(`mv_state_metrics: ${error.message}`); + } + + // Refresh other analytics views if configured + if (config.refreshBrandViews !== false) { + console.log('[Oscar - Analytics Refresh] Refreshing brand analytics views...'); + try { + // Check if v_brand_state_presence exists and refresh if needed + await pool.query(` + SELECT 1 FROM pg_matviews WHERE matviewname = 'v_brand_state_presence' LIMIT 1 + `).then(async (result) => { + if (result.rows.length > 0) { + await pool.query('REFRESH MATERIALIZED VIEW CONCURRENTLY v_brand_state_presence'); + refreshedViews.push('v_brand_state_presence'); + console.log('[Oscar - Analytics Refresh] v_brand_state_presence refreshed'); + } + }).catch(() => { + // View doesn't exist, skip + }); + } catch (error: any) { + errors.push(`v_brand_state_presence: ${error.message}`); + } + } + + const durationMs = Date.now() - startTime; + + console.log(`[Oscar - Analytics Refresh] Completed: ${refreshedViews.length} views refreshed in ${Math.round(durationMs / 1000)}s`); + + return { + status: errors.length > 0 ? (refreshedViews.length > 0 ? 'partial' : 'error') : 'success', + itemsProcessed: refreshedViews.length + errors.length, + itemsSucceeded: refreshedViews.length, + itemsFailed: errors.length, + errorMessage: errors.length > 0 ? errors.join('; ') : undefined, + metadata: { + refreshedViews, + errorCount: errors.length, + errors: errors.length > 0 ? errors : undefined, + durationMs, + }, + }; + } catch (error: any) { + console.error('[Oscar - Analytics Refresh] Job failed:', error.message); + return { + status: 'error', + itemsProcessed: 0, + itemsSucceeded: 0, + itemsFailed: 1, + errorMessage: error.message, + metadata: { error: error.message }, + }; + } +} + // ============================================================ // SCHEDULER RUNNER // ============================================================ @@ -596,14 +828,21 @@ async function checkAndRunDueJobs(): Promise { */ async function runScheduledJob(schedule: JobSchedule): Promise { const startTime = Date.now(); + const workerInfo = schedule.workerName ? ` [Worker: ${schedule.workerName}]` : ''; - console.log(`[Scheduler] Starting job "${schedule.jobName}"...`); + console.log(`[Scheduler]${workerInfo} Starting job "${schedule.jobName}"...`); // Mark as running await markScheduleRunning(schedule.id); - // Create run log entry - const runLogId = await createRunLog(schedule.id, schedule.jobName, 'running'); + // Create run log entry with worker metadata propagated from schedule + const runLogId = await createRunLog( + schedule.id, + schedule.jobName, + 'running', + schedule.workerName, + schedule.workerRole + ); try { // Execute the job @@ -735,11 +974,17 @@ export async function triggerScheduleNow(scheduleId: number): Promise<{ /** * Initialize default schedules if they don't exist + * + * Named Workers: + * - Bella: GraphQL Product Sync (crawls products from Dutchie) - 4hr + * - Henry: Entry Point Finder (detects menu providers and resolves platform IDs) - 24hr + * - Alice: Store Discovery (discovers new locations from city pages) - 24hr + * - Oscar: Analytics Refresh (refreshes materialized views) - 1hr */ export async function initializeDefaultSchedules(): Promise { const schedules = await getAllSchedules(); - // Check if product crawl schedule exists + // Check if product crawl schedule exists (Worker: Bella) const productCrawlExists = schedules.some(s => s.jobName === 'dutchie_az_product_crawl'); if (!productCrawlExists) { await createSchedule({ @@ -748,13 +993,15 @@ export async function initializeDefaultSchedules(): Promise { enabled: true, baseIntervalMinutes: 240, // 4 hours jitterMinutes: 30, // ±30 minutes + workerName: 'Bella', + workerRole: 'GraphQL Product Sync', jobConfig: { pricingType: 'rec', useBothModes: true }, startImmediately: false, }); - console.log('[Scheduler] Created default product crawl schedule'); + console.log('[Scheduler] Created default product crawl schedule (Worker: Bella)'); } - // Check if menu detection schedule exists + // Check if menu detection schedule exists (Worker: Henry) const menuDetectionExists = schedules.some(s => s.jobName === 'dutchie_az_menu_detection'); if (!menuDetectionExists) { await createSchedule({ @@ -763,10 +1010,46 @@ export async function initializeDefaultSchedules(): Promise { enabled: true, baseIntervalMinutes: 1440, // 24 hours jitterMinutes: 60, // ±1 hour + workerName: 'Henry', + workerRole: 'Entry Point Finder', jobConfig: { state: 'AZ', onlyUnknown: true }, startImmediately: false, }); - console.log('[Scheduler] Created default menu detection schedule'); + console.log('[Scheduler] Created default menu detection schedule (Worker: Henry)'); + } + + // Check if store discovery schedule exists (Worker: Alice) + const storeDiscoveryExists = schedules.some(s => s.jobName === 'dutchie_store_discovery'); + if (!storeDiscoveryExists) { + await createSchedule({ + jobName: 'dutchie_store_discovery', + description: 'Discover new Dutchie dispensary locations from city pages', + enabled: true, + baseIntervalMinutes: 1440, // 24 hours + jitterMinutes: 120, // ±2 hours + workerName: 'Alice', + workerRole: 'Store Discovery', + jobConfig: { delayMs: 2000 }, + startImmediately: false, + }); + console.log('[Scheduler] Created default store discovery schedule (Worker: Alice)'); + } + + // Check if analytics refresh schedule exists (Worker: Oscar) + const analyticsRefreshExists = schedules.some(s => s.jobName === 'analytics_refresh'); + if (!analyticsRefreshExists) { + await createSchedule({ + jobName: 'analytics_refresh', + description: 'Refresh analytics materialized views (mv_state_metrics, etc.)', + enabled: true, + baseIntervalMinutes: 60, // 1 hour + jitterMinutes: 10, // ±10 minutes + workerName: 'Oscar', + workerRole: 'Analytics Refresh', + jobConfig: { refreshBrandViews: true }, + startImmediately: false, + }); + console.log('[Scheduler] Created default analytics refresh schedule (Worker: Oscar)'); } } diff --git a/backend/src/dutchie-az/services/store-validator.ts b/backend/src/dutchie-az/services/store-validator.ts new file mode 100644 index 00000000..e3dbd878 --- /dev/null +++ b/backend/src/dutchie-az/services/store-validator.ts @@ -0,0 +1,465 @@ +/** + * Store Configuration Validator + * + * Validates and sanitizes store configurations before crawling. + * Applies defaults for missing values and logs warnings. + * + * Phase 1: Crawler Reliability & Stabilization + */ + +import { CrawlErrorCode, CrawlErrorCodeType } from './error-taxonomy'; + +// ============================================================ +// DEFAULT CONFIGURATION +// ============================================================ + +/** + * Default crawl configuration values + */ +export const DEFAULT_CONFIG = { + // Scheduling + crawlFrequencyMinutes: 240, // 4 hours + minCrawlGapMinutes: 2, // Minimum 2 minutes between crawls + + // Retries + maxRetries: 3, + baseBackoffMs: 1000, // 1 second + maxBackoffMs: 60000, // 1 minute + backoffMultiplier: 2.0, // Exponential backoff + + // Timeouts + requestTimeoutMs: 30000, // 30 seconds + pageLoadTimeoutMs: 60000, // 60 seconds + + // Limits + maxProductsPerPage: 100, + maxPages: 50, + + // Proxy + proxyRotationEnabled: true, + proxyRotationOnFailure: true, + + // User Agent + userAgentRotationEnabled: true, + userAgentRotationOnFailure: true, +} as const; + +// ============================================================ +// STORE CONFIG INTERFACE +// ============================================================ + +/** + * Raw store configuration from database + */ +export interface RawStoreConfig { + id: number; + name: string; + slug?: string; + platform?: string; + menuType?: string; + platformDispensaryId?: string; + menuUrl?: string; + website?: string; + + // Crawl config + crawlFrequencyMinutes?: number; + maxRetries?: number; + currentProxyId?: number; + currentUserAgent?: string; + + // Status + crawlStatus?: string; + consecutiveFailures?: number; + backoffMultiplier?: number; + lastCrawlAt?: Date; + lastSuccessAt?: Date; + lastFailureAt?: Date; + lastErrorCode?: string; + nextCrawlAt?: Date; +} + +/** + * Validated and sanitized store configuration + */ +export interface ValidatedStoreConfig { + id: number; + name: string; + slug: string; + platform: string; + menuType: string; + platformDispensaryId: string; + menuUrl: string; + + // Crawl config (with defaults applied) + crawlFrequencyMinutes: number; + maxRetries: number; + currentProxyId: number | null; + currentUserAgent: string | null; + + // Status + crawlStatus: 'active' | 'degraded' | 'paused' | 'failed'; + consecutiveFailures: number; + backoffMultiplier: number; + lastCrawlAt: Date | null; + lastSuccessAt: Date | null; + lastFailureAt: Date | null; + lastErrorCode: CrawlErrorCodeType | null; + nextCrawlAt: Date | null; + + // Validation metadata + isValid: boolean; + validationErrors: ValidationError[]; + validationWarnings: ValidationWarning[]; +} + +// ============================================================ +// VALIDATION TYPES +// ============================================================ + +export interface ValidationError { + field: string; + message: string; + code: CrawlErrorCodeType; +} + +export interface ValidationWarning { + field: string; + message: string; + appliedDefault?: any; +} + +export interface ValidationResult { + isValid: boolean; + config: ValidatedStoreConfig | null; + errors: ValidationError[]; + warnings: ValidationWarning[]; +} + +// ============================================================ +// VALIDATOR CLASS +// ============================================================ + +export class StoreValidator { + private errors: ValidationError[] = []; + private warnings: ValidationWarning[] = []; + + /** + * Validate and sanitize a store configuration + */ + validate(raw: RawStoreConfig): ValidationResult { + this.errors = []; + this.warnings = []; + + // Required field validation + this.validateRequired(raw); + + // If critical errors, return early + if (this.errors.length > 0) { + return { + isValid: false, + config: null, + errors: this.errors, + warnings: this.warnings, + }; + } + + // Build validated config with defaults + const config = this.buildValidatedConfig(raw); + + return { + isValid: this.errors.length === 0, + config, + errors: this.errors, + warnings: this.warnings, + }; + } + + /** + * Validate required fields + */ + private validateRequired(raw: RawStoreConfig): void { + if (!raw.id) { + this.addError('id', 'Store ID is required', CrawlErrorCode.INVALID_CONFIG); + } + + if (!raw.name) { + this.addError('name', 'Store name is required', CrawlErrorCode.INVALID_CONFIG); + } + + if (!raw.platformDispensaryId) { + this.addError( + 'platformDispensaryId', + 'Platform dispensary ID is required for crawling', + CrawlErrorCode.MISSING_PLATFORM_ID + ); + } + + if (!raw.menuType || raw.menuType === 'unknown') { + this.addError( + 'menuType', + 'Menu type must be detected before crawling', + CrawlErrorCode.INVALID_CONFIG + ); + } + } + + /** + * Build validated config with defaults applied + */ + private buildValidatedConfig(raw: RawStoreConfig): ValidatedStoreConfig { + // Slug + const slug = raw.slug || this.generateSlug(raw.name); + if (!raw.slug) { + this.addWarning('slug', 'Slug was missing, generated from name', slug); + } + + // Platform + const platform = raw.platform || 'dutchie'; + if (!raw.platform) { + this.addWarning('platform', 'Platform was missing, defaulting to dutchie', platform); + } + + // Menu URL + const menuUrl = raw.menuUrl || this.generateMenuUrl(raw.platformDispensaryId!, platform); + if (!raw.menuUrl) { + this.addWarning('menuUrl', 'Menu URL was missing, generated from platform ID', menuUrl); + } + + // Crawl frequency + const crawlFrequencyMinutes = this.validateNumeric( + raw.crawlFrequencyMinutes, + 'crawlFrequencyMinutes', + DEFAULT_CONFIG.crawlFrequencyMinutes, + 60, // min: 1 hour + 1440 // max: 24 hours + ); + + // Max retries + const maxRetries = this.validateNumeric( + raw.maxRetries, + 'maxRetries', + DEFAULT_CONFIG.maxRetries, + 1, // min + 10 // max + ); + + // Backoff multiplier + const backoffMultiplier = this.validateNumeric( + raw.backoffMultiplier, + 'backoffMultiplier', + 1.0, + 1.0, // min + 10.0 // max + ); + + // Crawl status + const crawlStatus = this.validateCrawlStatus(raw.crawlStatus); + + // Consecutive failures + const consecutiveFailures = Math.max(0, raw.consecutiveFailures || 0); + + // Last error code + const lastErrorCode = this.validateErrorCode(raw.lastErrorCode); + + return { + id: raw.id, + name: raw.name, + slug, + platform, + menuType: raw.menuType!, + platformDispensaryId: raw.platformDispensaryId!, + menuUrl, + + crawlFrequencyMinutes, + maxRetries, + currentProxyId: raw.currentProxyId || null, + currentUserAgent: raw.currentUserAgent || null, + + crawlStatus, + consecutiveFailures, + backoffMultiplier, + lastCrawlAt: raw.lastCrawlAt || null, + lastSuccessAt: raw.lastSuccessAt || null, + lastFailureAt: raw.lastFailureAt || null, + lastErrorCode, + nextCrawlAt: raw.nextCrawlAt || null, + + isValid: true, + validationErrors: [], + validationWarnings: this.warnings, + }; + } + + /** + * Validate numeric value with bounds + */ + private validateNumeric( + value: number | undefined, + field: string, + defaultValue: number, + min: number, + max: number + ): number { + if (value === undefined || value === null) { + this.addWarning(field, `Missing, defaulting to ${defaultValue}`, defaultValue); + return defaultValue; + } + + if (value < min) { + this.addWarning(field, `Value ${value} below minimum ${min}, using minimum`, min); + return min; + } + + if (value > max) { + this.addWarning(field, `Value ${value} above maximum ${max}, using maximum`, max); + return max; + } + + return value; + } + + /** + * Validate crawl status + */ + private validateCrawlStatus(status?: string): 'active' | 'degraded' | 'paused' | 'failed' { + const validStatuses = ['active', 'degraded', 'paused', 'failed']; + if (!status || !validStatuses.includes(status)) { + if (status) { + this.addWarning('crawlStatus', `Invalid status "${status}", defaulting to active`, 'active'); + } + return 'active'; + } + return status as 'active' | 'degraded' | 'paused' | 'failed'; + } + + /** + * Validate error code + */ + private validateErrorCode(code?: string): CrawlErrorCodeType | null { + if (!code) return null; + const validCodes = Object.values(CrawlErrorCode); + if (!validCodes.includes(code as CrawlErrorCodeType)) { + this.addWarning('lastErrorCode', `Invalid error code "${code}"`, null); + return CrawlErrorCode.UNKNOWN_ERROR; + } + return code as CrawlErrorCodeType; + } + + /** + * Generate slug from name + */ + private generateSlug(name: string): string { + return name + .toLowerCase() + .replace(/[^a-z0-9]+/g, '-') + .replace(/^-+|-+$/g, '') + .substring(0, 100); + } + + /** + * Generate menu URL from platform ID + */ + private generateMenuUrl(platformId: string, platform: string): string { + if (platform === 'dutchie') { + return `https://dutchie.com/embedded-menu/${platformId}`; + } + return `https://${platform}.com/menu/${platformId}`; + } + + /** + * Add validation error + */ + private addError(field: string, message: string, code: CrawlErrorCodeType): void { + this.errors.push({ field, message, code }); + console.warn(`[StoreValidator] ERROR ${field}: ${message}`); + } + + /** + * Add validation warning + */ + private addWarning(field: string, message: string, appliedDefault?: any): void { + this.warnings.push({ field, message, appliedDefault }); + // Log at debug level - warnings are expected for incomplete configs + console.debug(`[StoreValidator] WARNING ${field}: ${message}`); + } +} + +// ============================================================ +// CONVENIENCE FUNCTIONS +// ============================================================ + +/** + * Validate a single store config + */ +export function validateStoreConfig(raw: RawStoreConfig): ValidationResult { + const validator = new StoreValidator(); + return validator.validate(raw); +} + +/** + * Validate multiple store configs + */ +export function validateStoreConfigs(raws: RawStoreConfig[]): { + valid: ValidatedStoreConfig[]; + invalid: { raw: RawStoreConfig; errors: ValidationError[] }[]; + warnings: { storeId: number; warnings: ValidationWarning[] }[]; +} { + const valid: ValidatedStoreConfig[] = []; + const invalid: { raw: RawStoreConfig; errors: ValidationError[] }[] = []; + const warnings: { storeId: number; warnings: ValidationWarning[] }[] = []; + + for (const raw of raws) { + const result = validateStoreConfig(raw); + + if (result.isValid && result.config) { + valid.push(result.config); + if (result.warnings.length > 0) { + warnings.push({ storeId: raw.id, warnings: result.warnings }); + } + } else { + invalid.push({ raw, errors: result.errors }); + } + } + + return { valid, invalid, warnings }; +} + +/** + * Quick check if a store is crawlable + */ +export function isCrawlable(raw: RawStoreConfig): boolean { + return !!( + raw.id && + raw.name && + raw.platformDispensaryId && + raw.menuType && + raw.menuType !== 'unknown' && + raw.crawlStatus !== 'failed' && + raw.crawlStatus !== 'paused' + ); +} + +/** + * Get reason why store is not crawlable + */ +export function getNotCrawlableReason(raw: RawStoreConfig): string | null { + if (!raw.platformDispensaryId) { + return 'Missing platform_dispensary_id'; + } + if (!raw.menuType || raw.menuType === 'unknown') { + return 'Menu type not detected'; + } + if (raw.crawlStatus === 'failed') { + return 'Store is marked as failed'; + } + if (raw.crawlStatus === 'paused') { + return 'Crawling is paused'; + } + return null; +} + +// ============================================================ +// SINGLETON INSTANCE +// ============================================================ + +export const storeValidator = new StoreValidator(); diff --git a/backend/src/dutchie-az/types/index.ts b/backend/src/dutchie-az/types/index.ts index daa768c0..df7f68b4 100644 --- a/backend/src/dutchie-az/types/index.ts +++ b/backend/src/dutchie-az/types/index.ts @@ -564,6 +564,10 @@ export interface JobSchedule { baseIntervalMinutes: number; // e.g., 240 (4 hours) jitterMinutes: number; // e.g., 30 (±30 minutes) + // Worker identity + workerName?: string; // e.g., "Alice", "Henry", "Bella", "Oscar" + workerRole?: string; // e.g., "Store Discovery Worker", "GraphQL Product Sync" + // Last run tracking lastRunAt?: Date; lastStatus?: JobStatus; @@ -593,6 +597,10 @@ export interface JobRunLog { durationMs?: number; errorMessage?: string; + // Worker identity (propagated from schedule) + workerName?: string; // e.g., "Alice", "Henry", "Bella", "Oscar" + runRole?: string; // e.g., "Store Discovery Worker" + // Results summary itemsProcessed?: number; itemsSucceeded?: number; @@ -672,3 +680,72 @@ export interface BrandSummary { productCount: number; dispensaryCount: number; } + +// ============================================================ +// CRAWLER PROFILE TYPES +// ============================================================ + +/** + * DispensaryCrawlerProfile - per-store crawler configuration + * + * Allows each dispensary to have customized crawler settings without + * affecting shared crawler logic. A dispensary can have multiple profiles + * but only one is active at a time (via dispensaries.active_crawler_profile_id). + */ +export interface DispensaryCrawlerProfile { + id: number; + dispensaryId: number; + profileName: string; + crawlerType: string; // 'dutchie', 'treez', 'jane', 'sandbox', 'custom' + profileKey: string | null; // Optional key for per-store module mapping + config: Record; // Crawler-specific configuration + timeoutMs: number | null; + downloadImages: boolean; + trackStock: boolean; + version: number; + enabled: boolean; + createdAt: Date; + updatedAt: Date; +} + +/** + * DispensaryCrawlerProfileCreate - input type for creating a new profile + */ +export interface DispensaryCrawlerProfileCreate { + dispensaryId: number; + profileName: string; + crawlerType: string; + profileKey?: string | null; + config?: Record; + timeoutMs?: number | null; + downloadImages?: boolean; + trackStock?: boolean; + version?: number; + enabled?: boolean; +} + +/** + * DispensaryCrawlerProfileUpdate - input type for updating an existing profile + */ +export interface DispensaryCrawlerProfileUpdate { + profileName?: string; + crawlerType?: string; + profileKey?: string | null; + config?: Record; + timeoutMs?: number | null; + downloadImages?: boolean; + trackStock?: boolean; + version?: number; + enabled?: boolean; +} + +/** + * CrawlerProfileOptions - runtime options derived from a profile + * Used when invoking the actual crawler + */ +export interface CrawlerProfileOptions { + timeoutMs: number; + downloadImages: boolean; + trackStock: boolean; + config: Record; +} diff --git a/backend/src/hydration/__tests__/hydration.test.ts b/backend/src/hydration/__tests__/hydration.test.ts new file mode 100644 index 00000000..26ce1bf2 --- /dev/null +++ b/backend/src/hydration/__tests__/hydration.test.ts @@ -0,0 +1,250 @@ +/** + * Hydration Pipeline Unit Tests + */ + +import { HydrationWorker } from '../worker'; +import { HydrationLockManager, LOCK_NAMES } from '../locking'; +import { RawPayload, HydrationOptions } from '../types'; + +// Mock the pool +const mockQuery = jest.fn(); +const mockConnect = jest.fn(); +const mockPool = { + query: mockQuery, + connect: mockConnect, +} as any; + +describe('HydrationLockManager', () => { + beforeEach(() => { + jest.clearAllMocks(); + mockQuery.mockResolvedValue({ rows: [] }); + }); + + describe('acquireLock', () => { + it('should acquire lock when not held', async () => { + mockQuery + .mockResolvedValueOnce({ rows: [] }) // DELETE expired + .mockResolvedValueOnce({ rows: [{ id: 1 }] }); // INSERT + + const manager = new HydrationLockManager(mockPool, 'test-worker'); + const acquired = await manager.acquireLock('test-lock'); + + expect(acquired).toBe(true); + }); + + it('should return false when lock held by another worker', async () => { + mockQuery + .mockResolvedValueOnce({ rows: [] }) // DELETE expired + .mockResolvedValueOnce({ rows: [] }) // INSERT failed + .mockResolvedValueOnce({ rows: [{ worker_id: 'other-worker' }] }); // SELECT + + const manager = new HydrationLockManager(mockPool, 'test-worker'); + const acquired = await manager.acquireLock('test-lock'); + + expect(acquired).toBe(false); + }); + + it('should return true when lock held by same worker', async () => { + mockQuery + .mockResolvedValueOnce({ rows: [] }) // DELETE expired + .mockResolvedValueOnce({ rows: [] }) // INSERT failed + .mockResolvedValueOnce({ rows: [{ worker_id: 'test-worker' }] }) // SELECT + .mockResolvedValueOnce({ rows: [] }); // UPDATE refresh + + const manager = new HydrationLockManager(mockPool, 'test-worker'); + const acquired = await manager.acquireLock('test-lock'); + + expect(acquired).toBe(true); + }); + }); + + describe('releaseLock', () => { + it('should release lock owned by worker', async () => { + const manager = new HydrationLockManager(mockPool, 'test-worker'); + await manager.releaseLock('test-lock'); + + expect(mockQuery).toHaveBeenCalledWith( + expect.stringContaining('DELETE FROM hydration_locks'), + ['test-lock', 'test-worker'] + ); + }); + }); +}); + +describe('HydrationWorker', () => { + beforeEach(() => { + jest.clearAllMocks(); + }); + + describe('processPayload', () => { + it('should process valid payload in dry-run mode', async () => { + const mockPayload: RawPayload = { + id: 'test-uuid', + dispensary_id: 123, + crawl_run_id: 1, + platform: 'dutchie', + payload_version: 1, + raw_json: { + products: [ + { _id: 'p1', Name: 'Product 1', Status: 'Active' }, + ], + }, + product_count: 1, + pricing_type: 'rec', + crawl_mode: 'dual', + fetched_at: new Date(), + processed: false, + normalized_at: null, + hydration_error: null, + hydration_attempts: 0, + created_at: new Date(), + }; + + const worker = new HydrationWorker(mockPool, { dryRun: true }); + const result = await worker.processPayload(mockPayload); + + expect(result.success).toBe(true); + expect(result.payloadId).toBe('test-uuid'); + expect(result.dispensaryId).toBe(123); + // In dry-run, DB should not be updated + expect(mockQuery).not.toHaveBeenCalled(); + }); + + it('should handle missing normalizer', async () => { + const mockPayload: RawPayload = { + id: 'test-uuid', + dispensary_id: 123, + crawl_run_id: null, + platform: 'unknown-platform', + payload_version: 1, + raw_json: { products: [] }, + product_count: 0, + pricing_type: null, + crawl_mode: null, + fetched_at: new Date(), + processed: false, + normalized_at: null, + hydration_error: null, + hydration_attempts: 0, + created_at: new Date(), + }; + + mockQuery.mockResolvedValueOnce({ rows: [] }); // markPayloadFailed + + const worker = new HydrationWorker(mockPool, { dryRun: false }); + const result = await worker.processPayload(mockPayload); + + expect(result.success).toBe(false); + expect(result.errors).toContain('No normalizer found for platform: unknown-platform'); + }); + + it('should handle empty products', async () => { + const mockPayload: RawPayload = { + id: 'test-uuid', + dispensary_id: 123, + crawl_run_id: null, + platform: 'dutchie', + payload_version: 1, + raw_json: { products: [] }, + product_count: 0, + pricing_type: null, + crawl_mode: null, + fetched_at: new Date(), + processed: false, + normalized_at: null, + hydration_error: null, + hydration_attempts: 0, + created_at: new Date(), + }; + + const worker = new HydrationWorker(mockPool, { dryRun: true }); + const result = await worker.processPayload(mockPayload); + + // Should succeed but with 0 products + expect(result.success).toBe(true); + expect(result.productsUpserted).toBe(0); + }); + }); + + describe('dry-run mode', () => { + it('should not modify database in dry-run mode', async () => { + const mockPayload: RawPayload = { + id: 'test-uuid', + dispensary_id: 123, + crawl_run_id: null, + platform: 'dutchie', + payload_version: 1, + raw_json: { + products: [ + { + _id: 'p1', + Name: 'Product 1', + Status: 'Active', + brandName: 'Test Brand', + type: 'Flower', + recPrices: [50], + }, + ], + }, + product_count: 1, + pricing_type: 'rec', + crawl_mode: 'dual', + fetched_at: new Date(), + processed: false, + normalized_at: null, + hydration_error: null, + hydration_attempts: 0, + created_at: new Date(), + }; + + const consoleSpy = jest.spyOn(console, 'log').mockImplementation(); + + const worker = new HydrationWorker(mockPool, { dryRun: true }); + await worker.processPayload(mockPayload); + + // Verify dry-run log messages + expect(consoleSpy).toHaveBeenCalledWith( + expect.stringContaining('[DryRun]') + ); + + // Verify no database writes + expect(mockQuery).not.toHaveBeenCalled(); + + consoleSpy.mockRestore(); + }); + }); +}); + +describe('Discontinued products handling', () => { + it('should identify missing products correctly', () => { + const currentProducts = new Set(['p1', 'p2', 'p3']); + const previousProducts = ['p1', 'p2', 'p4', 'p5']; + + const discontinued = previousProducts.filter((id) => !currentProducts.has(id)); + + expect(discontinued).toEqual(['p4', 'p5']); + }); +}); + +describe('OOS transition handling', () => { + it('should detect OOS from Active to Inactive', () => { + const previousStatus = 'Active'; + const currentStatus = 'Inactive'; + + const wasActive = previousStatus === 'Active'; + const nowInactive = currentStatus === 'Inactive'; + const transitionedToOOS = wasActive && nowInactive; + + expect(transitionedToOOS).toBe(true); + }); + + it('should not flag OOS when already inactive', () => { + const previousStatus = 'Inactive'; + const currentStatus = 'Inactive'; + + const wasActive = previousStatus === 'Active'; + const transitionedToOOS = wasActive && currentStatus === 'Inactive'; + + expect(transitionedToOOS).toBe(false); + }); +}); diff --git a/backend/src/hydration/__tests__/normalizer.test.ts b/backend/src/hydration/__tests__/normalizer.test.ts new file mode 100644 index 00000000..2f8800c7 --- /dev/null +++ b/backend/src/hydration/__tests__/normalizer.test.ts @@ -0,0 +1,311 @@ +/** + * Normalizer Unit Tests + */ + +import { DutchieNormalizer } from '../normalizers/dutchie'; +import { RawPayload } from '../types'; + +describe('DutchieNormalizer', () => { + const normalizer = new DutchieNormalizer(); + + describe('extractProducts', () => { + it('should extract products from GraphQL response format', () => { + const rawJson = { + data: { + filteredProducts: { + products: [ + { _id: '1', Name: 'Product 1' }, + { _id: '2', Name: 'Product 2' }, + ], + }, + }, + }; + + const products = normalizer.extractProducts(rawJson); + expect(products).toHaveLength(2); + expect(products[0]._id).toBe('1'); + }); + + it('should extract products from direct array', () => { + const products = normalizer.extractProducts([ + { _id: '1', Name: 'Product 1' }, + ]); + expect(products).toHaveLength(1); + }); + + it('should extract products from merged mode format', () => { + const rawJson = { + merged: [ + { _id: '1', Name: 'Product 1' }, + ], + products_a: [{ _id: '2' }], + products_b: [{ _id: '3' }], + }; + + const products = normalizer.extractProducts(rawJson); + expect(products).toHaveLength(1); + expect(products[0]._id).toBe('1'); + }); + + it('should merge products from mode A and B when no merged array', () => { + const rawJson = { + products_a: [ + { _id: '1', Name: 'Product 1' }, + ], + products_b: [ + { _id: '2', Name: 'Product 2' }, + { _id: '1', Name: 'Product 1 from B' }, // Duplicate + ], + }; + + const products = normalizer.extractProducts(rawJson); + expect(products).toHaveLength(2); + // Mode A takes precedence for duplicates + expect(products.find((p: any) => p._id === '1').Name).toBe('Product 1'); + }); + + it('should return empty array for invalid payload', () => { + expect(normalizer.extractProducts(null)).toEqual([]); + expect(normalizer.extractProducts({})).toEqual([]); + expect(normalizer.extractProducts({ data: {} })).toEqual([]); + }); + }); + + describe('validatePayload', () => { + it('should validate valid payload', () => { + const rawJson = { + products: [{ _id: '1', Name: 'Product' }], + }; + + const result = normalizer.validatePayload(rawJson); + expect(result.valid).toBe(true); + expect(result.errors).toHaveLength(0); + }); + + it('should reject empty payload', () => { + const result = normalizer.validatePayload({}); + expect(result.valid).toBe(false); + expect(result.errors).toContain('No products found in payload'); + }); + + it('should capture GraphQL errors', () => { + const rawJson = { + products: [{ _id: '1' }], + errors: [{ message: 'Rate limit exceeded' }], + }; + + const result = normalizer.validatePayload(rawJson); + expect(result.errors.length).toBeGreaterThan(0); + expect(result.errors.some((e) => e.includes('Rate limit'))).toBe(true); + }); + }); + + describe('normalize', () => { + const mockPayload: RawPayload = { + id: 'test-uuid', + dispensary_id: 123, + crawl_run_id: 1, + platform: 'dutchie', + payload_version: 1, + raw_json: { + products: [ + { + _id: 'prod-1', + Name: 'Blue Dream', + brandName: 'Top Shelf', + brandId: 'brand-1', + type: 'Flower', + subcategory: 'Hybrid', + strainType: 'Hybrid', + THC: 25.5, + CBD: 0.5, + Status: 'Active', + Image: 'https://example.com/image.jpg', + recPrices: [35, 60, 100], + medicalPrices: [30, 55, 90], + POSMetaData: { + children: [ + { option: '1g', recPrice: 35, quantityAvailable: 10 }, + { option: '3.5g', recPrice: 60, quantityAvailable: 5 }, + ], + }, + }, + ], + }, + product_count: 1, + pricing_type: 'rec', + crawl_mode: 'dual', + fetched_at: new Date(), + processed: false, + normalized_at: null, + hydration_error: null, + hydration_attempts: 0, + created_at: new Date(), + }; + + it('should normalize products correctly', () => { + const result = normalizer.normalize(mockPayload); + + expect(result.products).toHaveLength(1); + expect(result.productCount).toBe(1); + + const product = result.products[0]; + expect(product.externalProductId).toBe('prod-1'); + expect(product.name).toBe('Blue Dream'); + expect(product.brandName).toBe('Top Shelf'); + expect(product.category).toBe('Flower'); + expect(product.thcPercent).toBe(25.5); + expect(product.isActive).toBe(true); + }); + + it('should normalize pricing correctly', () => { + const result = normalizer.normalize(mockPayload); + + const pricing = result.pricing.get('prod-1'); + expect(pricing).toBeDefined(); + expect(pricing!.priceRecMin).toBe(3500); // cents + expect(pricing!.priceRecMax).toBe(10000); + expect(pricing!.priceMedMin).toBe(3000); + }); + + it('should normalize availability correctly', () => { + const result = normalizer.normalize(mockPayload); + + const availability = result.availability.get('prod-1'); + expect(availability).toBeDefined(); + expect(availability!.inStock).toBe(true); + expect(availability!.stockStatus).toBe('in_stock'); + expect(availability!.quantity).toBe(15); // 10 + 5 + }); + + it('should extract brands', () => { + const result = normalizer.normalize(mockPayload); + + expect(result.brands).toHaveLength(1); + expect(result.brands[0].name).toBe('Top Shelf'); + expect(result.brands[0].slug).toBe('top-shelf'); + }); + + it('should extract categories', () => { + const result = normalizer.normalize(mockPayload); + + expect(result.categories).toHaveLength(1); + expect(result.categories[0].name).toBe('Flower'); + expect(result.categories[0].slug).toBe('flower'); + }); + + it('should handle products without required fields', () => { + const badPayload: RawPayload = { + ...mockPayload, + raw_json: { + products: [ + { _id: 'no-name' }, // Missing Name + { Name: 'No ID' }, // Missing _id + { _id: 'valid', Name: 'Valid Product' }, + ], + }, + }; + + const result = normalizer.normalize(badPayload); + // Only the valid product should be included + expect(result.products).toHaveLength(1); + expect(result.products[0].name).toBe('Valid Product'); + }); + + it('should mark inactive products correctly', () => { + const inactivePayload: RawPayload = { + ...mockPayload, + raw_json: { + products: [ + { + _id: 'inactive-1', + Name: 'Inactive Product', + Status: 'Inactive', + }, + ], + }, + }; + + const result = normalizer.normalize(inactivePayload); + const availability = result.availability.get('inactive-1'); + + expect(availability).toBeDefined(); + expect(availability!.inStock).toBe(false); + expect(availability!.stockStatus).toBe('out_of_stock'); + }); + }); +}); + +describe('Normalizer edge cases', () => { + const normalizer = new DutchieNormalizer(); + + it('should handle null/undefined values gracefully', () => { + const payload: RawPayload = { + id: 'test', + dispensary_id: 1, + crawl_run_id: null, + platform: 'dutchie', + payload_version: 1, + raw_json: { + products: [ + { + _id: 'prod-1', + Name: 'Test', + brandName: null, + THC: undefined, + POSMetaData: null, + }, + ], + }, + product_count: 1, + pricing_type: null, + crawl_mode: null, + fetched_at: new Date(), + processed: false, + normalized_at: null, + hydration_error: null, + hydration_attempts: 0, + created_at: new Date(), + }; + + const result = normalizer.normalize(payload); + expect(result.products).toHaveLength(1); + expect(result.products[0].brandName).toBeNull(); + }); + + it('should handle special price scenarios', () => { + const payload: RawPayload = { + id: 'test', + dispensary_id: 1, + crawl_run_id: null, + platform: 'dutchie', + payload_version: 1, + raw_json: { + products: [ + { + _id: 'special-prod', + Name: 'Special Product', + recPrices: [50], + recSpecialPrices: [40], + }, + ], + }, + product_count: 1, + pricing_type: null, + crawl_mode: null, + fetched_at: new Date(), + processed: false, + normalized_at: null, + hydration_error: null, + hydration_attempts: 0, + created_at: new Date(), + }; + + const result = normalizer.normalize(payload); + const pricing = result.pricing.get('special-prod'); + + expect(pricing!.isOnSpecial).toBe(true); + expect(pricing!.priceRecSpecial).toBe(4000); + expect(pricing!.discountPercent).toBe(20); + }); +}); diff --git a/backend/src/hydration/backfill.ts b/backend/src/hydration/backfill.ts new file mode 100644 index 00000000..517421ac --- /dev/null +++ b/backend/src/hydration/backfill.ts @@ -0,0 +1,431 @@ +/** + * Backfill Script + * + * Imports historical payloads from existing data sources: + * - dutchie_products.latest_raw_payload + * - dutchie_product_snapshots.raw_data + * - Any cached files on disk + */ + +import { Pool } from 'pg'; +import * as fs from 'fs'; +import * as path from 'path'; +import { storeRawPayload } from './payload-store'; +import { HydrationLockManager, LOCK_NAMES } from './locking'; + +const BATCH_SIZE = 100; + +export interface BackfillOptions { + dryRun?: boolean; + source: 'dutchie_products' | 'snapshots' | 'cache_files' | 'all'; + dispensaryId?: number; + limit?: number; + cachePath?: string; +} + +export interface BackfillResult { + source: string; + payloadsCreated: number; + skipped: number; + errors: string[]; + durationMs: number; +} + +// ============================================================ +// BACKFILL FROM DUTCHIE_PRODUCTS +// ============================================================ + +/** + * Backfill from dutchie_products.latest_raw_payload + * This captures the most recent raw data for each product + */ +export async function backfillFromDutchieProducts( + pool: Pool, + options: BackfillOptions +): Promise { + const startTime = Date.now(); + const errors: string[] = []; + let payloadsCreated = 0; + let skipped = 0; + + console.log('[Backfill] Starting backfill from dutchie_products...'); + + // Get distinct dispensaries with raw payloads + let query = ` + SELECT DISTINCT dispensary_id + FROM dutchie_products + WHERE latest_raw_payload IS NOT NULL + `; + const params: any[] = []; + + if (options.dispensaryId) { + query += ` AND dispensary_id = $1`; + params.push(options.dispensaryId); + } + + const dispensaries = await pool.query(query, params); + + console.log(`[Backfill] Found ${dispensaries.rows.length} dispensaries with raw payloads`); + + for (const row of dispensaries.rows) { + const dispensaryId = row.dispensary_id; + + try { + // Check if we already have a payload for this dispensary + const existing = await pool.query( + `SELECT 1 FROM raw_payloads + WHERE dispensary_id = $1 AND platform = 'dutchie' + LIMIT 1`, + [dispensaryId] + ); + + if (existing.rows.length > 0) { + skipped++; + continue; + } + + // Aggregate all products for this dispensary into one payload + const products = await pool.query( + `SELECT + external_product_id, + latest_raw_payload, + updated_at + FROM dutchie_products + WHERE dispensary_id = $1 + AND latest_raw_payload IS NOT NULL + ORDER BY updated_at DESC + LIMIT $2`, + [dispensaryId, options.limit || 10000] + ); + + if (products.rows.length === 0) { + skipped++; + continue; + } + + // Create aggregated payload + const aggregatedPayload = { + products: products.rows.map((p: any) => p.latest_raw_payload), + backfilled: true, + backfill_source: 'dutchie_products', + backfill_date: new Date().toISOString(), + }; + + // Get the latest update time + const latestUpdate = products.rows[0]?.updated_at || new Date(); + + if (options.dryRun) { + console.log( + `[Backfill][DryRun] Would create payload for dispensary ${dispensaryId} ` + + `with ${products.rows.length} products` + ); + payloadsCreated++; + } else { + await storeRawPayload(pool, { + dispensaryId, + platform: 'dutchie', + payloadVersion: 1, + rawJson: aggregatedPayload, + productCount: products.rows.length, + pricingType: 'rec', + crawlMode: 'backfill', + fetchedAt: latestUpdate, + }); + payloadsCreated++; + console.log( + `[Backfill] Created payload for dispensary ${dispensaryId} ` + + `with ${products.rows.length} products` + ); + } + } catch (error: any) { + errors.push(`Dispensary ${dispensaryId}: ${error.message}`); + } + } + + return { + source: 'dutchie_products', + payloadsCreated, + skipped, + errors, + durationMs: Date.now() - startTime, + }; +} + +// ============================================================ +// BACKFILL FROM SNAPSHOTS +// ============================================================ + +/** + * Backfill from dutchie_product_snapshots.raw_data + * Creates payloads from historical snapshot data + */ +export async function backfillFromSnapshots( + pool: Pool, + options: BackfillOptions +): Promise { + const startTime = Date.now(); + const errors: string[] = []; + let payloadsCreated = 0; + let skipped = 0; + + console.log('[Backfill] Starting backfill from snapshots...'); + + // Get distinct crawl timestamps per dispensary + let query = ` + SELECT DISTINCT + dispensary_id, + DATE_TRUNC('hour', captured_at) as crawl_hour, + COUNT(*) as product_count + FROM dutchie_product_snapshots + WHERE raw_data IS NOT NULL + `; + const params: any[] = []; + let paramIndex = 1; + + if (options.dispensaryId) { + query += ` AND dispensary_id = $${paramIndex}`; + params.push(options.dispensaryId); + paramIndex++; + } + + query += ` GROUP BY dispensary_id, DATE_TRUNC('hour', captured_at) + ORDER BY crawl_hour DESC`; + + if (options.limit) { + query += ` LIMIT $${paramIndex}`; + params.push(options.limit); + } + + const crawlHours = await pool.query(query, params); + + console.log(`[Backfill] Found ${crawlHours.rows.length} distinct crawl hours`); + + for (const row of crawlHours.rows) { + const { dispensary_id, crawl_hour, product_count } = row; + + try { + // Check if we already have this payload + const existing = await pool.query( + `SELECT 1 FROM raw_payloads + WHERE dispensary_id = $1 + AND platform = 'dutchie' + AND fetched_at >= $2 + AND fetched_at < $2 + INTERVAL '1 hour' + LIMIT 1`, + [dispensary_id, crawl_hour] + ); + + if (existing.rows.length > 0) { + skipped++; + continue; + } + + // Get all snapshots for this hour + const snapshots = await pool.query( + `SELECT raw_data + FROM dutchie_product_snapshots + WHERE dispensary_id = $1 + AND captured_at >= $2 + AND captured_at < $2 + INTERVAL '1 hour' + AND raw_data IS NOT NULL`, + [dispensary_id, crawl_hour] + ); + + if (snapshots.rows.length === 0) { + skipped++; + continue; + } + + const aggregatedPayload = { + products: snapshots.rows.map((s: any) => s.raw_data), + backfilled: true, + backfill_source: 'snapshots', + backfill_date: new Date().toISOString(), + original_crawl_hour: crawl_hour, + }; + + if (options.dryRun) { + console.log( + `[Backfill][DryRun] Would create payload for dispensary ${dispensary_id} ` + + `at ${crawl_hour} with ${snapshots.rows.length} products` + ); + payloadsCreated++; + } else { + await storeRawPayload(pool, { + dispensaryId: dispensary_id, + platform: 'dutchie', + payloadVersion: 1, + rawJson: aggregatedPayload, + productCount: snapshots.rows.length, + pricingType: 'rec', + crawlMode: 'backfill', + fetchedAt: crawl_hour, + }); + payloadsCreated++; + } + } catch (error: any) { + errors.push(`Dispensary ${dispensary_id} at ${crawl_hour}: ${error.message}`); + } + } + + return { + source: 'snapshots', + payloadsCreated, + skipped, + errors, + durationMs: Date.now() - startTime, + }; +} + +// ============================================================ +// BACKFILL FROM CACHE FILES +// ============================================================ + +/** + * Backfill from cached JSON files on disk + */ +export async function backfillFromCacheFiles( + pool: Pool, + options: BackfillOptions +): Promise { + const startTime = Date.now(); + const errors: string[] = []; + let payloadsCreated = 0; + let skipped = 0; + + const cachePath = options.cachePath || './cache/payloads'; + + console.log(`[Backfill] Starting backfill from cache files at ${cachePath}...`); + + if (!fs.existsSync(cachePath)) { + console.log('[Backfill] Cache directory does not exist'); + return { + source: 'cache_files', + payloadsCreated: 0, + skipped: 0, + errors: ['Cache directory does not exist'], + durationMs: Date.now() - startTime, + }; + } + + // Expected structure: cache/payloads//.json + const dispensaryDirs = fs.readdirSync(cachePath); + + for (const dispensaryDir of dispensaryDirs) { + const dispensaryPath = path.join(cachePath, dispensaryDir); + if (!fs.statSync(dispensaryPath).isDirectory()) continue; + + const dispensaryId = parseInt(dispensaryDir, 10); + if (isNaN(dispensaryId)) continue; + + if (options.dispensaryId && options.dispensaryId !== dispensaryId) { + continue; + } + + const files = fs.readdirSync(dispensaryPath) + .filter((f) => f.endsWith('.json')) + .sort() + .reverse(); + + let processed = 0; + for (const file of files) { + if (options.limit && processed >= options.limit) break; + + const filePath = path.join(dispensaryPath, file); + try { + const content = fs.readFileSync(filePath, 'utf-8'); + const payload = JSON.parse(content); + + // Extract timestamp from filename (e.g., 2024-01-15T10-30-00.json) + const timestamp = file.replace('.json', '').replace(/-/g, ':').replace('T', ' '); + const fetchedAt = new Date(timestamp); + + if (options.dryRun) { + console.log( + `[Backfill][DryRun] Would import ${file} for dispensary ${dispensaryId}` + ); + payloadsCreated++; + } else { + await storeRawPayload(pool, { + dispensaryId, + platform: 'dutchie', + payloadVersion: 1, + rawJson: { + ...payload, + backfilled: true, + backfill_source: 'cache_files', + backfill_file: file, + }, + productCount: payload.products?.length || 0, + pricingType: 'rec', + crawlMode: 'backfill', + fetchedAt: isNaN(fetchedAt.getTime()) ? new Date() : fetchedAt, + }); + payloadsCreated++; + } + + processed++; + } catch (error: any) { + errors.push(`File ${filePath}: ${error.message}`); + skipped++; + } + } + } + + return { + source: 'cache_files', + payloadsCreated, + skipped, + errors, + durationMs: Date.now() - startTime, + }; +} + +// ============================================================ +// MAIN BACKFILL FUNCTION +// ============================================================ + +/** + * Run full backfill + */ +export async function runBackfill( + pool: Pool, + options: BackfillOptions +): Promise { + const lockManager = new HydrationLockManager(pool); + const results: BackfillResult[] = []; + + // Acquire lock + const lockAcquired = await lockManager.acquireLock(LOCK_NAMES.BACKFILL, 60 * 60 * 1000); + if (!lockAcquired) { + console.log('[Backfill] Could not acquire lock, another backfill may be running'); + return []; + } + + try { + console.log('[Backfill] Starting backfill process...'); + + if (options.source === 'all' || options.source === 'dutchie_products') { + const result = await backfillFromDutchieProducts(pool, options); + results.push(result); + console.log(`[Backfill] dutchie_products: ${result.payloadsCreated} created, ${result.skipped} skipped`); + } + + if (options.source === 'all' || options.source === 'snapshots') { + const result = await backfillFromSnapshots(pool, options); + results.push(result); + console.log(`[Backfill] snapshots: ${result.payloadsCreated} created, ${result.skipped} skipped`); + } + + if (options.source === 'all' || options.source === 'cache_files') { + const result = await backfillFromCacheFiles(pool, options); + results.push(result); + console.log(`[Backfill] cache_files: ${result.payloadsCreated} created, ${result.skipped} skipped`); + } + + console.log('[Backfill] Backfill complete'); + return results; + } finally { + await lockManager.releaseLock(LOCK_NAMES.BACKFILL); + } +} diff --git a/backend/src/hydration/canonical-upsert.ts b/backend/src/hydration/canonical-upsert.ts new file mode 100644 index 00000000..fd020878 --- /dev/null +++ b/backend/src/hydration/canonical-upsert.ts @@ -0,0 +1,435 @@ +/** + * Canonical Upsert Functions + * + * Upserts normalized data into canonical tables: + * - store_products + * - store_product_snapshots + * - brands + * - categories (future) + */ + +import { Pool, PoolClient } from 'pg'; +import { + NormalizedProduct, + NormalizedPricing, + NormalizedAvailability, + NormalizedBrand, + NormalizationResult, +} from './types'; + +const BATCH_SIZE = 100; + +// ============================================================ +// PRODUCT UPSERTS +// ============================================================ + +export interface UpsertProductsResult { + upserted: number; + new: number; + updated: number; +} + +/** + * Upsert products to store_products table + * Returns counts of new vs updated products + */ +export async function upsertStoreProducts( + pool: Pool, + products: NormalizedProduct[], + pricing: Map, + availability: Map, + options: { dryRun?: boolean } = {} +): Promise { + if (products.length === 0) { + return { upserted: 0, new: 0, updated: 0 }; + } + + const { dryRun = false } = options; + let newCount = 0; + let updatedCount = 0; + + // Process in batches + for (let i = 0; i < products.length; i += BATCH_SIZE) { + const batch = products.slice(i, i + BATCH_SIZE); + + if (dryRun) { + console.log(`[DryRun] Would upsert ${batch.length} products`); + continue; + } + + const client = await pool.connect(); + try { + await client.query('BEGIN'); + + for (const product of batch) { + const productPricing = pricing.get(product.externalProductId); + const productAvailability = availability.get(product.externalProductId); + + const result = await client.query( + `INSERT INTO store_products ( + dispensary_id, provider, provider_product_id, provider_brand_id, + name, brand_name, category, subcategory, + price_rec, price_med, price_rec_special, price_med_special, + is_on_special, discount_percent, + is_in_stock, stock_status, + thc_percent, cbd_percent, + image_url, + first_seen_at, last_seen_at, updated_at + ) VALUES ( + $1, $2, $3, $4, + $5, $6, $7, $8, + $9, $10, $11, $12, + $13, $14, + $15, $16, + $17, $18, + $19, + NOW(), NOW(), NOW() + ) + ON CONFLICT (dispensary_id, provider, provider_product_id) + DO UPDATE SET + name = EXCLUDED.name, + brand_name = EXCLUDED.brand_name, + category = EXCLUDED.category, + subcategory = EXCLUDED.subcategory, + price_rec = EXCLUDED.price_rec, + price_med = EXCLUDED.price_med, + price_rec_special = EXCLUDED.price_rec_special, + price_med_special = EXCLUDED.price_med_special, + is_on_special = EXCLUDED.is_on_special, + discount_percent = EXCLUDED.discount_percent, + is_in_stock = EXCLUDED.is_in_stock, + stock_status = EXCLUDED.stock_status, + thc_percent = EXCLUDED.thc_percent, + cbd_percent = EXCLUDED.cbd_percent, + image_url = EXCLUDED.image_url, + last_seen_at = NOW(), + updated_at = NOW() + RETURNING (xmax = 0) as is_new`, + [ + product.dispensaryId, + product.platform, + product.externalProductId, + product.brandId, + product.name, + product.brandName, + product.category, + product.subcategory, + productPricing?.priceRec ? productPricing.priceRec / 100 : null, + productPricing?.priceMed ? productPricing.priceMed / 100 : null, + productPricing?.priceRecSpecial ? productPricing.priceRecSpecial / 100 : null, + productPricing?.priceMedSpecial ? productPricing.priceMedSpecial / 100 : null, + productPricing?.isOnSpecial || false, + productPricing?.discountPercent, + productAvailability?.inStock ?? true, + productAvailability?.stockStatus || 'unknown', + product.thcPercent, + product.cbdPercent, + product.primaryImageUrl, + ] + ); + + if (result.rows[0]?.is_new) { + newCount++; + } else { + updatedCount++; + } + } + + await client.query('COMMIT'); + } catch (error) { + await client.query('ROLLBACK'); + throw error; + } finally { + client.release(); + } + } + + return { + upserted: newCount + updatedCount, + new: newCount, + updated: updatedCount, + }; +} + +// ============================================================ +// SNAPSHOT CREATION +// ============================================================ + +export interface CreateSnapshotsResult { + created: number; +} + +/** + * Create snapshots for all products in a crawl + */ +export async function createStoreProductSnapshots( + pool: Pool, + dispensaryId: number, + products: NormalizedProduct[], + pricing: Map, + availability: Map, + crawlRunId: number | null, + options: { dryRun?: boolean } = {} +): Promise { + if (products.length === 0) { + return { created: 0 }; + } + + const { dryRun = false } = options; + + if (dryRun) { + console.log(`[DryRun] Would create ${products.length} snapshots`); + return { created: products.length }; + } + + let created = 0; + + // Process in batches + for (let i = 0; i < products.length; i += BATCH_SIZE) { + const batch = products.slice(i, i + BATCH_SIZE); + + const values: any[][] = []; + for (const product of batch) { + const productPricing = pricing.get(product.externalProductId); + const productAvailability = availability.get(product.externalProductId); + + values.push([ + dispensaryId, + product.platform, + product.externalProductId, + crawlRunId, + new Date(), // captured_at + product.name, + product.brandName, + product.category, + product.subcategory, + productPricing?.priceRec ? productPricing.priceRec / 100 : null, + productPricing?.priceMed ? productPricing.priceMed / 100 : null, + productPricing?.priceRecSpecial ? productPricing.priceRecSpecial / 100 : null, + productPricing?.priceMedSpecial ? productPricing.priceMedSpecial / 100 : null, + productPricing?.isOnSpecial || false, + productPricing?.discountPercent, + productAvailability?.inStock ?? true, + productAvailability?.quantity, + productAvailability?.stockStatus || 'unknown', + product.thcPercent, + product.cbdPercent, + product.primaryImageUrl, + JSON.stringify(product.rawProduct), + ]); + } + + // Build bulk insert query + const placeholders = values.map((_, idx) => { + const offset = idx * 22; + return `(${Array.from({ length: 22 }, (_, j) => `$${offset + j + 1}`).join(', ')})`; + }).join(', '); + + await pool.query( + `INSERT INTO store_product_snapshots ( + dispensary_id, provider, provider_product_id, crawl_run_id, + captured_at, + name, brand_name, category, subcategory, + price_rec, price_med, price_rec_special, price_med_special, + is_on_special, discount_percent, + is_in_stock, stock_quantity, stock_status, + thc_percent, cbd_percent, + image_url, raw_data + ) VALUES ${placeholders}`, + values.flat() + ); + + created += batch.length; + } + + return { created }; +} + +// ============================================================ +// DISCONTINUED PRODUCTS +// ============================================================ + +/** + * Mark products as discontinued if they weren't in the current crawl + */ +export async function markDiscontinuedProducts( + pool: Pool, + dispensaryId: number, + currentProductIds: Set, + platform: string, + crawlRunId: number | null, + options: { dryRun?: boolean } = {} +): Promise { + const { dryRun = false } = options; + + // Get all products for this dispensary/platform + const result = await pool.query( + `SELECT provider_product_id FROM store_products + WHERE dispensary_id = $1 AND provider = $2 AND is_in_stock = TRUE`, + [dispensaryId, platform] + ); + + const existingIds = result.rows.map((r: any) => r.provider_product_id); + const discontinuedIds = existingIds.filter((id: string) => !currentProductIds.has(id)); + + if (discontinuedIds.length === 0) { + return 0; + } + + if (dryRun) { + console.log(`[DryRun] Would mark ${discontinuedIds.length} products as discontinued`); + return discontinuedIds.length; + } + + // Update store_products to mark as out of stock + await pool.query( + `UPDATE store_products + SET is_in_stock = FALSE, + stock_status = 'discontinued', + updated_at = NOW() + WHERE dispensary_id = $1 + AND provider = $2 + AND provider_product_id = ANY($3)`, + [dispensaryId, platform, discontinuedIds] + ); + + // Create snapshots for discontinued products + for (const productId of discontinuedIds) { + await pool.query( + `INSERT INTO store_product_snapshots ( + dispensary_id, provider, provider_product_id, crawl_run_id, + captured_at, is_in_stock, stock_status + ) + SELECT + dispensary_id, provider, provider_product_id, $4, + NOW(), FALSE, 'discontinued' + FROM store_products + WHERE dispensary_id = $1 AND provider = $2 AND provider_product_id = $3`, + [dispensaryId, platform, productId, crawlRunId] + ); + } + + return discontinuedIds.length; +} + +// ============================================================ +// BRAND UPSERTS +// ============================================================ + +export interface UpsertBrandsResult { + upserted: number; + new: number; +} + +/** + * Upsert brands to brands table + */ +export async function upsertBrands( + pool: Pool, + brands: NormalizedBrand[], + options: { dryRun?: boolean; skipIfExists?: boolean } = {} +): Promise { + if (brands.length === 0) { + return { upserted: 0, new: 0 }; + } + + const { dryRun = false, skipIfExists = true } = options; + + if (dryRun) { + console.log(`[DryRun] Would upsert ${brands.length} brands`); + return { upserted: brands.length, new: 0 }; + } + + let newCount = 0; + + for (const brand of brands) { + const result = await pool.query( + `INSERT INTO brands (name, slug, external_id, logo_url, created_at, updated_at) + VALUES ($1, $2, $3, $4, NOW(), NOW()) + ON CONFLICT (slug) DO ${skipIfExists ? 'NOTHING' : 'UPDATE SET logo_url = COALESCE(EXCLUDED.logo_url, brands.logo_url), updated_at = NOW()'} + RETURNING (xmax = 0) as is_new`, + [brand.name, brand.slug, brand.externalBrandId, brand.logoUrl] + ); + + if (result.rows[0]?.is_new) { + newCount++; + } + } + + return { + upserted: brands.length, + new: newCount, + }; +} + +// ============================================================ +// FULL HYDRATION +// ============================================================ + +export interface HydratePayloadResult { + productsUpserted: number; + productsNew: number; + productsUpdated: number; + productsDiscontinued: number; + snapshotsCreated: number; + brandsCreated: number; +} + +/** + * Hydrate a complete normalization result into canonical tables + */ +export async function hydrateToCanonical( + pool: Pool, + dispensaryId: number, + normResult: NormalizationResult, + crawlRunId: number | null, + options: { dryRun?: boolean } = {} +): Promise { + const { dryRun = false } = options; + + // 1. Upsert brands + const brandResult = await upsertBrands(pool, normResult.brands, { dryRun }); + + // 2. Upsert products + const productResult = await upsertStoreProducts( + pool, + normResult.products, + normResult.pricing, + normResult.availability, + { dryRun } + ); + + // 3. Create snapshots + const snapshotResult = await createStoreProductSnapshots( + pool, + dispensaryId, + normResult.products, + normResult.pricing, + normResult.availability, + crawlRunId, + { dryRun } + ); + + // 4. Mark discontinued products + const currentProductIds = new Set( + normResult.products.map((p) => p.externalProductId) + ); + const platform = normResult.products[0]?.platform || 'dutchie'; + const discontinuedCount = await markDiscontinuedProducts( + pool, + dispensaryId, + currentProductIds, + platform, + crawlRunId, + { dryRun } + ); + + return { + productsUpserted: productResult.upserted, + productsNew: productResult.new, + productsUpdated: productResult.updated, + productsDiscontinued: discontinuedCount, + snapshotsCreated: snapshotResult.created, + brandsCreated: brandResult.new, + }; +} diff --git a/backend/src/hydration/incremental-sync.ts b/backend/src/hydration/incremental-sync.ts new file mode 100644 index 00000000..d8db1045 --- /dev/null +++ b/backend/src/hydration/incremental-sync.ts @@ -0,0 +1,680 @@ +/** + * Incremental Sync + * + * Hooks into the crawler to automatically write to canonical tables + * after each crawl completes. This ensures store_products and + * store_product_snapshots stay in sync with new data. + * + * Two modes: + * 1. Inline - Called directly from crawler after saving to legacy tables + * 2. Async - Called from a background worker that processes recent crawls + * + * Usage: + * // Inline mode (in crawler) + * import { syncCrawlToCanonical } from './hydration/incremental-sync'; + * await syncCrawlToCanonical(pool, crawlResult); + * + * // Async mode (background worker) + * import { syncRecentCrawls } from './hydration/incremental-sync'; + * await syncRecentCrawls(pool, { since: '1 hour' }); + */ + +import { Pool } from 'pg'; + +const BATCH_SIZE = 100; + +// ============================================================ +// TYPES +// ============================================================ + +export interface CrawlResult { + dispensaryId: number; + stateId?: number; + platformDispensaryId?: string; + crawlJobId?: number; // legacy dispensary_crawl_jobs.id + startedAt: Date; + finishedAt?: Date; + status: 'success' | 'failed' | 'running'; + errorMessage?: string; + productsFound: number; + productsCreated: number; + productsUpdated: number; + productsMissing?: number; + brandsFound?: number; +} + +export interface SyncOptions { + dryRun?: boolean; + verbose?: boolean; + skipSnapshots?: boolean; +} + +export interface SyncResult { + crawlRunId: number | null; + productsUpserted: number; + productsNew: number; + productsUpdated: number; + snapshotsCreated: number; + durationMs: number; + errors: string[]; +} + +// ============================================================ +// CREATE OR GET CRAWL RUN +// ============================================================ + +/** + * Create a crawl_run record for a completed crawl. + * Returns existing if already synced (idempotent). + */ +export async function getOrCreateCrawlRun( + pool: Pool, + crawlResult: CrawlResult, + options: SyncOptions = {} +): Promise { + const { dryRun = false, verbose = false } = options; + + // Check if already exists (by legacy job ID) + if (crawlResult.crawlJobId) { + const existing = await pool.query( + `SELECT id FROM crawl_runs WHERE legacy_dispensary_crawl_job_id = $1`, + [crawlResult.crawlJobId] + ); + + if (existing.rows.length > 0) { + if (verbose) { + console.log(`[IncrSync] Found existing crawl_run ${existing.rows[0].id} for job ${crawlResult.crawlJobId}`); + } + return existing.rows[0].id; + } + } + + if (dryRun) { + console.log(`[IncrSync][DryRun] Would create crawl_run for dispensary ${crawlResult.dispensaryId}`); + return null; + } + + const durationMs = crawlResult.finishedAt && crawlResult.startedAt + ? crawlResult.finishedAt.getTime() - crawlResult.startedAt.getTime() + : null; + + const result = await pool.query( + `INSERT INTO crawl_runs ( + dispensary_id, state_id, provider, + legacy_dispensary_crawl_job_id, + started_at, finished_at, duration_ms, + status, error_message, + products_found, products_new, products_updated, products_missing, + brands_found, trigger_type, created_at + ) VALUES ( + $1, $2, 'dutchie', + $3, + $4, $5, $6, + $7, $8, + $9, $10, $11, $12, + $13, 'scheduled', NOW() + ) + RETURNING id`, + [ + crawlResult.dispensaryId, + crawlResult.stateId, + crawlResult.crawlJobId, + crawlResult.startedAt, + crawlResult.finishedAt, + durationMs, + crawlResult.status, + crawlResult.errorMessage, + crawlResult.productsFound, + crawlResult.productsCreated, + crawlResult.productsUpdated, + crawlResult.productsMissing || 0, + crawlResult.brandsFound || 0, + ] + ); + + if (verbose) { + console.log(`[IncrSync] Created crawl_run ${result.rows[0].id}`); + } + + return result.rows[0].id; +} + +// ============================================================ +// SYNC PRODUCTS TO CANONICAL +// ============================================================ + +/** + * Sync dutchie_products to store_products for a single dispensary. + * Called after a crawl completes. + */ +export async function syncProductsToCanonical( + pool: Pool, + dispensaryId: number, + stateId: number | null, + crawlRunId: number | null, + options: SyncOptions = {} +): Promise<{ upserted: number; new: number; updated: number; errors: string[] }> { + const { dryRun = false, verbose = false } = options; + const errors: string[] = []; + let newCount = 0; + let updatedCount = 0; + + // Get all products for this dispensary + const { rows: products } = await pool.query( + `SELECT + dp.id, + dp.external_product_id, + dp.name, + dp.brand_name, + dp.brand_id, + dp.category, + dp.subcategory, + dp.type, + dp.strain_type, + dp.description, + dp.effects, + dp.cannabinoids_v2, + dp.thc, + dp.thc_content, + dp.cbd, + dp.cbd_content, + dp.primary_image_url, + dp.local_image_url, + dp.local_image_thumb_url, + dp.local_image_medium_url, + dp.original_image_url, + dp.additional_images, + dp.stock_status, + dp.c_name, + dp.enterprise_product_id, + dp.weight, + dp.options, + dp.measurements, + dp.status, + dp.featured, + dp.special, + dp.medical_only, + dp.rec_only, + dp.is_below_threshold, + dp.is_below_kiosk_threshold, + dp.total_quantity_available, + dp.total_kiosk_quantity_available, + dp.first_seen_at, + dp.last_seen_at, + dp.updated_at, + d.platform_dispensary_id + FROM dutchie_products dp + LEFT JOIN dispensaries d ON d.id = dp.dispensary_id + WHERE dp.dispensary_id = $1`, + [dispensaryId] + ); + + if (verbose) { + console.log(`[IncrSync] Found ${products.length} products for dispensary ${dispensaryId}`); + } + + // Process in batches + for (let i = 0; i < products.length; i += BATCH_SIZE) { + const batch = products.slice(i, i + BATCH_SIZE); + + for (const p of batch) { + try { + const thcPercent = parseFloat(p.thc) || parseFloat(p.thc_content) || null; + const cbdPercent = parseFloat(p.cbd) || parseFloat(p.cbd_content) || null; + const stockStatus = p.stock_status || 'unknown'; + const isInStock = stockStatus === 'in_stock' || stockStatus === 'unknown'; + + if (dryRun) { + if (verbose) { + console.log(`[IncrSync][DryRun] Would upsert product ${p.external_product_id}`); + } + newCount++; + continue; + } + + const result = await pool.query( + `INSERT INTO store_products ( + dispensary_id, state_id, provider, provider_product_id, + provider_brand_id, provider_dispensary_id, enterprise_product_id, + legacy_dutchie_product_id, + name, brand_name, category, subcategory, product_type, strain_type, + description, effects, cannabinoids, + thc_percent, cbd_percent, thc_content_text, cbd_content_text, + is_in_stock, stock_status, stock_quantity, + total_quantity_available, total_kiosk_quantity_available, + image_url, local_image_url, local_image_thumb_url, local_image_medium_url, + original_image_url, additional_images, + is_on_special, is_featured, medical_only, rec_only, + is_below_threshold, is_below_kiosk_threshold, + platform_status, c_name, weight, options, measurements, + first_seen_at, last_seen_at, updated_at + ) VALUES ( + $1, $2, 'dutchie', $3, + $4, $5, $6, + $7, + $8, $9, $10, $11, $12, $13, + $14, $15, $16, + $17, $18, $19, $20, + $21, $22, $23, + $24, $25, + $26, $27, $28, $29, + $30, $31, + $32, $33, $34, $35, + $36, $37, + $38, $39, $40, $41, $42, + $43, $44, NOW() + ) + ON CONFLICT (dispensary_id, provider, provider_product_id) + DO UPDATE SET + legacy_dutchie_product_id = EXCLUDED.legacy_dutchie_product_id, + name = EXCLUDED.name, + brand_name = EXCLUDED.brand_name, + category = EXCLUDED.category, + subcategory = EXCLUDED.subcategory, + is_in_stock = EXCLUDED.is_in_stock, + stock_status = EXCLUDED.stock_status, + thc_percent = EXCLUDED.thc_percent, + cbd_percent = EXCLUDED.cbd_percent, + image_url = EXCLUDED.image_url, + local_image_url = EXCLUDED.local_image_url, + is_on_special = EXCLUDED.is_on_special, + platform_status = EXCLUDED.platform_status, + last_seen_at = NOW(), + updated_at = NOW() + RETURNING (xmax = 0) as is_new`, + [ + dispensaryId, + stateId, + p.external_product_id, + p.brand_id, + p.platform_dispensary_id, + p.enterprise_product_id, + p.id, + p.name, + p.brand_name, + p.category || p.type, + p.subcategory, + p.type, + p.strain_type, + p.description, + p.effects, + p.cannabinoids_v2, + thcPercent, + cbdPercent, + p.thc_content, + p.cbd_content, + isInStock, + stockStatus, + p.total_quantity_available, + p.total_quantity_available, + p.total_kiosk_quantity_available, + p.primary_image_url, + p.local_image_url, + p.local_image_thumb_url, + p.local_image_medium_url, + p.original_image_url, + p.additional_images, + p.special || false, + p.featured || false, + p.medical_only || false, + p.rec_only || false, + p.is_below_threshold || false, + p.is_below_kiosk_threshold || false, + p.status, + p.c_name, + p.weight, + p.options, + p.measurements, + p.first_seen_at || p.updated_at, + p.last_seen_at || p.updated_at, + ] + ); + + if (result.rows[0]?.is_new) { + newCount++; + } else { + updatedCount++; + } + } catch (error: any) { + errors.push(`Product ${p.id}: ${error.message}`); + } + } + } + + return { + upserted: newCount + updatedCount, + new: newCount, + updated: updatedCount, + errors, + }; +} + +// ============================================================ +// SYNC SNAPSHOTS TO CANONICAL +// ============================================================ + +/** + * Sync dutchie_product_snapshots to store_product_snapshots for recent crawls. + */ +export async function syncSnapshotsToCanonical( + pool: Pool, + dispensaryId: number, + stateId: number | null, + crawlRunId: number | null, + since: Date, + options: SyncOptions = {} +): Promise<{ created: number; errors: string[] }> { + const { dryRun = false, verbose = false } = options; + const errors: string[] = []; + let created = 0; + + // Get recent snapshots that haven't been synced yet + const { rows: snapshots } = await pool.query( + `SELECT + dps.id, + dps.dutchie_product_id, + dps.dispensary_id, + dps.options, + dps.raw_product_data, + dps.crawled_at, + dps.created_at, + dp.external_product_id, + dp.name, + dp.brand_name, + dp.category, + dp.subcategory, + sp.id as store_product_id, + d.platform_dispensary_id + FROM dutchie_product_snapshots dps + JOIN dutchie_products dp ON dp.id = dps.dutchie_product_id + LEFT JOIN store_products sp ON sp.dispensary_id = dps.dispensary_id + AND sp.provider_product_id = dp.external_product_id + AND sp.provider = 'dutchie' + LEFT JOIN dispensaries d ON d.id = dps.dispensary_id + LEFT JOIN store_product_snapshots sps ON sps.legacy_snapshot_id = dps.id + WHERE dps.dispensary_id = $1 + AND dps.crawled_at >= $2 + AND sps.id IS NULL + ORDER BY dps.id`, + [dispensaryId, since] + ); + + if (verbose) { + console.log(`[IncrSync] Found ${snapshots.length} new snapshots since ${since.toISOString()}`); + } + + if (snapshots.length === 0) { + return { created: 0, errors: [] }; + } + + for (const s of snapshots) { + try { + // Extract pricing from raw_product_data + let priceRec: number | null = null; + let priceMed: number | null = null; + let priceRecSpecial: number | null = null; + let isOnSpecial = false; + let isInStock = true; + let thcPercent: number | null = null; + let cbdPercent: number | null = null; + let stockStatus = 'unknown'; + let platformStatus: string | null = null; + + if (s.raw_product_data) { + const raw = typeof s.raw_product_data === 'string' + ? JSON.parse(s.raw_product_data) + : s.raw_product_data; + + priceRec = raw.recPrices?.[0] || raw.Prices?.[0] || null; + priceMed = raw.medicalPrices?.[0] || null; + priceRecSpecial = raw.recSpecialPrices?.[0] || null; + isOnSpecial = raw.special === true || (priceRecSpecial !== null); + thcPercent = raw.THCContent?.range?.[0] || raw.THC || null; + cbdPercent = raw.CBDContent?.range?.[0] || raw.CBD || null; + platformStatus = raw.Status || null; + isInStock = platformStatus === 'Active'; + stockStatus = isInStock ? 'in_stock' : 'out_of_stock'; + } + + if (dryRun) { + if (verbose) { + console.log(`[IncrSync][DryRun] Would create snapshot for legacy ${s.id}`); + } + created++; + continue; + } + + await pool.query( + `INSERT INTO store_product_snapshots ( + dispensary_id, store_product_id, state_id, + provider, provider_product_id, provider_dispensary_id, + crawl_run_id, + legacy_snapshot_id, legacy_dutchie_product_id, + captured_at, + name, brand_name, category, subcategory, + price_rec, price_med, price_rec_special, + is_on_special, is_in_stock, stock_status, + thc_percent, cbd_percent, + platform_status, options, raw_data, + created_at + ) VALUES ( + $1, $2, $3, + 'dutchie', $4, $5, + $6, + $7, $8, + $9, + $10, $11, $12, $13, + $14, $15, $16, + $17, $18, $19, + $20, $21, + $22, $23, $24, + NOW() + )`, + [ + s.dispensary_id, + s.store_product_id, + stateId, + s.external_product_id, + s.platform_dispensary_id, + crawlRunId, + s.id, + s.dutchie_product_id, + s.crawled_at, + s.name, + s.brand_name, + s.category, + s.subcategory, + priceRec, + priceMed, + priceRecSpecial, + isOnSpecial, + isInStock, + stockStatus, + thcPercent, + cbdPercent, + platformStatus, + s.options, + s.raw_product_data, + ] + ); + + created++; + } catch (error: any) { + errors.push(`Snapshot ${s.id}: ${error.message}`); + } + } + + return { created, errors }; +} + +// ============================================================ +// MAIN SYNC FUNCTION +// ============================================================ + +/** + * Sync a single crawl result to canonical tables. + * Call this from the crawler after each crawl completes. + */ +export async function syncCrawlToCanonical( + pool: Pool, + crawlResult: CrawlResult, + options: SyncOptions = {} +): Promise { + const startTime = Date.now(); + const errors: string[] = []; + const { verbose = false, skipSnapshots = false } = options; + + if (verbose) { + console.log(`[IncrSync] Starting sync for dispensary ${crawlResult.dispensaryId}`); + } + + // 1. Create crawl_run record + const crawlRunId = await getOrCreateCrawlRun(pool, crawlResult, options); + + // 2. Sync products + const productResult = await syncProductsToCanonical( + pool, + crawlResult.dispensaryId, + crawlResult.stateId || null, + crawlRunId, + options + ); + errors.push(...productResult.errors); + + // 3. Sync snapshots (if not skipped) + let snapshotsCreated = 0; + if (!skipSnapshots) { + const since = new Date(crawlResult.startedAt.getTime() - 60 * 1000); // 1 min before + const snapshotResult = await syncSnapshotsToCanonical( + pool, + crawlResult.dispensaryId, + crawlResult.stateId || null, + crawlRunId, + since, + options + ); + snapshotsCreated = snapshotResult.created; + errors.push(...snapshotResult.errors); + } + + const durationMs = Date.now() - startTime; + + if (verbose) { + console.log(`[IncrSync] Completed in ${durationMs}ms: ${productResult.upserted} products, ${snapshotsCreated} snapshots`); + } + + return { + crawlRunId, + productsUpserted: productResult.upserted, + productsNew: productResult.new, + productsUpdated: productResult.updated, + snapshotsCreated, + durationMs, + errors, + }; +} + +// ============================================================ +// BATCH SYNC FOR RECENT CRAWLS +// ============================================================ + +export interface RecentSyncOptions extends SyncOptions { + since?: string; // e.g., '1 hour', '30 minutes', '1 day' + dispensaryId?: number; + limit?: number; +} + +/** + * Sync recent crawls that haven't been synced yet. + * Run this as a background job to catch any missed syncs. + */ +export async function syncRecentCrawls( + pool: Pool, + options: RecentSyncOptions = {} +): Promise<{ synced: number; errors: string[] }> { + const { + since = '1 hour', + dispensaryId, + limit = 100, + verbose = false, + dryRun = false, + } = options; + + const errors: string[] = []; + let synced = 0; + + // Find recent completed crawl jobs that don't have a crawl_run + let query = ` + SELECT + dcj.id as crawl_job_id, + dcj.dispensary_id, + dcj.status, + dcj.started_at, + dcj.completed_at, + dcj.products_found, + dcj.products_created, + dcj.products_updated, + dcj.brands_found, + dcj.error_message, + d.state_id + FROM dispensary_crawl_jobs dcj + LEFT JOIN dispensaries d ON d.id = dcj.dispensary_id + LEFT JOIN crawl_runs cr ON cr.legacy_dispensary_crawl_job_id = dcj.id + WHERE dcj.status IN ('completed', 'failed') + AND dcj.started_at > NOW() - INTERVAL '${since}' + AND cr.id IS NULL + `; + + const params: any[] = []; + let paramIdx = 1; + + if (dispensaryId) { + query += ` AND dcj.dispensary_id = $${paramIdx}`; + params.push(dispensaryId); + paramIdx++; + } + + query += ` ORDER BY dcj.started_at DESC LIMIT $${paramIdx}`; + params.push(limit); + + const { rows: unsynced } = await pool.query(query, params); + + if (verbose) { + console.log(`[IncrSync] Found ${unsynced.length} unsynced crawls from last ${since}`); + } + + for (const job of unsynced) { + try { + const crawlResult: CrawlResult = { + dispensaryId: job.dispensary_id, + stateId: job.state_id, + crawlJobId: job.crawl_job_id, + startedAt: new Date(job.started_at), + finishedAt: job.completed_at ? new Date(job.completed_at) : undefined, + status: job.status === 'completed' ? 'success' : 'failed', + errorMessage: job.error_message, + productsFound: job.products_found || 0, + productsCreated: job.products_created || 0, + productsUpdated: job.products_updated || 0, + brandsFound: job.brands_found || 0, + }; + + await syncCrawlToCanonical(pool, crawlResult, { dryRun, verbose }); + synced++; + } catch (error: any) { + errors.push(`Job ${job.crawl_job_id}: ${error.message}`); + } + } + + return { synced, errors }; +} + +// ============================================================ +// EXPORTS +// ============================================================ + +export { + CrawlResult, + SyncOptions, + SyncResult, +}; diff --git a/backend/src/hydration/index.ts b/backend/src/hydration/index.ts new file mode 100644 index 00000000..caec7b3c --- /dev/null +++ b/backend/src/hydration/index.ts @@ -0,0 +1,96 @@ +/** + * Hydration Module + * + * Central export for the raw payload → canonical hydration pipeline. + * + * Components: + * - Payload Store: Store and retrieve raw payloads + * - Normalizers: Platform-specific JSON → canonical format converters + * - Canonical Upsert: Write normalized data to canonical tables + * - Worker: Process payloads in batches with locking + * - Backfill: Import historical data + * - Producer: Hook for crawlers to store payloads + */ + +// Types +export * from './types'; + +// Payload storage +export { + storeRawPayload, + getUnprocessedPayloads, + markPayloadProcessed, + markPayloadFailed, + getPayloadById, + getPayloadsForDispensary, + getPayloadStats, +} from './payload-store'; + +// Normalizers +export { + getNormalizer, + getRegisteredPlatforms, + isPlatformSupported, + DutchieNormalizer, + INormalizer, + BaseNormalizer, +} from './normalizers'; + +// Canonical upserts +export { + upsertStoreProducts, + createStoreProductSnapshots, + markDiscontinuedProducts, + upsertBrands, + hydrateToCanonical, +} from './canonical-upsert'; + +// Locking +export { + HydrationLockManager, + LOCK_NAMES, +} from './locking'; + +// Worker +export { + HydrationWorker, + runHydrationBatch, + processPayloadById, + reprocessFailedPayloads, +} from './worker'; + +// Backfill +export { + runBackfill, + backfillFromDutchieProducts, + backfillFromSnapshots, + backfillFromCacheFiles, + BackfillOptions, + BackfillResult, +} from './backfill'; + +// Producer +export { + producePayload, + createProducer, + onCrawlComplete, + ProducerOptions, +} from './producer'; + +// Legacy Backfill +export { + runLegacyBackfill, +} from './legacy-backfill'; + +// Incremental Sync +export { + syncCrawlToCanonical, + syncRecentCrawls, + syncProductsToCanonical, + syncSnapshotsToCanonical, + getOrCreateCrawlRun, + CrawlResult, + SyncOptions, + SyncResult, + RecentSyncOptions, +} from './incremental-sync'; diff --git a/backend/src/hydration/legacy-backfill.ts b/backend/src/hydration/legacy-backfill.ts new file mode 100644 index 00000000..193c7261 --- /dev/null +++ b/backend/src/hydration/legacy-backfill.ts @@ -0,0 +1,851 @@ +/** + * Legacy Backfill Script + * + * Directly hydrates canonical tables from legacy dutchie_* tables. + * This bypasses the payload-store and normalizer pipeline for efficiency. + * + * Source Tables (READ-ONLY): + * - dutchie_products → store_products + * - dutchie_product_snapshots → store_product_snapshots + * - dispensary_crawl_jobs → crawl_runs + * + * This script is: + * - IDEMPOTENT: Can be run multiple times safely + * - BATCH-ORIENTED: Processes in chunks to avoid OOM + * - RESUMABLE: Can start from a specific ID if interrupted + * + * Usage: + * npx tsx src/hydration/legacy-backfill.ts + * npx tsx src/hydration/legacy-backfill.ts --dispensary-id 123 + * npx tsx src/hydration/legacy-backfill.ts --dry-run + * npx tsx src/hydration/legacy-backfill.ts --start-from 5000 + */ + +import { Pool } from 'pg'; +import dotenv from 'dotenv'; + +dotenv.config(); + +// ============================================================ +// CONFIGURATION +// ============================================================ + +const BATCH_SIZE = 100; + +interface LegacyBackfillOptions { + dryRun: boolean; + dispensaryId?: number; + startFromProductId?: number; + startFromSnapshotId?: number; + startFromJobId?: number; + verbose: boolean; +} + +interface LegacyBackfillStats { + productsProcessed: number; + productsInserted: number; + productsUpdated: number; + productsSkipped: number; + productErrors: number; + + snapshotsProcessed: number; + snapshotsInserted: number; + snapshotsSkipped: number; + snapshotErrors: number; + + crawlRunsProcessed: number; + crawlRunsInserted: number; + crawlRunsSkipped: number; + crawlRunErrors: number; + + startedAt: Date; + completedAt?: Date; + durationMs?: number; +} + +// ============================================================ +// DATABASE CONNECTION +// ============================================================ + +function getConnectionString(): string { + if (process.env.CANNAIQ_DB_URL) { + return process.env.CANNAIQ_DB_URL; + } + + const host = process.env.CANNAIQ_DB_HOST; + const port = process.env.CANNAIQ_DB_PORT; + const name = process.env.CANNAIQ_DB_NAME; + const user = process.env.CANNAIQ_DB_USER; + const pass = process.env.CANNAIQ_DB_PASS; + + if (host && port && name && user && pass) { + return `postgresql://${user}:${pass}@${host}:${port}/${name}`; + } + + throw new Error('Missing CANNAIQ_DB_* environment variables'); +} + +// ============================================================ +// STEP 1: HYDRATE CRAWL RUNS FROM dispensary_crawl_jobs +// ============================================================ + +async function hydrateCrawlRuns( + pool: Pool, + options: LegacyBackfillOptions, + stats: LegacyBackfillStats +): Promise> { + console.log('\n=== STEP 1: Hydrate crawl_runs from dispensary_crawl_jobs ==='); + + // Map from legacy job ID to canonical crawl_run ID + const jobToCrawlRunMap = new Map(); + + // Build query + let query = ` + SELECT + dcj.id, + dcj.dispensary_id, + dcj.schedule_id, + dcj.status, + dcj.job_type, + dcj.started_at, + dcj.completed_at, + dcj.products_found, + dcj.products_created, + dcj.products_updated, + dcj.brands_found, + dcj.error_message, + dcj.retry_count, + dcj.created_at, + d.state_id + FROM dispensary_crawl_jobs dcj + LEFT JOIN dispensaries d ON d.id = dcj.dispensary_id + WHERE dcj.status IN ('completed', 'failed') + AND dcj.started_at IS NOT NULL + `; + + const params: any[] = []; + let paramIndex = 1; + + if (options.dispensaryId) { + query += ` AND dcj.dispensary_id = $${paramIndex}`; + params.push(options.dispensaryId); + paramIndex++; + } + + if (options.startFromJobId) { + query += ` AND dcj.id >= $${paramIndex}`; + params.push(options.startFromJobId); + paramIndex++; + } + + query += ` ORDER BY dcj.id`; + + const { rows: jobs } = await pool.query(query, params); + console.log(` Found ${jobs.length} crawl jobs to hydrate`); + + for (const job of jobs) { + stats.crawlRunsProcessed++; + + try { + // Check if already hydrated + const existing = await pool.query( + `SELECT id FROM crawl_runs WHERE legacy_dispensary_crawl_job_id = $1`, + [job.id] + ); + + if (existing.rows.length > 0) { + jobToCrawlRunMap.set(job.id, existing.rows[0].id); + stats.crawlRunsSkipped++; + continue; + } + + if (options.dryRun) { + if (options.verbose) { + console.log(` [DryRun] Would insert crawl_run for job ${job.id}`); + } + stats.crawlRunsInserted++; + continue; + } + + // Calculate duration + const durationMs = job.completed_at && job.started_at + ? new Date(job.completed_at).getTime() - new Date(job.started_at).getTime() + : null; + + // Map status + const status = job.status === 'completed' ? 'success' : 'failed'; + + // Insert crawl_run + const result = await pool.query( + `INSERT INTO crawl_runs ( + dispensary_id, state_id, provider, + legacy_dispensary_crawl_job_id, schedule_id, job_type, + started_at, finished_at, duration_ms, + status, error_message, + products_found, products_new, products_updated, products_missing, + snapshots_written, brands_found, + trigger_type, retry_count, created_at + ) VALUES ( + $1, $2, 'dutchie', + $3, $4, $5, + $6, $7, $8, + $9, $10, + $11, $12, $13, 0, + 0, $14, + 'scheduled', $15, $16 + ) + RETURNING id`, + [ + job.dispensary_id, + job.state_id, + job.id, + job.schedule_id, + job.job_type || 'full', + job.started_at, + job.completed_at, + durationMs, + status, + job.error_message, + job.products_found || 0, + job.products_created || 0, + job.products_updated || 0, + job.brands_found || 0, + job.retry_count || 0, + job.created_at, + ] + ); + + jobToCrawlRunMap.set(job.id, result.rows[0].id); + stats.crawlRunsInserted++; + + if (options.verbose && stats.crawlRunsInserted % 100 === 0) { + console.log(` Inserted ${stats.crawlRunsInserted} crawl runs...`); + } + } catch (error: any) { + stats.crawlRunErrors++; + console.error(` Error hydrating job ${job.id}: ${error.message}`); + } + } + + console.log(` Crawl runs: ${stats.crawlRunsInserted} inserted, ${stats.crawlRunsSkipped} skipped, ${stats.crawlRunErrors} errors`); + return jobToCrawlRunMap; +} + +// ============================================================ +// STEP 2: HYDRATE STORE_PRODUCTS FROM dutchie_products +// ============================================================ + +async function hydrateStoreProducts( + pool: Pool, + options: LegacyBackfillOptions, + stats: LegacyBackfillStats +): Promise> { + console.log('\n=== STEP 2: Hydrate store_products from dutchie_products ==='); + + // Map from legacy dutchie_product.id to canonical store_product.id + const productIdMap = new Map(); + + // Get total count + let countQuery = `SELECT COUNT(*) as cnt FROM dutchie_products`; + const countParams: any[] = []; + + if (options.dispensaryId) { + countQuery += ` WHERE dispensary_id = $1`; + countParams.push(options.dispensaryId); + } + + const { rows: countRows } = await pool.query(countQuery, countParams); + const totalCount = parseInt(countRows[0].cnt, 10); + console.log(` Total dutchie_products: ${totalCount}`); + + let offset = options.startFromProductId ? 0 : 0; + let processed = 0; + + while (processed < totalCount) { + // Fetch batch + let query = ` + SELECT + dp.id, + dp.dispensary_id, + dp.external_product_id, + dp.name, + dp.brand_name, + dp.brand_id, + dp.brand_logo_url, + dp.category, + dp.subcategory, + dp.strain_type, + dp.description, + dp.effects, + dp.thc, + dp.thc_content, + dp.cbd, + dp.cbd_content, + dp.cannabinoids_v2, + dp.primary_image_url, + dp.additional_images, + dp.local_image_url, + dp.local_image_thumb_url, + dp.local_image_medium_url, + dp.original_image_url, + dp.stock_status, + dp.type, + dp.c_name, + dp.enterprise_product_id, + dp.weight, + dp.options, + dp.measurements, + dp.status, + dp.featured, + dp.special, + dp.medical_only, + dp.rec_only, + dp.is_below_threshold, + dp.is_below_kiosk_threshold, + dp.total_quantity_available, + dp.total_kiosk_quantity_available, + dp.first_seen_at, + dp.last_seen_at, + dp.created_at, + dp.updated_at, + d.state_id, + d.platform_dispensary_id + FROM dutchie_products dp + LEFT JOIN dispensaries d ON d.id = dp.dispensary_id + `; + + const params: any[] = []; + let paramIndex = 1; + + if (options.dispensaryId) { + query += ` WHERE dp.dispensary_id = $${paramIndex}`; + params.push(options.dispensaryId); + paramIndex++; + } + + if (options.startFromProductId && processed === 0) { + query += options.dispensaryId ? ` AND` : ` WHERE`; + query += ` dp.id >= $${paramIndex}`; + params.push(options.startFromProductId); + paramIndex++; + } + + query += ` ORDER BY dp.id LIMIT $${paramIndex} OFFSET $${paramIndex + 1}`; + params.push(BATCH_SIZE, offset); + + const { rows: products } = await pool.query(query, params); + + if (products.length === 0) break; + + for (const p of products) { + stats.productsProcessed++; + + try { + // Check if already hydrated by legacy ID + const existingByLegacy = await pool.query( + `SELECT id FROM store_products WHERE legacy_dutchie_product_id = $1`, + [p.id] + ); + + if (existingByLegacy.rows.length > 0) { + productIdMap.set(p.id, existingByLegacy.rows[0].id); + stats.productsSkipped++; + continue; + } + + // Parse THC/CBD percent from text + const thcPercent = parseFloat(p.thc) || parseFloat(p.thc_content) || null; + const cbdPercent = parseFloat(p.cbd) || parseFloat(p.cbd_content) || null; + + // Determine stock status + const stockStatus = p.stock_status || 'unknown'; + const isInStock = stockStatus === 'in_stock' || stockStatus === 'unknown'; + + if (options.dryRun) { + if (options.verbose) { + console.log(` [DryRun] Would upsert store_product for legacy ID ${p.id}`); + } + stats.productsInserted++; + continue; + } + + // Upsert store_product + const result = await pool.query( + `INSERT INTO store_products ( + dispensary_id, state_id, provider, provider_product_id, + provider_brand_id, provider_dispensary_id, enterprise_product_id, + legacy_dutchie_product_id, + name, brand_name, category, subcategory, product_type, strain_type, + description, effects, cannabinoids, + thc_percent, cbd_percent, thc_content_text, cbd_content_text, + is_in_stock, stock_status, stock_quantity, + total_quantity_available, total_kiosk_quantity_available, + image_url, local_image_url, local_image_thumb_url, local_image_medium_url, + original_image_url, additional_images, + is_on_special, is_featured, medical_only, rec_only, + is_below_threshold, is_below_kiosk_threshold, + platform_status, c_name, weight, options, measurements, + first_seen_at, last_seen_at, created_at, updated_at + ) VALUES ( + $1, $2, 'dutchie', $3, + $4, $5, $6, + $7, + $8, $9, $10, $11, $12, $13, + $14, $15, $16, + $17, $18, $19, $20, + $21, $22, $23, + $24, $25, + $26, $27, $28, $29, + $30, $31, + $32, $33, $34, $35, + $36, $37, + $38, $39, $40, $41, $42, + $43, $44, $45, $46 + ) + ON CONFLICT (dispensary_id, provider, provider_product_id) + DO UPDATE SET + legacy_dutchie_product_id = EXCLUDED.legacy_dutchie_product_id, + name = EXCLUDED.name, + brand_name = EXCLUDED.brand_name, + category = EXCLUDED.category, + subcategory = EXCLUDED.subcategory, + is_in_stock = EXCLUDED.is_in_stock, + stock_status = EXCLUDED.stock_status, + last_seen_at = EXCLUDED.last_seen_at, + updated_at = NOW() + RETURNING id, (xmax = 0) as is_new`, + [ + p.dispensary_id, + p.state_id, + p.external_product_id, + p.brand_id, + p.platform_dispensary_id, + p.enterprise_product_id, + p.id, // legacy_dutchie_product_id + p.name, + p.brand_name, + p.category || p.type, + p.subcategory, + p.type, + p.strain_type, + p.description, + p.effects, + p.cannabinoids_v2, + thcPercent, + cbdPercent, + p.thc_content, + p.cbd_content, + isInStock, + stockStatus, + p.total_quantity_available, + p.total_quantity_available, + p.total_kiosk_quantity_available, + p.primary_image_url, + p.local_image_url, + p.local_image_thumb_url, + p.local_image_medium_url, + p.original_image_url, + p.additional_images, + p.special || false, + p.featured || false, + p.medical_only || false, + p.rec_only || false, + p.is_below_threshold || false, + p.is_below_kiosk_threshold || false, + p.status, + p.c_name, + p.weight, + p.options, + p.measurements, + p.first_seen_at || p.created_at, + p.last_seen_at || p.updated_at, + p.created_at, + p.updated_at, + ] + ); + + productIdMap.set(p.id, result.rows[0].id); + + if (result.rows[0].is_new) { + stats.productsInserted++; + } else { + stats.productsUpdated++; + } + } catch (error: any) { + stats.productErrors++; + if (options.verbose) { + console.error(` Error hydrating product ${p.id}: ${error.message}`); + } + } + } + + offset += BATCH_SIZE; + processed += products.length; + console.log(` Processed ${processed}/${totalCount} products...`); + } + + console.log(` Products: ${stats.productsInserted} inserted, ${stats.productsUpdated} updated, ${stats.productsSkipped} skipped, ${stats.productErrors} errors`); + return productIdMap; +} + +// ============================================================ +// STEP 3: HYDRATE STORE_PRODUCT_SNAPSHOTS FROM dutchie_product_snapshots +// ============================================================ + +async function hydrateSnapshots( + pool: Pool, + options: LegacyBackfillOptions, + stats: LegacyBackfillStats, + productIdMap: Map, + _jobToCrawlRunMap: Map +): Promise { + console.log('\n=== STEP 3: Hydrate store_product_snapshots from dutchie_product_snapshots ==='); + + // Get total count + let countQuery = `SELECT COUNT(*) as cnt FROM dutchie_product_snapshots`; + const countParams: any[] = []; + + if (options.dispensaryId) { + countQuery += ` WHERE dispensary_id = $1`; + countParams.push(options.dispensaryId); + } + + const { rows: countRows } = await pool.query(countQuery, countParams); + const totalCount = parseInt(countRows[0].cnt, 10); + console.log(` Total dutchie_product_snapshots: ${totalCount}`); + + if (totalCount === 0) { + console.log(' No snapshots to hydrate'); + return; + } + + let offset = 0; + let processed = 0; + + while (processed < totalCount) { + // Fetch batch with product info + let query = ` + SELECT + dps.id, + dps.dutchie_product_id, + dps.dispensary_id, + dps.options, + dps.raw_product_data, + dps.crawled_at, + dps.created_at, + dp.external_product_id, + dp.name, + dp.brand_name, + dp.category, + dp.subcategory, + d.state_id, + d.platform_dispensary_id + FROM dutchie_product_snapshots dps + JOIN dutchie_products dp ON dp.id = dps.dutchie_product_id + LEFT JOIN dispensaries d ON d.id = dps.dispensary_id + `; + + const params: any[] = []; + let paramIndex = 1; + + if (options.dispensaryId) { + query += ` WHERE dps.dispensary_id = $${paramIndex}`; + params.push(options.dispensaryId); + paramIndex++; + } + + if (options.startFromSnapshotId && processed === 0) { + query += options.dispensaryId ? ` AND` : ` WHERE`; + query += ` dps.id >= $${paramIndex}`; + params.push(options.startFromSnapshotId); + paramIndex++; + } + + query += ` ORDER BY dps.id LIMIT $${paramIndex} OFFSET $${paramIndex + 1}`; + params.push(BATCH_SIZE, offset); + + const { rows: snapshots } = await pool.query(query, params); + + if (snapshots.length === 0) break; + + for (const s of snapshots) { + stats.snapshotsProcessed++; + + try { + // Check if already hydrated + const existing = await pool.query( + `SELECT 1 FROM store_product_snapshots WHERE legacy_snapshot_id = $1`, + [s.id] + ); + + if (existing.rows.length > 0) { + stats.snapshotsSkipped++; + continue; + } + + // Get canonical store_product_id + const storeProductId = productIdMap.get(s.dutchie_product_id); + + // Extract pricing from raw_product_data if available + let priceRec: number | null = null; + let priceMed: number | null = null; + let priceRecSpecial: number | null = null; + let isOnSpecial = false; + let isInStock = true; + let thcPercent: number | null = null; + let cbdPercent: number | null = null; + let stockStatus = 'unknown'; + let platformStatus: string | null = null; + + if (s.raw_product_data) { + const raw = typeof s.raw_product_data === 'string' + ? JSON.parse(s.raw_product_data) + : s.raw_product_data; + + priceRec = raw.recPrices?.[0] || raw.Prices?.[0] || null; + priceMed = raw.medicalPrices?.[0] || null; + priceRecSpecial = raw.recSpecialPrices?.[0] || null; + isOnSpecial = raw.special === true || (priceRecSpecial !== null); + thcPercent = raw.THCContent?.range?.[0] || raw.THC || null; + cbdPercent = raw.CBDContent?.range?.[0] || raw.CBD || null; + platformStatus = raw.Status || null; + isInStock = platformStatus === 'Active'; + stockStatus = isInStock ? 'in_stock' : 'out_of_stock'; + } + + if (options.dryRun) { + if (options.verbose) { + console.log(` [DryRun] Would insert snapshot for legacy ID ${s.id}`); + } + stats.snapshotsInserted++; + continue; + } + + // Insert snapshot + await pool.query( + `INSERT INTO store_product_snapshots ( + dispensary_id, store_product_id, state_id, + provider, provider_product_id, provider_dispensary_id, + legacy_snapshot_id, legacy_dutchie_product_id, + captured_at, + name, brand_name, category, subcategory, + price_rec, price_med, price_rec_special, + is_on_special, is_in_stock, stock_status, + thc_percent, cbd_percent, + platform_status, options, raw_data, + created_at + ) VALUES ( + $1, $2, $3, + 'dutchie', $4, $5, + $6, $7, + $8, + $9, $10, $11, $12, + $13, $14, $15, + $16, $17, $18, + $19, $20, + $21, $22, $23, + $24 + )`, + [ + s.dispensary_id, + storeProductId, + s.state_id, + s.external_product_id, + s.platform_dispensary_id, + s.id, // legacy_snapshot_id + s.dutchie_product_id, + s.crawled_at, + s.name, + s.brand_name, + s.category, + s.subcategory, + priceRec, + priceMed, + priceRecSpecial, + isOnSpecial, + isInStock, + stockStatus, + thcPercent, + cbdPercent, + platformStatus, + s.options, + s.raw_product_data, + s.created_at, + ] + ); + + stats.snapshotsInserted++; + } catch (error: any) { + stats.snapshotErrors++; + if (options.verbose) { + console.error(` Error hydrating snapshot ${s.id}: ${error.message}`); + } + } + } + + offset += BATCH_SIZE; + processed += snapshots.length; + + if (processed % 1000 === 0) { + console.log(` Processed ${processed}/${totalCount} snapshots...`); + } + } + + console.log(` Snapshots: ${stats.snapshotsInserted} inserted, ${stats.snapshotsSkipped} skipped, ${stats.snapshotErrors} errors`); +} + +// ============================================================ +// MAIN BACKFILL FUNCTION +// ============================================================ + +export async function runLegacyBackfill( + pool: Pool, + options: LegacyBackfillOptions +): Promise { + const stats: LegacyBackfillStats = { + productsProcessed: 0, + productsInserted: 0, + productsUpdated: 0, + productsSkipped: 0, + productErrors: 0, + snapshotsProcessed: 0, + snapshotsInserted: 0, + snapshotsSkipped: 0, + snapshotErrors: 0, + crawlRunsProcessed: 0, + crawlRunsInserted: 0, + crawlRunsSkipped: 0, + crawlRunErrors: 0, + startedAt: new Date(), + }; + + console.log('============================================================'); + console.log('Legacy → Canonical Hydration Backfill'); + console.log('============================================================'); + console.log(`Mode: ${options.dryRun ? 'DRY RUN' : 'LIVE'}`); + if (options.dispensaryId) { + console.log(`Dispensary: ${options.dispensaryId}`); + } + console.log(`Batch size: ${BATCH_SIZE}`); + console.log(''); + + try { + // Step 1: Hydrate crawl_runs + const jobToCrawlRunMap = await hydrateCrawlRuns(pool, options, stats); + + // Step 2: Hydrate store_products + const productIdMap = await hydrateStoreProducts(pool, options, stats); + + // Step 3: Hydrate store_product_snapshots + await hydrateSnapshots(pool, options, stats, productIdMap, jobToCrawlRunMap); + + stats.completedAt = new Date(); + stats.durationMs = stats.completedAt.getTime() - stats.startedAt.getTime(); + + console.log('\n============================================================'); + console.log('SUMMARY'); + console.log('============================================================'); + console.log(`Duration: ${(stats.durationMs / 1000).toFixed(1)}s`); + console.log(''); + console.log('Crawl Runs:'); + console.log(` Processed: ${stats.crawlRunsProcessed}`); + console.log(` Inserted: ${stats.crawlRunsInserted}`); + console.log(` Skipped: ${stats.crawlRunsSkipped}`); + console.log(` Errors: ${stats.crawlRunErrors}`); + console.log(''); + console.log('Products:'); + console.log(` Processed: ${stats.productsProcessed}`); + console.log(` Inserted: ${stats.productsInserted}`); + console.log(` Updated: ${stats.productsUpdated}`); + console.log(` Skipped: ${stats.productsSkipped}`); + console.log(` Errors: ${stats.productErrors}`); + console.log(''); + console.log('Snapshots:'); + console.log(` Processed: ${stats.snapshotsProcessed}`); + console.log(` Inserted: ${stats.snapshotsInserted}`); + console.log(` Skipped: ${stats.snapshotsSkipped}`); + console.log(` Errors: ${stats.snapshotErrors}`); + console.log(''); + + return stats; + } catch (error: any) { + console.error('\nFATAL ERROR:', error.message); + throw error; + } +} + +// ============================================================ +// CLI ENTRYPOINT +// ============================================================ + +async function main() { + const args = process.argv.slice(2); + + const options: LegacyBackfillOptions = { + dryRun: args.includes('--dry-run'), + verbose: args.includes('--verbose') || args.includes('-v'), + }; + + // Parse --dispensary-id + const dispIdx = args.indexOf('--dispensary-id'); + if (dispIdx !== -1 && args[dispIdx + 1]) { + options.dispensaryId = parseInt(args[dispIdx + 1], 10); + } + + // Parse --start-from + const startIdx = args.indexOf('--start-from'); + if (startIdx !== -1 && args[startIdx + 1]) { + options.startFromProductId = parseInt(args[startIdx + 1], 10); + } + + // Show help + if (args.includes('--help') || args.includes('-h')) { + console.log(` +Legacy Backfill Script - Hydrates canonical tables from dutchie_* tables + +Usage: + npx tsx src/hydration/legacy-backfill.ts [options] + +Options: + --dry-run Print what would be done without modifying the database + --dispensary-id N Only process a specific dispensary + --start-from N Resume from a specific product ID + --verbose, -v Print detailed progress for each record + --help, -h Show this help message + +Examples: + # Full backfill + npx tsx src/hydration/legacy-backfill.ts + + # Dry run for one dispensary + npx tsx src/hydration/legacy-backfill.ts --dry-run --dispensary-id 123 + + # Resume from product ID 5000 + npx tsx src/hydration/legacy-backfill.ts --start-from 5000 +`); + process.exit(0); + } + + const pool = new Pool({ + connectionString: getConnectionString(), + max: 5, + }); + + try { + // Verify connection + await pool.query('SELECT 1'); + console.log('Database connection: OK'); + + await runLegacyBackfill(pool, options); + } catch (error: any) { + console.error('Error:', error.message); + process.exit(1); + } finally { + await pool.end(); + } +} + +// Run if called directly +if (require.main === module) { + main(); +} diff --git a/backend/src/hydration/locking.ts b/backend/src/hydration/locking.ts new file mode 100644 index 00000000..bd2fe995 --- /dev/null +++ b/backend/src/hydration/locking.ts @@ -0,0 +1,194 @@ +/** + * Distributed Locking for Hydration Workers + * + * Prevents multiple workers from processing the same payloads. + * Uses PostgreSQL advisory locks with timeout-based expiry. + */ + +import { Pool } from 'pg'; +import { v4 as uuidv4 } from 'uuid'; + +const DEFAULT_LOCK_TIMEOUT_MS = 5 * 60 * 1000; // 5 minutes +const HEARTBEAT_INTERVAL_MS = 30 * 1000; // 30 seconds + +// ============================================================ +// LOCK MANAGER +// ============================================================ + +export class HydrationLockManager { + private pool: Pool; + private workerId: string; + private heartbeatInterval: NodeJS.Timeout | null = null; + private activeLocks: Set = new Set(); + + constructor(pool: Pool, workerId?: string) { + this.pool = pool; + this.workerId = workerId || `worker-${uuidv4().slice(0, 8)}`; + } + + /** + * Acquire a named lock + * Returns true if lock was acquired, false if already held by another worker + */ + async acquireLock( + lockName: string, + timeoutMs: number = DEFAULT_LOCK_TIMEOUT_MS + ): Promise { + const expiresAt = new Date(Date.now() + timeoutMs); + + try { + // First, clean up expired locks + await this.pool.query( + `DELETE FROM hydration_locks WHERE expires_at < NOW()` + ); + + // Try to insert the lock + const result = await this.pool.query( + `INSERT INTO hydration_locks (lock_name, worker_id, acquired_at, expires_at, heartbeat_at) + VALUES ($1, $2, NOW(), $3, NOW()) + ON CONFLICT (lock_name) DO NOTHING + RETURNING id`, + [lockName, this.workerId, expiresAt] + ); + + if (result.rows.length > 0) { + this.activeLocks.add(lockName); + this.startHeartbeat(); + console.log(`[HydrationLock] Acquired lock: ${lockName} (worker: ${this.workerId})`); + return true; + } + + // Check if we already own the lock + const existing = await this.pool.query( + `SELECT worker_id FROM hydration_locks WHERE lock_name = $1`, + [lockName] + ); + + if (existing.rows.length > 0 && existing.rows[0].worker_id === this.workerId) { + // Refresh our own lock + await this.refreshLock(lockName, timeoutMs); + return true; + } + + console.log(`[HydrationLock] Lock ${lockName} already held by ${existing.rows[0]?.worker_id}`); + return false; + } catch (error: any) { + console.error(`[HydrationLock] Error acquiring lock ${lockName}:`, error.message); + return false; + } + } + + /** + * Release a named lock + */ + async releaseLock(lockName: string): Promise { + try { + await this.pool.query( + `DELETE FROM hydration_locks WHERE lock_name = $1 AND worker_id = $2`, + [lockName, this.workerId] + ); + this.activeLocks.delete(lockName); + console.log(`[HydrationLock] Released lock: ${lockName}`); + + if (this.activeLocks.size === 0) { + this.stopHeartbeat(); + } + } catch (error: any) { + console.error(`[HydrationLock] Error releasing lock ${lockName}:`, error.message); + } + } + + /** + * Refresh lock expiry + */ + async refreshLock(lockName: string, timeoutMs: number = DEFAULT_LOCK_TIMEOUT_MS): Promise { + const expiresAt = new Date(Date.now() + timeoutMs); + await this.pool.query( + `UPDATE hydration_locks + SET expires_at = $1, heartbeat_at = NOW() + WHERE lock_name = $2 AND worker_id = $3`, + [expiresAt, lockName, this.workerId] + ); + } + + /** + * Release all locks held by this worker + */ + async releaseAllLocks(): Promise { + this.stopHeartbeat(); + await this.pool.query( + `DELETE FROM hydration_locks WHERE worker_id = $1`, + [this.workerId] + ); + this.activeLocks.clear(); + console.log(`[HydrationLock] Released all locks for worker: ${this.workerId}`); + } + + /** + * Check if a lock is held (by any worker) + */ + async isLockHeld(lockName: string): Promise { + const result = await this.pool.query( + `SELECT 1 FROM hydration_locks + WHERE lock_name = $1 AND expires_at > NOW()`, + [lockName] + ); + return result.rows.length > 0; + } + + /** + * Get current lock holder + */ + async getLockHolder(lockName: string): Promise { + const result = await this.pool.query( + `SELECT worker_id FROM hydration_locks + WHERE lock_name = $1 AND expires_at > NOW()`, + [lockName] + ); + return result.rows[0]?.worker_id || null; + } + + /** + * Start heartbeat to keep locks alive + */ + private startHeartbeat(): void { + if (this.heartbeatInterval) return; + + this.heartbeatInterval = setInterval(async () => { + for (const lockName of this.activeLocks) { + try { + await this.refreshLock(lockName); + } catch (error: any) { + console.error(`[HydrationLock] Heartbeat failed for ${lockName}:`, error.message); + } + } + }, HEARTBEAT_INTERVAL_MS); + } + + /** + * Stop heartbeat + */ + private stopHeartbeat(): void { + if (this.heartbeatInterval) { + clearInterval(this.heartbeatInterval); + this.heartbeatInterval = null; + } + } + + /** + * Get worker ID + */ + getWorkerId(): string { + return this.workerId; + } +} + +// ============================================================ +// SINGLETON LOCK NAMES +// ============================================================ + +export const LOCK_NAMES = { + HYDRATION_BATCH: 'hydration:batch', + HYDRATION_CATCHUP: 'hydration:catchup', + BACKFILL: 'hydration:backfill', +} as const; diff --git a/backend/src/hydration/normalizers/base.ts b/backend/src/hydration/normalizers/base.ts new file mode 100644 index 00000000..47bfc8f8 --- /dev/null +++ b/backend/src/hydration/normalizers/base.ts @@ -0,0 +1,210 @@ +/** + * Base Normalizer Interface + * + * Abstract interface for platform-specific normalizers. + * Each platform (Dutchie, Jane, Treez, etc.) implements this interface. + */ + +import { + RawPayload, + NormalizationResult, + NormalizedProduct, + NormalizedPricing, + NormalizedAvailability, + NormalizedBrand, + NormalizedCategory, + NormalizationError, +} from '../types'; + +// ============================================================ +// NORMALIZER INTERFACE +// ============================================================ + +export interface INormalizer { + /** + * Platform identifier (e.g., 'dutchie', 'jane', 'treez') + */ + readonly platform: string; + + /** + * Supported payload versions + */ + readonly supportedVersions: number[]; + + /** + * Normalize a raw payload into canonical format + */ + normalize(payload: RawPayload): NormalizationResult; + + /** + * Extract products from raw JSON + */ + extractProducts(rawJson: any): any[]; + + /** + * Validate raw JSON structure + */ + validatePayload(rawJson: any): { valid: boolean; errors: string[] }; +} + +// ============================================================ +// BASE NORMALIZER (abstract) +// ============================================================ + +export abstract class BaseNormalizer implements INormalizer { + abstract readonly platform: string; + abstract readonly supportedVersions: number[]; + + /** + * Main normalization entry point + */ + normalize(payload: RawPayload): NormalizationResult { + const errors: NormalizationError[] = []; + const rawProducts = this.extractProducts(payload.raw_json); + + const products: NormalizedProduct[] = []; + const pricing = new Map(); + const availability = new Map(); + const brandsMap = new Map(); + const categoriesMap = new Map(); + + for (const rawProduct of rawProducts) { + try { + // Normalize product identity + const product = this.normalizeProduct(rawProduct, payload.dispensary_id); + if (!product) continue; + + products.push(product); + + // Normalize pricing + const productPricing = this.normalizePricing(rawProduct); + if (productPricing) { + pricing.set(product.externalProductId, productPricing); + } + + // Normalize availability + const productAvailability = this.normalizeAvailability(rawProduct); + if (productAvailability) { + availability.set(product.externalProductId, productAvailability); + } + + // Extract brand + const brand = this.extractBrand(rawProduct); + if (brand && brand.name) { + brandsMap.set(brand.slug, brand); + } + + // Extract category + const category = this.extractCategory(rawProduct); + if (category && category.name) { + categoriesMap.set(category.slug, category); + } + } catch (error: any) { + errors.push({ + productId: rawProduct._id || rawProduct.id || null, + field: 'normalization', + message: error.message, + rawValue: rawProduct, + }); + } + } + + return { + products, + pricing, + availability, + brands: Array.from(brandsMap.values()), + categories: Array.from(categoriesMap.values()), + productCount: products.length, + timestamp: new Date(), + errors, + }; + } + + /** + * Extract products array from raw JSON + */ + abstract extractProducts(rawJson: any): any[]; + + /** + * Validate raw JSON structure + */ + abstract validatePayload(rawJson: any): { valid: boolean; errors: string[] }; + + /** + * Normalize a single product + */ + protected abstract normalizeProduct(rawProduct: any, dispensaryId: number): NormalizedProduct | null; + + /** + * Normalize pricing for a product + */ + protected abstract normalizePricing(rawProduct: any): NormalizedPricing | null; + + /** + * Normalize availability for a product + */ + protected abstract normalizeAvailability(rawProduct: any): NormalizedAvailability | null; + + /** + * Extract brand from product + */ + protected abstract extractBrand(rawProduct: any): NormalizedBrand | null; + + /** + * Extract category from product + */ + protected abstract extractCategory(rawProduct: any): NormalizedCategory | null; + + // ============================================================ + // UTILITY METHODS + // ============================================================ + + /** + * Convert dollars to cents + */ + protected toCents(price?: number | null): number | null { + if (price === undefined || price === null) return null; + return Math.round(price * 100); + } + + /** + * Slugify a string + */ + protected slugify(str: string): string { + return str + .toLowerCase() + .trim() + .replace(/[^\w\s-]/g, '') + .replace(/[\s_-]+/g, '-') + .replace(/^-+|-+$/g, ''); + } + + /** + * Safely parse boolean values + */ + protected toBool(value: any, defaultVal: boolean = false): boolean { + if (value === true) return true; + if (value === false) return false; + if (typeof value === 'object') return defaultVal; + return defaultVal; + } + + /** + * Get min value from array + */ + protected getMin(arr?: number[]): number | null { + if (!arr || arr.length === 0) return null; + const valid = arr.filter((n) => n !== null && n !== undefined); + return valid.length > 0 ? Math.min(...valid) : null; + } + + /** + * Get max value from array + */ + protected getMax(arr?: number[]): number | null { + if (!arr || arr.length === 0) return null; + const valid = arr.filter((n) => n !== null && n !== undefined); + return valid.length > 0 ? Math.max(...valid) : null; + } +} diff --git a/backend/src/hydration/normalizers/dutchie.ts b/backend/src/hydration/normalizers/dutchie.ts new file mode 100644 index 00000000..3aac098a --- /dev/null +++ b/backend/src/hydration/normalizers/dutchie.ts @@ -0,0 +1,275 @@ +/** + * Dutchie Platform Normalizer + * + * Normalizes raw Dutchie GraphQL responses to canonical format. + */ + +import { BaseNormalizer } from './base'; +import { + NormalizedProduct, + NormalizedPricing, + NormalizedAvailability, + NormalizedBrand, + NormalizedCategory, +} from '../types'; + +export class DutchieNormalizer extends BaseNormalizer { + readonly platform = 'dutchie'; + readonly supportedVersions = [1, 2]; + + // ============================================================ + // EXTRACTION + // ============================================================ + + extractProducts(rawJson: any): any[] { + // Handle different payload structures + if (Array.isArray(rawJson)) { + return rawJson; + } + + // GraphQL response format: { data: { filteredProducts: { products: [...] } } } + if (rawJson?.data?.filteredProducts?.products) { + return rawJson.data.filteredProducts.products; + } + + // Direct products array + if (rawJson?.products && Array.isArray(rawJson.products)) { + return rawJson.products; + } + + // Merged mode format: { modeA: [...], modeB: [...], merged: [...] } + if (rawJson?.merged && Array.isArray(rawJson.merged)) { + return rawJson.merged; + } + + // Fallback: try products_a and products_b + if (rawJson?.products_a || rawJson?.products_b) { + const productsMap = new Map(); + + // Add mode_a products (have pricing) + for (const p of rawJson.products_a || []) { + const id = p._id || p.id; + if (id) productsMap.set(id, { ...p, _crawlMode: 'mode_a' }); + } + + // Add mode_b products (full coverage) + for (const p of rawJson.products_b || []) { + const id = p._id || p.id; + if (id && !productsMap.has(id)) { + productsMap.set(id, { ...p, _crawlMode: 'mode_b' }); + } + } + + return Array.from(productsMap.values()); + } + + console.warn('[DutchieNormalizer] Could not extract products from payload'); + return []; + } + + validatePayload(rawJson: any): { valid: boolean; errors: string[] } { + const errors: string[] = []; + + if (!rawJson) { + errors.push('Payload is null or undefined'); + return { valid: false, errors }; + } + + const products = this.extractProducts(rawJson); + if (products.length === 0) { + errors.push('No products found in payload'); + } + + // Check for GraphQL errors + if (rawJson?.errors && Array.isArray(rawJson.errors)) { + for (const err of rawJson.errors) { + errors.push(`GraphQL error: ${err.message || JSON.stringify(err)}`); + } + } + + return { valid: errors.length === 0, errors }; + } + + // ============================================================ + // NORMALIZATION + // ============================================================ + + protected normalizeProduct(rawProduct: any, dispensaryId: number): NormalizedProduct | null { + const externalId = rawProduct._id || rawProduct.id; + if (!externalId) { + console.warn('[DutchieNormalizer] Product missing ID, skipping'); + return null; + } + + const name = rawProduct.Name || rawProduct.name; + if (!name) { + console.warn(`[DutchieNormalizer] Product ${externalId} missing name, skipping`); + return null; + } + + return { + externalProductId: externalId, + dispensaryId, + platform: 'dutchie', + platformDispensaryId: rawProduct.DispensaryID || rawProduct.dispensaryId || '', + + // Core fields + name, + brandName: rawProduct.brandName || rawProduct.brand?.name || null, + brandId: rawProduct.brandId || rawProduct.brand?.id || rawProduct.brand?._id || null, + category: rawProduct.type || null, + subcategory: rawProduct.subcategory || null, + type: rawProduct.type || null, + strainType: rawProduct.strainType || null, + + // Potency + thcPercent: this.extractPotency(rawProduct.THCContent) || rawProduct.THC || null, + cbdPercent: this.extractPotency(rawProduct.CBDContent) || rawProduct.CBD || null, + thcContent: rawProduct.THCContent?.range?.[0] || null, + cbdContent: rawProduct.CBDContent?.range?.[0] || null, + + // Status + status: rawProduct.Status || null, + isActive: rawProduct.Status === 'Active', + medicalOnly: this.toBool(rawProduct.medicalOnly, false), + recOnly: this.toBool(rawProduct.recOnly, false), + + // Images + primaryImageUrl: rawProduct.Image || rawProduct.images?.[0]?.url || null, + images: rawProduct.images || [], + + // Raw reference + rawProduct, + }; + } + + protected normalizePricing(rawProduct: any): NormalizedPricing | null { + const externalId = rawProduct._id || rawProduct.id; + if (!externalId) return null; + + // Extract prices from various sources + const recPrices = rawProduct.recPrices || []; + const medPrices = rawProduct.medicalPrices || []; + const recSpecialPrices = rawProduct.recSpecialPrices || []; + const medSpecialPrices = rawProduct.medSpecialPrices || []; + + // Also check POSMetaData children for option-level pricing + const children = rawProduct.POSMetaData?.children || []; + const childRecPrices = children.map((c: any) => c.recPrice).filter(Boolean); + const childMedPrices = children.map((c: any) => c.medPrice).filter(Boolean); + + const allRecPrices = [...recPrices, ...childRecPrices]; + const allMedPrices = [...medPrices, ...childMedPrices]; + + // Determine if on special + const isOnSpecial = recSpecialPrices.length > 0 || medSpecialPrices.length > 0; + + // Calculate discount percent if on special + let discountPercent: number | null = null; + if (isOnSpecial && allRecPrices.length > 0 && recSpecialPrices.length > 0) { + const regularMin = Math.min(...allRecPrices); + const specialMin = Math.min(...recSpecialPrices); + if (regularMin > 0) { + discountPercent = Math.round(((regularMin - specialMin) / regularMin) * 100); + } + } + + return { + externalProductId: externalId, + + priceRec: this.toCents(this.getMin(allRecPrices)), + priceRecMin: this.toCents(this.getMin(allRecPrices)), + priceRecMax: this.toCents(this.getMax(allRecPrices)), + priceRecSpecial: this.toCents(this.getMin(recSpecialPrices)), + + priceMed: this.toCents(this.getMin(allMedPrices)), + priceMedMin: this.toCents(this.getMin(allMedPrices)), + priceMedMax: this.toCents(this.getMax(allMedPrices)), + priceMedSpecial: this.toCents(this.getMin(medSpecialPrices)), + + isOnSpecial, + specialName: rawProduct.specialName || null, + discountPercent, + }; + } + + protected normalizeAvailability(rawProduct: any): NormalizedAvailability | null { + const externalId = rawProduct._id || rawProduct.id; + if (!externalId) return null; + + // Calculate total quantity from POSMetaData children + const children = rawProduct.POSMetaData?.children || []; + let totalQuantity = 0; + for (const child of children) { + totalQuantity += (child.quantityAvailable || child.quantity || 0); + } + + // Derive stock status + const status = rawProduct.Status; + const isBelowThreshold = this.toBool(rawProduct.isBelowThreshold, false); + const optionsBelowThreshold = this.toBool(rawProduct.optionsBelowThreshold, false); + + let stockStatus: 'in_stock' | 'out_of_stock' | 'low_stock' | 'unknown' = 'unknown'; + let inStock = true; + + if (status === 'Active') { + if (isBelowThreshold || optionsBelowThreshold) { + stockStatus = 'low_stock'; + } else { + stockStatus = 'in_stock'; + } + } else if (status === 'Inactive' || status === 'archived') { + stockStatus = 'out_of_stock'; + inStock = false; + } else if (totalQuantity === 0) { + stockStatus = 'out_of_stock'; + inStock = false; + } + + return { + externalProductId: externalId, + inStock, + stockStatus, + quantity: totalQuantity || null, + quantityAvailable: totalQuantity || null, + isBelowThreshold, + optionsBelowThreshold, + }; + } + + protected extractBrand(rawProduct: any): NormalizedBrand | null { + const brandName = rawProduct.brandName || rawProduct.brand?.name; + if (!brandName) return null; + + return { + externalBrandId: rawProduct.brandId || rawProduct.brand?.id || rawProduct.brand?._id || null, + name: brandName, + slug: this.slugify(brandName), + logoUrl: rawProduct.brandLogo || rawProduct.brand?.imageUrl || null, + }; + } + + protected extractCategory(rawProduct: any): NormalizedCategory | null { + const categoryName = rawProduct.type; + if (!categoryName) return null; + + return { + name: categoryName, + slug: this.slugify(categoryName), + parentCategory: null, + }; + } + + // ============================================================ + // HELPERS + // ============================================================ + + private extractPotency(content: any): number | null { + if (!content) return null; + if (typeof content === 'number') return content; + if (content.range && Array.isArray(content.range) && content.range.length > 0) { + return content.range[0]; + } + return null; + } +} diff --git a/backend/src/hydration/normalizers/index.ts b/backend/src/hydration/normalizers/index.ts new file mode 100644 index 00000000..ca887fa5 --- /dev/null +++ b/backend/src/hydration/normalizers/index.ts @@ -0,0 +1,47 @@ +/** + * Normalizer Registry + * + * Central registry for platform-specific normalizers. + */ + +import { INormalizer } from './base'; +import { DutchieNormalizer } from './dutchie'; + +// ============================================================ +// NORMALIZER REGISTRY +// ============================================================ + +const normalizers: Map = new Map(); + +// Register platform normalizers +normalizers.set('dutchie', new DutchieNormalizer()); + +// Future platforms: +// normalizers.set('jane', new JaneNormalizer()); +// normalizers.set('treez', new TreezNormalizer()); +// normalizers.set('weedmaps', new WeedmapsNormalizer()); + +/** + * Get normalizer for a platform + */ +export function getNormalizer(platform: string): INormalizer | null { + return normalizers.get(platform.toLowerCase()) || null; +} + +/** + * Get all registered platforms + */ +export function getRegisteredPlatforms(): string[] { + return Array.from(normalizers.keys()); +} + +/** + * Check if platform is supported + */ +export function isPlatformSupported(platform: string): boolean { + return normalizers.has(platform.toLowerCase()); +} + +// Export individual normalizers for direct use +export { DutchieNormalizer } from './dutchie'; +export { INormalizer, BaseNormalizer } from './base'; diff --git a/backend/src/hydration/payload-store.ts b/backend/src/hydration/payload-store.ts new file mode 100644 index 00000000..40d9c4b1 --- /dev/null +++ b/backend/src/hydration/payload-store.ts @@ -0,0 +1,260 @@ +/** + * Payload Store + * + * Database operations for raw_payloads table. + * Handles storing, retrieving, and marking payloads as processed. + */ + +import { Pool, PoolClient } from 'pg'; +import { RawPayload } from './types'; + +// ============================================================ +// PAYLOAD STORAGE +// ============================================================ + +/** + * Store a raw payload from a crawler + * Automatically looks up and stores the dispensary's state for query optimization + */ +export async function storeRawPayload( + pool: Pool, + params: { + dispensaryId: number; + crawlRunId?: number | null; + platform: string; + payloadVersion?: number; + rawJson: any; + productCount?: number; + pricingType?: string; + crawlMode?: string; + fetchedAt?: Date; + state?: string; // Optional: if not provided, will be looked up + } +): Promise { + const { + dispensaryId, + crawlRunId = null, + platform, + payloadVersion = 1, + rawJson, + productCount = null, + pricingType = null, + crawlMode = null, + fetchedAt = new Date(), + } = params; + + // Look up state from dispensary if not provided + let state = params.state; + if (!state) { + const dispResult = await pool.query( + `SELECT state FROM dispensaries WHERE id = $1`, + [dispensaryId] + ); + state = dispResult.rows[0]?.state || 'AZ'; + } + + const result = await pool.query( + `INSERT INTO raw_payloads ( + dispensary_id, crawl_run_id, platform, payload_version, + raw_json, product_count, pricing_type, crawl_mode, fetched_at, state + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10) + RETURNING id`, + [ + dispensaryId, + crawlRunId, + platform, + payloadVersion, + JSON.stringify(rawJson), + productCount, + pricingType, + crawlMode, + fetchedAt, + state, + ] + ); + + return result.rows[0].id; +} + +/** + * Get unprocessed payloads in FIFO order + * Supports filtering by state for multi-state processing + */ +export async function getUnprocessedPayloads( + pool: Pool, + options: { + limit?: number; + platform?: string; + state?: string; + states?: string[]; + maxAttempts?: number; + } = {} +): Promise { + const { limit = 100, platform = null, state = null, states = null, maxAttempts = 3 } = options; + + let query = ` + SELECT * FROM raw_payloads + WHERE processed = FALSE + AND hydration_attempts < $1 + `; + const params: any[] = [maxAttempts]; + let paramIndex = 2; + + if (platform) { + query += ` AND platform = $${paramIndex}`; + params.push(platform); + paramIndex++; + } + + // Single state filter + if (state) { + query += ` AND state = $${paramIndex}`; + params.push(state); + paramIndex++; + } + + // Multi-state filter + if (states && states.length > 0) { + query += ` AND state = ANY($${paramIndex})`; + params.push(states); + paramIndex++; + } + + query += ` ORDER BY fetched_at ASC LIMIT $${paramIndex}`; + params.push(limit); + + const result = await pool.query(query, params); + return result.rows; +} + +/** + * Mark payload as processed (successful hydration) + */ +export async function markPayloadProcessed( + pool: Pool, + payloadId: string, + client?: PoolClient +): Promise { + const conn = client || pool; + await conn.query( + `UPDATE raw_payloads + SET processed = TRUE, + normalized_at = NOW(), + hydration_error = NULL + WHERE id = $1`, + [payloadId] + ); +} + +/** + * Mark payload as failed (record error) + */ +export async function markPayloadFailed( + pool: Pool, + payloadId: string, + error: string, + client?: PoolClient +): Promise { + const conn = client || pool; + await conn.query( + `UPDATE raw_payloads + SET hydration_error = $2, + hydration_attempts = hydration_attempts + 1 + WHERE id = $1`, + [payloadId, error] + ); +} + +/** + * Get payload by ID + */ +export async function getPayloadById( + pool: Pool, + payloadId: string +): Promise { + const result = await pool.query( + `SELECT * FROM raw_payloads WHERE id = $1`, + [payloadId] + ); + return result.rows[0] || null; +} + +/** + * Get payloads for a dispensary + */ +export async function getPayloadsForDispensary( + pool: Pool, + dispensaryId: number, + options: { limit?: number; offset?: number; processed?: boolean } = {} +): Promise<{ payloads: RawPayload[]; total: number }> { + const { limit = 50, offset = 0, processed } = options; + + let whereClause = 'WHERE dispensary_id = $1'; + const params: any[] = [dispensaryId]; + let paramIndex = 2; + + if (processed !== undefined) { + whereClause += ` AND processed = $${paramIndex}`; + params.push(processed); + paramIndex++; + } + + // Get total count + const countResult = await pool.query( + `SELECT COUNT(*) as total FROM raw_payloads ${whereClause}`, + params.slice(0, paramIndex - 1) + ); + + // Get paginated results + params.push(limit, offset); + const result = await pool.query( + `SELECT * FROM raw_payloads ${whereClause} + ORDER BY fetched_at DESC + LIMIT $${paramIndex} OFFSET $${paramIndex + 1}`, + params + ); + + return { + payloads: result.rows, + total: parseInt(countResult.rows[0].total, 10), + }; +} + +/** + * Get payload statistics + */ +export async function getPayloadStats(pool: Pool): Promise<{ + total: number; + processed: number; + unprocessed: number; + failed: number; + byPlatform: Record; +}> { + const result = await pool.query(` + SELECT + COUNT(*) as total, + COUNT(*) FILTER (WHERE processed = TRUE) as processed, + COUNT(*) FILTER (WHERE processed = FALSE) as unprocessed, + COUNT(*) FILTER (WHERE hydration_error IS NOT NULL) as failed, + jsonb_object_agg( + platform, + platform_count + ) as by_platform + FROM raw_payloads, + LATERAL ( + SELECT platform, COUNT(*) as platform_count + FROM raw_payloads rp2 + WHERE rp2.platform = raw_payloads.platform + GROUP BY platform + ) counts + `); + + const row = result.rows[0] || {}; + return { + total: parseInt(row.total || '0', 10), + processed: parseInt(row.processed || '0', 10), + unprocessed: parseInt(row.unprocessed || '0', 10), + failed: parseInt(row.failed || '0', 10), + byPlatform: row.by_platform || {}, + }; +} diff --git a/backend/src/hydration/producer.ts b/backend/src/hydration/producer.ts new file mode 100644 index 00000000..996014b8 --- /dev/null +++ b/backend/src/hydration/producer.ts @@ -0,0 +1,121 @@ +/** + * Payload Producer + * + * Hook for crawlers to store raw payloads immediately after fetching. + * This is called by the crawler before any normalization. + */ + +import { Pool } from 'pg'; +import { storeRawPayload } from './payload-store'; + +export interface ProducerOptions { + autoTriggerHydration?: boolean; + platform?: string; + payloadVersion?: number; +} + +/** + * Store raw crawler response as a payload + * Called by crawlers immediately after successful fetch + */ +export async function producePayload( + pool: Pool, + params: { + dispensaryId: number; + crawlRunId?: number | null; + rawJson: any; + productCount?: number; + pricingType?: 'rec' | 'med' | 'both'; + crawlMode?: 'mode_a' | 'mode_b' | 'dual'; + }, + options: ProducerOptions = {} +): Promise { + const { + platform = 'dutchie', + payloadVersion = 1, + autoTriggerHydration = false, + } = options; + + const payloadId = await storeRawPayload(pool, { + dispensaryId: params.dispensaryId, + crawlRunId: params.crawlRunId, + platform, + payloadVersion, + rawJson: params.rawJson, + productCount: params.productCount, + pricingType: params.pricingType, + crawlMode: params.crawlMode, + }); + + console.log( + `[PayloadProducer] Stored payload ${payloadId} for dispensary ${params.dispensaryId} ` + + `(${params.productCount || 0} products)` + ); + + // Optionally trigger immediate hydration + if (autoTriggerHydration) { + // Queue hydration job - this would integrate with the job queue system + // For now, we just log that hydration should be triggered + console.log(`[PayloadProducer] Hydration auto-trigger requested for ${payloadId}`); + } + + return payloadId; +} + +/** + * Create a producer instance bound to a pool + * Useful for dependency injection in crawlers + */ +export function createProducer(pool: Pool, options: ProducerOptions = {}) { + return { + produce: (params: Parameters[1]) => + producePayload(pool, params, options), + }; +} + +/** + * Wrapper to integrate with existing crawler result handling + * Call this at the end of a successful crawl + */ +export async function onCrawlComplete( + pool: Pool, + result: { + dispensaryId: number; + crawlRunId?: number; + rawProducts: any[]; + pricingType: 'rec' | 'med' | 'both'; + crawlMode: 'mode_a' | 'mode_b' | 'dual'; + modeAProducts?: any[]; + modeBProducts?: any[]; + } +): Promise { + if (!result.rawProducts || result.rawProducts.length === 0) { + console.log('[PayloadProducer] No products to store'); + return null; + } + + // Build the raw payload structure + const rawJson: any = { + products: result.rawProducts, + crawl_mode: result.crawlMode, + pricing_type: result.pricingType, + captured_at: new Date().toISOString(), + }; + + // Include mode-specific data if available + if (result.modeAProducts) { + rawJson.products_a = result.modeAProducts; + } + if (result.modeBProducts) { + rawJson.products_b = result.modeBProducts; + } + + return producePayload(pool, { + dispensaryId: result.dispensaryId, + crawlRunId: result.crawlRunId, + rawJson, + productCount: result.rawProducts.length, + pricingType: result.pricingType, + crawlMode: result.crawlMode, + }); +} diff --git a/backend/src/hydration/types.ts b/backend/src/hydration/types.ts new file mode 100644 index 00000000..dbaf2717 --- /dev/null +++ b/backend/src/hydration/types.ts @@ -0,0 +1,202 @@ +/** + * Hydration Pipeline Types + * + * Type definitions for the raw payload → canonical hydration pipeline. + */ + +// ============================================================ +// RAW PAYLOAD TYPES +// ============================================================ + +export interface RawPayload { + id: string; // UUID + dispensary_id: number; + crawl_run_id: number | null; + platform: string; + payload_version: number; + raw_json: any; + product_count: number | null; + pricing_type: string | null; + crawl_mode: string | null; + fetched_at: Date; + processed: boolean; + normalized_at: Date | null; + hydration_error: string | null; + hydration_attempts: number; + created_at: Date; +} + +// ============================================================ +// NORMALIZED OUTPUT TYPES +// ============================================================ + +export interface NormalizedProduct { + // Identity + externalProductId: string; + dispensaryId: number; + platform: string; + platformDispensaryId: string; + + // Core fields + name: string; + brandName: string | null; + brandId: string | null; + category: string | null; + subcategory: string | null; + type: string | null; + strainType: string | null; + + // Potency + thcPercent: number | null; + cbdPercent: number | null; + thcContent: number | null; + cbdContent: number | null; + + // Status + status: string | null; + isActive: boolean; + medicalOnly: boolean; + recOnly: boolean; + + // Images + primaryImageUrl: string | null; + images: any[]; + + // Raw payload reference + rawProduct: any; +} + +export interface NormalizedPricing { + externalProductId: string; + + // Recreational pricing + priceRec: number | null; // in cents + priceRecMin: number | null; + priceRecMax: number | null; + priceRecSpecial: number | null; + + // Medical pricing + priceMed: number | null; + priceMedMin: number | null; + priceMedMax: number | null; + priceMedSpecial: number | null; + + // Specials + isOnSpecial: boolean; + specialName: string | null; + discountPercent: number | null; +} + +export interface NormalizedAvailability { + externalProductId: string; + + // Stock status + inStock: boolean; + stockStatus: 'in_stock' | 'out_of_stock' | 'low_stock' | 'unknown'; + quantity: number | null; + quantityAvailable: number | null; + + // Threshold flags + isBelowThreshold: boolean; + optionsBelowThreshold: boolean; +} + +export interface NormalizedBrand { + externalBrandId: string | null; + name: string; + slug: string; + logoUrl: string | null; +} + +export interface NormalizedCategory { + name: string; + slug: string; + parentCategory: string | null; +} + +// ============================================================ +// NORMALIZATION RESULT +// ============================================================ + +export interface NormalizationResult { + products: NormalizedProduct[]; + pricing: Map; // keyed by externalProductId + availability: Map; // keyed by externalProductId + brands: NormalizedBrand[]; + categories: NormalizedCategory[]; + + // Metadata + productCount: number; + timestamp: Date; + errors: NormalizationError[]; +} + +export interface NormalizationError { + productId: string | null; + field: string; + message: string; + rawValue: any; +} + +// ============================================================ +// HYDRATION RESULT +// ============================================================ + +export interface HydrationResult { + success: boolean; + payloadId: string; + dispensaryId: number; + + // Counts + productsUpserted: number; + productsNew: number; + productsUpdated: number; + productsDiscontinued: number; + snapshotsCreated: number; + brandsCreated: number; + categoriesCreated: number; + + // Errors + errors: string[]; + + // Timing + startedAt: Date; + finishedAt: Date; + durationMs: number; +} + +export interface HydrationBatchResult { + payloadsProcessed: number; + payloadsFailed: number; + totalProductsUpserted: number; + totalSnapshotsCreated: number; + totalBrandsCreated: number; + errors: Array<{ payloadId: string; error: string }>; + durationMs: number; +} + +// ============================================================ +// HYDRATION OPTIONS +// ============================================================ + +export interface HydrationOptions { + dryRun?: boolean; // Print changes without modifying DB + batchSize?: number; // Number of payloads per batch + maxRetries?: number; // Max hydration attempts per payload + lockTimeoutMs?: number; // Lock expiry time + skipBrandCreation?: boolean; // Skip creating new brands + skipCategoryNormalization?: boolean; +} + +// ============================================================ +// LOCK TYPES +// ============================================================ + +export interface HydrationLock { + id: number; + lock_name: string; + worker_id: string; + acquired_at: Date; + expires_at: Date; + heartbeat_at: Date; +} diff --git a/backend/src/hydration/worker.ts b/backend/src/hydration/worker.ts new file mode 100644 index 00000000..a12671b8 --- /dev/null +++ b/backend/src/hydration/worker.ts @@ -0,0 +1,370 @@ +/** + * Hydration Worker + * + * Processes raw payloads and hydrates them to canonical tables. + * Features: + * - Distributed locking to prevent double-processing + * - Batch processing for efficiency + * - Automatic retry with backoff + * - Dry-run mode for testing + */ + +import { Pool } from 'pg'; +import { v4 as uuidv4 } from 'uuid'; +import { + RawPayload, + HydrationOptions, + HydrationResult, + HydrationBatchResult, +} from './types'; +import { getNormalizer } from './normalizers'; +import { + getUnprocessedPayloads, + markPayloadProcessed, + markPayloadFailed, +} from './payload-store'; +import { HydrationLockManager, LOCK_NAMES } from './locking'; +import { hydrateToCanonical } from './canonical-upsert'; + +const DEFAULT_BATCH_SIZE = 50; +const DEFAULT_MAX_RETRIES = 3; +const DEFAULT_LOCK_TIMEOUT_MS = 10 * 60 * 1000; // 10 minutes + +// ============================================================ +// HYDRATION WORKER CLASS +// ============================================================ + +export class HydrationWorker { + private pool: Pool; + private lockManager: HydrationLockManager; + private workerId: string; + private options: HydrationOptions; + private isRunning: boolean = false; + + constructor(pool: Pool, options: HydrationOptions = {}) { + this.pool = pool; + this.workerId = `hydration-${uuidv4().slice(0, 8)}`; + this.lockManager = new HydrationLockManager(pool, this.workerId); + this.options = { + dryRun: false, + batchSize: DEFAULT_BATCH_SIZE, + maxRetries: DEFAULT_MAX_RETRIES, + lockTimeoutMs: DEFAULT_LOCK_TIMEOUT_MS, + ...options, + }; + } + + /** + * Process a single payload + */ + async processPayload(payload: RawPayload): Promise { + const startedAt = new Date(); + const errors: string[] = []; + + try { + // Get normalizer for this platform + const normalizer = getNormalizer(payload.platform); + if (!normalizer) { + throw new Error(`No normalizer found for platform: ${payload.platform}`); + } + + // Validate payload + const validation = normalizer.validatePayload(payload.raw_json); + if (!validation.valid) { + errors.push(...validation.errors); + if (errors.length > 0 && !payload.raw_json) { + throw new Error(`Invalid payload: ${errors.join(', ')}`); + } + } + + // Normalize + const normResult = normalizer.normalize(payload); + + if (normResult.errors.length > 0) { + errors.push(...normResult.errors.map((e) => `${e.field}: ${e.message}`)); + } + + if (normResult.products.length === 0) { + console.warn(`[HydrationWorker] No products in payload ${payload.id}`); + } + + // Hydrate to canonical tables + const hydrateResult = await hydrateToCanonical( + this.pool, + payload.dispensary_id, + normResult, + payload.crawl_run_id, + { dryRun: this.options.dryRun } + ); + + // Mark as processed + if (!this.options.dryRun) { + await markPayloadProcessed(this.pool, payload.id); + } + + const finishedAt = new Date(); + + console.log( + `[HydrationWorker] ${this.options.dryRun ? '[DryRun] ' : ''}Processed payload ${payload.id}: ` + + `${hydrateResult.productsNew} new, ${hydrateResult.productsUpdated} updated, ` + + `${hydrateResult.productsDiscontinued} discontinued, ${hydrateResult.snapshotsCreated} snapshots` + ); + + return { + success: true, + payloadId: payload.id, + dispensaryId: payload.dispensary_id, + productsUpserted: hydrateResult.productsUpserted, + productsNew: hydrateResult.productsNew, + productsUpdated: hydrateResult.productsUpdated, + productsDiscontinued: hydrateResult.productsDiscontinued, + snapshotsCreated: hydrateResult.snapshotsCreated, + brandsCreated: hydrateResult.brandsCreated, + categoriesCreated: 0, + errors, + startedAt, + finishedAt, + durationMs: finishedAt.getTime() - startedAt.getTime(), + }; + } catch (error: any) { + const finishedAt = new Date(); + errors.push(error.message); + + // Mark as failed + if (!this.options.dryRun) { + await markPayloadFailed(this.pool, payload.id, error.message); + } + + console.error(`[HydrationWorker] Failed to process payload ${payload.id}:`, error.message); + + return { + success: false, + payloadId: payload.id, + dispensaryId: payload.dispensary_id, + productsUpserted: 0, + productsNew: 0, + productsUpdated: 0, + productsDiscontinued: 0, + snapshotsCreated: 0, + brandsCreated: 0, + categoriesCreated: 0, + errors, + startedAt, + finishedAt, + durationMs: finishedAt.getTime() - startedAt.getTime(), + }; + } + } + + /** + * Process a batch of payloads + */ + async processBatch( + platform?: string + ): Promise { + const startTime = Date.now(); + const errors: Array<{ payloadId: string; error: string }> = []; + let payloadsProcessed = 0; + let payloadsFailed = 0; + let totalProductsUpserted = 0; + let totalSnapshotsCreated = 0; + let totalBrandsCreated = 0; + + // Acquire lock + const lockAcquired = await this.lockManager.acquireLock( + LOCK_NAMES.HYDRATION_BATCH, + this.options.lockTimeoutMs + ); + + if (!lockAcquired) { + console.log('[HydrationWorker] Could not acquire batch lock, skipping'); + return { + payloadsProcessed: 0, + payloadsFailed: 0, + totalProductsUpserted: 0, + totalSnapshotsCreated: 0, + totalBrandsCreated: 0, + errors: [], + durationMs: Date.now() - startTime, + }; + } + + try { + // Create hydration run record + let runId: number | null = null; + if (!this.options.dryRun) { + const result = await this.pool.query( + `INSERT INTO hydration_runs (worker_id, started_at, status) + VALUES ($1, NOW(), 'running') + RETURNING id`, + [this.workerId] + ); + runId = result.rows[0].id; + } + + // Get unprocessed payloads + const payloads = await getUnprocessedPayloads(this.pool, { + limit: this.options.batchSize, + platform, + maxAttempts: this.options.maxRetries, + }); + + console.log(`[HydrationWorker] Processing ${payloads.length} payloads`); + + // Process each payload + for (const payload of payloads) { + const result = await this.processPayload(payload); + + if (result.success) { + payloadsProcessed++; + totalProductsUpserted += result.productsUpserted; + totalSnapshotsCreated += result.snapshotsCreated; + totalBrandsCreated += result.brandsCreated; + } else { + payloadsFailed++; + if (result.errors.length > 0) { + errors.push({ + payloadId: payload.id, + error: result.errors.join('; '), + }); + } + } + } + + // Update hydration run record + if (!this.options.dryRun && runId) { + await this.pool.query( + `UPDATE hydration_runs SET + finished_at = NOW(), + status = $2, + payloads_processed = $3, + products_upserted = $4, + snapshots_created = $5, + brands_created = $6, + errors_count = $7 + WHERE id = $1`, + [ + runId, + payloadsFailed > 0 ? 'completed_with_errors' : 'completed', + payloadsProcessed, + totalProductsUpserted, + totalSnapshotsCreated, + totalBrandsCreated, + payloadsFailed, + ] + ); + } + + console.log( + `[HydrationWorker] Batch complete: ${payloadsProcessed} processed, ${payloadsFailed} failed` + ); + + return { + payloadsProcessed, + payloadsFailed, + totalProductsUpserted, + totalSnapshotsCreated, + totalBrandsCreated, + errors, + durationMs: Date.now() - startTime, + }; + } finally { + await this.lockManager.releaseLock(LOCK_NAMES.HYDRATION_BATCH); + } + } + + /** + * Run continuous hydration loop + */ + async runLoop(intervalMs: number = 30000): Promise { + this.isRunning = true; + console.log(`[HydrationWorker] Starting loop (interval: ${intervalMs}ms)`); + + while (this.isRunning) { + try { + const result = await this.processBatch(); + + if (result.payloadsProcessed === 0) { + // No work to do, wait before checking again + await new Promise((resolve) => setTimeout(resolve, intervalMs)); + } + } catch (error: any) { + console.error('[HydrationWorker] Loop error:', error.message); + await new Promise((resolve) => setTimeout(resolve, intervalMs)); + } + } + + await this.lockManager.releaseAllLocks(); + console.log('[HydrationWorker] Loop stopped'); + } + + /** + * Stop the hydration loop + */ + stop(): void { + this.isRunning = false; + } + + /** + * Get worker ID + */ + getWorkerId(): string { + return this.workerId; + } +} + +// ============================================================ +// STANDALONE FUNCTIONS +// ============================================================ + +/** + * Run a single hydration batch (for cron jobs) + */ +export async function runHydrationBatch( + pool: Pool, + options: HydrationOptions = {} +): Promise { + const worker = new HydrationWorker(pool, options); + return worker.processBatch(); +} + +/** + * Process a specific payload by ID + */ +export async function processPayloadById( + pool: Pool, + payloadId: string, + options: HydrationOptions = {} +): Promise { + const result = await pool.query( + `SELECT * FROM raw_payloads WHERE id = $1`, + [payloadId] + ); + + if (result.rows.length === 0) { + throw new Error(`Payload not found: ${payloadId}`); + } + + const worker = new HydrationWorker(pool, options); + return worker.processPayload(result.rows[0]); +} + +/** + * Reprocess failed payloads + */ +export async function reprocessFailedPayloads( + pool: Pool, + options: HydrationOptions = {} +): Promise { + // Reset failed payloads for reprocessing + await pool.query( + `UPDATE raw_payloads + SET hydration_attempts = 0, + hydration_error = NULL + WHERE processed = FALSE + AND hydration_error IS NOT NULL` + ); + + const worker = new HydrationWorker(pool, options); + return worker.processBatch(); +} diff --git a/backend/src/index.ts b/backend/src/index.ts index 32f00193..15831602 100755 --- a/backend/src/index.ts +++ b/backend/src/index.ts @@ -60,11 +60,22 @@ import versionRoutes from './routes/version'; import publicApiRoutes from './routes/public-api'; import usersRoutes from './routes/users'; import staleProcessesRoutes from './routes/stale-processes'; +import orchestratorAdminRoutes from './routes/orchestrator-admin'; +import adminRoutes from './routes/admin'; import { dutchieAZRouter, startScheduler as startDutchieAZScheduler, initializeDefaultSchedules } from './dutchie-az'; +import { getPool } from './dutchie-az/db/connection'; +import { createAnalyticsRouter } from './dutchie-az/routes/analytics'; +import { createMultiStateRoutes } from './multi-state'; import { trackApiUsage, checkRateLimit } from './middleware/apiTokenTracker'; import { startCrawlScheduler } from './services/crawl-scheduler'; import { validateWordPressPermissions } from './middleware/wordpressPermissions'; import { markTrustedDomains } from './middleware/trustedDomains'; +import { createSystemRouter, createPrometheusRouter } from './system/routes'; +import { createPortalRoutes } from './portals'; +import { createStatesRouter } from './routes/states'; +import { createAnalyticsV2Router } from './routes/analytics-v2'; +import { createDiscoveryRoutes } from './discovery'; +import { createDutchieDiscoveryRoutes, promoteDiscoveryLocation } from './dutchie-az/discovery'; // Mark requests from trusted domains (cannaiq.co, findagram.co, findadispo.com) // These domains can access the API without authentication @@ -98,15 +109,124 @@ app.use('/api/crawler-sandbox', crawlerSandboxRoutes); app.use('/api/version', versionRoutes); app.use('/api/users', usersRoutes); app.use('/api/stale-processes', staleProcessesRoutes); +// Admin routes - operator actions (crawl triggers, health checks) +app.use('/api/admin', adminRoutes); +app.use('/api/admin/orchestrator', orchestratorAdminRoutes); // Vendor-agnostic AZ data pipeline routes (new public surface) app.use('/api/az', dutchieAZRouter); // Legacy alias (kept temporarily for backward compatibility) app.use('/api/dutchie-az', dutchieAZRouter); +// Phase 3: Analytics Dashboards - price trends, penetration, category growth, etc. +try { + const analyticsRouter = createAnalyticsRouter(getPool()); + app.use('/api/az/analytics', analyticsRouter); + console.log('[Analytics] Routes registered at /api/az/analytics'); +} catch (error) { + console.warn('[Analytics] Failed to register routes:', error); +} + +// Phase 3: Analytics V2 - Enhanced analytics with rec/med state segmentation +try { + const analyticsV2Router = createAnalyticsV2Router(getPool()); + app.use('/api/analytics/v2', analyticsV2Router); + console.log('[AnalyticsV2] Routes registered at /api/analytics/v2'); +} catch (error) { + console.warn('[AnalyticsV2] Failed to register routes:', error); +} + // Public API v1 - External consumer endpoints (WordPress, etc.) // Uses dutchie_az data pipeline with per-dispensary API key auth app.use('/api/v1', publicApiRoutes); +// Multi-state API routes - national analytics and cross-state comparisons +// Phase 4: Multi-State Expansion +try { + const multiStateRoutes = createMultiStateRoutes(getPool()); + app.use('/api', multiStateRoutes); + console.log('[MultiState] Routes registered'); +} catch (error) { + console.warn('[MultiState] Failed to register routes (DB may not be configured):', error); +} + +// States API routes - cannabis legalization status and targeting +try { + const statesRouter = createStatesRouter(getPool()); + app.use('/api/states', statesRouter); + console.log('[States] Routes registered at /api/states'); +} catch (error) { + console.warn('[States] Failed to register routes:', error); +} + +// Phase 5: Production Sync + Monitoring +// System orchestrator, DLQ, integrity checks, auto-fix routines, alerts +try { + const systemRouter = createSystemRouter(getPool()); + const prometheusRouter = createPrometheusRouter(getPool()); + app.use('/api/system', systemRouter); + app.use('/metrics', prometheusRouter); + console.log('[System] Routes registered at /api/system and /metrics'); +} catch (error) { + console.warn('[System] Failed to register routes:', error); +} + +// Phase 6 & 7: Portals (Brand, Buyer), Intelligence, Orders, Inventory, Pricing +try { + const portalRoutes = createPortalRoutes(getPool()); + app.use('/api/portal', portalRoutes); + console.log('[Portals] Routes registered at /api/portal'); +} catch (error) { + console.warn('[Portals] Failed to register routes:', error); +} + +// Discovery Pipeline - Store discovery from Dutchie with verification workflow +try { + const discoveryRoutes = createDiscoveryRoutes(getPool()); + app.use('/api/discovery', discoveryRoutes); + console.log('[Discovery] Routes registered at /api/discovery'); +} catch (error) { + console.warn('[Discovery] Failed to register routes:', error); +} + +// Platform-specific Discovery Routes +// Uses neutral slugs to avoid trademark issues in URLs: +// dt = Dutchie, jn = Jane, wm = Weedmaps, etc. +// Routes: /api/discovery/platforms/:platformSlug/* +try { + const dtDiscoveryRoutes = createDutchieDiscoveryRoutes(getPool()); + app.use('/api/discovery/platforms/dt', dtDiscoveryRoutes); + console.log('[Discovery] Platform routes registered at /api/discovery/platforms/dt'); +} catch (error) { + console.warn('[Discovery] Failed to register platform routes:', error); +} + +// Orchestrator promotion endpoint (platform-agnostic) +// Route: /api/orchestrator/platforms/:platformSlug/promote/:id +app.post('/api/orchestrator/platforms/:platformSlug/promote/:id', async (req, res) => { + try { + const { platformSlug, id } = req.params; + + // Validate platform slug + const validPlatforms = ['dt']; // dt = Dutchie + if (!validPlatforms.includes(platformSlug)) { + return res.status(400).json({ + success: false, + error: `Invalid platform slug: ${platformSlug}. Valid slugs: ${validPlatforms.join(', ')}` + }); + } + + const result = await promoteDiscoveryLocation(getPool(), parseInt(id, 10)); + if (result.success) { + res.json(result); + } else { + res.status(400).json(result); + } + } catch (error: any) { + console.error('[Orchestrator] Promotion error:', error); + res.status(500).json({ success: false, error: error.message }); + } +}); + async function startServer() { try { logger.info('system', 'Starting server...'); diff --git a/backend/src/middleware/apiTokenTracker.ts b/backend/src/middleware/apiTokenTracker.ts index 38582c35..f73b0243 100644 --- a/backend/src/middleware/apiTokenTracker.ts +++ b/backend/src/middleware/apiTokenTracker.ts @@ -1,5 +1,5 @@ import { Request, Response, NextFunction } from 'express'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; interface TrackedRequest extends Request { apiToken?: { diff --git a/backend/src/middleware/wordpressPermissions.ts b/backend/src/middleware/wordpressPermissions.ts index 66db2f89..152a5675 100644 --- a/backend/src/middleware/wordpressPermissions.ts +++ b/backend/src/middleware/wordpressPermissions.ts @@ -1,5 +1,5 @@ import { Request, Response, NextFunction } from 'express'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import ipaddr from 'ipaddr.js'; interface WordPressPermissionRequest extends Request { diff --git a/backend/src/migrations-runner/009_image_sizes.ts b/backend/src/migrations-runner/009_image_sizes.ts index 0a08b69e..71626e44 100644 --- a/backend/src/migrations-runner/009_image_sizes.ts +++ b/backend/src/migrations-runner/009_image_sizes.ts @@ -1,4 +1,4 @@ -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; (async () => { try { diff --git a/backend/src/migrations/047_multi_state_enhancements.sql b/backend/src/migrations/047_multi_state_enhancements.sql new file mode 100644 index 00000000..a3db9b73 --- /dev/null +++ b/backend/src/migrations/047_multi_state_enhancements.sql @@ -0,0 +1,304 @@ +-- Migration: 047_multi_state_enhancements.sql +-- Purpose: Enhance multi-state support with additional indexes and optimizations +-- Phase 4: Multi-State Expansion + +-- ============================================================================ +-- 1. Ensure states table has all US cannabis-legal states +-- ============================================================================ + +INSERT INTO states (code, name) VALUES + ('AK', 'Alaska'), + ('AR', 'Arkansas'), + ('CT', 'Connecticut'), + ('DE', 'Delaware'), + ('HI', 'Hawaii'), + ('ME', 'Maine'), + ('MN', 'Minnesota'), + ('MS', 'Mississippi'), + ('MT', 'Montana'), + ('NH', 'New Hampshire'), + ('NM', 'New Mexico'), + ('ND', 'North Dakota'), + ('RI', 'Rhode Island'), + ('SD', 'South Dakota'), + ('UT', 'Utah'), + ('VT', 'Vermont'), + ('VA', 'Virginia'), + ('WV', 'West Virginia'), + ('DC', 'District of Columbia') +ON CONFLICT (code) DO NOTHING; + +-- ============================================================================ +-- 2. Add state column to raw_payloads for query optimization +-- ============================================================================ + +ALTER TABLE raw_payloads + ADD COLUMN IF NOT EXISTS state CHAR(2); + +-- Backfill state from dispensary +UPDATE raw_payloads rp +SET state = d.state +FROM dispensaries d +WHERE rp.dispensary_id = d.id + AND rp.state IS NULL; + +-- Set default for future inserts +ALTER TABLE raw_payloads + ALTER COLUMN state SET DEFAULT 'AZ'; + +-- Index for state-based queries on raw_payloads +CREATE INDEX IF NOT EXISTS idx_raw_payloads_state + ON raw_payloads(state); + +CREATE INDEX IF NOT EXISTS idx_raw_payloads_state_unprocessed + ON raw_payloads(state, processed) + WHERE processed = FALSE; + +-- ============================================================================ +-- 3. Additional composite indexes for multi-state queries +-- ============================================================================ + +-- Dispensary state-based lookups +CREATE INDEX IF NOT EXISTS idx_dispensaries_state_menu_type + ON dispensaries(state, menu_type); + +CREATE INDEX IF NOT EXISTS idx_dispensaries_state_crawl_status + ON dispensaries(state, crawl_status); + +CREATE INDEX IF NOT EXISTS idx_dispensaries_state_active + ON dispensaries(state) + WHERE crawl_status != 'disabled'; + +-- Store products by state (via dispensary join optimization) +CREATE INDEX IF NOT EXISTS idx_store_products_dispensary_stock + ON store_products(dispensary_id, is_in_stock); + +CREATE INDEX IF NOT EXISTS idx_store_products_dispensary_special + ON store_products(dispensary_id, is_on_special) + WHERE is_on_special = TRUE; + +-- Snapshots by date range for state analytics +CREATE INDEX IF NOT EXISTS idx_snapshots_captured_date + ON store_product_snapshots(DATE(captured_at)); + +CREATE INDEX IF NOT EXISTS idx_snapshots_dispensary_date + ON store_product_snapshots(dispensary_id, DATE(captured_at)); + +-- ============================================================================ +-- 4. Create state_metrics materialized view for fast dashboard queries +-- ============================================================================ + +DROP MATERIALIZED VIEW IF EXISTS mv_state_metrics; + +CREATE MATERIALIZED VIEW mv_state_metrics AS +SELECT + d.state, + s.name AS state_name, + COUNT(DISTINCT d.id) AS store_count, + COUNT(DISTINCT CASE WHEN d.menu_type = 'dutchie' THEN d.id END) AS dutchie_stores, + COUNT(DISTINCT CASE WHEN d.crawl_status = 'active' THEN d.id END) AS active_stores, + COUNT(DISTINCT sp.id) AS total_products, + COUNT(DISTINCT CASE WHEN sp.is_in_stock THEN sp.id END) AS in_stock_products, + COUNT(DISTINCT CASE WHEN sp.is_on_special THEN sp.id END) AS on_special_products, + COUNT(DISTINCT sp.brand_id) AS unique_brands, + COUNT(DISTINCT sp.category_raw) AS unique_categories, + AVG(sp.price_rec)::NUMERIC(10,2) AS avg_price_rec, + MIN(sp.price_rec)::NUMERIC(10,2) AS min_price_rec, + MAX(sp.price_rec)::NUMERIC(10,2) AS max_price_rec, + NOW() AS refreshed_at +FROM dispensaries d +LEFT JOIN states s ON d.state = s.code +LEFT JOIN store_products sp ON d.id = sp.dispensary_id +WHERE d.state IS NOT NULL +GROUP BY d.state, s.name; + +CREATE UNIQUE INDEX IF NOT EXISTS idx_mv_state_metrics_state + ON mv_state_metrics(state); + +-- ============================================================================ +-- 5. Create brand_state_presence view for cross-state brand analytics +-- ============================================================================ + +DROP VIEW IF EXISTS v_brand_state_presence; + +CREATE VIEW v_brand_state_presence AS +SELECT + b.id AS brand_id, + b.name AS brand_name, + b.slug AS brand_slug, + d.state, + COUNT(DISTINCT d.id) AS store_count, + COUNT(DISTINCT sp.id) AS product_count, + AVG(sp.price_rec)::NUMERIC(10,2) AS avg_price, + MIN(sp.first_seen_at) AS first_seen_in_state, + MAX(sp.last_seen_at) AS last_seen_in_state +FROM brands b +JOIN store_products sp ON b.id = sp.brand_id +JOIN dispensaries d ON sp.dispensary_id = d.id +WHERE d.state IS NOT NULL +GROUP BY b.id, b.name, b.slug, d.state; + +-- ============================================================================ +-- 6. Create category_state_distribution view +-- ============================================================================ + +DROP VIEW IF EXISTS v_category_state_distribution; + +CREATE VIEW v_category_state_distribution AS +SELECT + d.state, + sp.category_raw AS category, + COUNT(DISTINCT sp.id) AS product_count, + COUNT(DISTINCT d.id) AS store_count, + AVG(sp.price_rec)::NUMERIC(10,2) AS avg_price, + COUNT(DISTINCT CASE WHEN sp.is_in_stock THEN sp.id END) AS in_stock_count, + COUNT(DISTINCT CASE WHEN sp.is_on_special THEN sp.id END) AS on_special_count +FROM dispensaries d +JOIN store_products sp ON d.id = sp.dispensary_id +WHERE d.state IS NOT NULL + AND sp.category_raw IS NOT NULL +GROUP BY d.state, sp.category_raw; + +-- ============================================================================ +-- 7. Create store_state_summary view for quick state dashboards +-- ============================================================================ + +DROP VIEW IF EXISTS v_store_state_summary; + +CREATE VIEW v_store_state_summary AS +SELECT + d.id AS dispensary_id, + d.name AS dispensary_name, + d.slug AS dispensary_slug, + d.state, + d.city, + d.menu_type, + d.crawl_status, + d.last_crawl_at, + COUNT(DISTINCT sp.id) AS product_count, + COUNT(DISTINCT CASE WHEN sp.is_in_stock THEN sp.id END) AS in_stock_count, + COUNT(DISTINCT sp.brand_id) AS brand_count, + AVG(sp.price_rec)::NUMERIC(10,2) AS avg_price, + COUNT(DISTINCT CASE WHEN sp.is_on_special THEN sp.id END) AS special_count +FROM dispensaries d +LEFT JOIN store_products sp ON d.id = sp.dispensary_id +WHERE d.state IS NOT NULL +GROUP BY d.id, d.name, d.slug, d.state, d.city, d.menu_type, d.crawl_status, d.last_crawl_at; + +-- ============================================================================ +-- 8. Create national price comparison function +-- ============================================================================ + +CREATE OR REPLACE FUNCTION fn_national_price_comparison( + p_category VARCHAR DEFAULT NULL, + p_brand_id INTEGER DEFAULT NULL +) +RETURNS TABLE ( + state CHAR(2), + state_name VARCHAR, + product_count BIGINT, + avg_price NUMERIC, + min_price NUMERIC, + max_price NUMERIC, + median_price NUMERIC, + price_stddev NUMERIC +) AS $$ +BEGIN + RETURN QUERY + SELECT + d.state, + s.name AS state_name, + COUNT(DISTINCT sp.id)::BIGINT AS product_count, + AVG(sp.price_rec)::NUMERIC(10,2) AS avg_price, + MIN(sp.price_rec)::NUMERIC(10,2) AS min_price, + MAX(sp.price_rec)::NUMERIC(10,2) AS max_price, + PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY sp.price_rec)::NUMERIC(10,2) AS median_price, + STDDEV(sp.price_rec)::NUMERIC(10,2) AS price_stddev + FROM dispensaries d + JOIN states s ON d.state = s.code + JOIN store_products sp ON d.id = sp.dispensary_id + WHERE d.state IS NOT NULL + AND sp.price_rec IS NOT NULL + AND sp.price_rec > 0 + AND (p_category IS NULL OR sp.category_raw = p_category) + AND (p_brand_id IS NULL OR sp.brand_id = p_brand_id) + GROUP BY d.state, s.name + ORDER BY avg_price DESC; +END; +$$ LANGUAGE plpgsql; + +-- ============================================================================ +-- 9. Create brand penetration by state function +-- ============================================================================ + +CREATE OR REPLACE FUNCTION fn_brand_state_penetration( + p_brand_id INTEGER +) +RETURNS TABLE ( + state CHAR(2), + state_name VARCHAR, + total_stores BIGINT, + stores_with_brand BIGINT, + penetration_pct NUMERIC, + product_count BIGINT, + avg_price NUMERIC +) AS $$ +BEGIN + RETURN QUERY + WITH state_totals AS ( + SELECT + d.state, + COUNT(DISTINCT d.id) AS total_stores + FROM dispensaries d + WHERE d.state IS NOT NULL + AND d.menu_type IS NOT NULL + GROUP BY d.state + ), + brand_presence AS ( + SELECT + d.state, + COUNT(DISTINCT d.id) AS stores_with_brand, + COUNT(DISTINCT sp.id) AS product_count, + AVG(sp.price_rec) AS avg_price + FROM dispensaries d + JOIN store_products sp ON d.id = sp.dispensary_id + WHERE sp.brand_id = p_brand_id + AND d.state IS NOT NULL + GROUP BY d.state + ) + SELECT + st.state, + s.name AS state_name, + st.total_stores, + COALESCE(bp.stores_with_brand, 0)::BIGINT AS stores_with_brand, + ROUND(COALESCE(bp.stores_with_brand, 0)::NUMERIC / st.total_stores * 100, 2) AS penetration_pct, + COALESCE(bp.product_count, 0)::BIGINT AS product_count, + bp.avg_price::NUMERIC(10,2) AS avg_price + FROM state_totals st + JOIN states s ON st.state = s.code + LEFT JOIN brand_presence bp ON st.state = bp.state + ORDER BY penetration_pct DESC NULLS LAST; +END; +$$ LANGUAGE plpgsql; + +-- ============================================================================ +-- 10. Refresh function for materialized view +-- ============================================================================ + +CREATE OR REPLACE FUNCTION refresh_state_metrics() +RETURNS void AS $$ +BEGIN + REFRESH MATERIALIZED VIEW CONCURRENTLY mv_state_metrics; +END; +$$ LANGUAGE plpgsql; + +-- Initial refresh +SELECT refresh_state_metrics(); + +-- ============================================================================ +-- Record migration +-- ============================================================================ + +INSERT INTO schema_migrations (version, name, applied_at) +VALUES (47, '047_multi_state_enhancements', NOW()) +ON CONFLICT (version) DO NOTHING; diff --git a/backend/src/migrations/048_phase6_portals_intelligence.sql b/backend/src/migrations/048_phase6_portals_intelligence.sql new file mode 100644 index 00000000..518fe7bd --- /dev/null +++ b/backend/src/migrations/048_phase6_portals_intelligence.sql @@ -0,0 +1,574 @@ +-- Migration: 048_phase6_portals_intelligence.sql +-- Purpose: Phase 6 - Brand Portal, Buyer Portal, Intelligence Engine +-- Creates tables for: roles, permissions, businesses, messaging, notifications, intelligence + +-- ============================================================================ +-- 1. ROLES AND PERMISSIONS SYSTEM +-- ============================================================================ + +-- Roles table +CREATE TABLE IF NOT EXISTS roles ( + id SERIAL PRIMARY KEY, + name VARCHAR(50) NOT NULL UNIQUE, + display_name VARCHAR(100) NOT NULL, + description TEXT, + is_system BOOLEAN DEFAULT FALSE, -- System roles cannot be deleted + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW() +); + +-- Seed default roles +INSERT INTO roles (name, display_name, description, is_system) VALUES + ('internal_admin', 'Internal Admin', 'Full system access for CannaiQ staff', TRUE), + ('enterprise_manager', 'Enterprise Manager', 'Multi-brand/multi-store management', TRUE), + ('brand_manager', 'Brand Manager', 'Manage brand catalog, view analytics, messaging', TRUE), + ('buyer_manager', 'Buyer Manager', 'Dispensary purchasing, view catalogs, messaging', TRUE), + ('brand_viewer', 'Brand Viewer', 'Read-only brand portal access', TRUE), + ('buyer_viewer', 'Buyer Viewer', 'Read-only buyer portal access', TRUE) +ON CONFLICT (name) DO NOTHING; + +-- Permissions table +CREATE TABLE IF NOT EXISTS permissions ( + id SERIAL PRIMARY KEY, + name VARCHAR(100) NOT NULL UNIQUE, + display_name VARCHAR(150) NOT NULL, + category VARCHAR(50) NOT NULL, -- portal, messaging, intelligence, catalog, etc. + description TEXT, + created_at TIMESTAMPTZ DEFAULT NOW() +); + +-- Seed permissions +INSERT INTO permissions (name, display_name, category, description) VALUES + -- Portal access + ('view_brand_portal', 'View Brand Portal', 'portal', 'Access brand portal dashboard'), + ('view_buyer_portal', 'View Buyer Portal', 'portal', 'Access buyer portal dashboard'), + ('view_admin_portal', 'View Admin Portal', 'portal', 'Access internal admin portal'), + + -- Brand management + ('manage_brand_catalog', 'Manage Brand Catalog', 'catalog', 'Edit brand SKUs, images, details'), + ('view_brand_analytics', 'View Brand Analytics', 'analytics', 'View brand distribution and performance'), + ('manage_brand_pricing', 'Manage Brand Pricing', 'pricing', 'Set and adjust brand pricing'), + + -- Buyer management + ('manage_buyer_catalog', 'Manage Buyer Catalog', 'catalog', 'Manage dispensary product listings'), + ('view_buyer_analytics', 'View Buyer Analytics', 'analytics', 'View buyer purchasing analytics'), + + -- Competition and intelligence + ('view_competition', 'View Competition', 'intelligence', 'View competitor data and analysis'), + ('view_intelligence_insights', 'View Intelligence Insights', 'intelligence', 'Access AI-powered recommendations'), + ('manage_intelligence_rules', 'Manage Intelligence Rules', 'intelligence', 'Configure alert and recommendation rules'), + + -- Messaging + ('send_messages', 'Send Messages', 'messaging', 'Send messages to brands/buyers'), + ('receive_messages', 'Receive Messages', 'messaging', 'Receive and view messages'), + ('manage_notifications', 'Manage Notifications', 'messaging', 'Configure notification preferences'), + + -- Orders (Phase 7) + ('create_orders', 'Create Orders', 'orders', 'Place orders with brands'), + ('manage_orders', 'Manage Orders', 'orders', 'Accept, reject, fulfill orders'), + ('view_orders', 'View Orders', 'orders', 'View order history'), + + -- Inventory (Phase 7) + ('manage_inventory', 'Manage Inventory', 'inventory', 'Update inventory levels'), + ('view_inventory', 'View Inventory', 'inventory', 'View inventory data'), + + -- Pricing (Phase 7) + ('manage_pricing_rules', 'Manage Pricing Rules', 'pricing', 'Configure automated pricing'), + ('approve_pricing', 'Approve Pricing', 'pricing', 'Approve pricing suggestions') +ON CONFLICT (name) DO NOTHING; + +-- Role-Permission mapping +CREATE TABLE IF NOT EXISTS role_permissions ( + id SERIAL PRIMARY KEY, + role_id INTEGER NOT NULL REFERENCES roles(id) ON DELETE CASCADE, + permission_id INTEGER NOT NULL REFERENCES permissions(id) ON DELETE CASCADE, + created_at TIMESTAMPTZ DEFAULT NOW(), + UNIQUE(role_id, permission_id) +); + +-- Assign permissions to roles +INSERT INTO role_permissions (role_id, permission_id) +SELECT r.id, p.id FROM roles r, permissions p +WHERE r.name = 'internal_admin' +ON CONFLICT DO NOTHING; + +INSERT INTO role_permissions (role_id, permission_id) +SELECT r.id, p.id FROM roles r, permissions p +WHERE r.name = 'brand_manager' AND p.name IN ( + 'view_brand_portal', 'manage_brand_catalog', 'view_brand_analytics', + 'manage_brand_pricing', 'view_competition', 'view_intelligence_insights', + 'send_messages', 'receive_messages', 'manage_notifications', + 'manage_orders', 'view_orders', 'manage_inventory', 'view_inventory', + 'manage_pricing_rules', 'approve_pricing' +) +ON CONFLICT DO NOTHING; + +INSERT INTO role_permissions (role_id, permission_id) +SELECT r.id, p.id FROM roles r, permissions p +WHERE r.name = 'buyer_manager' AND p.name IN ( + 'view_buyer_portal', 'manage_buyer_catalog', 'view_buyer_analytics', + 'view_competition', 'view_intelligence_insights', + 'send_messages', 'receive_messages', 'manage_notifications', + 'create_orders', 'view_orders', 'view_inventory' +) +ON CONFLICT DO NOTHING; + +-- User roles junction +CREATE TABLE IF NOT EXISTS user_roles ( + id SERIAL PRIMARY KEY, + user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE, + role_id INTEGER NOT NULL REFERENCES roles(id) ON DELETE CASCADE, + granted_by INTEGER REFERENCES users(id), + granted_at TIMESTAMPTZ DEFAULT NOW(), + expires_at TIMESTAMPTZ, -- Optional expiry + UNIQUE(user_id, role_id) +); + +-- ============================================================================ +-- 2. BUSINESS ENTITIES (Brands and Buyers) +-- ============================================================================ + +-- Brand businesses (companies that own brands) +CREATE TABLE IF NOT EXISTS brand_businesses ( + id SERIAL PRIMARY KEY, + name VARCHAR(255) NOT NULL, + slug VARCHAR(255) NOT NULL UNIQUE, + legal_name VARCHAR(255), + logo_url TEXT, + website_url TEXT, + description TEXT, + contact_email VARCHAR(255), + contact_phone VARCHAR(50), + headquarters_state CHAR(2), + headquarters_city VARCHAR(100), + license_number VARCHAR(100), + license_state CHAR(2), + is_verified BOOLEAN DEFAULT FALSE, + is_active BOOLEAN DEFAULT TRUE, + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW() +); + +-- Link brands table to brand_businesses +ALTER TABLE brands ADD COLUMN IF NOT EXISTS business_id INTEGER REFERENCES brand_businesses(id); +ALTER TABLE brands ADD COLUMN IF NOT EXISTS is_verified BOOLEAN DEFAULT FALSE; + +-- Buyer businesses (dispensaries as business entities) +CREATE TABLE IF NOT EXISTS buyer_businesses ( + id SERIAL PRIMARY KEY, + dispensary_id INTEGER REFERENCES dispensaries(id), -- Link to existing dispensary + name VARCHAR(255) NOT NULL, + slug VARCHAR(255) NOT NULL UNIQUE, + legal_name VARCHAR(255), + logo_url TEXT, + contact_email VARCHAR(255), + contact_phone VARCHAR(50), + billing_address TEXT, + license_number VARCHAR(100), + license_state CHAR(2), + is_verified BOOLEAN DEFAULT FALSE, + is_active BOOLEAN DEFAULT TRUE, + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW() +); + +-- User-Business associations +CREATE TABLE IF NOT EXISTS user_businesses ( + id SERIAL PRIMARY KEY, + user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE, + brand_business_id INTEGER REFERENCES brand_businesses(id) ON DELETE CASCADE, + buyer_business_id INTEGER REFERENCES buyer_businesses(id) ON DELETE CASCADE, + is_primary BOOLEAN DEFAULT FALSE, + created_at TIMESTAMPTZ DEFAULT NOW(), + CONSTRAINT chk_one_business CHECK ( + (brand_business_id IS NOT NULL AND buyer_business_id IS NULL) OR + (brand_business_id IS NULL AND buyer_business_id IS NOT NULL) + ) +); + +CREATE INDEX IF NOT EXISTS idx_user_businesses_user ON user_businesses(user_id); +CREATE INDEX IF NOT EXISTS idx_user_businesses_brand ON user_businesses(brand_business_id); +CREATE INDEX IF NOT EXISTS idx_user_businesses_buyer ON user_businesses(buyer_business_id); + +-- ============================================================================ +-- 3. MESSAGING SYSTEM +-- ============================================================================ + +-- Message threads +CREATE TABLE IF NOT EXISTS message_threads ( + id SERIAL PRIMARY KEY, + subject VARCHAR(255), + thread_type VARCHAR(50) NOT NULL DEFAULT 'direct', -- direct, order, inquiry, support + brand_business_id INTEGER REFERENCES brand_businesses(id), + buyer_business_id INTEGER REFERENCES buyer_businesses(id), + order_id INTEGER, -- Will reference orders table (Phase 7) + is_archived BOOLEAN DEFAULT FALSE, + last_message_at TIMESTAMPTZ, + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_message_threads_brand ON message_threads(brand_business_id); +CREATE INDEX IF NOT EXISTS idx_message_threads_buyer ON message_threads(buyer_business_id); +CREATE INDEX IF NOT EXISTS idx_message_threads_last_msg ON message_threads(last_message_at DESC); + +-- Messages +CREATE TABLE IF NOT EXISTS messages ( + id SERIAL PRIMARY KEY, + thread_id INTEGER NOT NULL REFERENCES message_threads(id) ON DELETE CASCADE, + sender_user_id INTEGER NOT NULL REFERENCES users(id), + sender_type VARCHAR(20) NOT NULL, -- brand, buyer, system + content TEXT NOT NULL, + content_type VARCHAR(20) DEFAULT 'text', -- text, html, markdown + is_read BOOLEAN DEFAULT FALSE, + read_at TIMESTAMPTZ, + is_system_message BOOLEAN DEFAULT FALSE, + metadata JSONB DEFAULT '{}', + created_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_messages_thread ON messages(thread_id); +CREATE INDEX IF NOT EXISTS idx_messages_sender ON messages(sender_user_id); +CREATE INDEX IF NOT EXISTS idx_messages_unread ON messages(thread_id, is_read) WHERE is_read = FALSE; + +-- Message attachments +CREATE TABLE IF NOT EXISTS message_attachments ( + id SERIAL PRIMARY KEY, + message_id INTEGER NOT NULL REFERENCES messages(id) ON DELETE CASCADE, + filename VARCHAR(255) NOT NULL, + file_url TEXT NOT NULL, + file_size INTEGER, + mime_type VARCHAR(100), + created_at TIMESTAMPTZ DEFAULT NOW() +); + +-- Thread participants (for tracking read status per user) +CREATE TABLE IF NOT EXISTS thread_participants ( + id SERIAL PRIMARY KEY, + thread_id INTEGER NOT NULL REFERENCES message_threads(id) ON DELETE CASCADE, + user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE, + last_read_message_id INTEGER REFERENCES messages(id), + unread_count INTEGER DEFAULT 0, + is_muted BOOLEAN DEFAULT FALSE, + created_at TIMESTAMPTZ DEFAULT NOW(), + UNIQUE(thread_id, user_id) +); + +-- ============================================================================ +-- 4. NOTIFICATION SYSTEM +-- ============================================================================ + +-- Notification types +CREATE TABLE IF NOT EXISTS notification_types ( + id SERIAL PRIMARY KEY, + name VARCHAR(100) NOT NULL UNIQUE, + display_name VARCHAR(150) NOT NULL, + category VARCHAR(50) NOT NULL, -- intelligence, messaging, orders, inventory, pricing + description TEXT, + default_channels JSONB DEFAULT '["in_app"]', -- in_app, email, webhook + is_active BOOLEAN DEFAULT TRUE +); + +-- Seed notification types +INSERT INTO notification_types (name, display_name, category, description, default_channels) VALUES + -- Intelligence alerts + ('price_spike', 'Price Spike Alert', 'intelligence', 'SKU price increased significantly', '["in_app", "email"]'), + ('price_crash', 'Price Crash Alert', 'intelligence', 'SKU price decreased significantly', '["in_app", "email"]'), + ('oos_risk', 'Out-of-Stock Risk', 'intelligence', 'SKU at risk of going out of stock', '["in_app", "email"]'), + ('competitive_intrusion', 'Competitive Intrusion', 'intelligence', 'Competitor entered your stores', '["in_app", "email"]'), + ('category_surge', 'Category Surge', 'intelligence', 'Category seeing increased demand', '["in_app"]'), + ('category_decline', 'Category Decline', 'intelligence', 'Category seeing decreased demand', '["in_app"]'), + ('sku_gap_opportunity', 'SKU Gap Opportunity', 'intelligence', 'Missing SKU opportunity identified', '["in_app"]'), + ('brand_dropoff', 'Brand Drop-off', 'intelligence', 'Brand lost presence in stores', '["in_app", "email"]'), + + -- Messaging + ('new_message', 'New Message', 'messaging', 'Received a new message', '["in_app", "email"]'), + ('message_reply', 'Message Reply', 'messaging', 'Someone replied to your message', '["in_app"]'), + + -- Orders + ('order_submitted', 'Order Submitted', 'orders', 'New order received', '["in_app", "email"]'), + ('order_accepted', 'Order Accepted', 'orders', 'Your order was accepted', '["in_app", "email"]'), + ('order_rejected', 'Order Rejected', 'orders', 'Your order was rejected', '["in_app", "email"]'), + ('order_shipped', 'Order Shipped', 'orders', 'Order has been shipped', '["in_app", "email"]'), + ('order_delivered', 'Order Delivered', 'orders', 'Order has been delivered', '["in_app"]'), + + -- Inventory + ('inventory_low', 'Low Inventory', 'inventory', 'Inventory running low', '["in_app", "email"]'), + ('inventory_oos', 'Out of Stock', 'inventory', 'Item is out of stock', '["in_app", "email"]'), + ('inventory_restock', 'Restocked', 'inventory', 'Item has been restocked', '["in_app"]'), + + -- Pricing + ('pricing_suggestion', 'Pricing Suggestion', 'pricing', 'New pricing recommendation available', '["in_app"]'), + ('pricing_accepted', 'Pricing Accepted', 'pricing', 'Pricing suggestion was accepted', '["in_app"]'), + ('pricing_override', 'Pricing Override', 'pricing', 'Pricing was manually overridden', '["in_app"]') +ON CONFLICT (name) DO NOTHING; + +-- User notification preferences +CREATE TABLE IF NOT EXISTS user_notification_preferences ( + id SERIAL PRIMARY KEY, + user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE, + notification_type_id INTEGER NOT NULL REFERENCES notification_types(id), + channels JSONB DEFAULT '["in_app"]', -- Override default channels + is_enabled BOOLEAN DEFAULT TRUE, + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW(), + UNIQUE(user_id, notification_type_id) +); + +-- Notifications +CREATE TABLE IF NOT EXISTS notifications ( + id SERIAL PRIMARY KEY, + user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE, + notification_type_id INTEGER NOT NULL REFERENCES notification_types(id), + title VARCHAR(255) NOT NULL, + body TEXT, + data JSONB DEFAULT '{}', -- Additional context (SKU, store, brand, etc.) + action_url TEXT, -- Link to relevant page + is_read BOOLEAN DEFAULT FALSE, + read_at TIMESTAMPTZ, + channels_sent JSONB DEFAULT '[]', -- Which channels were actually used + created_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_notifications_user ON notifications(user_id); +CREATE INDEX IF NOT EXISTS idx_notifications_unread ON notifications(user_id, is_read) WHERE is_read = FALSE; +CREATE INDEX IF NOT EXISTS idx_notifications_created ON notifications(created_at DESC); +CREATE INDEX IF NOT EXISTS idx_notifications_type ON notifications(notification_type_id); + +-- ============================================================================ +-- 5. INTELLIGENCE ENGINE +-- ============================================================================ + +-- Intelligence alerts (generated by the engine) +CREATE TABLE IF NOT EXISTS intelligence_alerts ( + id SERIAL PRIMARY KEY, + alert_type VARCHAR(50) NOT NULL, -- price_spike, oos_risk, competitive_intrusion, etc. + severity VARCHAR(20) NOT NULL DEFAULT 'medium', -- low, medium, high, critical + + -- Target entity + brand_id INTEGER REFERENCES brands(id), + brand_business_id INTEGER REFERENCES brand_businesses(id), + buyer_business_id INTEGER REFERENCES buyer_businesses(id), + dispensary_id INTEGER REFERENCES dispensaries(id), + store_product_id INTEGER REFERENCES store_products(id), + + -- Context + state CHAR(2), + category VARCHAR(100), + + -- Alert content + title VARCHAR(255) NOT NULL, + description TEXT, + data JSONB DEFAULT '{}', -- Detailed metrics, comparisons, etc. + + -- Recommendations + recommended_action TEXT, + action_data JSONB DEFAULT '{}', -- Structured action data + + -- Status + status VARCHAR(20) DEFAULT 'active', -- active, acknowledged, resolved, dismissed + acknowledged_at TIMESTAMPTZ, + acknowledged_by INTEGER REFERENCES users(id), + resolved_at TIMESTAMPTZ, + + -- Metadata + confidence_score NUMERIC(5,2), -- 0-100 confidence in the alert + expires_at TIMESTAMPTZ, + created_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_intel_alerts_brand ON intelligence_alerts(brand_id); +CREATE INDEX IF NOT EXISTS idx_intel_alerts_brand_biz ON intelligence_alerts(brand_business_id); +CREATE INDEX IF NOT EXISTS idx_intel_alerts_buyer_biz ON intelligence_alerts(buyer_business_id); +CREATE INDEX IF NOT EXISTS idx_intel_alerts_dispensary ON intelligence_alerts(dispensary_id); +CREATE INDEX IF NOT EXISTS idx_intel_alerts_type ON intelligence_alerts(alert_type); +CREATE INDEX IF NOT EXISTS idx_intel_alerts_status ON intelligence_alerts(status); +CREATE INDEX IF NOT EXISTS idx_intel_alerts_state ON intelligence_alerts(state); +CREATE INDEX IF NOT EXISTS idx_intel_alerts_active ON intelligence_alerts(status, created_at DESC) WHERE status = 'active'; + +-- Intelligence recommendations +CREATE TABLE IF NOT EXISTS intelligence_recommendations ( + id SERIAL PRIMARY KEY, + recommendation_type VARCHAR(50) NOT NULL, -- pricing, sku_add, distribution, promo, etc. + + -- Target + brand_id INTEGER REFERENCES brands(id), + brand_business_id INTEGER REFERENCES brand_businesses(id), + buyer_business_id INTEGER REFERENCES buyer_businesses(id), + store_product_id INTEGER REFERENCES store_products(id), + + -- Context + state CHAR(2), + category VARCHAR(100), + + -- Recommendation content + title VARCHAR(255) NOT NULL, + description TEXT, + rationale TEXT, -- Why this recommendation + + -- Impact projection + projected_impact JSONB DEFAULT '{}', -- revenue, margin, penetration, etc. + confidence_score NUMERIC(5,2), + + -- Action + action_type VARCHAR(50), -- adjust_price, add_sku, run_promo, etc. + action_data JSONB DEFAULT '{}', + + -- Status + status VARCHAR(20) DEFAULT 'pending', -- pending, accepted, rejected, expired + decision_at TIMESTAMPTZ, + decision_by INTEGER REFERENCES users(id), + decision_notes TEXT, + + expires_at TIMESTAMPTZ, + created_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_intel_recs_brand ON intelligence_recommendations(brand_id); +CREATE INDEX IF NOT EXISTS idx_intel_recs_brand_biz ON intelligence_recommendations(brand_business_id); +CREATE INDEX IF NOT EXISTS idx_intel_recs_status ON intelligence_recommendations(status); +CREATE INDEX IF NOT EXISTS idx_intel_recs_type ON intelligence_recommendations(recommendation_type); + +-- Intelligence summaries (daily/weekly digests) +CREATE TABLE IF NOT EXISTS intelligence_summaries ( + id SERIAL PRIMARY KEY, + summary_type VARCHAR(20) NOT NULL, -- daily, weekly, monthly + period_start DATE NOT NULL, + period_end DATE NOT NULL, + + -- Target + brand_business_id INTEGER REFERENCES brand_businesses(id), + buyer_business_id INTEGER REFERENCES buyer_businesses(id), + state CHAR(2), + + -- Content + title VARCHAR(255), + executive_summary TEXT, + highlights JSONB DEFAULT '[]', -- Key points + metrics JSONB DEFAULT '{}', -- Period metrics + alerts_summary JSONB DEFAULT '{}', -- Alert counts by type + recommendations_summary JSONB DEFAULT '{}', + + -- Status + is_sent BOOLEAN DEFAULT FALSE, + sent_at TIMESTAMPTZ, + + created_at TIMESTAMPTZ DEFAULT NOW(), + UNIQUE(summary_type, period_start, brand_business_id, buyer_business_id, state) +); + +-- Intelligence rules (user-configurable alert thresholds) +CREATE TABLE IF NOT EXISTS intelligence_rules ( + id SERIAL PRIMARY KEY, + rule_type VARCHAR(50) NOT NULL, -- price_change, penetration_drop, inventory_low, etc. + + -- Owner + brand_business_id INTEGER REFERENCES brand_businesses(id), + buyer_business_id INTEGER REFERENCES buyer_businesses(id), + + -- Scope + state CHAR(2), -- NULL = all states + category VARCHAR(100), -- NULL = all categories + brand_id INTEGER REFERENCES brands(id), -- NULL = all brands + + -- Rule configuration + name VARCHAR(255) NOT NULL, + description TEXT, + conditions JSONB NOT NULL, -- Threshold conditions + actions JSONB NOT NULL, -- What to do when triggered + + -- Settings + is_enabled BOOLEAN DEFAULT TRUE, + cooldown_hours INTEGER DEFAULT 24, -- Minimum hours between re-triggering + last_triggered_at TIMESTAMPTZ, + + created_by INTEGER REFERENCES users(id), + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_intel_rules_brand_biz ON intelligence_rules(brand_business_id); +CREATE INDEX IF NOT EXISTS idx_intel_rules_buyer_biz ON intelligence_rules(buyer_business_id); +CREATE INDEX IF NOT EXISTS idx_intel_rules_enabled ON intelligence_rules(is_enabled) WHERE is_enabled = TRUE; + +-- ============================================================================ +-- 6. BRAND CATALOG EXTENSIONS +-- ============================================================================ + +-- Brand product catalog (master SKU list per brand) +CREATE TABLE IF NOT EXISTS brand_catalog_items ( + id SERIAL PRIMARY KEY, + brand_id INTEGER NOT NULL REFERENCES brands(id), + + -- Product info + sku VARCHAR(100) NOT NULL, + name VARCHAR(255) NOT NULL, + category VARCHAR(100), + subcategory VARCHAR(100), + description TEXT, + + -- Attributes + thc_percent NUMERIC(5,2), + cbd_percent NUMERIC(5,2), + weight_grams NUMERIC(10,3), + unit_count INTEGER, + + -- Media + image_url TEXT, + additional_images JSONB DEFAULT '[]', + lab_results_url TEXT, + + -- Pricing (MSRP) + msrp NUMERIC(10,2), + wholesale_price NUMERIC(10,2), + + -- Availability + states_available CHAR(2)[] DEFAULT '{}', + is_active BOOLEAN DEFAULT TRUE, + launch_date DATE, + discontinue_date DATE, + + -- Flags + needs_review BOOLEAN DEFAULT FALSE, + review_notes TEXT, + + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW(), + UNIQUE(brand_id, sku) +); + +CREATE INDEX IF NOT EXISTS idx_brand_catalog_brand ON brand_catalog_items(brand_id); +CREATE INDEX IF NOT EXISTS idx_brand_catalog_category ON brand_catalog_items(category); +CREATE INDEX IF NOT EXISTS idx_brand_catalog_active ON brand_catalog_items(is_active) WHERE is_active = TRUE; + +-- Catalog-to-store mapping (which catalog items are in which stores) +CREATE TABLE IF NOT EXISTS brand_catalog_distribution ( + id SERIAL PRIMARY KEY, + catalog_item_id INTEGER NOT NULL REFERENCES brand_catalog_items(id), + dispensary_id INTEGER NOT NULL REFERENCES dispensaries(id), + store_product_id INTEGER REFERENCES store_products(id), -- Link to crawled product + + -- Status + status VARCHAR(20) DEFAULT 'active', -- active, discontinued, pending + first_seen_at TIMESTAMPTZ, + last_seen_at TIMESTAMPTZ, + + -- Pricing at store + current_price NUMERIC(10,2), + price_vs_msrp_percent NUMERIC(5,2), -- % difference from MSRP + + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW(), + UNIQUE(catalog_item_id, dispensary_id) +); + +CREATE INDEX IF NOT EXISTS idx_catalog_dist_item ON brand_catalog_distribution(catalog_item_id); +CREATE INDEX IF NOT EXISTS idx_catalog_dist_dispensary ON brand_catalog_distribution(dispensary_id); + +-- ============================================================================ +-- Record migration +-- ============================================================================ + +INSERT INTO schema_migrations (version, name, applied_at) +VALUES (48, '048_phase6_portals_intelligence', NOW()) +ON CONFLICT (version) DO NOTHING; diff --git a/backend/src/migrations/049_phase7_orders_inventory_pricing.sql b/backend/src/migrations/049_phase7_orders_inventory_pricing.sql new file mode 100644 index 00000000..ff8ac6b5 --- /dev/null +++ b/backend/src/migrations/049_phase7_orders_inventory_pricing.sql @@ -0,0 +1,462 @@ +-- Migration: 049_phase7_orders_inventory_pricing.sql +-- Purpose: Phase 7 - Live Ordering, Inventory Sync, Pricing Automation +-- Creates tables for: orders, inventory, pricing rules, and automation + +-- ============================================================================ +-- 1. ORDERS SYSTEM +-- ============================================================================ + +-- Orders +CREATE TABLE IF NOT EXISTS orders ( + id SERIAL PRIMARY KEY, + order_number VARCHAR(50) NOT NULL UNIQUE, -- Human-readable order number + + -- Parties + buyer_business_id INTEGER NOT NULL REFERENCES buyer_businesses(id), + seller_brand_business_id INTEGER NOT NULL REFERENCES brand_businesses(id), + + -- Location + state CHAR(2) NOT NULL, + shipping_address JSONB, -- Street, city, state, zip + + -- Financials + subtotal NUMERIC(12,2) NOT NULL DEFAULT 0, + tax_amount NUMERIC(12,2) DEFAULT 0, + discount_amount NUMERIC(12,2) DEFAULT 0, + shipping_cost NUMERIC(12,2) DEFAULT 0, + total NUMERIC(12,2) NOT NULL DEFAULT 0, + currency CHAR(3) DEFAULT 'USD', + + -- Status + status VARCHAR(30) NOT NULL DEFAULT 'draft', + -- draft, submitted, accepted, rejected, processing, packed, shipped, delivered, cancelled + + -- Timestamps + submitted_at TIMESTAMPTZ, + accepted_at TIMESTAMPTZ, + rejected_at TIMESTAMPTZ, + processing_at TIMESTAMPTZ, + packed_at TIMESTAMPTZ, + shipped_at TIMESTAMPTZ, + delivered_at TIMESTAMPTZ, + cancelled_at TIMESTAMPTZ, + + -- Tracking + tracking_number VARCHAR(100), + carrier VARCHAR(50), + estimated_delivery_date DATE, + + -- Notes + buyer_notes TEXT, + seller_notes TEXT, + internal_notes TEXT, + + -- Metadata + po_number VARCHAR(100), -- Buyer's PO reference + manifest_number VARCHAR(100), -- State compliance + metadata JSONB DEFAULT '{}', + + created_by INTEGER REFERENCES users(id), + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_orders_buyer ON orders(buyer_business_id); +CREATE INDEX IF NOT EXISTS idx_orders_seller ON orders(seller_brand_business_id); +CREATE INDEX IF NOT EXISTS idx_orders_state ON orders(state); +CREATE INDEX IF NOT EXISTS idx_orders_status ON orders(status); +CREATE INDEX IF NOT EXISTS idx_orders_composite ON orders(state, seller_brand_business_id, buyer_business_id); +CREATE INDEX IF NOT EXISTS idx_orders_submitted ON orders(submitted_at DESC) WHERE submitted_at IS NOT NULL; + +-- Order items +CREATE TABLE IF NOT EXISTS order_items ( + id SERIAL PRIMARY KEY, + order_id INTEGER NOT NULL REFERENCES orders(id) ON DELETE CASCADE, + + -- Product reference + catalog_item_id INTEGER REFERENCES brand_catalog_items(id), + store_product_id INTEGER REFERENCES store_products(id), + sku VARCHAR(100) NOT NULL, + name VARCHAR(255) NOT NULL, + category VARCHAR(100), + + -- Quantity and pricing + quantity INTEGER NOT NULL, + unit_price NUMERIC(10,2) NOT NULL, + discount_percent NUMERIC(5,2) DEFAULT 0, + discount_amount NUMERIC(10,2) DEFAULT 0, + line_total NUMERIC(12,2) NOT NULL, + + -- Fulfillment + quantity_fulfilled INTEGER DEFAULT 0, + fulfillment_status VARCHAR(20) DEFAULT 'pending', -- pending, partial, complete, cancelled + + -- Notes + notes TEXT, + + created_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_order_items_order ON order_items(order_id); +CREATE INDEX IF NOT EXISTS idx_order_items_catalog ON order_items(catalog_item_id); +CREATE INDEX IF NOT EXISTS idx_order_items_sku ON order_items(sku); + +-- Order status history +CREATE TABLE IF NOT EXISTS order_status_history ( + id SERIAL PRIMARY KEY, + order_id INTEGER NOT NULL REFERENCES orders(id) ON DELETE CASCADE, + from_status VARCHAR(30), + to_status VARCHAR(30) NOT NULL, + changed_by INTEGER REFERENCES users(id), + reason TEXT, + metadata JSONB DEFAULT '{}', + created_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_order_history_order ON order_status_history(order_id); +CREATE INDEX IF NOT EXISTS idx_order_history_created ON order_status_history(created_at DESC); + +-- Order documents +CREATE TABLE IF NOT EXISTS order_documents ( + id SERIAL PRIMARY KEY, + order_id INTEGER NOT NULL REFERENCES orders(id) ON DELETE CASCADE, + document_type VARCHAR(50) NOT NULL, -- po, invoice, manifest, packing_slip, other + filename VARCHAR(255) NOT NULL, + file_url TEXT NOT NULL, + file_size INTEGER, + mime_type VARCHAR(100), + uploaded_by INTEGER REFERENCES users(id), + created_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_order_docs_order ON order_documents(order_id); + +-- ============================================================================ +-- 2. INVENTORY SYSTEM +-- ============================================================================ + +-- Brand inventory (master inventory per brand per state) +CREATE TABLE IF NOT EXISTS brand_inventory ( + id SERIAL PRIMARY KEY, + brand_id INTEGER NOT NULL REFERENCES brands(id), + catalog_item_id INTEGER NOT NULL REFERENCES brand_catalog_items(id), + state CHAR(2) NOT NULL, + + -- Quantities + quantity_on_hand INTEGER NOT NULL DEFAULT 0, + quantity_reserved INTEGER DEFAULT 0, -- Reserved for pending orders + quantity_available INTEGER GENERATED ALWAYS AS (quantity_on_hand - COALESCE(quantity_reserved, 0)) STORED, + reorder_point INTEGER DEFAULT 0, -- Alert when qty drops below + + -- Status + inventory_status VARCHAR(20) DEFAULT 'in_stock', -- in_stock, low, oos, preorder, discontinued + available_date DATE, -- For preorders + + -- Source + last_sync_source VARCHAR(50), -- api, manual, erp + last_sync_at TIMESTAMPTZ, + + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW(), + UNIQUE(catalog_item_id, state) +); + +CREATE INDEX IF NOT EXISTS idx_brand_inventory_brand ON brand_inventory(brand_id); +CREATE INDEX IF NOT EXISTS idx_brand_inventory_catalog ON brand_inventory(catalog_item_id); +CREATE INDEX IF NOT EXISTS idx_brand_inventory_state ON brand_inventory(state); +CREATE INDEX IF NOT EXISTS idx_brand_inventory_status ON brand_inventory(inventory_status); +CREATE INDEX IF NOT EXISTS idx_brand_inventory_low ON brand_inventory(quantity_on_hand, reorder_point) WHERE quantity_on_hand <= reorder_point; + +-- Inventory history (audit trail) +CREATE TABLE IF NOT EXISTS inventory_history ( + id SERIAL PRIMARY KEY, + brand_inventory_id INTEGER NOT NULL REFERENCES brand_inventory(id), + change_type VARCHAR(30) NOT NULL, -- adjustment, order_reserve, order_fulfill, sync, restock, write_off + quantity_change INTEGER NOT NULL, + quantity_before INTEGER NOT NULL, + quantity_after INTEGER NOT NULL, + order_id INTEGER REFERENCES orders(id), + reason TEXT, + changed_by INTEGER REFERENCES users(id), + created_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_inv_history_inventory ON inventory_history(brand_inventory_id); +CREATE INDEX IF NOT EXISTS idx_inv_history_created ON inventory_history(created_at DESC); + +-- Inventory sync log +CREATE TABLE IF NOT EXISTS inventory_sync_log ( + id SERIAL PRIMARY KEY, + brand_business_id INTEGER NOT NULL REFERENCES brand_businesses(id), + sync_source VARCHAR(50) NOT NULL, -- api, webhook, manual, erp + status VARCHAR(20) NOT NULL, -- pending, processing, completed, failed + items_synced INTEGER DEFAULT 0, + items_failed INTEGER DEFAULT 0, + error_message TEXT, + request_payload JSONB, + response_summary JSONB, + started_at TIMESTAMPTZ DEFAULT NOW(), + completed_at TIMESTAMPTZ +); + +CREATE INDEX IF NOT EXISTS idx_inv_sync_brand ON inventory_sync_log(brand_business_id); +CREATE INDEX IF NOT EXISTS idx_inv_sync_status ON inventory_sync_log(status); + +-- ============================================================================ +-- 3. PRICING SYSTEM +-- ============================================================================ + +-- Pricing rules (automation configuration) +CREATE TABLE IF NOT EXISTS pricing_rules ( + id SERIAL PRIMARY KEY, + brand_business_id INTEGER NOT NULL REFERENCES brand_businesses(id), + + -- Scope + name VARCHAR(255) NOT NULL, + description TEXT, + state CHAR(2), -- NULL = all states + category VARCHAR(100), -- NULL = all categories + catalog_item_id INTEGER REFERENCES brand_catalog_items(id), -- NULL = all items + + -- Rule type + rule_type VARCHAR(50) NOT NULL, -- floor, ceiling, competitive, margin, velocity + + -- Conditions + conditions JSONB NOT NULL DEFAULT '{}', + -- Example: {"competitor_price_below": 10, "velocity_above": 50} + + -- Actions + actions JSONB NOT NULL DEFAULT '{}', + -- Example: {"adjust_price_by_percent": -5, "min_price": 25} + + -- Constraints + min_price NUMERIC(10,2), + max_price NUMERIC(10,2), + max_adjustment_percent NUMERIC(5,2) DEFAULT 15, -- Max % change per adjustment + + -- Settings + priority INTEGER DEFAULT 0, -- Higher = more important + is_enabled BOOLEAN DEFAULT TRUE, + requires_approval BOOLEAN DEFAULT FALSE, + cooldown_hours INTEGER DEFAULT 24, + last_triggered_at TIMESTAMPTZ, + + created_by INTEGER REFERENCES users(id), + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_pricing_rules_brand ON pricing_rules(brand_business_id); +CREATE INDEX IF NOT EXISTS idx_pricing_rules_enabled ON pricing_rules(is_enabled) WHERE is_enabled = TRUE; + +-- Pricing suggestions (generated by automation) +CREATE TABLE IF NOT EXISTS pricing_suggestions ( + id SERIAL PRIMARY KEY, + catalog_item_id INTEGER NOT NULL REFERENCES brand_catalog_items(id), + brand_business_id INTEGER NOT NULL REFERENCES brand_businesses(id), + state CHAR(2) NOT NULL, + + -- Current vs suggested + current_price NUMERIC(10,2) NOT NULL, + suggested_price NUMERIC(10,2) NOT NULL, + price_change_amount NUMERIC(10,2), + price_change_percent NUMERIC(5,2), + + -- Rationale + suggestion_type VARCHAR(50) NOT NULL, -- competitive, margin, velocity, promotional + rationale TEXT, + supporting_data JSONB DEFAULT '{}', -- Competitor prices, velocity data, etc. + + -- Projections + projected_revenue_impact NUMERIC(12,2), + projected_margin_impact NUMERIC(5,2), + confidence_score NUMERIC(5,2), -- 0-100 + + -- Triggering rule + triggered_by_rule_id INTEGER REFERENCES pricing_rules(id), + + -- Status + status VARCHAR(20) DEFAULT 'pending', -- pending, accepted, rejected, expired, auto_applied + decision_at TIMESTAMPTZ, + decision_by INTEGER REFERENCES users(id), + decision_notes TEXT, + + expires_at TIMESTAMPTZ, + created_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_pricing_sugg_catalog ON pricing_suggestions(catalog_item_id); +CREATE INDEX IF NOT EXISTS idx_pricing_sugg_brand ON pricing_suggestions(brand_business_id); +CREATE INDEX IF NOT EXISTS idx_pricing_sugg_state ON pricing_suggestions(state); +CREATE INDEX IF NOT EXISTS idx_pricing_sugg_status ON pricing_suggestions(status); +CREATE INDEX IF NOT EXISTS idx_pricing_sugg_pending ON pricing_suggestions(status, created_at DESC) WHERE status = 'pending'; + +-- Pricing history (track all price changes) +CREATE TABLE IF NOT EXISTS pricing_history ( + id SERIAL PRIMARY KEY, + catalog_item_id INTEGER NOT NULL REFERENCES brand_catalog_items(id), + state CHAR(2), + + -- Change details + field_changed VARCHAR(50) NOT NULL, -- msrp, wholesale_price + old_value NUMERIC(10,2), + new_value NUMERIC(10,2), + change_percent NUMERIC(5,2), + + -- Source + change_source VARCHAR(50) NOT NULL, -- manual, rule_auto, suggestion_accepted, sync + suggestion_id INTEGER REFERENCES pricing_suggestions(id), + rule_id INTEGER REFERENCES pricing_rules(id), + + changed_by INTEGER REFERENCES users(id), + created_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_pricing_hist_catalog ON pricing_history(catalog_item_id); +CREATE INDEX IF NOT EXISTS idx_pricing_hist_created ON pricing_history(created_at DESC); + +-- ============================================================================ +-- 4. BUYER CARTS (Pre-order staging) +-- ============================================================================ + +CREATE TABLE IF NOT EXISTS buyer_carts ( + id SERIAL PRIMARY KEY, + buyer_business_id INTEGER NOT NULL REFERENCES buyer_businesses(id), + seller_brand_business_id INTEGER NOT NULL REFERENCES brand_businesses(id), + state CHAR(2) NOT NULL, + + -- Status + status VARCHAR(20) DEFAULT 'active', -- active, abandoned, converted + converted_to_order_id INTEGER REFERENCES orders(id), + + -- Timestamps + last_activity_at TIMESTAMPTZ DEFAULT NOW(), + expires_at TIMESTAMPTZ, + + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW(), + UNIQUE(buyer_business_id, seller_brand_business_id, state) +); + +CREATE INDEX IF NOT EXISTS idx_carts_buyer ON buyer_carts(buyer_business_id); +CREATE INDEX IF NOT EXISTS idx_carts_seller ON buyer_carts(seller_brand_business_id); + +CREATE TABLE IF NOT EXISTS cart_items ( + id SERIAL PRIMARY KEY, + cart_id INTEGER NOT NULL REFERENCES buyer_carts(id) ON DELETE CASCADE, + catalog_item_id INTEGER NOT NULL REFERENCES brand_catalog_items(id), + quantity INTEGER NOT NULL DEFAULT 1, + unit_price NUMERIC(10,2) NOT NULL, + notes TEXT, + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW(), + UNIQUE(cart_id, catalog_item_id) +); + +CREATE INDEX IF NOT EXISTS idx_cart_items_cart ON cart_items(cart_id); + +-- ============================================================================ +-- 5. DISCOVERY FEED (For Buyer Portal) +-- ============================================================================ + +CREATE TABLE IF NOT EXISTS discovery_feed_items ( + id SERIAL PRIMARY KEY, + item_type VARCHAR(50) NOT NULL, -- new_brand, new_sku, trending, recommendation, expansion + state CHAR(2) NOT NULL, + + -- Target + brand_id INTEGER REFERENCES brands(id), + catalog_item_id INTEGER REFERENCES brand_catalog_items(id), + category VARCHAR(100), + + -- Content + title VARCHAR(255) NOT NULL, + description TEXT, + image_url TEXT, + data JSONB DEFAULT '{}', + + -- Targeting + target_buyer_business_ids INTEGER[], -- NULL = all buyers in state + target_categories VARCHAR(100)[], + + -- Display + priority INTEGER DEFAULT 0, + is_featured BOOLEAN DEFAULT FALSE, + cta_text VARCHAR(100), + cta_url TEXT, + + -- Lifecycle + is_active BOOLEAN DEFAULT TRUE, + starts_at TIMESTAMPTZ DEFAULT NOW(), + expires_at TIMESTAMPTZ, + + created_at TIMESTAMPTZ DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_discovery_state ON discovery_feed_items(state); +CREATE INDEX IF NOT EXISTS idx_discovery_type ON discovery_feed_items(item_type); +CREATE INDEX IF NOT EXISTS idx_discovery_active ON discovery_feed_items(is_active, starts_at, expires_at) + WHERE is_active = TRUE; + +-- ============================================================================ +-- 6. ANALYTICS HELPERS +-- ============================================================================ + +-- View for order analytics +CREATE OR REPLACE VIEW v_order_analytics AS +SELECT + o.state, + o.seller_brand_business_id, + o.buyer_business_id, + DATE(o.submitted_at) AS order_date, + o.status, + o.total, + COUNT(oi.id) AS item_count, + SUM(oi.quantity) AS total_units +FROM orders o +LEFT JOIN order_items oi ON o.id = oi.order_id +WHERE o.submitted_at IS NOT NULL +GROUP BY o.id, o.state, o.seller_brand_business_id, o.buyer_business_id, DATE(o.submitted_at), o.status, o.total; + +-- View for inventory alerts +CREATE OR REPLACE VIEW v_inventory_alerts AS +SELECT + bi.id AS inventory_id, + bi.brand_id, + bi.catalog_item_id, + bci.sku, + bci.name AS product_name, + bi.state, + bi.quantity_on_hand, + bi.quantity_available, + bi.reorder_point, + bi.inventory_status, + CASE + WHEN bi.quantity_on_hand = 0 THEN 'critical' + WHEN bi.quantity_on_hand <= bi.reorder_point THEN 'warning' + ELSE 'normal' + END AS alert_level +FROM brand_inventory bi +JOIN brand_catalog_items bci ON bi.catalog_item_id = bci.id +WHERE bi.quantity_on_hand <= bi.reorder_point + OR bi.inventory_status = 'oos'; + +-- Function to generate order number +CREATE OR REPLACE FUNCTION generate_order_number() +RETURNS VARCHAR AS $$ +DECLARE + new_number VARCHAR; +BEGIN + SELECT 'ORD-' || TO_CHAR(NOW(), 'YYYYMMDD') || '-' || LPAD(NEXTVAL('orders_id_seq')::TEXT, 6, '0') + INTO new_number; + RETURN new_number; +END; +$$ LANGUAGE plpgsql; + +-- ============================================================================ +-- Record migration +-- ============================================================================ + +INSERT INTO schema_migrations (version, name, applied_at) +VALUES (49, '049_phase7_orders_inventory_pricing', NOW()) +ON CONFLICT (version) DO NOTHING; diff --git a/backend/src/migrations/050_canonical_hydration_schema.sql b/backend/src/migrations/050_canonical_hydration_schema.sql new file mode 100644 index 00000000..84f293f5 --- /dev/null +++ b/backend/src/migrations/050_canonical_hydration_schema.sql @@ -0,0 +1,33 @@ +-- Migration 050: Canonical Hydration Schema Adjustments +-- Adds columns and constraints for idempotent hydration from dutchie_* to canonical tables + +-- Add source job tracking columns to crawl_runs for linking back to source jobs +ALTER TABLE crawl_runs ADD COLUMN IF NOT EXISTS source_job_type VARCHAR(50); +ALTER TABLE crawl_runs ADD COLUMN IF NOT EXISTS source_job_id INTEGER; + +-- Add unique constraint for idempotent crawl_runs insertion +-- This allows re-running hydration without creating duplicates +CREATE UNIQUE INDEX IF NOT EXISTS idx_crawl_runs_source_job +ON crawl_runs (source_job_type, source_job_id) +WHERE source_job_id IS NOT NULL; + +-- Add unique constraint for idempotent store_product_snapshots insertion +-- One snapshot per product per crawl run +CREATE UNIQUE INDEX IF NOT EXISTS idx_store_product_snapshots_product_crawl +ON store_product_snapshots (store_product_id, crawl_run_id) +WHERE store_product_id IS NOT NULL AND crawl_run_id IS NOT NULL; + +-- Add index to speed up snapshot queries by crawl_run_id +CREATE INDEX IF NOT EXISTS idx_store_product_snapshots_crawl_run +ON store_product_snapshots (crawl_run_id); + +-- Add index to speed up hydration queries on dutchie_product_snapshots +CREATE INDEX IF NOT EXISTS idx_dutchie_product_snapshots_crawled_at +ON dutchie_product_snapshots (crawled_at); + +-- Add index for finding unhydrated snapshots (snapshots without corresponding store_product_snapshots) +CREATE INDEX IF NOT EXISTS idx_dutchie_product_snapshots_dispensary_crawled +ON dutchie_product_snapshots (dispensary_id, crawled_at); + +COMMENT ON COLUMN crawl_runs.source_job_type IS 'Source job type for hydration tracking: dispensary_crawl_jobs, crawl_jobs, job_run_logs'; +COMMENT ON COLUMN crawl_runs.source_job_id IS 'Source job ID for hydration tracking - links back to original job table'; diff --git a/backend/src/migrations/051_create_mv_state_metrics.sql b/backend/src/migrations/051_create_mv_state_metrics.sql new file mode 100644 index 00000000..6eff9522 --- /dev/null +++ b/backend/src/migrations/051_create_mv_state_metrics.sql @@ -0,0 +1,83 @@ +-- Migration: 051_create_mv_state_metrics.sql +-- Purpose: Create mv_state_metrics materialized view for national heatmap +-- This is a READ-ONLY aggregate using canonical schema tables. +-- NO DROPS, NO TRUNCATES, NO DELETES. + +-- ============================================================================ +-- 1. Create mv_state_metrics materialized view (if not exists) +-- ============================================================================ + +-- PostgreSQL doesn't have CREATE MATERIALIZED VIEW IF NOT EXISTS, +-- so we use a DO block to check existence first. + +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM pg_matviews WHERE matviewname = 'mv_state_metrics' + ) THEN + EXECUTE ' + CREATE MATERIALIZED VIEW mv_state_metrics AS + SELECT + d.state, + s.name AS state_name, + COUNT(DISTINCT d.id) AS store_count, + COUNT(DISTINCT CASE WHEN d.menu_type = ''dutchie'' THEN d.id END) AS dutchie_stores, + COUNT(DISTINCT CASE WHEN d.crawl_status = ''active'' THEN d.id END) AS active_stores, + COUNT(DISTINCT sp.id) AS total_products, + COUNT(DISTINCT CASE WHEN sp.is_in_stock THEN sp.id END) AS in_stock_products, + COUNT(DISTINCT CASE WHEN sp.is_on_special THEN sp.id END) AS on_special_products, + COUNT(DISTINCT sp.brand_id) AS unique_brands, + COUNT(DISTINCT sp.category_raw) AS unique_categories, + AVG(sp.price_rec)::NUMERIC(10,2) AS avg_price_rec, + MIN(sp.price_rec)::NUMERIC(10,2) AS min_price_rec, + MAX(sp.price_rec)::NUMERIC(10,2) AS max_price_rec, + NOW() AS refreshed_at + FROM dispensaries d + LEFT JOIN states s ON d.state = s.code + LEFT JOIN store_products sp ON d.id = sp.dispensary_id + WHERE d.state IS NOT NULL + GROUP BY d.state, s.name + '; + RAISE NOTICE 'Created materialized view mv_state_metrics'; + ELSE + RAISE NOTICE 'Materialized view mv_state_metrics already exists, skipping creation'; + END IF; +END $$; + +-- ============================================================================ +-- 2. Create unique index for CONCURRENTLY refresh support +-- ============================================================================ + +CREATE UNIQUE INDEX IF NOT EXISTS idx_mv_state_metrics_state + ON mv_state_metrics(state); + +-- ============================================================================ +-- 3. Ensure refresh function exists +-- ============================================================================ + +CREATE OR REPLACE FUNCTION refresh_state_metrics() +RETURNS void AS $$ +BEGIN + REFRESH MATERIALIZED VIEW CONCURRENTLY mv_state_metrics; +END; +$$ LANGUAGE plpgsql; + +-- ============================================================================ +-- 4. Record migration +-- ============================================================================ + +INSERT INTO schema_migrations (version, name, applied_at) +VALUES (51, '051_create_mv_state_metrics', NOW()) +ON CONFLICT (version) DO NOTHING; + +-- ============================================================================ +-- NOTE: To refresh the view after migration, run manually: +-- +-- REFRESH MATERIALIZED VIEW CONCURRENTLY mv_state_metrics; +-- +-- Or call the helper function: +-- +-- SELECT refresh_state_metrics(); +-- +-- Do NOT run refresh automatically in this migration. +-- ============================================================================ diff --git a/backend/src/portals/index.ts b/backend/src/portals/index.ts new file mode 100644 index 00000000..9272e14c --- /dev/null +++ b/backend/src/portals/index.ts @@ -0,0 +1,20 @@ +/** + * Portals Module + * Phase 6: Brand Portal + Buyer Portal + Intelligence Engine + * Phase 7: Orders + Inventory + Pricing Automation + */ + +// Types +export * from './types'; + +// Services +export { BrandPortalService } from './services/brand-portal'; +export { BuyerPortalService } from './services/buyer-portal'; +export { IntelligenceEngineService } from './services/intelligence'; +export { MessagingService } from './services/messaging'; +export { OrdersService } from './services/orders'; +export { InventoryService } from './services/inventory'; +export { PricingAutomationService } from './services/pricing'; + +// Routes +export { createPortalRoutes } from './routes'; diff --git a/backend/src/portals/routes.ts b/backend/src/portals/routes.ts new file mode 100644 index 00000000..235668de --- /dev/null +++ b/backend/src/portals/routes.ts @@ -0,0 +1,746 @@ +/** + * Portal Routes + * Phase 6 & 7: Brand Portal, Buyer Portal, Intelligence, Orders, Inventory, Pricing + */ + +import { Router, Request, Response, NextFunction } from 'express'; +import { Pool } from 'pg'; +import { BrandPortalService } from './services/brand-portal'; +import { BuyerPortalService } from './services/buyer-portal'; +import { IntelligenceEngineService } from './services/intelligence'; +import { MessagingService } from './services/messaging'; +import { OrdersService } from './services/orders'; +import { InventoryService } from './services/inventory'; +import { PricingAutomationService } from './services/pricing'; + +export function createPortalRoutes(pool: Pool): Router { + const router = Router(); + + // Initialize services + const brandPortal = new BrandPortalService(pool); + const buyerPortal = new BuyerPortalService(pool); + const intelligence = new IntelligenceEngineService(pool); + const messaging = new MessagingService(pool); + const orders = new OrdersService(pool); + const inventory = new InventoryService(pool); + const pricing = new PricingAutomationService(pool); + + // Async handler wrapper + const asyncHandler = (fn: (req: Request, res: Response, next: NextFunction) => Promise) => + (req: Request, res: Response, next: NextFunction) => Promise.resolve(fn(req, res, next)).catch(next); + + // ============================================================================ + // BRAND PORTAL ROUTES + // ============================================================================ + + // Dashboard + router.get('/brand/:brandBusinessId/dashboard', asyncHandler(async (req, res) => { + const brandBusinessId = parseInt(req.params.brandBusinessId); + const metrics = await brandPortal.getDashboardMetrics(brandBusinessId); + res.json({ success: true, data: metrics }); + })); + + // Business profile + router.get('/brand/:brandBusinessId', asyncHandler(async (req, res) => { + const brandBusinessId = parseInt(req.params.brandBusinessId); + const business = await brandPortal.getBrandBusiness(brandBusinessId); + if (!business) { + return res.status(404).json({ success: false, error: 'Brand business not found' }); + } + res.json({ success: true, data: business }); + })); + + router.patch('/brand/:brandBusinessId', asyncHandler(async (req, res) => { + const brandBusinessId = parseInt(req.params.brandBusinessId); + const business = await brandPortal.updateBrandBusiness(brandBusinessId, req.body); + res.json({ success: true, data: business }); + })); + + // Catalog management + router.get('/brand/:brandBusinessId/catalog', asyncHandler(async (req, res) => { + const brandBusinessId = parseInt(req.params.brandBusinessId); + const { limit, offset, category, search, sortBy, sortDir } = req.query; + const result = await brandPortal.getCatalogItems(brandBusinessId, { + limit: limit ? parseInt(limit as string) : undefined, + offset: offset ? parseInt(offset as string) : undefined, + category: category as string, + search: search as string, + sortBy: sortBy as string, + sortDir: sortDir as 'asc' | 'desc', + }); + res.json({ success: true, data: result }); + })); + + router.post('/brand/:brandBusinessId/catalog', asyncHandler(async (req, res) => { + const brandBusinessId = parseInt(req.params.brandBusinessId); + const item = await brandPortal.createCatalogItem(brandBusinessId, req.body); + res.status(201).json({ success: true, data: item }); + })); + + router.get('/brand/catalog/:itemId', asyncHandler(async (req, res) => { + const item = await brandPortal.getCatalogItem(parseInt(req.params.itemId)); + if (!item) { + return res.status(404).json({ success: false, error: 'Catalog item not found' }); + } + res.json({ success: true, data: item }); + })); + + router.patch('/brand/catalog/:itemId', asyncHandler(async (req, res) => { + const item = await brandPortal.updateCatalogItem(parseInt(req.params.itemId), req.body); + res.json({ success: true, data: item }); + })); + + router.delete('/brand/catalog/:itemId', asyncHandler(async (req, res) => { + await brandPortal.deleteCatalogItem(parseInt(req.params.itemId)); + res.json({ success: true }); + })); + + // Store presence + router.get('/brand/:brandBusinessId/presence', asyncHandler(async (req, res) => { + const brandBusinessId = parseInt(req.params.brandBusinessId); + const { limit, offset, state, sortBy, sortDir } = req.query; + const result = await brandPortal.getStorePresence(brandBusinessId, { + limit: limit ? parseInt(limit as string) : undefined, + offset: offset ? parseInt(offset as string) : undefined, + state: state as string, + sortBy: sortBy as string, + sortDir: sortDir as 'asc' | 'desc', + }); + res.json({ success: true, data: result }); + })); + + // Competitor analysis + router.get('/brand/:brandBusinessId/competitors', asyncHandler(async (req, res) => { + const brandBusinessId = parseInt(req.params.brandBusinessId); + const { state, category } = req.query; + const result = await brandPortal.getCompetitorAnalysis(brandBusinessId, { + state: state as string, + category: category as string, + }); + res.json({ success: true, data: result }); + })); + + // ============================================================================ + // BUYER PORTAL ROUTES + // ============================================================================ + + // Dashboard + router.get('/buyer/:buyerBusinessId/dashboard', asyncHandler(async (req, res) => { + const buyerBusinessId = parseInt(req.params.buyerBusinessId); + const metrics = await buyerPortal.getDashboardMetrics(buyerBusinessId); + res.json({ success: true, data: metrics }); + })); + + // Business profile + router.get('/buyer/:buyerBusinessId', asyncHandler(async (req, res) => { + const buyerBusinessId = parseInt(req.params.buyerBusinessId); + const business = await buyerPortal.getBuyerBusiness(buyerBusinessId); + if (!business) { + return res.status(404).json({ success: false, error: 'Buyer business not found' }); + } + res.json({ success: true, data: business }); + })); + + router.patch('/buyer/:buyerBusinessId', asyncHandler(async (req, res) => { + const buyerBusinessId = parseInt(req.params.buyerBusinessId); + const business = await buyerPortal.updateBuyerBusiness(buyerBusinessId, req.body); + res.json({ success: true, data: business }); + })); + + // Discovery feed + router.get('/buyer/:buyerBusinessId/discovery', asyncHandler(async (req, res) => { + const buyerBusinessId = parseInt(req.params.buyerBusinessId); + const { limit, offset, category } = req.query; + const result = await buyerPortal.getDiscoveryFeed(buyerBusinessId, { + limit: limit ? parseInt(limit as string) : undefined, + offset: offset ? parseInt(offset as string) : undefined, + category: category as string, + }); + res.json({ success: true, data: result }); + })); + + // Catalog browsing + router.get('/buyer/:buyerBusinessId/browse', asyncHandler(async (req, res) => { + const buyerBusinessId = parseInt(req.params.buyerBusinessId); + const { limit, offset, brandId, category, search, sortBy, sortDir } = req.query; + const result = await buyerPortal.browseCatalog(buyerBusinessId, { + limit: limit ? parseInt(limit as string) : undefined, + offset: offset ? parseInt(offset as string) : undefined, + brandId: brandId ? parseInt(brandId as string) : undefined, + category: category as string, + search: search as string, + sortBy: sortBy as string, + sortDir: sortDir as 'asc' | 'desc', + }); + res.json({ success: true, data: result }); + })); + + router.get('/buyer/:buyerBusinessId/brands', asyncHandler(async (req, res) => { + const buyerBusinessId = parseInt(req.params.buyerBusinessId); + const result = await buyerPortal.getBrandsForBuyer(buyerBusinessId); + res.json({ success: true, data: result }); + })); + + // Cart management + router.get('/buyer/:buyerBusinessId/carts', asyncHandler(async (req, res) => { + const buyerBusinessId = parseInt(req.params.buyerBusinessId); + const result = await buyerPortal.getActiveCarts(buyerBusinessId); + res.json({ success: true, data: result }); + })); + + router.post('/buyer/:buyerBusinessId/cart', asyncHandler(async (req, res) => { + const buyerBusinessId = parseInt(req.params.buyerBusinessId); + const { sellerBrandBusinessId, state } = req.body; + const cart = await buyerPortal.getOrCreateCart(buyerBusinessId, sellerBrandBusinessId, state); + res.json({ success: true, data: cart }); + })); + + router.get('/buyer/cart/:cartId/items', asyncHandler(async (req, res) => { + const cartId = parseInt(req.params.cartId); + const items = await buyerPortal.getCartItems(cartId); + res.json({ success: true, data: items }); + })); + + router.post('/buyer/cart/:cartId/items', asyncHandler(async (req, res) => { + const cartId = parseInt(req.params.cartId); + const { catalogItemId, quantity, notes } = req.body; + const item = await buyerPortal.addToCart(cartId, catalogItemId, quantity, notes); + res.json({ success: true, data: item }); + })); + + router.patch('/buyer/cart/item/:itemId', asyncHandler(async (req, res) => { + const itemId = parseInt(req.params.itemId); + const { quantity, notes } = req.body; + const item = await buyerPortal.updateCartItem(itemId, quantity, notes); + res.json({ success: true, data: item }); + })); + + router.delete('/buyer/cart/item/:itemId', asyncHandler(async (req, res) => { + await buyerPortal.removeFromCart(parseInt(req.params.itemId)); + res.json({ success: true }); + })); + + // ============================================================================ + // INTELLIGENCE ROUTES + // ============================================================================ + + // Alerts + router.get('/intelligence/alerts', asyncHandler(async (req, res) => { + const { brandBusinessId, buyerBusinessId, alertType, severity, status, state, limit, offset } = req.query; + const result = await intelligence.getAlerts({ + brandBusinessId: brandBusinessId ? parseInt(brandBusinessId as string) : undefined, + buyerBusinessId: buyerBusinessId ? parseInt(buyerBusinessId as string) : undefined, + alertType: alertType as string, + severity: severity as string, + status: status as string, + state: state as string, + limit: limit ? parseInt(limit as string) : undefined, + offset: offset ? parseInt(offset as string) : undefined, + }); + res.json({ success: true, data: result }); + })); + + router.post('/intelligence/alerts/:alertId/acknowledge', asyncHandler(async (req, res) => { + const alertId = parseInt(req.params.alertId); + const userId = (req as any).user?.id || 1; // TODO: Get from auth + const alert = await intelligence.acknowledgeAlert(alertId, userId); + res.json({ success: true, data: alert }); + })); + + router.post('/intelligence/alerts/:alertId/resolve', asyncHandler(async (req, res) => { + const alertId = parseInt(req.params.alertId); + const alert = await intelligence.resolveAlert(alertId); + res.json({ success: true, data: alert }); + })); + + router.post('/intelligence/alerts/:alertId/dismiss', asyncHandler(async (req, res) => { + await intelligence.dismissAlert(parseInt(req.params.alertId)); + res.json({ success: true }); + })); + + // Recommendations + router.get('/intelligence/recommendations', asyncHandler(async (req, res) => { + const { brandBusinessId, buyerBusinessId, recommendationType, status, limit, offset } = req.query; + const result = await intelligence.getRecommendations({ + brandBusinessId: brandBusinessId ? parseInt(brandBusinessId as string) : undefined, + buyerBusinessId: buyerBusinessId ? parseInt(buyerBusinessId as string) : undefined, + recommendationType: recommendationType as string, + status: status as string, + limit: limit ? parseInt(limit as string) : undefined, + offset: offset ? parseInt(offset as string) : undefined, + }); + res.json({ success: true, data: result }); + })); + + router.post('/intelligence/recommendations/:recId/accept', asyncHandler(async (req, res) => { + const recId = parseInt(req.params.recId); + const userId = (req as any).user?.id || 1; + const rec = await intelligence.acceptRecommendation(recId, userId); + res.json({ success: true, data: rec }); + })); + + router.post('/intelligence/recommendations/:recId/reject', asyncHandler(async (req, res) => { + await intelligence.rejectRecommendation(parseInt(req.params.recId)); + res.json({ success: true }); + })); + + // Summaries + router.get('/intelligence/summaries', asyncHandler(async (req, res) => { + const { brandBusinessId, buyerBusinessId, summaryType, limit, offset } = req.query; + const summaries = await intelligence.getSummaries({ + brandBusinessId: brandBusinessId ? parseInt(brandBusinessId as string) : undefined, + buyerBusinessId: buyerBusinessId ? parseInt(buyerBusinessId as string) : undefined, + summaryType: summaryType as 'daily' | 'weekly' | 'monthly', + limit: limit ? parseInt(limit as string) : undefined, + offset: offset ? parseInt(offset as string) : undefined, + }); + res.json({ success: true, data: summaries }); + })); + + router.post('/intelligence/summaries/generate/:brandBusinessId', asyncHandler(async (req, res) => { + const brandBusinessId = parseInt(req.params.brandBusinessId); + const summary = await intelligence.generateDailySummary(brandBusinessId); + res.json({ success: true, data: summary }); + })); + + // Rules + router.get('/intelligence/rules', asyncHandler(async (req, res) => { + const { brandBusinessId, buyerBusinessId, ruleType, limit, offset } = req.query; + const rules = await intelligence.getRules({ + brandBusinessId: brandBusinessId ? parseInt(brandBusinessId as string) : undefined, + buyerBusinessId: buyerBusinessId ? parseInt(buyerBusinessId as string) : undefined, + ruleType: ruleType as 'alert' | 'recommendation', + limit: limit ? parseInt(limit as string) : undefined, + offset: offset ? parseInt(offset as string) : undefined, + }); + res.json({ success: true, data: rules }); + })); + + router.post('/intelligence/rules', asyncHandler(async (req, res) => { + const rule = await intelligence.createRule(req.body); + res.status(201).json({ success: true, data: rule }); + })); + + router.patch('/intelligence/rules/:ruleId', asyncHandler(async (req, res) => { + const rule = await intelligence.updateRule(parseInt(req.params.ruleId), req.body); + res.json({ success: true, data: rule }); + })); + + router.delete('/intelligence/rules/:ruleId', asyncHandler(async (req, res) => { + await intelligence.deleteRule(parseInt(req.params.ruleId)); + res.json({ success: true }); + })); + + // ============================================================================ + // MESSAGING ROUTES + // ============================================================================ + + // Threads + router.get('/messaging/threads', asyncHandler(async (req, res) => { + const { brandBusinessId, buyerBusinessId, threadType, status, userId, limit, offset } = req.query; + const result = await messaging.getThreads({ + brandBusinessId: brandBusinessId ? parseInt(brandBusinessId as string) : undefined, + buyerBusinessId: buyerBusinessId ? parseInt(buyerBusinessId as string) : undefined, + threadType: threadType as string, + status: status as string, + userId: userId ? parseInt(userId as string) : undefined, + limit: limit ? parseInt(limit as string) : undefined, + offset: offset ? parseInt(offset as string) : undefined, + }); + res.json({ success: true, data: result }); + })); + + router.post('/messaging/threads', asyncHandler(async (req, res) => { + const thread = await messaging.createThread(req.body); + res.status(201).json({ success: true, data: thread }); + })); + + router.get('/messaging/threads/:threadId', asyncHandler(async (req, res) => { + const thread = await messaging.getThread(parseInt(req.params.threadId)); + if (!thread) { + return res.status(404).json({ success: false, error: 'Thread not found' }); + } + res.json({ success: true, data: thread }); + })); + + router.patch('/messaging/threads/:threadId/status', asyncHandler(async (req, res) => { + const { status } = req.body; + await messaging.updateThreadStatus(parseInt(req.params.threadId), status); + res.json({ success: true }); + })); + + // Messages + router.get('/messaging/threads/:threadId/messages', asyncHandler(async (req, res) => { + const threadId = parseInt(req.params.threadId); + const { limit, offset } = req.query; + const result = await messaging.getMessages(threadId, { + limit: limit ? parseInt(limit as string) : undefined, + offset: offset ? parseInt(offset as string) : undefined, + }); + res.json({ success: true, data: result }); + })); + + router.post('/messaging/threads/:threadId/messages', asyncHandler(async (req, res) => { + const threadId = parseInt(req.params.threadId); + const { senderId, senderType, content, attachments } = req.body; + const message = await messaging.sendMessage(threadId, senderId, senderType, content, attachments); + res.status(201).json({ success: true, data: message }); + })); + + router.post('/messaging/threads/:threadId/read', asyncHandler(async (req, res) => { + const threadId = parseInt(req.params.threadId); + const userId = (req as any).user?.id || req.body.userId; + const count = await messaging.markMessagesAsRead(threadId, userId); + res.json({ success: true, data: { markedRead: count } }); + })); + + // Notifications + router.get('/notifications', asyncHandler(async (req, res) => { + const userId = (req as any).user?.id || parseInt(req.query.userId as string); + const { status, limit, offset } = req.query; + const result = await messaging.getNotifications(userId, { + status: status as string, + limit: limit ? parseInt(limit as string) : undefined, + offset: offset ? parseInt(offset as string) : undefined, + }); + res.json({ success: true, data: result }); + })); + + router.get('/notifications/unread-count', asyncHandler(async (req, res) => { + const userId = (req as any).user?.id || parseInt(req.query.userId as string); + const count = await messaging.getUnreadCount(userId); + res.json({ success: true, data: { count } }); + })); + + router.post('/notifications/:notificationId/read', asyncHandler(async (req, res) => { + const notificationId = parseInt(req.params.notificationId); + const userId = (req as any).user?.id || req.body.userId; + await messaging.markNotificationRead(notificationId, userId); + res.json({ success: true }); + })); + + router.post('/notifications/read-all', asyncHandler(async (req, res) => { + const userId = (req as any).user?.id || req.body.userId; + const count = await messaging.markAllNotificationsRead(userId); + res.json({ success: true, data: { markedRead: count } }); + })); + + // ============================================================================ + // ORDER ROUTES + // ============================================================================ + + // Orders list + router.get('/orders', asyncHandler(async (req, res) => { + const { buyerBusinessId, sellerBrandBusinessId, orderStatus, state, sortBy, sortDir, dateFrom, dateTo, limit, offset } = req.query; + const result = await orders.getOrders({ + buyerBusinessId: buyerBusinessId ? parseInt(buyerBusinessId as string) : undefined, + sellerBrandBusinessId: sellerBrandBusinessId ? parseInt(sellerBrandBusinessId as string) : undefined, + orderStatus: orderStatus as any, + state: state as string, + sortBy: sortBy as string, + sortDir: sortDir as 'asc' | 'desc', + dateFrom: dateFrom ? new Date(dateFrom as string) : undefined, + dateTo: dateTo ? new Date(dateTo as string) : undefined, + limit: limit ? parseInt(limit as string) : undefined, + offset: offset ? parseInt(offset as string) : undefined, + }); + res.json({ success: true, data: result }); + })); + + // Create order + router.post('/orders', asyncHandler(async (req, res) => { + const order = await orders.createOrder(req.body); + res.status(201).json({ success: true, data: order }); + })); + + router.post('/orders/from-cart/:cartId', asyncHandler(async (req, res) => { + const cartId = parseInt(req.params.cartId); + const order = await orders.createOrderFromCart(cartId, req.body); + res.status(201).json({ success: true, data: order }); + })); + + // Order details + router.get('/orders/:orderId', asyncHandler(async (req, res) => { + const order = await orders.getOrder(parseInt(req.params.orderId)); + if (!order) { + return res.status(404).json({ success: false, error: 'Order not found' }); + } + res.json({ success: true, data: order }); + })); + + router.get('/orders/:orderId/items', asyncHandler(async (req, res) => { + const items = await orders.getOrderItems(parseInt(req.params.orderId)); + res.json({ success: true, data: items }); + })); + + router.post('/orders/:orderId/items', asyncHandler(async (req, res) => { + const orderId = parseInt(req.params.orderId); + const item = await orders.addOrderItem(orderId, req.body); + res.status(201).json({ success: true, data: item }); + })); + + router.get('/orders/:orderId/history', asyncHandler(async (req, res) => { + const history = await orders.getStatusHistory(parseInt(req.params.orderId)); + res.json({ success: true, data: history }); + })); + + router.get('/orders/:orderId/documents', asyncHandler(async (req, res) => { + const docs = await orders.getOrderDocuments(parseInt(req.params.orderId)); + res.json({ success: true, data: docs }); + })); + + router.post('/orders/:orderId/documents', asyncHandler(async (req, res) => { + const orderId = parseInt(req.params.orderId); + const doc = await orders.addDocument(orderId, req.body); + res.status(201).json({ success: true, data: doc }); + })); + + // Order status transitions + router.post('/orders/:orderId/submit', asyncHandler(async (req, res) => { + const orderId = parseInt(req.params.orderId); + const userId = (req as any).user?.id; + const order = await orders.submitOrder(orderId, userId); + res.json({ success: true, data: order }); + })); + + router.post('/orders/:orderId/accept', asyncHandler(async (req, res) => { + const orderId = parseInt(req.params.orderId); + const userId = (req as any).user?.id; + const order = await orders.acceptOrder(orderId, userId, req.body.sellerNotes); + res.json({ success: true, data: order }); + })); + + router.post('/orders/:orderId/reject', asyncHandler(async (req, res) => { + const orderId = parseInt(req.params.orderId); + const userId = (req as any).user?.id; + const order = await orders.rejectOrder(orderId, userId, req.body.reason); + res.json({ success: true, data: order }); + })); + + router.post('/orders/:orderId/cancel', asyncHandler(async (req, res) => { + const orderId = parseInt(req.params.orderId); + const userId = (req as any).user?.id; + const order = await orders.cancelOrder(orderId, userId, req.body.reason); + res.json({ success: true, data: order }); + })); + + router.post('/orders/:orderId/process', asyncHandler(async (req, res) => { + const orderId = parseInt(req.params.orderId); + const userId = (req as any).user?.id; + const order = await orders.startProcessing(orderId, userId); + res.json({ success: true, data: order }); + })); + + router.post('/orders/:orderId/pack', asyncHandler(async (req, res) => { + const orderId = parseInt(req.params.orderId); + const userId = (req as any).user?.id; + const order = await orders.markPacked(orderId, userId); + res.json({ success: true, data: order }); + })); + + router.post('/orders/:orderId/ship', asyncHandler(async (req, res) => { + const orderId = parseInt(req.params.orderId); + const userId = (req as any).user?.id; + const order = await orders.markShipped(orderId, req.body, userId); + res.json({ success: true, data: order }); + })); + + router.post('/orders/:orderId/deliver', asyncHandler(async (req, res) => { + const orderId = parseInt(req.params.orderId); + const userId = (req as any).user?.id; + const order = await orders.markDelivered(orderId, userId); + res.json({ success: true, data: order }); + })); + + // ============================================================================ + // INVENTORY ROUTES + // ============================================================================ + + router.get('/inventory', asyncHandler(async (req, res) => { + const { brandId, state, inventoryStatus, lowStockOnly, sortBy, sortDir, limit, offset } = req.query; + const result = await inventory.getInventory({ + brandId: brandId ? parseInt(brandId as string) : undefined, + state: state as string, + inventoryStatus: inventoryStatus as string, + lowStockOnly: lowStockOnly === 'true', + sortBy: sortBy as string, + sortDir: sortDir as 'asc' | 'desc', + limit: limit ? parseInt(limit as string) : undefined, + offset: offset ? parseInt(offset as string) : undefined, + }); + res.json({ success: true, data: result }); + })); + + router.get('/inventory/:inventoryId', asyncHandler(async (req, res) => { + const item = await inventory.getInventoryItem(parseInt(req.params.inventoryId)); + if (!item) { + return res.status(404).json({ success: false, error: 'Inventory item not found' }); + } + res.json({ success: true, data: item }); + })); + + router.post('/inventory', asyncHandler(async (req, res) => { + const { brandId, catalogItemId, state, ...data } = req.body; + const item = await inventory.upsertInventory(brandId, catalogItemId, state, data); + res.json({ success: true, data: item }); + })); + + router.post('/inventory/:inventoryId/adjust', asyncHandler(async (req, res) => { + const inventoryId = parseInt(req.params.inventoryId); + const { quantityChange, changeType, orderId, reason } = req.body; + const changedBy = (req as any).user?.id; + const item = await inventory.adjustInventory(inventoryId, quantityChange, changeType, { orderId, reason, changedBy }); + res.json({ success: true, data: item }); + })); + + router.get('/inventory/:inventoryId/history', asyncHandler(async (req, res) => { + const inventoryId = parseInt(req.params.inventoryId); + const { limit, offset } = req.query; + const result = await inventory.getInventoryHistory(inventoryId, { + limit: limit ? parseInt(limit as string) : undefined, + offset: offset ? parseInt(offset as string) : undefined, + }); + res.json({ success: true, data: result }); + })); + + router.get('/inventory/brand/:brandId/summary', asyncHandler(async (req, res) => { + const brandId = parseInt(req.params.brandId); + const summary = await inventory.getInventorySummary(brandId); + res.json({ success: true, data: summary }); + })); + + router.get('/inventory/brand/:brandId/low-stock', asyncHandler(async (req, res) => { + const brandId = parseInt(req.params.brandId); + const { state } = req.query; + const items = await inventory.getLowStockItems(brandId, state as string); + res.json({ success: true, data: items }); + })); + + router.get('/inventory/brand/:brandId/out-of-stock', asyncHandler(async (req, res) => { + const brandId = parseInt(req.params.brandId); + const { state } = req.query; + const items = await inventory.getOutOfStockItems(brandId, state as string); + res.json({ success: true, data: items }); + })); + + // Sync + router.get('/inventory/sync/:brandBusinessId/history', asyncHandler(async (req, res) => { + const brandBusinessId = parseInt(req.params.brandBusinessId); + const { limit, offset } = req.query; + const logs = await inventory.getSyncHistory(brandBusinessId, { + limit: limit ? parseInt(limit as string) : undefined, + offset: offset ? parseInt(offset as string) : undefined, + }); + res.json({ success: true, data: logs }); + })); + + // ============================================================================ + // PRICING ROUTES + // ============================================================================ + + // Rules + router.get('/pricing/rules', asyncHandler(async (req, res) => { + const { brandBusinessId, ruleType, state, category, limit, offset } = req.query; + const result = await pricing.getRules({ + brandBusinessId: brandBusinessId ? parseInt(brandBusinessId as string) : undefined, + ruleType: ruleType as string, + state: state as string, + category: category as string, + limit: limit ? parseInt(limit as string) : undefined, + offset: offset ? parseInt(offset as string) : undefined, + }); + res.json({ success: true, data: result }); + })); + + router.post('/pricing/rules', asyncHandler(async (req, res) => { + const rule = await pricing.createRule(req.body); + res.status(201).json({ success: true, data: rule }); + })); + + router.get('/pricing/rules/:ruleId', asyncHandler(async (req, res) => { + const rule = await pricing.getRule(parseInt(req.params.ruleId)); + if (!rule) { + return res.status(404).json({ success: false, error: 'Pricing rule not found' }); + } + res.json({ success: true, data: rule }); + })); + + router.patch('/pricing/rules/:ruleId', asyncHandler(async (req, res) => { + const rule = await pricing.updateRule(parseInt(req.params.ruleId), req.body); + res.json({ success: true, data: rule }); + })); + + router.delete('/pricing/rules/:ruleId', asyncHandler(async (req, res) => { + await pricing.deleteRule(parseInt(req.params.ruleId)); + res.json({ success: true }); + })); + + router.post('/pricing/rules/:ruleId/toggle', asyncHandler(async (req, res) => { + const { enabled } = req.body; + await pricing.toggleRule(parseInt(req.params.ruleId), enabled); + res.json({ success: true }); + })); + + // Suggestions + router.get('/pricing/suggestions', asyncHandler(async (req, res) => { + const { brandBusinessId, suggestionStatus, state, category, limit, offset } = req.query; + const result = await pricing.getSuggestions({ + brandBusinessId: brandBusinessId ? parseInt(brandBusinessId as string) : undefined, + suggestionStatus: suggestionStatus as string, + state: state as string, + category: category as string, + limit: limit ? parseInt(limit as string) : undefined, + offset: offset ? parseInt(offset as string) : undefined, + }); + res.json({ success: true, data: result }); + })); + + router.post('/pricing/suggestions/:suggestionId/accept', asyncHandler(async (req, res) => { + const suggestionId = parseInt(req.params.suggestionId); + const userId = (req as any).user?.id || 1; + const suggestion = await pricing.acceptSuggestion(suggestionId, userId, req.body.notes); + res.json({ success: true, data: suggestion }); + })); + + router.post('/pricing/suggestions/:suggestionId/reject', asyncHandler(async (req, res) => { + const suggestionId = parseInt(req.params.suggestionId); + const userId = (req as any).user?.id || 1; + await pricing.rejectSuggestion(suggestionId, userId, req.body.notes); + res.json({ success: true }); + })); + + // History + router.get('/pricing/history/:catalogItemId', asyncHandler(async (req, res) => { + const catalogItemId = parseInt(req.params.catalogItemId); + const { limit, state } = req.query; + const history = await pricing.getPricingHistory(catalogItemId, { + limit: limit ? parseInt(limit as string) : undefined, + state: state as string, + }); + res.json({ success: true, data: history }); + })); + + // Competitive analysis + router.get('/pricing/competitive/:brandBusinessId', asyncHandler(async (req, res) => { + const brandBusinessId = parseInt(req.params.brandBusinessId); + const { state } = req.query; + if (!state) { + return res.status(400).json({ success: false, error: 'State is required' }); + } + const analysis = await pricing.analyzeCompetitivePricing(brandBusinessId, state as string); + res.json({ success: true, data: analysis }); + })); + + // Rule evaluation + router.post('/pricing/evaluate/:catalogItemId', asyncHandler(async (req, res) => { + const catalogItemId = parseInt(req.params.catalogItemId); + const { state } = req.body; + if (!state) { + return res.status(400).json({ success: false, error: 'State is required' }); + } + const suggestions = await pricing.evaluateRulesForProduct(catalogItemId, state); + res.json({ success: true, data: suggestions }); + })); + + return router; +} diff --git a/backend/src/portals/services/brand-portal.ts b/backend/src/portals/services/brand-portal.ts new file mode 100644 index 00000000..9fc5d297 --- /dev/null +++ b/backend/src/portals/services/brand-portal.ts @@ -0,0 +1,788 @@ +/** + * Brand Portal Service + * Phase 6: Brand Portal Dashboard, Catalog Management, Analytics + */ + +import { Pool } from 'pg'; +import { + BrandBusiness, + BrandCatalogItem, + BrandCatalogDistribution, + BrandDashboardMetrics, + PortalQueryOptions, + TopPerformer, +} from '../types'; + +export class BrandPortalService { + constructor(private pool: Pool) {} + + // ============================================================================ + // BUSINESS MANAGEMENT + // ============================================================================ + + async getBrandBusiness(brandBusinessId: number): Promise { + const result = await this.pool.query( + `SELECT + bb.id, + bb.brand_id AS "brandId", + b.name AS "brandName", + bb.company_name AS "companyName", + bb.contact_email AS "contactEmail", + bb.contact_phone AS "contactPhone", + bb.billing_address AS "billingAddress", + bb.onboarded_at AS "onboardedAt", + bb.subscription_tier AS "subscriptionTier", + bb.subscription_expires_at AS "subscriptionExpiresAt", + bb.settings, + bb.states, + bb.is_active AS "isActive", + bb.created_at AS "createdAt", + bb.updated_at AS "updatedAt" + FROM brand_businesses bb + JOIN brands b ON bb.brand_id = b.id + WHERE bb.id = $1`, + [brandBusinessId] + ); + return result.rows[0] || null; + } + + async getBrandBusinessByBrandId(brandId: number): Promise { + const result = await this.pool.query( + `SELECT + bb.id, + bb.brand_id AS "brandId", + b.name AS "brandName", + bb.company_name AS "companyName", + bb.contact_email AS "contactEmail", + bb.contact_phone AS "contactPhone", + bb.billing_address AS "billingAddress", + bb.onboarded_at AS "onboardedAt", + bb.subscription_tier AS "subscriptionTier", + bb.subscription_expires_at AS "subscriptionExpiresAt", + bb.settings, + bb.states, + bb.is_active AS "isActive", + bb.created_at AS "createdAt", + bb.updated_at AS "updatedAt" + FROM brand_businesses bb + JOIN brands b ON bb.brand_id = b.id + WHERE bb.brand_id = $1`, + [brandId] + ); + return result.rows[0] || null; + } + + async updateBrandBusiness( + brandBusinessId: number, + updates: Partial + ): Promise { + const setClauses: string[] = []; + const values: any[] = []; + let paramIndex = 1; + + if (updates.companyName !== undefined) { + setClauses.push(`company_name = $${paramIndex++}`); + values.push(updates.companyName); + } + if (updates.contactEmail !== undefined) { + setClauses.push(`contact_email = $${paramIndex++}`); + values.push(updates.contactEmail); + } + if (updates.contactPhone !== undefined) { + setClauses.push(`contact_phone = $${paramIndex++}`); + values.push(updates.contactPhone); + } + if (updates.billingAddress !== undefined) { + setClauses.push(`billing_address = $${paramIndex++}`); + values.push(JSON.stringify(updates.billingAddress)); + } + if (updates.settings !== undefined) { + setClauses.push(`settings = $${paramIndex++}`); + values.push(JSON.stringify(updates.settings)); + } + if (updates.states !== undefined) { + setClauses.push(`states = $${paramIndex++}`); + values.push(updates.states); + } + + if (setClauses.length === 0) { + return this.getBrandBusiness(brandBusinessId); + } + + setClauses.push('updated_at = NOW()'); + values.push(brandBusinessId); + + await this.pool.query( + `UPDATE brand_businesses SET ${setClauses.join(', ')} WHERE id = $${paramIndex}`, + values + ); + + return this.getBrandBusiness(brandBusinessId); + } + + // ============================================================================ + // DASHBOARD + // ============================================================================ + + async getDashboardMetrics(brandBusinessId: number): Promise { + // Get brand ID for this business + const businessResult = await this.pool.query( + `SELECT brand_id FROM brand_businesses WHERE id = $1`, + [brandBusinessId] + ); + const brandId = businessResult.rows[0]?.brand_id; + + if (!brandId) { + throw new Error(`Brand business ${brandBusinessId} not found`); + } + + // Execute multiple queries in parallel + const [ + productStats, + orderStats, + presenceStats, + alertStats, + messageStats, + topProducts, + revenueByState, + orderTrend, + ] = await Promise.all([ + // Product stats + this.pool.query( + `SELECT + COUNT(*) AS "totalProducts", + COUNT(*) FILTER (WHERE is_active = TRUE) AS "activeProducts" + FROM brand_catalog_items + WHERE brand_id = $1`, + [brandId] + ), + + // Order stats + this.pool.query( + `SELECT + COUNT(*) AS "totalOrders", + COUNT(*) FILTER (WHERE status IN ('submitted', 'accepted', 'processing')) AS "pendingOrders", + COALESCE(SUM(total), 0) AS "totalRevenue", + COALESCE(SUM(total) FILTER (WHERE submitted_at >= DATE_TRUNC('month', NOW())), 0) AS "revenueThisMonth" + FROM orders + WHERE seller_brand_business_id = $1`, + [brandBusinessId] + ), + + // Store presence and states + this.pool.query( + `SELECT + COUNT(DISTINCT sp.dispensary_id) AS "storePresence", + COUNT(DISTINCT d.state) AS "statesCovered" + FROM store_products sp + JOIN dispensaries d ON sp.dispensary_id = d.id + WHERE sp.brand_id = $1`, + [brandId] + ), + + // Alert stats + this.pool.query( + `SELECT + COUNT(*) FILTER (WHERE status = 'new') AS "activeAlerts" + FROM intelligence_alerts + WHERE brand_business_id = $1`, + [brandBusinessId] + ), + + // Message stats + this.pool.query( + `SELECT + COUNT(*) AS "unreadMessages" + FROM messages m + JOIN message_threads t ON m.thread_id = t.id + WHERE t.brand_business_id = $1 AND m.is_read = FALSE AND m.sender_type != 'brand'`, + [brandBusinessId] + ), + + // Top products + this.pool.query( + `SELECT + sp.id, + sp.name, + COUNT(DISTINCT sp.dispensary_id) AS value + FROM store_products sp + WHERE sp.brand_id = $1 + GROUP BY sp.id, sp.name + ORDER BY value DESC + LIMIT 5`, + [brandId] + ), + + // Revenue by state + this.pool.query( + `SELECT + o.state, + COALESCE(SUM(o.total), 0) AS revenue + FROM orders o + WHERE o.seller_brand_business_id = $1 AND o.status NOT IN ('cancelled', 'rejected', 'draft') + GROUP BY o.state + ORDER BY revenue DESC`, + [brandBusinessId] + ), + + // Order trend (last 30 days) + this.pool.query( + `SELECT + DATE(submitted_at) AS date, + COUNT(*) AS orders, + COALESCE(SUM(total), 0) AS revenue + FROM orders + WHERE seller_brand_business_id = $1 + AND submitted_at >= NOW() - INTERVAL '30 days' + AND status NOT IN ('cancelled', 'rejected', 'draft') + GROUP BY DATE(submitted_at) + ORDER BY date`, + [brandBusinessId] + ), + ]); + + // Get low stock and pending price suggestions + const [lowStockResult, priceSuggResult] = await Promise.all([ + this.pool.query( + `SELECT COUNT(*) AS count + FROM brand_inventory bi + WHERE bi.brand_id = $1 AND bi.quantity_on_hand <= bi.reorder_point`, + [brandId] + ), + this.pool.query( + `SELECT COUNT(*) AS count + FROM pricing_suggestions ps + WHERE ps.brand_business_id = $1 AND ps.status = 'pending'`, + [brandBusinessId] + ), + ]); + + const topProductsList: TopPerformer[] = topProducts.rows.map((row: any, idx: number) => ({ + type: 'product' as const, + id: row.id, + name: row.name, + value: parseInt(row.value), + rank: idx + 1, + })); + + return { + totalProducts: parseInt(productStats.rows[0]?.totalProducts || '0'), + activeProducts: parseInt(productStats.rows[0]?.activeProducts || '0'), + totalOrders: parseInt(orderStats.rows[0]?.totalOrders || '0'), + pendingOrders: parseInt(orderStats.rows[0]?.pendingOrders || '0'), + totalRevenue: parseFloat(orderStats.rows[0]?.totalRevenue || '0'), + revenueThisMonth: parseFloat(orderStats.rows[0]?.revenueThisMonth || '0'), + storePresence: parseInt(presenceStats.rows[0]?.storePresence || '0'), + statesCovered: parseInt(presenceStats.rows[0]?.statesCovered || '0'), + lowStockAlerts: parseInt(lowStockResult.rows[0]?.count || '0'), + pendingPriceSuggestions: parseInt(priceSuggResult.rows[0]?.count || '0'), + unreadMessages: parseInt(messageStats.rows[0]?.unreadMessages || '0'), + activeAlerts: parseInt(alertStats.rows[0]?.activeAlerts || '0'), + topProducts: topProductsList, + revenueByState: revenueByState.rows.map((row: any) => ({ + state: row.state, + revenue: parseFloat(row.revenue), + })), + orderTrend: orderTrend.rows.map((row: any) => ({ + date: row.date, + orders: parseInt(row.orders), + revenue: parseFloat(row.revenue), + })), + }; + } + + // ============================================================================ + // CATALOG MANAGEMENT + // ============================================================================ + + async getCatalogItems( + brandBusinessId: number, + options: PortalQueryOptions = {} + ): Promise<{ items: BrandCatalogItem[]; total: number }> { + const { limit = 50, offset = 0, sortBy = 'name', sortDir = 'asc', category, search } = options; + + // Get brand ID + const businessResult = await this.pool.query( + `SELECT brand_id FROM brand_businesses WHERE id = $1`, + [brandBusinessId] + ); + const brandId = businessResult.rows[0]?.brand_id; + + if (!brandId) { + return { items: [], total: 0 }; + } + + const conditions: string[] = ['brand_id = $1']; + const values: any[] = [brandId]; + let paramIndex = 2; + + if (category) { + conditions.push(`category = $${paramIndex++}`); + values.push(category); + } + + if (search) { + conditions.push(`(name ILIKE $${paramIndex} OR sku ILIKE $${paramIndex})`); + values.push(`%${search}%`); + paramIndex++; + } + + const whereClause = conditions.join(' AND '); + const validSortColumns = ['name', 'sku', 'category', 'msrp', 'created_at', 'updated_at']; + const sortColumn = validSortColumns.includes(sortBy) ? sortBy : 'name'; + const sortDirection = sortDir === 'desc' ? 'DESC' : 'ASC'; + + const [itemsResult, countResult] = await Promise.all([ + this.pool.query( + `SELECT + id, + brand_id AS "brandId", + sku, + name, + description, + category, + subcategory, + thc_content AS "thcContent", + cbd_content AS "cbdContent", + terpene_profile AS "terpeneProfile", + strain_type AS "strainType", + weight, + weight_unit AS "weightUnit", + image_url AS "imageUrl", + additional_images AS "additionalImages", + msrp, + wholesale_price AS "wholesalePrice", + cogs, + is_active AS "isActive", + available_states AS "availableStates", + created_at AS "createdAt", + updated_at AS "updatedAt" + FROM brand_catalog_items + WHERE ${whereClause} + ORDER BY ${sortColumn} ${sortDirection} + LIMIT $${paramIndex} OFFSET $${paramIndex + 1}`, + [...values, limit, offset] + ), + this.pool.query( + `SELECT COUNT(*) AS total FROM brand_catalog_items WHERE ${whereClause}`, + values + ), + ]); + + return { + items: itemsResult.rows, + total: parseInt(countResult.rows[0]?.total || '0'), + }; + } + + async getCatalogItem(catalogItemId: number): Promise { + const result = await this.pool.query( + `SELECT + id, + brand_id AS "brandId", + sku, + name, + description, + category, + subcategory, + thc_content AS "thcContent", + cbd_content AS "cbdContent", + terpene_profile AS "terpeneProfile", + strain_type AS "strainType", + weight, + weight_unit AS "weightUnit", + image_url AS "imageUrl", + additional_images AS "additionalImages", + msrp, + wholesale_price AS "wholesalePrice", + cogs, + is_active AS "isActive", + available_states AS "availableStates", + created_at AS "createdAt", + updated_at AS "updatedAt" + FROM brand_catalog_items + WHERE id = $1`, + [catalogItemId] + ); + return result.rows[0] || null; + } + + async createCatalogItem( + brandBusinessId: number, + item: Omit + ): Promise { + // Get brand ID + const businessResult = await this.pool.query( + `SELECT brand_id FROM brand_businesses WHERE id = $1`, + [brandBusinessId] + ); + const brandId = businessResult.rows[0]?.brand_id; + + if (!brandId) { + throw new Error(`Brand business ${brandBusinessId} not found`); + } + + const result = await this.pool.query( + `INSERT INTO brand_catalog_items ( + brand_id, sku, name, description, category, subcategory, + thc_content, cbd_content, terpene_profile, strain_type, + weight, weight_unit, image_url, additional_images, + msrp, wholesale_price, cogs, is_active, available_states + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19) + RETURNING + id, + brand_id AS "brandId", + sku, name, description, category, subcategory, + thc_content AS "thcContent", + cbd_content AS "cbdContent", + terpene_profile AS "terpeneProfile", + strain_type AS "strainType", + weight, weight_unit AS "weightUnit", + image_url AS "imageUrl", + additional_images AS "additionalImages", + msrp, wholesale_price AS "wholesalePrice", cogs, + is_active AS "isActive", + available_states AS "availableStates", + created_at AS "createdAt", + updated_at AS "updatedAt"`, + [ + brandId, + item.sku, + item.name, + item.description, + item.category, + item.subcategory, + item.thcContent, + item.cbdContent, + item.terpeneProfile ? JSON.stringify(item.terpeneProfile) : null, + item.strainType, + item.weight, + item.weightUnit, + item.imageUrl, + item.additionalImages || [], + item.msrp, + item.wholesalePrice, + item.cogs, + item.isActive ?? true, + item.availableStates || [], + ] + ); + + return result.rows[0]; + } + + async updateCatalogItem( + catalogItemId: number, + updates: Partial + ): Promise { + const setClauses: string[] = []; + const values: any[] = []; + let paramIndex = 1; + + const fieldMap: Record = { + sku: 'sku', + name: 'name', + description: 'description', + category: 'category', + subcategory: 'subcategory', + thcContent: 'thc_content', + cbdContent: 'cbd_content', + terpeneProfile: 'terpene_profile', + strainType: 'strain_type', + weight: 'weight', + weightUnit: 'weight_unit', + imageUrl: 'image_url', + additionalImages: 'additional_images', + msrp: 'msrp', + wholesalePrice: 'wholesale_price', + cogs: 'cogs', + isActive: 'is_active', + availableStates: 'available_states', + }; + + for (const [key, dbField] of Object.entries(fieldMap)) { + if ((updates as any)[key] !== undefined) { + setClauses.push(`${dbField} = $${paramIndex++}`); + let value = (updates as any)[key]; + if (key === 'terpeneProfile' && value) { + value = JSON.stringify(value); + } + values.push(value); + } + } + + if (setClauses.length === 0) { + return this.getCatalogItem(catalogItemId); + } + + setClauses.push('updated_at = NOW()'); + values.push(catalogItemId); + + await this.pool.query( + `UPDATE brand_catalog_items SET ${setClauses.join(', ')} WHERE id = $${paramIndex}`, + values + ); + + return this.getCatalogItem(catalogItemId); + } + + async deleteCatalogItem(catalogItemId: number): Promise { + const result = await this.pool.query( + `DELETE FROM brand_catalog_items WHERE id = $1`, + [catalogItemId] + ); + return (result.rowCount ?? 0) > 0; + } + + // ============================================================================ + // CATALOG DISTRIBUTION + // ============================================================================ + + async getDistributionByItem(catalogItemId: number): Promise { + const result = await this.pool.query( + `SELECT + id, + catalog_item_id AS "catalogItemId", + state, + is_available AS "isAvailable", + custom_msrp AS "customMsrp", + custom_wholesale AS "customWholesale", + min_order_qty AS "minOrderQty", + lead_time_days AS "leadTimeDays", + notes, + created_at AS "createdAt", + updated_at AS "updatedAt" + FROM brand_catalog_distribution + WHERE catalog_item_id = $1 + ORDER BY state`, + [catalogItemId] + ); + return result.rows; + } + + async upsertDistribution( + catalogItemId: number, + state: string, + distribution: Partial + ): Promise { + const result = await this.pool.query( + `INSERT INTO brand_catalog_distribution ( + catalog_item_id, state, is_available, custom_msrp, custom_wholesale, + min_order_qty, lead_time_days, notes + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8) + ON CONFLICT (catalog_item_id, state) DO UPDATE SET + is_available = EXCLUDED.is_available, + custom_msrp = EXCLUDED.custom_msrp, + custom_wholesale = EXCLUDED.custom_wholesale, + min_order_qty = EXCLUDED.min_order_qty, + lead_time_days = EXCLUDED.lead_time_days, + notes = EXCLUDED.notes, + updated_at = NOW() + RETURNING + id, + catalog_item_id AS "catalogItemId", + state, + is_available AS "isAvailable", + custom_msrp AS "customMsrp", + custom_wholesale AS "customWholesale", + min_order_qty AS "minOrderQty", + lead_time_days AS "leadTimeDays", + notes, + created_at AS "createdAt", + updated_at AS "updatedAt"`, + [ + catalogItemId, + state, + distribution.isAvailable ?? true, + distribution.customMsrp, + distribution.customWholesale, + distribution.minOrderQty ?? 1, + distribution.leadTimeDays ?? 7, + distribution.notes, + ] + ); + return result.rows[0]; + } + + // ============================================================================ + // STORE PRESENCE ANALYTICS + // ============================================================================ + + async getStorePresence( + brandBusinessId: number, + options: PortalQueryOptions = {} + ): Promise<{ + stores: { + dispensaryId: number; + dispensaryName: string; + state: string; + city: string; + productCount: number; + lastSeenAt: Date | null; + }[]; + total: number; + }> { + const { limit = 50, offset = 0, state, sortBy = 'productCount', sortDir = 'desc' } = options; + + // Get brand ID + const businessResult = await this.pool.query( + `SELECT brand_id FROM brand_businesses WHERE id = $1`, + [brandBusinessId] + ); + const brandId = businessResult.rows[0]?.brand_id; + + if (!brandId) { + return { stores: [], total: 0 }; + } + + const conditions: string[] = ['sp.brand_id = $1']; + const values: any[] = [brandId]; + let paramIndex = 2; + + if (state) { + conditions.push(`d.state = $${paramIndex++}`); + values.push(state); + } + + const whereClause = conditions.join(' AND '); + const validSortColumns = ['productCount', 'dispensaryName', 'state', 'lastSeenAt']; + const sortColumn = validSortColumns.includes(sortBy) ? + (sortBy === 'productCount' ? 'product_count' : + sortBy === 'dispensaryName' ? 'd.name' : + sortBy === 'lastSeenAt' ? 'last_seen_at' : 'd.state') : 'product_count'; + const sortDirection = sortDir === 'asc' ? 'ASC' : 'DESC'; + + const [storesResult, countResult] = await Promise.all([ + this.pool.query( + `SELECT + d.id AS "dispensaryId", + d.name AS "dispensaryName", + d.state, + d.city, + COUNT(sp.id) AS "productCount", + MAX(sp.updated_at) AS "lastSeenAt" + FROM store_products sp + JOIN dispensaries d ON sp.dispensary_id = d.id + WHERE ${whereClause} + GROUP BY d.id, d.name, d.state, d.city + ORDER BY ${sortColumn} ${sortDirection} + LIMIT $${paramIndex} OFFSET $${paramIndex + 1}`, + [...values, limit, offset] + ), + this.pool.query( + `SELECT COUNT(DISTINCT d.id) AS total + FROM store_products sp + JOIN dispensaries d ON sp.dispensary_id = d.id + WHERE ${whereClause}`, + values + ), + ]); + + return { + stores: storesResult.rows.map((row: any) => ({ + dispensaryId: row.dispensaryId, + dispensaryName: row.dispensaryName, + state: row.state, + city: row.city, + productCount: parseInt(row.productCount), + lastSeenAt: row.lastSeenAt, + })), + total: parseInt(countResult.rows[0]?.total || '0'), + }; + } + + // ============================================================================ + // COMPETITIVE ANALYSIS + // ============================================================================ + + async getCompetitorAnalysis( + brandBusinessId: number, + options: { state?: string; category?: string } = {} + ): Promise<{ + competitors: { + brandId: number; + brandName: string; + storeCount: number; + productCount: number; + avgPrice: number | null; + overlap: number; + }[]; + }> { + // Get brand ID + const businessResult = await this.pool.query( + `SELECT brand_id FROM brand_businesses WHERE id = $1`, + [brandBusinessId] + ); + const brandId = businessResult.rows[0]?.brand_id; + + if (!brandId) { + return { competitors: [] }; + } + + const conditions: string[] = ['sp.brand_id != $1']; + const values: any[] = [brandId]; + let paramIndex = 2; + + if (options.state) { + conditions.push(`d.state = $${paramIndex++}`); + values.push(options.state); + } + + if (options.category) { + conditions.push(`sp.category = $${paramIndex++}`); + values.push(options.category); + } + + // Get stores where our brand is present + const ourStoresResult = await this.pool.query( + `SELECT DISTINCT sp.dispensary_id + FROM store_products sp + JOIN dispensaries d ON sp.dispensary_id = d.id + WHERE sp.brand_id = $1 ${options.state ? `AND d.state = $2` : ''}`, + options.state ? [brandId, options.state] : [brandId] + ); + const ourStoreIds = ourStoresResult.rows.map((r: any) => r.dispensary_id); + + const whereClause = conditions.join(' AND '); + + const result = await this.pool.query( + `SELECT + b.id AS "brandId", + b.name AS "brandName", + COUNT(DISTINCT sp.dispensary_id) AS "storeCount", + COUNT(sp.id) AS "productCount", + AVG(sp.price_rec) AS "avgPrice" + FROM store_products sp + JOIN dispensaries d ON sp.dispensary_id = d.id + JOIN brands b ON sp.brand_id = b.id + WHERE ${whereClause} + GROUP BY b.id, b.name + ORDER BY "storeCount" DESC + LIMIT 20`, + values + ); + + // Calculate overlap for each competitor + const competitors = await Promise.all( + result.rows.map(async (row: any) => { + const overlapResult = await this.pool.query( + `SELECT COUNT(DISTINCT dispensary_id) AS overlap + FROM store_products + WHERE brand_id = $1 AND dispensary_id = ANY($2)`, + [row.brandId, ourStoreIds] + ); + + return { + brandId: row.brandId, + brandName: row.brandName, + storeCount: parseInt(row.storeCount), + productCount: parseInt(row.productCount), + avgPrice: row.avgPrice ? parseFloat(row.avgPrice) : null, + overlap: parseInt(overlapResult.rows[0]?.overlap || '0'), + }; + }) + ); + + return { competitors }; + } +} diff --git a/backend/src/portals/services/buyer-portal.ts b/backend/src/portals/services/buyer-portal.ts new file mode 100644 index 00000000..1d86edd9 --- /dev/null +++ b/backend/src/portals/services/buyer-portal.ts @@ -0,0 +1,711 @@ +/** + * Buyer Portal Service + * Phase 6: Buyer Dashboard, Discovery Feed, Cart Management + */ + +import { Pool } from 'pg'; +import { + BuyerBusiness, + BuyerDashboardMetrics, + BuyerCart, + CartItem, + DiscoveryFeedItem, + BrandCatalogItem, + Order, + IntelligenceAlert, + PortalQueryOptions, +} from '../types'; + +export class BuyerPortalService { + constructor(private pool: Pool) {} + + // ============================================================================ + // BUSINESS MANAGEMENT + // ============================================================================ + + async getBuyerBusiness(buyerBusinessId: number): Promise { + const result = await this.pool.query( + `SELECT + bb.id, + bb.dispensary_id AS "dispensaryId", + d.name AS "dispensaryName", + bb.company_name AS "companyName", + bb.contact_email AS "contactEmail", + bb.contact_phone AS "contactPhone", + bb.billing_address AS "billingAddress", + bb.shipping_addresses AS "shippingAddresses", + bb.license_number AS "licenseNumber", + bb.license_expires_at AS "licenseExpiresAt", + bb.onboarded_at AS "onboardedAt", + bb.subscription_tier AS "subscriptionTier", + bb.settings, + bb.states, + bb.is_active AS "isActive", + bb.created_at AS "createdAt", + bb.updated_at AS "updatedAt" + FROM buyer_businesses bb + JOIN dispensaries d ON bb.dispensary_id = d.id + WHERE bb.id = $1`, + [buyerBusinessId] + ); + return result.rows[0] || null; + } + + async getBuyerBusinessByDispensaryId(dispensaryId: number): Promise { + const result = await this.pool.query( + `SELECT + bb.id, + bb.dispensary_id AS "dispensaryId", + d.name AS "dispensaryName", + bb.company_name AS "companyName", + bb.contact_email AS "contactEmail", + bb.contact_phone AS "contactPhone", + bb.billing_address AS "billingAddress", + bb.shipping_addresses AS "shippingAddresses", + bb.license_number AS "licenseNumber", + bb.license_expires_at AS "licenseExpiresAt", + bb.onboarded_at AS "onboardedAt", + bb.subscription_tier AS "subscriptionTier", + bb.settings, + bb.states, + bb.is_active AS "isActive", + bb.created_at AS "createdAt", + bb.updated_at AS "updatedAt" + FROM buyer_businesses bb + JOIN dispensaries d ON bb.dispensary_id = d.id + WHERE bb.dispensary_id = $1`, + [dispensaryId] + ); + return result.rows[0] || null; + } + + async updateBuyerBusiness( + buyerBusinessId: number, + updates: Partial + ): Promise { + const setClauses: string[] = []; + const values: any[] = []; + let paramIndex = 1; + + if (updates.companyName !== undefined) { + setClauses.push(`company_name = $${paramIndex++}`); + values.push(updates.companyName); + } + if (updates.contactEmail !== undefined) { + setClauses.push(`contact_email = $${paramIndex++}`); + values.push(updates.contactEmail); + } + if (updates.contactPhone !== undefined) { + setClauses.push(`contact_phone = $${paramIndex++}`); + values.push(updates.contactPhone); + } + if (updates.billingAddress !== undefined) { + setClauses.push(`billing_address = $${paramIndex++}`); + values.push(JSON.stringify(updates.billingAddress)); + } + if (updates.shippingAddresses !== undefined) { + setClauses.push(`shipping_addresses = $${paramIndex++}`); + values.push(JSON.stringify(updates.shippingAddresses)); + } + if (updates.licenseNumber !== undefined) { + setClauses.push(`license_number = $${paramIndex++}`); + values.push(updates.licenseNumber); + } + if (updates.licenseExpiresAt !== undefined) { + setClauses.push(`license_expires_at = $${paramIndex++}`); + values.push(updates.licenseExpiresAt); + } + if (updates.settings !== undefined) { + setClauses.push(`settings = $${paramIndex++}`); + values.push(JSON.stringify(updates.settings)); + } + if (updates.states !== undefined) { + setClauses.push(`states = $${paramIndex++}`); + values.push(updates.states); + } + + if (setClauses.length === 0) { + return this.getBuyerBusiness(buyerBusinessId); + } + + setClauses.push('updated_at = NOW()'); + values.push(buyerBusinessId); + + await this.pool.query( + `UPDATE buyer_businesses SET ${setClauses.join(', ')} WHERE id = $${paramIndex}`, + values + ); + + return this.getBuyerBusiness(buyerBusinessId); + } + + // ============================================================================ + // DASHBOARD + // ============================================================================ + + async getDashboardMetrics(buyerBusinessId: number): Promise { + const [ + orderStats, + cartStats, + messageStats, + discoveryStats, + recentOrders, + pricingAlerts, + ] = await Promise.all([ + // Order stats + this.pool.query( + `SELECT + COUNT(*) AS "totalOrders", + COUNT(*) FILTER (WHERE status IN ('submitted', 'accepted', 'processing', 'packed', 'shipped')) AS "pendingOrders", + COALESCE(SUM(total), 0) AS "totalSpent", + COALESCE(SUM(total) FILTER (WHERE submitted_at >= DATE_TRUNC('month', NOW())), 0) AS "spentThisMonth", + COALESCE(SUM(discount_amount), 0) AS "savedAmount" + FROM orders + WHERE buyer_business_id = $1 AND status NOT IN ('cancelled', 'rejected', 'draft')`, + [buyerBusinessId] + ), + + // Cart stats + this.pool.query( + `SELECT + COALESCE(SUM(ci.quantity), 0) AS "cartItems", + COALESCE(SUM(ci.quantity * ci.unit_price), 0) AS "cartValue" + FROM buyer_carts bc + JOIN cart_items ci ON bc.id = ci.cart_id + WHERE bc.buyer_business_id = $1 AND bc.status = 'active'`, + [buyerBusinessId] + ), + + // Message stats + this.pool.query( + `SELECT COUNT(*) AS "unreadMessages" + FROM messages m + JOIN message_threads t ON m.thread_id = t.id + WHERE t.buyer_business_id = $1 AND m.is_read = FALSE AND m.sender_type != 'buyer'`, + [buyerBusinessId] + ), + + // Discovery stats + this.pool.query( + `SELECT COUNT(*) AS "newItems" + FROM discovery_feed_items + WHERE is_active = TRUE + AND (target_buyer_business_ids IS NULL OR $1 = ANY(target_buyer_business_ids)) + AND created_at >= NOW() - INTERVAL '7 days'`, + [buyerBusinessId] + ), + + // Recent orders + this.pool.query( + `SELECT + id, order_number AS "orderNumber", + seller_brand_business_id AS "sellerBrandBusinessId", + state, total, status, + submitted_at AS "submittedAt", + created_at AS "createdAt" + FROM orders + WHERE buyer_business_id = $1 + ORDER BY created_at DESC + LIMIT 5`, + [buyerBusinessId] + ), + + // Pricing alerts + this.pool.query( + `SELECT + id, alert_type AS "alertType", severity, title, description, + data, status, created_at AS "createdAt" + FROM intelligence_alerts + WHERE buyer_business_id = $1 AND status = 'new' AND alert_type LIKE 'price_%' + ORDER BY created_at DESC + LIMIT 5`, + [buyerBusinessId] + ), + ]); + + // Get followed brands count + const business = await this.getBuyerBusiness(buyerBusinessId); + const brandsFollowed = business?.settings?.preferredBrands?.length || 0; + + return { + totalOrders: parseInt(orderStats.rows[0]?.totalOrders || '0'), + pendingOrders: parseInt(orderStats.rows[0]?.pendingOrders || '0'), + totalSpent: parseFloat(orderStats.rows[0]?.totalSpent || '0'), + spentThisMonth: parseFloat(orderStats.rows[0]?.spentThisMonth || '0'), + savedAmount: parseFloat(orderStats.rows[0]?.savedAmount || '0'), + brandsFollowed, + cartItems: parseInt(cartStats.rows[0]?.cartItems || '0'), + cartValue: parseFloat(cartStats.rows[0]?.cartValue || '0'), + unreadMessages: parseInt(messageStats.rows[0]?.unreadMessages || '0'), + newDiscoveryItems: parseInt(discoveryStats.rows[0]?.newItems || '0'), + recentOrders: recentOrders.rows, + recommendedProducts: [], // TODO: Implement recommendation engine + pricingAlerts: pricingAlerts.rows, + }; + } + + // ============================================================================ + // DISCOVERY FEED + // ============================================================================ + + async getDiscoveryFeed( + buyerBusinessId: number, + options: PortalQueryOptions = {} + ): Promise<{ items: DiscoveryFeedItem[]; total: number }> { + const { limit = 20, offset = 0, category } = options; + + // Get buyer's state + const buyerResult = await this.pool.query( + `SELECT states FROM buyer_businesses WHERE id = $1`, + [buyerBusinessId] + ); + const buyerStates = buyerResult.rows[0]?.states || []; + + const conditions: string[] = [ + 'is_active = TRUE', + 'starts_at <= NOW()', + '(expires_at IS NULL OR expires_at > NOW())', + `(target_buyer_business_ids IS NULL OR $1 = ANY(target_buyer_business_ids))`, + ]; + const values: any[] = [buyerBusinessId]; + let paramIndex = 2; + + if (buyerStates.length > 0) { + conditions.push(`state = ANY($${paramIndex++})`); + values.push(buyerStates); + } + + if (category) { + conditions.push(`(category = $${paramIndex} OR $${paramIndex} = ANY(target_categories))`); + values.push(category); + paramIndex++; + } + + const whereClause = conditions.join(' AND '); + + const [itemsResult, countResult] = await Promise.all([ + this.pool.query( + `SELECT + id, + item_type AS "itemType", + state, + brand_id AS "brandId", + catalog_item_id AS "catalogItemId", + category, + title, + description, + image_url AS "imageUrl", + data, + priority, + is_featured AS "isFeatured", + cta_text AS "ctaText", + cta_url AS "ctaUrl", + created_at AS "createdAt" + FROM discovery_feed_items + WHERE ${whereClause} + ORDER BY is_featured DESC, priority DESC, created_at DESC + LIMIT $${paramIndex} OFFSET $${paramIndex + 1}`, + [...values, limit, offset] + ), + this.pool.query( + `SELECT COUNT(*) AS total FROM discovery_feed_items WHERE ${whereClause}`, + values + ), + ]); + + return { + items: itemsResult.rows, + total: parseInt(countResult.rows[0]?.total || '0'), + }; + } + + // ============================================================================ + // BRAND CATALOG BROWSING + // ============================================================================ + + async browseCatalog( + buyerBusinessId: number, + options: PortalQueryOptions & { brandId?: number } = {} + ): Promise<{ items: BrandCatalogItem[]; total: number }> { + const { limit = 50, offset = 0, brandId, category, search, sortBy = 'name', sortDir = 'asc' } = options; + + // Get buyer's states + const buyerResult = await this.pool.query( + `SELECT states FROM buyer_businesses WHERE id = $1`, + [buyerBusinessId] + ); + const buyerStates = buyerResult.rows[0]?.states || []; + + const conditions: string[] = ['bci.is_active = TRUE']; + const values: any[] = []; + let paramIndex = 1; + + // Filter by buyer's states (must have distribution in at least one) + if (buyerStates.length > 0) { + conditions.push(`bci.available_states && $${paramIndex++}`); + values.push(buyerStates); + } + + if (brandId) { + conditions.push(`bci.brand_id = $${paramIndex++}`); + values.push(brandId); + } + + if (category) { + conditions.push(`bci.category = $${paramIndex++}`); + values.push(category); + } + + if (search) { + conditions.push(`(bci.name ILIKE $${paramIndex} OR bci.sku ILIKE $${paramIndex} OR b.name ILIKE $${paramIndex})`); + values.push(`%${search}%`); + paramIndex++; + } + + const whereClause = conditions.join(' AND '); + const validSortColumns = ['name', 'msrp', 'category', 'created_at']; + const sortColumn = validSortColumns.includes(sortBy) ? `bci.${sortBy}` : 'bci.name'; + const sortDirection = sortDir === 'desc' ? 'DESC' : 'ASC'; + + const [itemsResult, countResult] = await Promise.all([ + this.pool.query( + `SELECT + bci.id, + bci.brand_id AS "brandId", + b.name AS "brandName", + bci.sku, + bci.name, + bci.description, + bci.category, + bci.subcategory, + bci.thc_content AS "thcContent", + bci.cbd_content AS "cbdContent", + bci.strain_type AS "strainType", + bci.weight, + bci.weight_unit AS "weightUnit", + bci.image_url AS "imageUrl", + bci.msrp, + bci.wholesale_price AS "wholesalePrice", + bci.available_states AS "availableStates" + FROM brand_catalog_items bci + JOIN brands b ON bci.brand_id = b.id + WHERE ${whereClause} + ORDER BY ${sortColumn} ${sortDirection} + LIMIT $${paramIndex} OFFSET $${paramIndex + 1}`, + [...values, limit, offset] + ), + this.pool.query( + `SELECT COUNT(*) AS total + FROM brand_catalog_items bci + JOIN brands b ON bci.brand_id = b.id + WHERE ${whereClause}`, + values + ), + ]); + + return { + items: itemsResult.rows, + total: parseInt(countResult.rows[0]?.total || '0'), + }; + } + + async getBrandsForBuyer(buyerBusinessId: number): Promise<{ + brands: { brandId: number; brandName: string; productCount: number; imageUrl: string | null }[]; + }> { + // Get buyer's states + const buyerResult = await this.pool.query( + `SELECT states FROM buyer_businesses WHERE id = $1`, + [buyerBusinessId] + ); + const buyerStates = buyerResult.rows[0]?.states || []; + + let query = ` + SELECT + b.id AS "brandId", + b.name AS "brandName", + COUNT(bci.id) AS "productCount", + b.image_url AS "imageUrl" + FROM brands b + JOIN brand_catalog_items bci ON b.id = bci.brand_id + WHERE bci.is_active = TRUE`; + + const values: any[] = []; + + if (buyerStates.length > 0) { + query += ` AND bci.available_states && $1`; + values.push(buyerStates); + } + + query += ` + GROUP BY b.id, b.name, b.image_url + ORDER BY "productCount" DESC`; + + const result = await this.pool.query(query, values); + + return { + brands: result.rows.map((row: any) => ({ + brandId: row.brandId, + brandName: row.brandName, + productCount: parseInt(row.productCount), + imageUrl: row.imageUrl, + })), + }; + } + + // ============================================================================ + // CART MANAGEMENT + // ============================================================================ + + async getOrCreateCart( + buyerBusinessId: number, + sellerBrandBusinessId: number, + state: string + ): Promise { + // Try to get existing active cart + const existingResult = await this.pool.query( + `SELECT + id, + buyer_business_id AS "buyerBusinessId", + seller_brand_business_id AS "sellerBrandBusinessId", + state, + status, + converted_to_order_id AS "convertedToOrderId", + last_activity_at AS "lastActivityAt", + expires_at AS "expiresAt", + created_at AS "createdAt", + updated_at AS "updatedAt" + FROM buyer_carts + WHERE buyer_business_id = $1 + AND seller_brand_business_id = $2 + AND state = $3 + AND status = 'active'`, + [buyerBusinessId, sellerBrandBusinessId, state] + ); + + if (existingResult.rows[0]) { + // Update last activity + await this.pool.query( + `UPDATE buyer_carts SET last_activity_at = NOW(), updated_at = NOW() WHERE id = $1`, + [existingResult.rows[0].id] + ); + return existingResult.rows[0]; + } + + // Create new cart + const result = await this.pool.query( + `INSERT INTO buyer_carts (buyer_business_id, seller_brand_business_id, state) + VALUES ($1, $2, $3) + RETURNING + id, + buyer_business_id AS "buyerBusinessId", + seller_brand_business_id AS "sellerBrandBusinessId", + state, + status, + converted_to_order_id AS "convertedToOrderId", + last_activity_at AS "lastActivityAt", + expires_at AS "expiresAt", + created_at AS "createdAt", + updated_at AS "updatedAt"`, + [buyerBusinessId, sellerBrandBusinessId, state] + ); + + return result.rows[0]; + } + + async getCartItems(cartId: number): Promise { + const result = await this.pool.query( + `SELECT + ci.id, + ci.cart_id AS "cartId", + ci.catalog_item_id AS "catalogItemId", + ci.quantity, + ci.unit_price AS "unitPrice", + ci.notes, + ci.created_at AS "createdAt", + ci.updated_at AS "updatedAt", + bci.name AS "productName", + bci.sku, + bci.image_url AS "imageUrl" + FROM cart_items ci + JOIN brand_catalog_items bci ON ci.catalog_item_id = bci.id + WHERE ci.cart_id = $1 + ORDER BY ci.created_at`, + [cartId] + ); + return result.rows; + } + + async addToCart( + cartId: number, + catalogItemId: number, + quantity: number, + notes?: string + ): Promise { + // Get the item price + const itemResult = await this.pool.query( + `SELECT COALESCE(wholesale_price, msrp) AS price FROM brand_catalog_items WHERE id = $1`, + [catalogItemId] + ); + const unitPrice = itemResult.rows[0]?.price || 0; + + const result = await this.pool.query( + `INSERT INTO cart_items (cart_id, catalog_item_id, quantity, unit_price, notes) + VALUES ($1, $2, $3, $4, $5) + ON CONFLICT (cart_id, catalog_item_id) DO UPDATE SET + quantity = cart_items.quantity + EXCLUDED.quantity, + notes = COALESCE(EXCLUDED.notes, cart_items.notes), + updated_at = NOW() + RETURNING + id, + cart_id AS "cartId", + catalog_item_id AS "catalogItemId", + quantity, + unit_price AS "unitPrice", + notes, + created_at AS "createdAt", + updated_at AS "updatedAt"`, + [cartId, catalogItemId, quantity, unitPrice, notes] + ); + + // Update cart activity + await this.pool.query( + `UPDATE buyer_carts SET last_activity_at = NOW(), updated_at = NOW() WHERE id = $1`, + [cartId] + ); + + return result.rows[0]; + } + + async updateCartItem(cartItemId: number, quantity: number, notes?: string): Promise { + if (quantity <= 0) { + await this.removeFromCart(cartItemId); + return null; + } + + const result = await this.pool.query( + `UPDATE cart_items + SET quantity = $2, notes = COALESCE($3, notes), updated_at = NOW() + WHERE id = $1 + RETURNING + id, + cart_id AS "cartId", + catalog_item_id AS "catalogItemId", + quantity, + unit_price AS "unitPrice", + notes, + created_at AS "createdAt", + updated_at AS "updatedAt"`, + [cartItemId, quantity, notes] + ); + + return result.rows[0] || null; + } + + async removeFromCart(cartItemId: number): Promise { + const result = await this.pool.query( + `DELETE FROM cart_items WHERE id = $1`, + [cartItemId] + ); + return (result.rowCount ?? 0) > 0; + } + + async clearCart(cartId: number): Promise { + await this.pool.query(`DELETE FROM cart_items WHERE cart_id = $1`, [cartId]); + } + + async getActiveCarts(buyerBusinessId: number): Promise<{ + carts: (BuyerCart & { itemCount: number; totalValue: number; brandName: string })[]; + }> { + const result = await this.pool.query( + `SELECT + bc.id, + bc.buyer_business_id AS "buyerBusinessId", + bc.seller_brand_business_id AS "sellerBrandBusinessId", + bc.state, + bc.status, + bc.last_activity_at AS "lastActivityAt", + bc.created_at AS "createdAt", + COALESCE(SUM(ci.quantity), 0) AS "itemCount", + COALESCE(SUM(ci.quantity * ci.unit_price), 0) AS "totalValue", + b.name AS "brandName" + FROM buyer_carts bc + LEFT JOIN cart_items ci ON bc.id = ci.cart_id + JOIN brand_businesses bb ON bc.seller_brand_business_id = bb.id + JOIN brands b ON bb.brand_id = b.id + WHERE bc.buyer_business_id = $1 AND bc.status = 'active' + GROUP BY bc.id, bc.buyer_business_id, bc.seller_brand_business_id, bc.state, bc.status, + bc.last_activity_at, bc.created_at, b.name + ORDER BY bc.last_activity_at DESC`, + [buyerBusinessId] + ); + + return { + carts: result.rows.map((row: any) => ({ + ...row, + itemCount: parseInt(row.itemCount), + totalValue: parseFloat(row.totalValue), + })), + }; + } + + // ============================================================================ + // PRICE ALERTS + // ============================================================================ + + async getPriceAlerts( + buyerBusinessId: number, + options: PortalQueryOptions = {} + ): Promise<{ alerts: IntelligenceAlert[]; total: number }> { + const { limit = 20, offset = 0, status = 'new' } = options; + + const [alertsResult, countResult] = await Promise.all([ + this.pool.query( + `SELECT + id, + alert_type AS "alertType", + severity, + title, + description, + data, + state, + category, + product_id AS "productId", + brand_id AS "brandId", + is_actionable AS "isActionable", + suggested_action AS "suggestedAction", + status, + created_at AS "createdAt" + FROM intelligence_alerts + WHERE buyer_business_id = $1 + AND alert_type LIKE 'price_%' + AND ($2::text IS NULL OR status = $2) + ORDER BY created_at DESC + LIMIT $3 OFFSET $4`, + [buyerBusinessId, status === 'all' ? null : status, limit, offset] + ), + this.pool.query( + `SELECT COUNT(*) AS total + FROM intelligence_alerts + WHERE buyer_business_id = $1 + AND alert_type LIKE 'price_%' + AND ($2::text IS NULL OR status = $2)`, + [buyerBusinessId, status === 'all' ? null : status] + ), + ]); + + return { + alerts: alertsResult.rows, + total: parseInt(countResult.rows[0]?.total || '0'), + }; + } + + async dismissAlert(alertId: number, buyerBusinessId: number): Promise { + const result = await this.pool.query( + `UPDATE intelligence_alerts + SET status = 'dismissed', acknowledged_at = NOW() + WHERE id = $1 AND buyer_business_id = $2`, + [alertId, buyerBusinessId] + ); + return (result.rowCount ?? 0) > 0; + } +} diff --git a/backend/src/portals/services/intelligence.ts b/backend/src/portals/services/intelligence.ts new file mode 100644 index 00000000..3c380d83 --- /dev/null +++ b/backend/src/portals/services/intelligence.ts @@ -0,0 +1,860 @@ +/** + * Intelligence Engine Service + * Phase 6: AI-driven alerts, recommendations, and summaries + */ + +import { Pool } from 'pg'; +import { + IntelligenceAlert, + IntelligenceRecommendation, + IntelligenceSummary, + IntelligenceRule, + PortalQueryOptions, + SummaryHighlight, + SummaryMetrics, + SummaryTrend, + TopPerformer, + AreaOfConcern, +} from '../types'; + +export class IntelligenceEngineService { + constructor(private pool: Pool) {} + + // ============================================================================ + // ALERTS + // ============================================================================ + + async getAlerts( + options: PortalQueryOptions & { + brandBusinessId?: number; + buyerBusinessId?: number; + alertType?: string; + severity?: string; + } = {} + ): Promise<{ alerts: IntelligenceAlert[]; total: number }> { + const { + limit = 50, + offset = 0, + brandBusinessId, + buyerBusinessId, + alertType, + severity, + status = 'new', + state, + } = options; + + const conditions: string[] = []; + const values: any[] = []; + let paramIndex = 1; + + if (brandBusinessId) { + conditions.push(`brand_business_id = $${paramIndex++}`); + values.push(brandBusinessId); + } + + if (buyerBusinessId) { + conditions.push(`buyer_business_id = $${paramIndex++}`); + values.push(buyerBusinessId); + } + + if (alertType) { + conditions.push(`alert_type = $${paramIndex++}`); + values.push(alertType); + } + + if (severity) { + conditions.push(`severity = $${paramIndex++}`); + values.push(severity); + } + + if (status && status !== 'all') { + conditions.push(`status = $${paramIndex++}`); + values.push(status); + } + + if (state) { + conditions.push(`state = $${paramIndex++}`); + values.push(state); + } + + const whereClause = conditions.length > 0 ? `WHERE ${conditions.join(' AND ')}` : ''; + + const [alertsResult, countResult] = await Promise.all([ + this.pool.query( + `SELECT + id, + brand_business_id AS "brandBusinessId", + buyer_business_id AS "buyerBusinessId", + alert_type AS "alertType", + severity, + title, + description, + data, + state, + category, + product_id AS "productId", + brand_id AS "brandId", + is_actionable AS "isActionable", + suggested_action AS "suggestedAction", + status, + acknowledged_at AS "acknowledgedAt", + acknowledged_by AS "acknowledgedBy", + resolved_at AS "resolvedAt", + expires_at AS "expiresAt", + created_at AS "createdAt" + FROM intelligence_alerts + ${whereClause} + ORDER BY + CASE severity WHEN 'critical' THEN 1 WHEN 'warning' THEN 2 ELSE 3 END, + created_at DESC + LIMIT $${paramIndex} OFFSET $${paramIndex + 1}`, + [...values, limit, offset] + ), + this.pool.query( + `SELECT COUNT(*) AS total FROM intelligence_alerts ${whereClause}`, + values + ), + ]); + + return { + alerts: alertsResult.rows, + total: parseInt(countResult.rows[0]?.total || '0'), + }; + } + + async createAlert(alert: Omit): Promise { + const result = await this.pool.query( + `INSERT INTO intelligence_alerts ( + brand_business_id, buyer_business_id, alert_type, severity, + title, description, data, state, category, product_id, brand_id, + is_actionable, suggested_action, status, expires_at + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15) + RETURNING + id, + brand_business_id AS "brandBusinessId", + buyer_business_id AS "buyerBusinessId", + alert_type AS "alertType", + severity, title, description, data, state, category, + product_id AS "productId", + brand_id AS "brandId", + is_actionable AS "isActionable", + suggested_action AS "suggestedAction", + status, + created_at AS "createdAt"`, + [ + alert.brandBusinessId, + alert.buyerBusinessId, + alert.alertType, + alert.severity, + alert.title, + alert.description, + JSON.stringify(alert.data), + alert.state, + alert.category, + alert.productId, + alert.brandId, + alert.isActionable, + alert.suggestedAction, + alert.status || 'new', + alert.expiresAt, + ] + ); + return result.rows[0]; + } + + async acknowledgeAlert(alertId: number, userId: number): Promise { + const result = await this.pool.query( + `UPDATE intelligence_alerts + SET status = 'acknowledged', acknowledged_at = NOW(), acknowledged_by = $2 + WHERE id = $1 + RETURNING + id, brand_business_id AS "brandBusinessId", buyer_business_id AS "buyerBusinessId", + alert_type AS "alertType", severity, title, description, data, status, + acknowledged_at AS "acknowledgedAt", acknowledged_by AS "acknowledgedBy", + created_at AS "createdAt"`, + [alertId, userId] + ); + return result.rows[0] || null; + } + + async resolveAlert(alertId: number): Promise { + const result = await this.pool.query( + `UPDATE intelligence_alerts + SET status = 'resolved', resolved_at = NOW() + WHERE id = $1 + RETURNING id, status, resolved_at AS "resolvedAt"`, + [alertId] + ); + return result.rows[0] || null; + } + + async dismissAlert(alertId: number): Promise { + const result = await this.pool.query( + `UPDATE intelligence_alerts SET status = 'dismissed' WHERE id = $1`, + [alertId] + ); + return (result.rowCount ?? 0) > 0; + } + + // ============================================================================ + // RECOMMENDATIONS + // ============================================================================ + + async getRecommendations( + options: PortalQueryOptions & { + brandBusinessId?: number; + buyerBusinessId?: number; + recommendationType?: string; + } = {} + ): Promise<{ recommendations: IntelligenceRecommendation[]; total: number }> { + const { limit = 20, offset = 0, brandBusinessId, buyerBusinessId, recommendationType, status = 'pending' } = options; + + const conditions: string[] = []; + const values: any[] = []; + let paramIndex = 1; + + if (brandBusinessId) { + conditions.push(`brand_business_id = $${paramIndex++}`); + values.push(brandBusinessId); + } + + if (buyerBusinessId) { + conditions.push(`buyer_business_id = $${paramIndex++}`); + values.push(buyerBusinessId); + } + + if (recommendationType) { + conditions.push(`recommendation_type = $${paramIndex++}`); + values.push(recommendationType); + } + + if (status && status !== 'all') { + conditions.push(`status = $${paramIndex++}`); + values.push(status); + } + + // Filter out expired recommendations + conditions.push(`(expires_at IS NULL OR expires_at > NOW())`); + + const whereClause = conditions.length > 0 ? `WHERE ${conditions.join(' AND ')}` : ''; + + const [recsResult, countResult] = await Promise.all([ + this.pool.query( + `SELECT + id, + brand_business_id AS "brandBusinessId", + buyer_business_id AS "buyerBusinessId", + recommendation_type AS "recommendationType", + title, + description, + rationale, + data, + priority, + potential_impact AS "potentialImpact", + status, + accepted_at AS "acceptedAt", + accepted_by AS "acceptedBy", + implemented_at AS "implementedAt", + expires_at AS "expiresAt", + created_at AS "createdAt" + FROM intelligence_recommendations + ${whereClause} + ORDER BY priority DESC, created_at DESC + LIMIT $${paramIndex} OFFSET $${paramIndex + 1}`, + [...values, limit, offset] + ), + this.pool.query( + `SELECT COUNT(*) AS total FROM intelligence_recommendations ${whereClause}`, + values + ), + ]); + + return { + recommendations: recsResult.rows, + total: parseInt(countResult.rows[0]?.total || '0'), + }; + } + + async createRecommendation( + rec: Omit + ): Promise { + const result = await this.pool.query( + `INSERT INTO intelligence_recommendations ( + brand_business_id, buyer_business_id, recommendation_type, + title, description, rationale, data, priority, potential_impact, + status, expires_at + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11) + RETURNING + id, + brand_business_id AS "brandBusinessId", + buyer_business_id AS "buyerBusinessId", + recommendation_type AS "recommendationType", + title, description, rationale, data, priority, + potential_impact AS "potentialImpact", + status, + created_at AS "createdAt"`, + [ + rec.brandBusinessId, + rec.buyerBusinessId, + rec.recommendationType, + rec.title, + rec.description, + rec.rationale, + JSON.stringify(rec.data), + rec.priority, + JSON.stringify(rec.potentialImpact), + rec.status || 'pending', + rec.expiresAt, + ] + ); + return result.rows[0]; + } + + async acceptRecommendation(recId: number, userId: number): Promise { + const result = await this.pool.query( + `UPDATE intelligence_recommendations + SET status = 'accepted', accepted_at = NOW(), accepted_by = $2 + WHERE id = $1 + RETURNING id, status, accepted_at AS "acceptedAt", accepted_by AS "acceptedBy"`, + [recId, userId] + ); + return result.rows[0] || null; + } + + async rejectRecommendation(recId: number): Promise { + const result = await this.pool.query( + `UPDATE intelligence_recommendations SET status = 'rejected' WHERE id = $1`, + [recId] + ); + return (result.rowCount ?? 0) > 0; + } + + async markImplemented(recId: number): Promise { + const result = await this.pool.query( + `UPDATE intelligence_recommendations + SET status = 'implemented', implemented_at = NOW() + WHERE id = $1 + RETURNING id, status, implemented_at AS "implementedAt"`, + [recId] + ); + return result.rows[0] || null; + } + + // ============================================================================ + // SUMMARIES + // ============================================================================ + + async getSummaries( + options: PortalQueryOptions & { + brandBusinessId?: number; + buyerBusinessId?: number; + summaryType?: 'daily' | 'weekly' | 'monthly'; + } = {} + ): Promise { + const { limit = 10, offset = 0, brandBusinessId, buyerBusinessId, summaryType } = options; + + const conditions: string[] = []; + const values: any[] = []; + let paramIndex = 1; + + if (brandBusinessId) { + conditions.push(`brand_business_id = $${paramIndex++}`); + values.push(brandBusinessId); + } + + if (buyerBusinessId) { + conditions.push(`buyer_business_id = $${paramIndex++}`); + values.push(buyerBusinessId); + } + + if (summaryType) { + conditions.push(`summary_type = $${paramIndex++}`); + values.push(summaryType); + } + + const whereClause = conditions.length > 0 ? `WHERE ${conditions.join(' AND ')}` : ''; + + const result = await this.pool.query( + `SELECT + id, + brand_business_id AS "brandBusinessId", + buyer_business_id AS "buyerBusinessId", + summary_type AS "summaryType", + period_start AS "periodStart", + period_end AS "periodEnd", + highlights, + metrics, + trends, + top_performers AS "topPerformers", + areas_of_concern AS "areasOfConcern", + generated_at AS "generatedAt" + FROM intelligence_summaries + ${whereClause} + ORDER BY period_end DESC + LIMIT $${paramIndex} OFFSET $${paramIndex + 1}`, + [...values, limit, offset] + ); + + return result.rows; + } + + async generateDailySummary(brandBusinessId: number): Promise { + const now = new Date(); + const periodStart = new Date(now); + periodStart.setHours(0, 0, 0, 0); + periodStart.setDate(periodStart.getDate() - 1); + + const periodEnd = new Date(periodStart); + periodEnd.setDate(periodEnd.getDate() + 1); + + // Get brand ID + const brandResult = await this.pool.query( + `SELECT brand_id FROM brand_businesses WHERE id = $1`, + [brandBusinessId] + ); + const brandId = brandResult.rows[0]?.brand_id; + + if (!brandId) { + throw new Error(`Brand business ${brandBusinessId} not found`); + } + + // Gather metrics from the past day + const [orderStats, productStats, alertStats] = await Promise.all([ + // Order metrics + this.pool.query( + `SELECT + COUNT(*) AS total_orders, + COALESCE(SUM(total), 0) AS total_revenue, + COALESCE(AVG(total), 0) AS avg_order_value + FROM orders + WHERE seller_brand_business_id = $1 + AND submitted_at >= $2 AND submitted_at < $3 + AND status NOT IN ('cancelled', 'rejected', 'draft')`, + [brandBusinessId, periodStart, periodEnd] + ), + + // Product metrics + this.pool.query( + `SELECT + COUNT(DISTINCT dispensary_id) AS stores_with_brand, + COUNT(*) AS products_listed + FROM store_products + WHERE brand_id = $1 + AND updated_at >= $2`, + [brandId, periodStart] + ), + + // Alert counts + this.pool.query( + `SELECT + COUNT(*) AS total_alerts, + COUNT(*) FILTER (WHERE severity = 'critical') AS critical_alerts + FROM intelligence_alerts + WHERE brand_business_id = $1 + AND created_at >= $2 AND created_at < $3`, + [brandBusinessId, periodStart, periodEnd] + ), + ]); + + const metrics: SummaryMetrics = { + totalRevenue: parseFloat(orderStats.rows[0]?.total_revenue || '0'), + totalOrders: parseInt(orderStats.rows[0]?.total_orders || '0'), + avgOrderValue: parseFloat(orderStats.rows[0]?.avg_order_value || '0'), + storesWithBrand: parseInt(productStats.rows[0]?.stores_with_brand || '0'), + productsListed: parseInt(productStats.rows[0]?.products_listed || '0'), + }; + + const highlights: SummaryHighlight[] = [ + { + type: 'revenue', + title: 'Daily Revenue', + value: `$${metrics.totalRevenue?.toFixed(2) || '0.00'}`, + }, + { + type: 'orders', + title: 'Orders Received', + value: metrics.totalOrders || 0, + }, + { + type: 'presence', + title: 'Store Presence', + value: metrics.storesWithBrand || 0, + }, + ]; + + const areasOfConcern: AreaOfConcern[] = []; + if (parseInt(alertStats.rows[0]?.critical_alerts || '0') > 0) { + areasOfConcern.push({ + type: 'alerts', + title: 'Critical Alerts', + description: `${alertStats.rows[0].critical_alerts} critical alerts require attention`, + severity: 'high', + suggestedAction: 'Review alerts in the intelligence dashboard', + }); + } + + // Insert summary + const result = await this.pool.query( + `INSERT INTO intelligence_summaries ( + brand_business_id, summary_type, period_start, period_end, + highlights, metrics, trends, top_performers, areas_of_concern + ) VALUES ($1, 'daily', $2, $3, $4, $5, $6, $7, $8) + RETURNING + id, + brand_business_id AS "brandBusinessId", + summary_type AS "summaryType", + period_start AS "periodStart", + period_end AS "periodEnd", + highlights, metrics, trends, + top_performers AS "topPerformers", + areas_of_concern AS "areasOfConcern", + generated_at AS "generatedAt"`, + [ + brandBusinessId, + periodStart, + periodEnd, + JSON.stringify(highlights), + JSON.stringify(metrics), + JSON.stringify([]), + JSON.stringify([]), + JSON.stringify(areasOfConcern), + ] + ); + + return result.rows[0]; + } + + // ============================================================================ + // INTELLIGENCE RULES + // ============================================================================ + + async getRules( + options: PortalQueryOptions & { + brandBusinessId?: number; + buyerBusinessId?: number; + ruleType?: 'alert' | 'recommendation'; + } = {} + ): Promise { + const { limit = 50, offset = 0, brandBusinessId, buyerBusinessId, ruleType } = options; + + const conditions: string[] = []; + const values: any[] = []; + let paramIndex = 1; + + if (brandBusinessId) { + conditions.push(`brand_business_id = $${paramIndex++}`); + values.push(brandBusinessId); + } + + if (buyerBusinessId) { + conditions.push(`buyer_business_id = $${paramIndex++}`); + values.push(buyerBusinessId); + } + + if (ruleType) { + conditions.push(`rule_type = $${paramIndex++}`); + values.push(ruleType); + } + + const whereClause = conditions.length > 0 ? `WHERE ${conditions.join(' AND ')}` : ''; + + const result = await this.pool.query( + `SELECT + id, + brand_business_id AS "brandBusinessId", + buyer_business_id AS "buyerBusinessId", + rule_name AS "ruleName", + rule_type AS "ruleType", + conditions, + actions, + is_enabled AS "isEnabled", + last_triggered_at AS "lastTriggeredAt", + trigger_count AS "triggerCount", + created_by AS "createdBy", + created_at AS "createdAt", + updated_at AS "updatedAt" + FROM intelligence_rules + ${whereClause} + ORDER BY created_at DESC + LIMIT $${paramIndex} OFFSET $${paramIndex + 1}`, + [...values, limit, offset] + ); + + return result.rows; + } + + async createRule(rule: Omit): Promise { + const result = await this.pool.query( + `INSERT INTO intelligence_rules ( + brand_business_id, buyer_business_id, rule_name, rule_type, + conditions, actions, is_enabled, created_by + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8) + RETURNING + id, + brand_business_id AS "brandBusinessId", + buyer_business_id AS "buyerBusinessId", + rule_name AS "ruleName", + rule_type AS "ruleType", + conditions, actions, + is_enabled AS "isEnabled", + trigger_count AS "triggerCount", + created_by AS "createdBy", + created_at AS "createdAt", + updated_at AS "updatedAt"`, + [ + rule.brandBusinessId, + rule.buyerBusinessId, + rule.ruleName, + rule.ruleType, + JSON.stringify(rule.conditions), + JSON.stringify(rule.actions), + rule.isEnabled ?? true, + rule.createdBy, + ] + ); + return result.rows[0]; + } + + async updateRule(ruleId: number, updates: Partial): Promise { + const setClauses: string[] = []; + const values: any[] = []; + let paramIndex = 1; + + if (updates.ruleName !== undefined) { + setClauses.push(`rule_name = $${paramIndex++}`); + values.push(updates.ruleName); + } + if (updates.conditions !== undefined) { + setClauses.push(`conditions = $${paramIndex++}`); + values.push(JSON.stringify(updates.conditions)); + } + if (updates.actions !== undefined) { + setClauses.push(`actions = $${paramIndex++}`); + values.push(JSON.stringify(updates.actions)); + } + if (updates.isEnabled !== undefined) { + setClauses.push(`is_enabled = $${paramIndex++}`); + values.push(updates.isEnabled); + } + + if (setClauses.length === 0) { + return null; + } + + setClauses.push('updated_at = NOW()'); + values.push(ruleId); + + await this.pool.query( + `UPDATE intelligence_rules SET ${setClauses.join(', ')} WHERE id = $${paramIndex}`, + values + ); + + const result = await this.pool.query( + `SELECT + id, rule_name AS "ruleName", rule_type AS "ruleType", + conditions, actions, is_enabled AS "isEnabled", + updated_at AS "updatedAt" + FROM intelligence_rules WHERE id = $1`, + [ruleId] + ); + + return result.rows[0] || null; + } + + async deleteRule(ruleId: number): Promise { + const result = await this.pool.query( + `DELETE FROM intelligence_rules WHERE id = $1`, + [ruleId] + ); + return (result.rowCount ?? 0) > 0; + } + + async toggleRule(ruleId: number, enabled: boolean): Promise { + const result = await this.pool.query( + `UPDATE intelligence_rules SET is_enabled = $2, updated_at = NOW() WHERE id = $1`, + [ruleId, enabled] + ); + return (result.rowCount ?? 0) > 0; + } + + // ============================================================================ + // AUTOMATED ALERT GENERATION + // ============================================================================ + + async generatePriceChangeAlerts(brandId: number): Promise { + // Find significant price changes in the last 24 hours + const priceChanges = await this.pool.query( + `SELECT + sp.id AS product_id, + sp.name AS product_name, + sp.dispensary_id, + d.name AS dispensary_name, + d.state, + sps.price_rec AS current_price, + LAG(sps.price_rec) OVER (PARTITION BY sp.id ORDER BY sps.snapshot_time) AS previous_price, + (sps.price_rec - LAG(sps.price_rec) OVER (PARTITION BY sp.id ORDER BY sps.snapshot_time)) / + NULLIF(LAG(sps.price_rec) OVER (PARTITION BY sp.id ORDER BY sps.snapshot_time), 0) * 100 AS change_pct + FROM store_products sp + JOIN store_product_snapshots sps ON sp.id = sps.store_product_id + JOIN dispensaries d ON sp.dispensary_id = d.id + WHERE sp.brand_id = $1 + AND sps.snapshot_time >= NOW() - INTERVAL '24 hours' + ORDER BY sp.id, sps.snapshot_time`, + [brandId] + ); + + let alertsCreated = 0; + const significantThreshold = 10; // 10% change + + for (const row of priceChanges.rows) { + if (row.change_pct && Math.abs(row.change_pct) >= significantThreshold) { + const alertType = row.change_pct > 0 ? 'price_increase' : 'price_drop'; + const severity = Math.abs(row.change_pct) >= 20 ? 'warning' : 'info'; + + // Get brand business ID + const bbResult = await this.pool.query( + `SELECT id FROM brand_businesses WHERE brand_id = $1`, + [brandId] + ); + const brandBusinessId = bbResult.rows[0]?.id; + + if (brandBusinessId) { + await this.createAlert({ + brandBusinessId, + buyerBusinessId: null, + alertType, + severity, + title: `Price ${row.change_pct > 0 ? 'increase' : 'drop'} detected`, + description: `${row.product_name} at ${row.dispensary_name} changed by ${Math.abs(row.change_pct).toFixed(1)}%`, + data: { + productId: row.product_id, + dispensaryId: row.dispensary_id, + previousPrice: row.previous_price, + currentPrice: row.current_price, + changePercent: row.change_pct, + }, + state: row.state, + category: null, + productId: row.product_id, + brandId, + isActionable: true, + suggestedAction: 'Review pricing strategy for this product', + status: 'new', + acknowledgedAt: null, + acknowledgedBy: null, + resolvedAt: null, + expiresAt: new Date(Date.now() + 7 * 24 * 60 * 60 * 1000), // 7 days + }); + alertsCreated++; + } + } + } + + return alertsCreated; + } + + async generateLowStockAlerts(brandBusinessId: number): Promise { + const lowStockItems = await this.pool.query( + `SELECT + bi.id AS inventory_id, + bi.catalog_item_id, + bci.name AS product_name, + bci.sku, + bi.state, + bi.quantity_on_hand, + bi.reorder_point, + bi.quantity_available + FROM brand_inventory bi + JOIN brand_catalog_items bci ON bi.catalog_item_id = bci.id + JOIN brand_businesses bb ON bci.brand_id = bb.brand_id + WHERE bb.id = $1 + AND bi.quantity_on_hand <= bi.reorder_point + AND bi.quantity_on_hand > 0`, + [brandBusinessId] + ); + + let alertsCreated = 0; + + for (const item of lowStockItems.rows) { + await this.createAlert({ + brandBusinessId, + buyerBusinessId: null, + alertType: 'stock_low', + severity: 'warning', + title: `Low stock: ${item.product_name}`, + description: `Only ${item.quantity_on_hand} units remaining in ${item.state} (reorder point: ${item.reorder_point})`, + data: { + inventoryId: item.inventory_id, + catalogItemId: item.catalog_item_id, + sku: item.sku, + quantityOnHand: item.quantity_on_hand, + reorderPoint: item.reorder_point, + }, + state: item.state, + category: null, + productId: null, + brandId: null, + isActionable: true, + suggestedAction: 'Reorder inventory to avoid stockouts', + status: 'new', + acknowledgedAt: null, + acknowledgedBy: null, + resolvedAt: null, + expiresAt: null, + }); + alertsCreated++; + } + + return alertsCreated; + } + + async generateOutOfStockAlerts(brandBusinessId: number): Promise { + const oosItems = await this.pool.query( + `SELECT + bi.id AS inventory_id, + bi.catalog_item_id, + bci.name AS product_name, + bci.sku, + bi.state + FROM brand_inventory bi + JOIN brand_catalog_items bci ON bi.catalog_item_id = bci.id + JOIN brand_businesses bb ON bci.brand_id = bb.brand_id + WHERE bb.id = $1 + AND bi.quantity_on_hand = 0 + AND bi.inventory_status != 'discontinued'`, + [brandBusinessId] + ); + + let alertsCreated = 0; + + for (const item of oosItems.rows) { + await this.createAlert({ + brandBusinessId, + buyerBusinessId: null, + alertType: 'stock_out', + severity: 'critical', + title: `Out of stock: ${item.product_name}`, + description: `${item.product_name} (${item.sku}) is out of stock in ${item.state}`, + data: { + inventoryId: item.inventory_id, + catalogItemId: item.catalog_item_id, + sku: item.sku, + }, + state: item.state, + category: null, + productId: null, + brandId: null, + isActionable: true, + suggestedAction: 'Immediate restocking required', + status: 'new', + acknowledgedAt: null, + acknowledgedBy: null, + resolvedAt: null, + expiresAt: null, + }); + alertsCreated++; + } + + return alertsCreated; + } +} diff --git a/backend/src/portals/services/inventory.ts b/backend/src/portals/services/inventory.ts new file mode 100644 index 00000000..4f28637c --- /dev/null +++ b/backend/src/portals/services/inventory.ts @@ -0,0 +1,636 @@ +/** + * Inventory Service + * Phase 7: Inventory tracking, sync, reservations, and alerts + */ + +import { Pool } from 'pg'; +import { + BrandInventory, + InventoryHistory, + InventorySyncLog, + InventoryQueryOptions, +} from '../types'; + +export class InventoryService { + constructor(private pool: Pool) {} + + // ============================================================================ + // INVENTORY QUERIES + // ============================================================================ + + async getInventory(options: InventoryQueryOptions = {}): Promise<{ items: BrandInventory[]; total: number }> { + const { + limit = 50, + offset = 0, + brandId, + state, + inventoryStatus, + lowStockOnly, + sortBy = 'name', + sortDir = 'asc', + } = options; + + const conditions: string[] = []; + const values: any[] = []; + let paramIndex = 1; + + if (brandId) { + conditions.push(`bi.brand_id = $${paramIndex++}`); + values.push(brandId); + } + + if (state) { + conditions.push(`bi.state = $${paramIndex++}`); + values.push(state); + } + + if (inventoryStatus) { + conditions.push(`bi.inventory_status = $${paramIndex++}`); + values.push(inventoryStatus); + } + + if (lowStockOnly) { + conditions.push(`bi.quantity_on_hand <= bi.reorder_point`); + } + + const whereClause = conditions.length > 0 ? `WHERE ${conditions.join(' AND ')}` : ''; + const validSortColumns = ['name', 'sku', 'quantity_on_hand', 'state', 'inventory_status']; + const sortColumn = validSortColumns.includes(sortBy) ? + (sortBy === 'name' ? 'bci.name' : sortBy === 'sku' ? 'bci.sku' : `bi.${sortBy}`) : 'bci.name'; + const sortDirection = sortDir === 'desc' ? 'DESC' : 'ASC'; + + const [inventoryResult, countResult] = await Promise.all([ + this.pool.query( + `SELECT + bi.id, + bi.brand_id AS "brandId", + bi.catalog_item_id AS "catalogItemId", + bci.sku, + bci.name AS "productName", + bci.category, + bi.state, + bi.quantity_on_hand AS "quantityOnHand", + bi.quantity_reserved AS "quantityReserved", + bi.quantity_available AS "quantityAvailable", + bi.reorder_point AS "reorderPoint", + bi.inventory_status AS "inventoryStatus", + bi.available_date AS "availableDate", + bi.last_sync_source AS "lastSyncSource", + bi.last_sync_at AS "lastSyncAt", + bi.created_at AS "createdAt", + bi.updated_at AS "updatedAt" + FROM brand_inventory bi + JOIN brand_catalog_items bci ON bi.catalog_item_id = bci.id + ${whereClause} + ORDER BY ${sortColumn} ${sortDirection} + LIMIT $${paramIndex} OFFSET $${paramIndex + 1}`, + [...values, limit, offset] + ), + this.pool.query( + `SELECT COUNT(*) AS total + FROM brand_inventory bi + JOIN brand_catalog_items bci ON bi.catalog_item_id = bci.id + ${whereClause}`, + values + ), + ]); + + return { + items: inventoryResult.rows, + total: parseInt(countResult.rows[0]?.total || '0'), + }; + } + + async getInventoryItem(inventoryId: number): Promise { + const result = await this.pool.query( + `SELECT + bi.id, + bi.brand_id AS "brandId", + bi.catalog_item_id AS "catalogItemId", + bci.sku, + bci.name AS "productName", + bi.state, + bi.quantity_on_hand AS "quantityOnHand", + bi.quantity_reserved AS "quantityReserved", + bi.quantity_available AS "quantityAvailable", + bi.reorder_point AS "reorderPoint", + bi.inventory_status AS "inventoryStatus", + bi.available_date AS "availableDate", + bi.last_sync_source AS "lastSyncSource", + bi.last_sync_at AS "lastSyncAt", + bi.created_at AS "createdAt", + bi.updated_at AS "updatedAt" + FROM brand_inventory bi + JOIN brand_catalog_items bci ON bi.catalog_item_id = bci.id + WHERE bi.id = $1`, + [inventoryId] + ); + return result.rows[0] || null; + } + + async getInventoryByCatalogItem(catalogItemId: number, state?: string): Promise { + const conditions = ['bi.catalog_item_id = $1']; + const values: any[] = [catalogItemId]; + + if (state) { + conditions.push('bi.state = $2'); + values.push(state); + } + + const result = await this.pool.query( + `SELECT + bi.id, + bi.brand_id AS "brandId", + bi.catalog_item_id AS "catalogItemId", + bi.state, + bi.quantity_on_hand AS "quantityOnHand", + bi.quantity_reserved AS "quantityReserved", + bi.quantity_available AS "quantityAvailable", + bi.reorder_point AS "reorderPoint", + bi.inventory_status AS "inventoryStatus", + bi.available_date AS "availableDate", + bi.last_sync_at AS "lastSyncAt" + FROM brand_inventory bi + WHERE ${conditions.join(' AND ')} + ORDER BY bi.state`, + values + ); + return result.rows; + } + + // ============================================================================ + // INVENTORY MANAGEMENT + // ============================================================================ + + async upsertInventory( + brandId: number, + catalogItemId: number, + state: string, + data: { + quantityOnHand: number; + reorderPoint?: number; + inventoryStatus?: string; + availableDate?: Date; + syncSource?: string; + } + ): Promise { + const result = await this.pool.query( + `INSERT INTO brand_inventory ( + brand_id, catalog_item_id, state, quantity_on_hand, reorder_point, + inventory_status, available_date, last_sync_source, last_sync_at + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, NOW()) + ON CONFLICT (catalog_item_id, state) DO UPDATE SET + quantity_on_hand = EXCLUDED.quantity_on_hand, + reorder_point = COALESCE(EXCLUDED.reorder_point, brand_inventory.reorder_point), + inventory_status = COALESCE(EXCLUDED.inventory_status, brand_inventory.inventory_status), + available_date = EXCLUDED.available_date, + last_sync_source = EXCLUDED.last_sync_source, + last_sync_at = NOW(), + updated_at = NOW() + RETURNING + id, + brand_id AS "brandId", + catalog_item_id AS "catalogItemId", + state, + quantity_on_hand AS "quantityOnHand", + quantity_reserved AS "quantityReserved", + quantity_available AS "quantityAvailable", + reorder_point AS "reorderPoint", + inventory_status AS "inventoryStatus"`, + [ + brandId, + catalogItemId, + state, + data.quantityOnHand, + data.reorderPoint, + data.inventoryStatus || 'in_stock', + data.availableDate, + data.syncSource || 'manual', + ] + ); + + return result.rows[0]; + } + + async adjustInventory( + inventoryId: number, + quantityChange: number, + changeType: 'adjustment' | 'order_reserve' | 'order_fulfill' | 'sync' | 'restock' | 'write_off', + options: { orderId?: number; reason?: string; changedBy?: number } = {} + ): Promise { + // Get current inventory + const currentResult = await this.pool.query( + `SELECT quantity_on_hand FROM brand_inventory WHERE id = $1`, + [inventoryId] + ); + + if (!currentResult.rows[0]) return null; + + const quantityBefore = currentResult.rows[0].quantity_on_hand; + const quantityAfter = quantityBefore + quantityChange; + + // Update inventory + const inventoryResult = await this.pool.query( + `UPDATE brand_inventory SET + quantity_on_hand = $2, + inventory_status = CASE + WHEN $2 = 0 THEN 'oos' + WHEN $2 <= reorder_point THEN 'low' + ELSE 'in_stock' + END, + updated_at = NOW() + WHERE id = $1 + RETURNING + id, + brand_id AS "brandId", + catalog_item_id AS "catalogItemId", + state, + quantity_on_hand AS "quantityOnHand", + quantity_reserved AS "quantityReserved", + quantity_available AS "quantityAvailable", + inventory_status AS "inventoryStatus"`, + [inventoryId, quantityAfter] + ); + + // Record history + await this.pool.query( + `INSERT INTO inventory_history ( + brand_inventory_id, change_type, quantity_change, + quantity_before, quantity_after, order_id, reason, changed_by + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)`, + [ + inventoryId, + changeType, + quantityChange, + quantityBefore, + quantityAfter, + options.orderId, + options.reason, + options.changedBy, + ] + ); + + return inventoryResult.rows[0]; + } + + // ============================================================================ + // RESERVATIONS (for orders) + // ============================================================================ + + async reserveInventory( + inventoryId: number, + quantity: number, + orderId: number + ): Promise { + // Check availability + const checkResult = await this.pool.query( + `SELECT quantity_available FROM brand_inventory WHERE id = $1`, + [inventoryId] + ); + + const available = checkResult.rows[0]?.quantity_available || 0; + if (available < quantity) { + throw new Error(`Insufficient inventory: requested ${quantity}, available ${available}`); + } + + // Update reservation + await this.pool.query( + `UPDATE brand_inventory SET + quantity_reserved = quantity_reserved + $2, + updated_at = NOW() + WHERE id = $1`, + [inventoryId, quantity] + ); + + // Record history + await this.pool.query( + `INSERT INTO inventory_history ( + brand_inventory_id, change_type, quantity_change, + quantity_before, quantity_after, order_id, reason + ) + SELECT + $1, 'order_reserve', $2, + quantity_reserved - $2, quantity_reserved, + $3, 'Reserved for order' + FROM brand_inventory WHERE id = $1`, + [inventoryId, quantity, orderId] + ); + + return true; + } + + async releaseReservation( + inventoryId: number, + quantity: number, + orderId: number + ): Promise { + await this.pool.query( + `UPDATE brand_inventory SET + quantity_reserved = GREATEST(0, quantity_reserved - $2), + updated_at = NOW() + WHERE id = $1`, + [inventoryId, quantity] + ); + + // Record history + await this.pool.query( + `INSERT INTO inventory_history ( + brand_inventory_id, change_type, quantity_change, + quantity_before, quantity_after, order_id, reason + ) + SELECT + $1, 'order_reserve', -$2, + quantity_reserved + $2, quantity_reserved, + $3, 'Released reservation' + FROM brand_inventory WHERE id = $1`, + [inventoryId, quantity, orderId] + ); + + return true; + } + + async fulfillReservation( + inventoryId: number, + quantity: number, + orderId: number + ): Promise { + // Decrease both on_hand and reserved + const result = await this.pool.query( + `UPDATE brand_inventory SET + quantity_on_hand = quantity_on_hand - $2, + quantity_reserved = GREATEST(0, quantity_reserved - $2), + inventory_status = CASE + WHEN quantity_on_hand - $2 = 0 THEN 'oos' + WHEN quantity_on_hand - $2 <= reorder_point THEN 'low' + ELSE 'in_stock' + END, + updated_at = NOW() + WHERE id = $1 + RETURNING quantity_on_hand AS "quantityOnHand"`, + [inventoryId, quantity] + ); + + // Record history + await this.pool.query( + `INSERT INTO inventory_history ( + brand_inventory_id, change_type, quantity_change, + quantity_before, quantity_after, order_id, reason + ) + SELECT + $1, 'order_fulfill', -$2, + $3 + $2, $3, + $4, 'Fulfilled for order' + FROM brand_inventory WHERE id = $1`, + [inventoryId, quantity, result.rows[0]?.quantityOnHand || 0, orderId] + ); + + return true; + } + + // ============================================================================ + // INVENTORY HISTORY + // ============================================================================ + + async getInventoryHistory( + inventoryId: number, + options: { limit?: number; offset?: number } = {} + ): Promise<{ history: InventoryHistory[]; total: number }> { + const { limit = 50, offset = 0 } = options; + + const [historyResult, countResult] = await Promise.all([ + this.pool.query( + `SELECT + ih.id, + ih.brand_inventory_id AS "brandInventoryId", + ih.change_type AS "changeType", + ih.quantity_change AS "quantityChange", + ih.quantity_before AS "quantityBefore", + ih.quantity_after AS "quantityAfter", + ih.order_id AS "orderId", + ih.reason, + ih.changed_by AS "changedBy", + u.name AS "changedByName", + ih.created_at AS "createdAt" + FROM inventory_history ih + LEFT JOIN users u ON ih.changed_by = u.id + WHERE ih.brand_inventory_id = $1 + ORDER BY ih.created_at DESC + LIMIT $2 OFFSET $3`, + [inventoryId, limit, offset] + ), + this.pool.query( + `SELECT COUNT(*) AS total FROM inventory_history WHERE brand_inventory_id = $1`, + [inventoryId] + ), + ]); + + return { + history: historyResult.rows, + total: parseInt(countResult.rows[0]?.total || '0'), + }; + } + + // ============================================================================ + // BULK SYNC + // ============================================================================ + + async startSync(brandBusinessId: number, syncSource: string): Promise { + const result = await this.pool.query( + `INSERT INTO inventory_sync_log (brand_business_id, sync_source, status) + VALUES ($1, $2, 'pending') + RETURNING + id, + brand_business_id AS "brandBusinessId", + sync_source AS "syncSource", + status, + items_synced AS "itemsSynced", + items_failed AS "itemsFailed", + started_at AS "startedAt"`, + [brandBusinessId, syncSource] + ); + return result.rows[0]; + } + + async updateSyncProgress( + syncLogId: number, + itemsSynced: number, + itemsFailed: number + ): Promise { + await this.pool.query( + `UPDATE inventory_sync_log SET + status = 'processing', + items_synced = $2, + items_failed = $3 + WHERE id = $1`, + [syncLogId, itemsSynced, itemsFailed] + ); + } + + async completeSync( + syncLogId: number, + itemsSynced: number, + itemsFailed: number, + errorMessage?: string + ): Promise { + const status = itemsFailed > 0 && itemsSynced === 0 ? 'failed' : 'completed'; + + const result = await this.pool.query( + `UPDATE inventory_sync_log SET + status = $2, + items_synced = $3, + items_failed = $4, + error_message = $5, + completed_at = NOW() + WHERE id = $1 + RETURNING + id, + brand_business_id AS "brandBusinessId", + sync_source AS "syncSource", + status, + items_synced AS "itemsSynced", + items_failed AS "itemsFailed", + error_message AS "errorMessage", + started_at AS "startedAt", + completed_at AS "completedAt"`, + [syncLogId, status, itemsSynced, itemsFailed, errorMessage] + ); + + return result.rows[0]; + } + + async getSyncHistory( + brandBusinessId: number, + options: { limit?: number; offset?: number } = {} + ): Promise { + const { limit = 20, offset = 0 } = options; + + const result = await this.pool.query( + `SELECT + id, + brand_business_id AS "brandBusinessId", + sync_source AS "syncSource", + status, + items_synced AS "itemsSynced", + items_failed AS "itemsFailed", + error_message AS "errorMessage", + started_at AS "startedAt", + completed_at AS "completedAt" + FROM inventory_sync_log + WHERE brand_business_id = $1 + ORDER BY started_at DESC + LIMIT $2 OFFSET $3`, + [brandBusinessId, limit, offset] + ); + + return result.rows; + } + + // ============================================================================ + // ALERTS & REPORTS + // ============================================================================ + + async getLowStockItems(brandId: number, state?: string): Promise { + const conditions = ['bi.brand_id = $1', 'bi.quantity_on_hand <= bi.reorder_point']; + const values: any[] = [brandId]; + + if (state) { + conditions.push('bi.state = $2'); + values.push(state); + } + + const result = await this.pool.query( + `SELECT + bi.id, + bi.catalog_item_id AS "catalogItemId", + bci.sku, + bci.name AS "productName", + bi.state, + bi.quantity_on_hand AS "quantityOnHand", + bi.reorder_point AS "reorderPoint", + bi.inventory_status AS "inventoryStatus" + FROM brand_inventory bi + JOIN brand_catalog_items bci ON bi.catalog_item_id = bci.id + WHERE ${conditions.join(' AND ')} + ORDER BY bi.quantity_on_hand ASC`, + values + ); + + return result.rows; + } + + async getOutOfStockItems(brandId: number, state?: string): Promise { + const conditions = ['bi.brand_id = $1', 'bi.quantity_on_hand = 0', "bi.inventory_status != 'discontinued'"]; + const values: any[] = [brandId]; + + if (state) { + conditions.push('bi.state = $2'); + values.push(state); + } + + const result = await this.pool.query( + `SELECT + bi.id, + bi.catalog_item_id AS "catalogItemId", + bci.sku, + bci.name AS "productName", + bi.state, + bi.inventory_status AS "inventoryStatus" + FROM brand_inventory bi + JOIN brand_catalog_items bci ON bi.catalog_item_id = bci.id + WHERE ${conditions.join(' AND ')} + ORDER BY bci.name`, + values + ); + + return result.rows; + } + + async getInventorySummary(brandId: number): Promise<{ + totalSkus: number; + inStock: number; + lowStock: number; + outOfStock: number; + totalUnits: number; + byState: { state: string; skus: number; units: number }[]; + }> { + const [summaryResult, byStateResult] = await Promise.all([ + this.pool.query( + `SELECT + COUNT(DISTINCT catalog_item_id) AS "totalSkus", + COUNT(*) FILTER (WHERE inventory_status = 'in_stock') AS "inStock", + COUNT(*) FILTER (WHERE inventory_status = 'low') AS "lowStock", + COUNT(*) FILTER (WHERE inventory_status = 'oos') AS "outOfStock", + COALESCE(SUM(quantity_on_hand), 0) AS "totalUnits" + FROM brand_inventory + WHERE brand_id = $1`, + [brandId] + ), + this.pool.query( + `SELECT + state, + COUNT(DISTINCT catalog_item_id) AS skus, + COALESCE(SUM(quantity_on_hand), 0) AS units + FROM brand_inventory + WHERE brand_id = $1 + GROUP BY state + ORDER BY state`, + [brandId] + ), + ]); + + const summary = summaryResult.rows[0]; + + return { + totalSkus: parseInt(summary?.totalSkus || '0'), + inStock: parseInt(summary?.inStock || '0'), + lowStock: parseInt(summary?.lowStock || '0'), + outOfStock: parseInt(summary?.outOfStock || '0'), + totalUnits: parseInt(summary?.totalUnits || '0'), + byState: byStateResult.rows.map((row: any) => ({ + state: row.state, + skus: parseInt(row.skus), + units: parseInt(row.units), + })), + }; + } +} diff --git a/backend/src/portals/services/messaging.ts b/backend/src/portals/services/messaging.ts new file mode 100644 index 00000000..a800ff0e --- /dev/null +++ b/backend/src/portals/services/messaging.ts @@ -0,0 +1,601 @@ +/** + * Messaging Service + * Phase 6: Message threads, messages, attachments, notifications + */ + +import { Pool } from 'pg'; +import { + MessageThread, + Message, + MessageAttachment, + ThreadParticipant, + Notification, + NotificationType, + UserNotificationPreference, + PortalQueryOptions, +} from '../types'; + +export class MessagingService { + constructor(private pool: Pool) {} + + // ============================================================================ + // THREADS + // ============================================================================ + + async getThreads( + options: PortalQueryOptions & { + brandBusinessId?: number; + buyerBusinessId?: number; + threadType?: string; + userId?: number; + } = {} + ): Promise<{ threads: MessageThread[]; total: number }> { + const { limit = 20, offset = 0, brandBusinessId, buyerBusinessId, threadType, status = 'open', userId } = options; + + const conditions: string[] = []; + const values: any[] = []; + let paramIndex = 1; + + if (brandBusinessId) { + conditions.push(`t.brand_business_id = $${paramIndex++}`); + values.push(brandBusinessId); + } + + if (buyerBusinessId) { + conditions.push(`t.buyer_business_id = $${paramIndex++}`); + values.push(buyerBusinessId); + } + + if (threadType) { + conditions.push(`t.thread_type = $${paramIndex++}`); + values.push(threadType); + } + + if (status && status !== 'all') { + conditions.push(`t.status = $${paramIndex++}`); + values.push(status); + } + + if (userId) { + conditions.push(`EXISTS (SELECT 1 FROM thread_participants tp WHERE tp.thread_id = t.id AND tp.user_id = $${paramIndex++})`); + values.push(userId); + } + + const whereClause = conditions.length > 0 ? `WHERE ${conditions.join(' AND ')}` : ''; + + const [threadsResult, countResult] = await Promise.all([ + this.pool.query( + `SELECT + t.id, + t.subject, + t.thread_type AS "threadType", + t.order_id AS "orderId", + t.brand_business_id AS "brandBusinessId", + t.buyer_business_id AS "buyerBusinessId", + t.status, + t.last_message_at AS "lastMessageAt", + ( + SELECT COUNT(*) + FROM messages m + WHERE m.thread_id = t.id AND m.is_read = FALSE + ) AS "unreadCount", + t.created_at AS "createdAt", + t.updated_at AS "updatedAt" + FROM message_threads t + ${whereClause} + ORDER BY t.last_message_at DESC NULLS LAST + LIMIT $${paramIndex} OFFSET $${paramIndex + 1}`, + [...values, limit, offset] + ), + this.pool.query( + `SELECT COUNT(*) AS total FROM message_threads t ${whereClause}`, + values + ), + ]); + + return { + threads: threadsResult.rows.map((row: any) => ({ + ...row, + unreadCount: parseInt(row.unreadCount || '0'), + })), + total: parseInt(countResult.rows[0]?.total || '0'), + }; + } + + async getThread(threadId: number): Promise { + const result = await this.pool.query( + `SELECT + id, + subject, + thread_type AS "threadType", + order_id AS "orderId", + brand_business_id AS "brandBusinessId", + buyer_business_id AS "buyerBusinessId", + status, + last_message_at AS "lastMessageAt", + created_at AS "createdAt", + updated_at AS "updatedAt" + FROM message_threads + WHERE id = $1`, + [threadId] + ); + return result.rows[0] || null; + } + + async createThread(thread: Omit): Promise { + const result = await this.pool.query( + `INSERT INTO message_threads ( + subject, thread_type, order_id, brand_business_id, buyer_business_id, status + ) VALUES ($1, $2, $3, $4, $5, $6) + RETURNING + id, + subject, + thread_type AS "threadType", + order_id AS "orderId", + brand_business_id AS "brandBusinessId", + buyer_business_id AS "buyerBusinessId", + status, + last_message_at AS "lastMessageAt", + created_at AS "createdAt", + updated_at AS "updatedAt"`, + [ + thread.subject, + thread.threadType, + thread.orderId, + thread.brandBusinessId, + thread.buyerBusinessId, + thread.status || 'open', + ] + ); + return { ...result.rows[0], unreadCount: 0 }; + } + + async updateThreadStatus(threadId: number, status: 'open' | 'closed' | 'archived'): Promise { + const result = await this.pool.query( + `UPDATE message_threads SET status = $2, updated_at = NOW() WHERE id = $1`, + [threadId, status] + ); + return (result.rowCount ?? 0) > 0; + } + + // ============================================================================ + // MESSAGES + // ============================================================================ + + async getMessages(threadId: number, options: PortalQueryOptions = {}): Promise<{ messages: Message[]; total: number }> { + const { limit = 50, offset = 0 } = options; + + const [messagesResult, countResult] = await Promise.all([ + this.pool.query( + `SELECT + m.id, + m.thread_id AS "threadId", + m.sender_id AS "senderId", + m.sender_type AS "senderType", + m.content, + m.is_read AS "isRead", + m.read_at AS "readAt", + m.created_at AS "createdAt", + COALESCE( + (SELECT json_agg(json_build_object( + 'id', ma.id, + 'filename', ma.filename, + 'fileUrl', ma.file_url, + 'fileSize', ma.file_size, + 'mimeType', ma.mime_type + )) + FROM message_attachments ma + WHERE ma.message_id = m.id), + '[]' + ) AS attachments + FROM messages m + WHERE m.thread_id = $1 + ORDER BY m.created_at ASC + LIMIT $2 OFFSET $3`, + [threadId, limit, offset] + ), + this.pool.query( + `SELECT COUNT(*) AS total FROM messages WHERE thread_id = $1`, + [threadId] + ), + ]); + + return { + messages: messagesResult.rows, + total: parseInt(countResult.rows[0]?.total || '0'), + }; + } + + async sendMessage( + threadId: number, + senderId: number, + senderType: 'brand' | 'buyer' | 'system', + content: string, + attachments?: { filename: string; fileUrl: string; fileSize?: number; mimeType?: string }[] + ): Promise { + // Insert message + const messageResult = await this.pool.query( + `INSERT INTO messages (thread_id, sender_id, sender_type, content) + VALUES ($1, $2, $3, $4) + RETURNING + id, + thread_id AS "threadId", + sender_id AS "senderId", + sender_type AS "senderType", + content, + is_read AS "isRead", + read_at AS "readAt", + created_at AS "createdAt"`, + [threadId, senderId, senderType, content] + ); + + const message = messageResult.rows[0]; + message.attachments = []; + + // Insert attachments if any + if (attachments && attachments.length > 0) { + for (const att of attachments) { + const attResult = await this.pool.query( + `INSERT INTO message_attachments (message_id, filename, file_url, file_size, mime_type) + VALUES ($1, $2, $3, $4, $5) + RETURNING id, filename, file_url AS "fileUrl", file_size AS "fileSize", mime_type AS "mimeType"`, + [message.id, att.filename, att.fileUrl, att.fileSize, att.mimeType] + ); + message.attachments.push(attResult.rows[0]); + } + } + + // Update thread's last_message_at + await this.pool.query( + `UPDATE message_threads SET last_message_at = NOW(), updated_at = NOW() WHERE id = $1`, + [threadId] + ); + + return message; + } + + async markMessagesAsRead(threadId: number, userId: number): Promise { + const result = await this.pool.query( + `UPDATE messages + SET is_read = TRUE, read_at = NOW() + WHERE thread_id = $1 AND sender_id != $2 AND is_read = FALSE`, + [threadId, userId] + ); + return result.rowCount ?? 0; + } + + // ============================================================================ + // THREAD PARTICIPANTS + // ============================================================================ + + async addParticipant( + threadId: number, + userId: number, + role: 'owner' | 'participant' | 'viewer' = 'participant' + ): Promise { + const result = await this.pool.query( + `INSERT INTO thread_participants (thread_id, user_id, role) + VALUES ($1, $2, $3) + ON CONFLICT (thread_id, user_id) DO UPDATE SET role = EXCLUDED.role + RETURNING + thread_id AS "threadId", + user_id AS "userId", + role, + last_read_at AS "lastReadAt", + is_subscribed AS "isSubscribed"`, + [threadId, userId, role] + ); + return result.rows[0]; + } + + async removeParticipant(threadId: number, userId: number): Promise { + const result = await this.pool.query( + `DELETE FROM thread_participants WHERE thread_id = $1 AND user_id = $2`, + [threadId, userId] + ); + return (result.rowCount ?? 0) > 0; + } + + async getParticipants(threadId: number): Promise { + const result = await this.pool.query( + `SELECT + tp.thread_id AS "threadId", + tp.user_id AS "userId", + tp.role, + tp.last_read_at AS "lastReadAt", + tp.is_subscribed AS "isSubscribed", + u.name AS "userName", + u.email AS "userEmail" + FROM thread_participants tp + JOIN users u ON tp.user_id = u.id + WHERE tp.thread_id = $1`, + [threadId] + ); + return result.rows; + } + + // ============================================================================ + // NOTIFICATIONS + // ============================================================================ + + async getNotifications( + userId: number, + options: PortalQueryOptions = {} + ): Promise<{ notifications: Notification[]; total: number }> { + const { limit = 20, offset = 0, status } = options; + + const conditions: string[] = ['user_id = $1', '(expires_at IS NULL OR expires_at > NOW())']; + const values: any[] = [userId]; + let paramIndex = 2; + + if (status === 'unread') { + conditions.push('is_read = FALSE'); + } else if (status === 'read') { + conditions.push('is_read = TRUE'); + } + + const whereClause = conditions.join(' AND '); + + const [notificationsResult, countResult] = await Promise.all([ + this.pool.query( + `SELECT + n.id, + n.user_id AS "userId", + n.notification_type_id AS "notificationTypeId", + n.title, + n.body, + n.data, + n.priority, + n.is_read AS "isRead", + n.read_at AS "readAt", + n.action_url AS "actionUrl", + n.expires_at AS "expiresAt", + n.created_at AS "createdAt", + nt.name AS "typeName", + nt.category AS "typeCategory" + FROM notifications n + JOIN notification_types nt ON n.notification_type_id = nt.id + WHERE ${whereClause} + ORDER BY + CASE n.priority WHEN 'urgent' THEN 1 WHEN 'high' THEN 2 WHEN 'normal' THEN 3 ELSE 4 END, + n.created_at DESC + LIMIT $${paramIndex} OFFSET $${paramIndex + 1}`, + [...values, limit, offset] + ), + this.pool.query( + `SELECT COUNT(*) AS total FROM notifications WHERE ${whereClause}`, + values + ), + ]); + + return { + notifications: notificationsResult.rows, + total: parseInt(countResult.rows[0]?.total || '0'), + }; + } + + async createNotification(notification: Omit): Promise { + const result = await this.pool.query( + `INSERT INTO notifications ( + user_id, notification_type_id, title, body, data, priority, action_url, expires_at + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8) + RETURNING + id, + user_id AS "userId", + notification_type_id AS "notificationTypeId", + title, body, data, priority, + is_read AS "isRead", + read_at AS "readAt", + action_url AS "actionUrl", + expires_at AS "expiresAt", + created_at AS "createdAt"`, + [ + notification.userId, + notification.notificationTypeId, + notification.title, + notification.body, + JSON.stringify(notification.data), + notification.priority || 'normal', + notification.actionUrl, + notification.expiresAt, + ] + ); + return result.rows[0]; + } + + async markNotificationRead(notificationId: number, userId: number): Promise { + const result = await this.pool.query( + `UPDATE notifications + SET is_read = TRUE, read_at = NOW() + WHERE id = $1 AND user_id = $2`, + [notificationId, userId] + ); + return (result.rowCount ?? 0) > 0; + } + + async markAllNotificationsRead(userId: number): Promise { + const result = await this.pool.query( + `UPDATE notifications + SET is_read = TRUE, read_at = NOW() + WHERE user_id = $1 AND is_read = FALSE`, + [userId] + ); + return result.rowCount ?? 0; + } + + async deleteNotification(notificationId: number, userId: number): Promise { + const result = await this.pool.query( + `DELETE FROM notifications WHERE id = $1 AND user_id = $2`, + [notificationId, userId] + ); + return (result.rowCount ?? 0) > 0; + } + + async getUnreadCount(userId: number): Promise { + const result = await this.pool.query( + `SELECT COUNT(*) AS count + FROM notifications + WHERE user_id = $1 AND is_read = FALSE AND (expires_at IS NULL OR expires_at > NOW())`, + [userId] + ); + return parseInt(result.rows[0]?.count || '0'); + } + + // ============================================================================ + // NOTIFICATION PREFERENCES + // ============================================================================ + + async getNotificationPreferences(userId: number): Promise { + const result = await this.pool.query( + `SELECT + unp.user_id AS "userId", + unp.notification_type_id AS "notificationTypeId", + unp.email_enabled AS "emailEnabled", + unp.in_app_enabled AS "inAppEnabled", + unp.push_enabled AS "pushEnabled", + nt.name AS "typeName", + nt.display_name AS "typeDisplayName", + nt.category AS "typeCategory" + FROM user_notification_preferences unp + JOIN notification_types nt ON unp.notification_type_id = nt.id + WHERE unp.user_id = $1`, + [userId] + ); + return result.rows; + } + + async updateNotificationPreference( + userId: number, + notificationTypeId: number, + preferences: Partial> + ): Promise { + const result = await this.pool.query( + `INSERT INTO user_notification_preferences (user_id, notification_type_id, email_enabled, in_app_enabled, push_enabled) + VALUES ($1, $2, $3, $4, $5) + ON CONFLICT (user_id, notification_type_id) DO UPDATE SET + email_enabled = COALESCE($3, user_notification_preferences.email_enabled), + in_app_enabled = COALESCE($4, user_notification_preferences.in_app_enabled), + push_enabled = COALESCE($5, user_notification_preferences.push_enabled) + RETURNING + user_id AS "userId", + notification_type_id AS "notificationTypeId", + email_enabled AS "emailEnabled", + in_app_enabled AS "inAppEnabled", + push_enabled AS "pushEnabled"`, + [ + userId, + notificationTypeId, + preferences.emailEnabled, + preferences.inAppEnabled, + preferences.pushEnabled, + ] + ); + return result.rows[0]; + } + + // ============================================================================ + // NOTIFICATION TYPES + // ============================================================================ + + async getNotificationTypes(): Promise { + const result = await this.pool.query( + `SELECT + id, + name, + display_name AS "displayName", + description, + category, + default_enabled AS "defaultEnabled", + template + FROM notification_types + ORDER BY category, display_name` + ); + return result.rows; + } + + // ============================================================================ + // HELPER: SEND NOTIFICATION TO USER + // ============================================================================ + + async notifyUser( + userId: number, + typeName: string, + title: string, + body: string, + data: Record = {}, + actionUrl?: string + ): Promise { + // Get notification type + const typeResult = await this.pool.query( + `SELECT id FROM notification_types WHERE name = $1`, + [typeName] + ); + + if (!typeResult.rows[0]) { + console.warn(`Notification type ${typeName} not found`); + return null; + } + + const notificationTypeId = typeResult.rows[0].id; + + // Check user preferences + const prefResult = await this.pool.query( + `SELECT in_app_enabled FROM user_notification_preferences + WHERE user_id = $1 AND notification_type_id = $2`, + [userId, notificationTypeId] + ); + + // Use default if no preference set + const inAppEnabled = prefResult.rows[0]?.in_app_enabled ?? true; + + if (!inAppEnabled) { + return null; + } + + return this.createNotification({ + userId, + notificationTypeId, + title, + body, + data, + priority: 'normal', + isRead: false, + readAt: null, + actionUrl: actionUrl || null, + expiresAt: null, + }); + } + + // ============================================================================ + // HELPER: NOTIFY THREAD PARTICIPANTS + // ============================================================================ + + async notifyThreadParticipants( + threadId: number, + excludeUserId: number, + title: string, + body: string, + actionUrl?: string + ): Promise { + const participants = await this.pool.query( + `SELECT user_id FROM thread_participants + WHERE thread_id = $1 AND user_id != $2 AND is_subscribed = TRUE`, + [threadId, excludeUserId] + ); + + let notified = 0; + for (const p of participants.rows) { + const notification = await this.notifyUser( + p.user_id, + 'new_message', + title, + body, + { threadId }, + actionUrl + ); + if (notification) notified++; + } + + return notified; + } +} diff --git a/backend/src/portals/services/orders.ts b/backend/src/portals/services/orders.ts new file mode 100644 index 00000000..67b459cf --- /dev/null +++ b/backend/src/portals/services/orders.ts @@ -0,0 +1,694 @@ +/** + * Orders Service + * Phase 7: Order lifecycle management, workflow, status transitions + */ + +import { Pool } from 'pg'; +import { + Order, + OrderItem, + OrderStatus, + OrderStatusHistory, + OrderDocument, + OrderQueryOptions, + BuyerCart, +} from '../types'; + +const VALID_TRANSITIONS: Record = { + draft: ['submitted', 'cancelled'], + submitted: ['accepted', 'rejected', 'cancelled'], + accepted: ['processing', 'cancelled'], + rejected: [], + processing: ['packed', 'cancelled'], + packed: ['shipped', 'cancelled'], + shipped: ['delivered'], + delivered: [], + cancelled: [], +}; + +export class OrdersService { + constructor(private pool: Pool) {} + + // ============================================================================ + // ORDER QUERIES + // ============================================================================ + + async getOrders(options: OrderQueryOptions = {}): Promise<{ orders: Order[]; total: number }> { + const { + limit = 50, + offset = 0, + buyerBusinessId, + sellerBrandBusinessId, + orderStatus, + state, + sortBy = 'created_at', + sortDir = 'desc', + dateFrom, + dateTo, + } = options; + + const conditions: string[] = []; + const values: any[] = []; + let paramIndex = 1; + + if (buyerBusinessId) { + conditions.push(`buyer_business_id = $${paramIndex++}`); + values.push(buyerBusinessId); + } + + if (sellerBrandBusinessId) { + conditions.push(`seller_brand_business_id = $${paramIndex++}`); + values.push(sellerBrandBusinessId); + } + + if (orderStatus) { + if (Array.isArray(orderStatus)) { + conditions.push(`status = ANY($${paramIndex++})`); + values.push(orderStatus); + } else { + conditions.push(`status = $${paramIndex++}`); + values.push(orderStatus); + } + } + + if (state) { + conditions.push(`state = $${paramIndex++}`); + values.push(state); + } + + if (dateFrom) { + conditions.push(`created_at >= $${paramIndex++}`); + values.push(dateFrom); + } + + if (dateTo) { + conditions.push(`created_at <= $${paramIndex++}`); + values.push(dateTo); + } + + const whereClause = conditions.length > 0 ? `WHERE ${conditions.join(' AND ')}` : ''; + const validSortColumns = ['created_at', 'submitted_at', 'total', 'status', 'order_number']; + const sortColumn = validSortColumns.includes(sortBy) ? sortBy : 'created_at'; + const sortDirection = sortDir === 'asc' ? 'ASC' : 'DESC'; + + const [ordersResult, countResult] = await Promise.all([ + this.pool.query( + `SELECT + id, + order_number AS "orderNumber", + buyer_business_id AS "buyerBusinessId", + seller_brand_business_id AS "sellerBrandBusinessId", + state, + shipping_address AS "shippingAddress", + subtotal, + tax_amount AS "taxAmount", + discount_amount AS "discountAmount", + shipping_cost AS "shippingCost", + total, + currency, + status, + submitted_at AS "submittedAt", + accepted_at AS "acceptedAt", + rejected_at AS "rejectedAt", + processing_at AS "processingAt", + packed_at AS "packedAt", + shipped_at AS "shippedAt", + delivered_at AS "deliveredAt", + cancelled_at AS "cancelledAt", + tracking_number AS "trackingNumber", + carrier, + estimated_delivery_date AS "estimatedDeliveryDate", + buyer_notes AS "buyerNotes", + seller_notes AS "sellerNotes", + po_number AS "poNumber", + manifest_number AS "manifestNumber", + metadata, + created_by AS "createdBy", + created_at AS "createdAt", + updated_at AS "updatedAt" + FROM orders + ${whereClause} + ORDER BY ${sortColumn} ${sortDirection} + LIMIT $${paramIndex} OFFSET $${paramIndex + 1}`, + [...values, limit, offset] + ), + this.pool.query( + `SELECT COUNT(*) AS total FROM orders ${whereClause}`, + values + ), + ]); + + return { + orders: ordersResult.rows, + total: parseInt(countResult.rows[0]?.total || '0'), + }; + } + + async getOrder(orderId: number): Promise { + const result = await this.pool.query( + `SELECT + id, + order_number AS "orderNumber", + buyer_business_id AS "buyerBusinessId", + seller_brand_business_id AS "sellerBrandBusinessId", + state, + shipping_address AS "shippingAddress", + subtotal, + tax_amount AS "taxAmount", + discount_amount AS "discountAmount", + shipping_cost AS "shippingCost", + total, + currency, + status, + submitted_at AS "submittedAt", + accepted_at AS "acceptedAt", + rejected_at AS "rejectedAt", + processing_at AS "processingAt", + packed_at AS "packedAt", + shipped_at AS "shippedAt", + delivered_at AS "deliveredAt", + cancelled_at AS "cancelledAt", + tracking_number AS "trackingNumber", + carrier, + estimated_delivery_date AS "estimatedDeliveryDate", + buyer_notes AS "buyerNotes", + seller_notes AS "sellerNotes", + internal_notes AS "internalNotes", + po_number AS "poNumber", + manifest_number AS "manifestNumber", + metadata, + created_by AS "createdBy", + created_at AS "createdAt", + updated_at AS "updatedAt" + FROM orders + WHERE id = $1`, + [orderId] + ); + return result.rows[0] || null; + } + + async getOrderByNumber(orderNumber: string): Promise { + const result = await this.pool.query( + `SELECT + id, + order_number AS "orderNumber", + buyer_business_id AS "buyerBusinessId", + seller_brand_business_id AS "sellerBrandBusinessId", + state, + status, + total, + created_at AS "createdAt" + FROM orders + WHERE order_number = $1`, + [orderNumber] + ); + return result.rows[0] || null; + } + + // ============================================================================ + // ORDER ITEMS + // ============================================================================ + + async getOrderItems(orderId: number): Promise { + const result = await this.pool.query( + `SELECT + oi.id, + oi.order_id AS "orderId", + oi.catalog_item_id AS "catalogItemId", + oi.store_product_id AS "storeProductId", + oi.sku, + oi.name, + oi.category, + oi.quantity, + oi.unit_price AS "unitPrice", + oi.discount_percent AS "discountPercent", + oi.discount_amount AS "discountAmount", + oi.line_total AS "lineTotal", + oi.quantity_fulfilled AS "quantityFulfilled", + oi.fulfillment_status AS "fulfillmentStatus", + oi.notes, + oi.created_at AS "createdAt", + bci.image_url AS "imageUrl" + FROM order_items oi + LEFT JOIN brand_catalog_items bci ON oi.catalog_item_id = bci.id + WHERE oi.order_id = $1 + ORDER BY oi.created_at`, + [orderId] + ); + return result.rows; + } + + // ============================================================================ + // ORDER CREATION + // ============================================================================ + + async createOrder(order: { + buyerBusinessId: number; + sellerBrandBusinessId: number; + state: string; + shippingAddress?: any; + buyerNotes?: string; + poNumber?: string; + createdBy?: number; + }): Promise { + // Generate order number + const orderNumber = await this.generateOrderNumber(); + + const result = await this.pool.query( + `INSERT INTO orders ( + order_number, buyer_business_id, seller_brand_business_id, state, + shipping_address, buyer_notes, po_number, created_by, status + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, 'draft') + RETURNING + id, + order_number AS "orderNumber", + buyer_business_id AS "buyerBusinessId", + seller_brand_business_id AS "sellerBrandBusinessId", + state, + shipping_address AS "shippingAddress", + subtotal, total, status, + created_at AS "createdAt"`, + [ + orderNumber, + order.buyerBusinessId, + order.sellerBrandBusinessId, + order.state, + order.shippingAddress ? JSON.stringify(order.shippingAddress) : null, + order.buyerNotes, + order.poNumber, + order.createdBy, + ] + ); + + return result.rows[0]; + } + + async addOrderItem(orderId: number, item: { + catalogItemId?: number; + storeProductId?: number; + sku: string; + name: string; + category?: string; + quantity: number; + unitPrice: number; + discountPercent?: number; + notes?: string; + }): Promise { + const discountAmount = item.discountPercent + ? (item.quantity * item.unitPrice * item.discountPercent) / 100 + : 0; + const lineTotal = item.quantity * item.unitPrice - discountAmount; + + const result = await this.pool.query( + `INSERT INTO order_items ( + order_id, catalog_item_id, store_product_id, sku, name, category, + quantity, unit_price, discount_percent, discount_amount, line_total, notes + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12) + RETURNING + id, + order_id AS "orderId", + catalog_item_id AS "catalogItemId", + sku, name, category, quantity, + unit_price AS "unitPrice", + discount_percent AS "discountPercent", + discount_amount AS "discountAmount", + line_total AS "lineTotal", + notes, + created_at AS "createdAt"`, + [ + orderId, + item.catalogItemId, + item.storeProductId, + item.sku, + item.name, + item.category, + item.quantity, + item.unitPrice, + item.discountPercent || 0, + discountAmount, + lineTotal, + item.notes, + ] + ); + + // Recalculate order totals + await this.recalculateOrderTotals(orderId); + + return result.rows[0]; + } + + async updateOrderItem(itemId: number, updates: { + quantity?: number; + discountPercent?: number; + notes?: string; + }): Promise { + // Get current item + const currentResult = await this.pool.query( + `SELECT order_id, unit_price FROM order_items WHERE id = $1`, + [itemId] + ); + + if (!currentResult.rows[0]) return null; + + const { order_id: orderId, unit_price: unitPrice } = currentResult.rows[0]; + const quantity = updates.quantity ?? 0; + const discountPercent = updates.discountPercent ?? 0; + const discountAmount = (quantity * unitPrice * discountPercent) / 100; + const lineTotal = quantity * unitPrice - discountAmount; + + const result = await this.pool.query( + `UPDATE order_items SET + quantity = COALESCE($2, quantity), + discount_percent = COALESCE($3, discount_percent), + discount_amount = $4, + line_total = $5, + notes = COALESCE($6, notes) + WHERE id = $1 + RETURNING + id, order_id AS "orderId", sku, name, quantity, + unit_price AS "unitPrice", + discount_percent AS "discountPercent", + line_total AS "lineTotal"`, + [itemId, updates.quantity, updates.discountPercent, discountAmount, lineTotal, updates.notes] + ); + + // Recalculate order totals + await this.recalculateOrderTotals(orderId); + + return result.rows[0] || null; + } + + async removeOrderItem(itemId: number): Promise { + // Get order ID first + const orderResult = await this.pool.query( + `SELECT order_id FROM order_items WHERE id = $1`, + [itemId] + ); + + if (!orderResult.rows[0]) return false; + + const orderId = orderResult.rows[0].order_id; + + const result = await this.pool.query( + `DELETE FROM order_items WHERE id = $1`, + [itemId] + ); + + // Recalculate order totals + await this.recalculateOrderTotals(orderId); + + return (result.rowCount ?? 0) > 0; + } + + // ============================================================================ + // ORDER FROM CART + // ============================================================================ + + async createOrderFromCart(cartId: number, options: { + shippingAddress?: any; + buyerNotes?: string; + poNumber?: string; + createdBy?: number; + } = {}): Promise { + // Get cart details + const cartResult = await this.pool.query( + `SELECT + bc.id, bc.buyer_business_id, bc.seller_brand_business_id, bc.state, + json_agg(json_build_object( + 'catalogItemId', ci.catalog_item_id, + 'quantity', ci.quantity, + 'unitPrice', ci.unit_price, + 'notes', ci.notes, + 'name', bci.name, + 'sku', bci.sku, + 'category', bci.category + )) AS items + FROM buyer_carts bc + JOIN cart_items ci ON bc.id = ci.cart_id + JOIN brand_catalog_items bci ON ci.catalog_item_id = bci.id + WHERE bc.id = $1 AND bc.status = 'active' + GROUP BY bc.id`, + [cartId] + ); + + if (!cartResult.rows[0]) { + throw new Error(`Cart ${cartId} not found or not active`); + } + + const cart = cartResult.rows[0]; + + // Create order + const order = await this.createOrder({ + buyerBusinessId: cart.buyer_business_id, + sellerBrandBusinessId: cart.seller_brand_business_id, + state: cart.state, + shippingAddress: options.shippingAddress, + buyerNotes: options.buyerNotes, + poNumber: options.poNumber, + createdBy: options.createdBy, + }); + + // Add items + for (const item of cart.items) { + await this.addOrderItem(order.id, { + catalogItemId: item.catalogItemId, + sku: item.sku, + name: item.name, + category: item.category, + quantity: item.quantity, + unitPrice: item.unitPrice, + notes: item.notes, + }); + } + + // Mark cart as converted + await this.pool.query( + `UPDATE buyer_carts SET status = 'converted', converted_to_order_id = $2, updated_at = NOW() WHERE id = $1`, + [cartId, order.id] + ); + + return this.getOrder(order.id) as Promise; + } + + // ============================================================================ + // STATUS TRANSITIONS + // ============================================================================ + + async transitionStatus( + orderId: number, + newStatus: OrderStatus, + options: { changedBy?: number; reason?: string; metadata?: Record } = {} + ): Promise { + const order = await this.getOrder(orderId); + if (!order) return null; + + const currentStatus = order.status as OrderStatus; + + // Validate transition + if (!VALID_TRANSITIONS[currentStatus]?.includes(newStatus)) { + throw new Error(`Invalid status transition: ${currentStatus} -> ${newStatus}`); + } + + // Determine timestamp column to update + const timestampColumn = `${newStatus}_at`; + const validTimestampColumns = [ + 'submitted_at', 'accepted_at', 'rejected_at', 'processing_at', + 'packed_at', 'shipped_at', 'delivered_at', 'cancelled_at' + ]; + + let updateQuery = `UPDATE orders SET status = $2, updated_at = NOW()`; + const values: any[] = [orderId, newStatus]; + + if (validTimestampColumns.includes(timestampColumn)) { + updateQuery += `, ${timestampColumn} = NOW()`; + } + + updateQuery += ` WHERE id = $1`; + + await this.pool.query(updateQuery, values); + + // Record status history + await this.pool.query( + `INSERT INTO order_status_history (order_id, from_status, to_status, changed_by, reason, metadata) + VALUES ($1, $2, $3, $4, $5, $6)`, + [orderId, currentStatus, newStatus, options.changedBy, options.reason, JSON.stringify(options.metadata || {})] + ); + + return this.getOrder(orderId); + } + + async submitOrder(orderId: number, userId?: number): Promise { + return this.transitionStatus(orderId, 'submitted', { changedBy: userId }); + } + + async acceptOrder(orderId: number, userId?: number, sellerNotes?: string): Promise { + if (sellerNotes) { + await this.pool.query( + `UPDATE orders SET seller_notes = $2 WHERE id = $1`, + [orderId, sellerNotes] + ); + } + return this.transitionStatus(orderId, 'accepted', { changedBy: userId }); + } + + async rejectOrder(orderId: number, userId?: number, reason?: string): Promise { + return this.transitionStatus(orderId, 'rejected', { changedBy: userId, reason }); + } + + async cancelOrder(orderId: number, userId?: number, reason?: string): Promise { + return this.transitionStatus(orderId, 'cancelled', { changedBy: userId, reason }); + } + + async startProcessing(orderId: number, userId?: number): Promise { + return this.transitionStatus(orderId, 'processing', { changedBy: userId }); + } + + async markPacked(orderId: number, userId?: number): Promise { + return this.transitionStatus(orderId, 'packed', { changedBy: userId }); + } + + async markShipped( + orderId: number, + trackingInfo: { trackingNumber: string; carrier: string; estimatedDeliveryDate?: Date }, + userId?: number + ): Promise { + await this.pool.query( + `UPDATE orders SET + tracking_number = $2, + carrier = $3, + estimated_delivery_date = $4 + WHERE id = $1`, + [orderId, trackingInfo.trackingNumber, trackingInfo.carrier, trackingInfo.estimatedDeliveryDate] + ); + return this.transitionStatus(orderId, 'shipped', { changedBy: userId, metadata: trackingInfo }); + } + + async markDelivered(orderId: number, userId?: number): Promise { + return this.transitionStatus(orderId, 'delivered', { changedBy: userId }); + } + + // ============================================================================ + // STATUS HISTORY + // ============================================================================ + + async getStatusHistory(orderId: number): Promise { + const result = await this.pool.query( + `SELECT + osh.id, + osh.order_id AS "orderId", + osh.from_status AS "fromStatus", + osh.to_status AS "toStatus", + osh.changed_by AS "changedBy", + u.name AS "changedByName", + osh.reason, + osh.metadata, + osh.created_at AS "createdAt" + FROM order_status_history osh + LEFT JOIN users u ON osh.changed_by = u.id + WHERE osh.order_id = $1 + ORDER BY osh.created_at`, + [orderId] + ); + return result.rows; + } + + // ============================================================================ + // DOCUMENTS + // ============================================================================ + + async getOrderDocuments(orderId: number): Promise { + const result = await this.pool.query( + `SELECT + id, + order_id AS "orderId", + document_type AS "documentType", + filename, + file_url AS "fileUrl", + file_size AS "fileSize", + mime_type AS "mimeType", + uploaded_by AS "uploadedBy", + created_at AS "createdAt" + FROM order_documents + WHERE order_id = $1 + ORDER BY created_at`, + [orderId] + ); + return result.rows; + } + + async addDocument(orderId: number, document: { + documentType: 'po' | 'invoice' | 'manifest' | 'packing_slip' | 'other'; + filename: string; + fileUrl: string; + fileSize?: number; + mimeType?: string; + uploadedBy?: number; + }): Promise { + const result = await this.pool.query( + `INSERT INTO order_documents (order_id, document_type, filename, file_url, file_size, mime_type, uploaded_by) + VALUES ($1, $2, $3, $4, $5, $6, $7) + RETURNING + id, + order_id AS "orderId", + document_type AS "documentType", + filename, + file_url AS "fileUrl", + file_size AS "fileSize", + mime_type AS "mimeType", + uploaded_by AS "uploadedBy", + created_at AS "createdAt"`, + [orderId, document.documentType, document.filename, document.fileUrl, document.fileSize, document.mimeType, document.uploadedBy] + ); + return result.rows[0]; + } + + async deleteDocument(documentId: number): Promise { + const result = await this.pool.query( + `DELETE FROM order_documents WHERE id = $1`, + [documentId] + ); + return (result.rowCount ?? 0) > 0; + } + + // ============================================================================ + // HELPERS + // ============================================================================ + + private async generateOrderNumber(): Promise { + const result = await this.pool.query(`SELECT generate_order_number() AS order_number`); + return result.rows[0]?.order_number || `ORD-${Date.now()}`; + } + + private async recalculateOrderTotals(orderId: number): Promise { + await this.pool.query( + `UPDATE orders SET + subtotal = (SELECT COALESCE(SUM(line_total), 0) FROM order_items WHERE order_id = $1), + total = (SELECT COALESCE(SUM(line_total), 0) FROM order_items WHERE order_id = $1) + COALESCE(tax_amount, 0) - COALESCE(discount_amount, 0) + COALESCE(shipping_cost, 0), + updated_at = NOW() + WHERE id = $1`, + [orderId] + ); + } + + // ============================================================================ + // FULFILLMENT + // ============================================================================ + + async updateItemFulfillment(itemId: number, quantityFulfilled: number): Promise { + const result = await this.pool.query( + `UPDATE order_items SET + quantity_fulfilled = $2, + fulfillment_status = CASE + WHEN $2 = 0 THEN 'pending' + WHEN $2 >= quantity THEN 'complete' + ELSE 'partial' + END + WHERE id = $1 + RETURNING + id, order_id AS "orderId", quantity, quantity_fulfilled AS "quantityFulfilled", + fulfillment_status AS "fulfillmentStatus"`, + [itemId, quantityFulfilled] + ); + return result.rows[0] || null; + } +} diff --git a/backend/src/portals/services/pricing.ts b/backend/src/portals/services/pricing.ts new file mode 100644 index 00000000..c8010d86 --- /dev/null +++ b/backend/src/portals/services/pricing.ts @@ -0,0 +1,826 @@ +/** + * Pricing Automation Service + * Phase 7: Pricing rules, suggestions, competitive analysis, automated adjustments + */ + +import { Pool } from 'pg'; +import { + PricingRule, + PricingSuggestion, + PricingHistory, + PricingQueryOptions, + PricingConditions, + PricingActions, +} from '../types'; + +export class PricingAutomationService { + constructor(private pool: Pool) {} + + // ============================================================================ + // PRICING RULES + // ============================================================================ + + async getRules(options: PricingQueryOptions = {}): Promise<{ rules: PricingRule[]; total: number }> { + const { limit = 50, offset = 0, brandBusinessId, ruleType, state, category } = options; + + const conditions: string[] = []; + const values: any[] = []; + let paramIndex = 1; + + if (brandBusinessId) { + conditions.push(`brand_business_id = $${paramIndex++}`); + values.push(brandBusinessId); + } + + if (ruleType) { + conditions.push(`rule_type = $${paramIndex++}`); + values.push(ruleType); + } + + if (state) { + conditions.push(`(state = $${paramIndex} OR state IS NULL)`); + values.push(state); + paramIndex++; + } + + if (category) { + conditions.push(`(category = $${paramIndex} OR category IS NULL)`); + values.push(category); + paramIndex++; + } + + const whereClause = conditions.length > 0 ? `WHERE ${conditions.join(' AND ')}` : ''; + + const [rulesResult, countResult] = await Promise.all([ + this.pool.query( + `SELECT + id, + brand_business_id AS "brandBusinessId", + name, + description, + state, + category, + catalog_item_id AS "catalogItemId", + rule_type AS "ruleType", + conditions, + actions, + min_price AS "minPrice", + max_price AS "maxPrice", + max_adjustment_percent AS "maxAdjustmentPercent", + priority, + is_enabled AS "isEnabled", + requires_approval AS "requiresApproval", + cooldown_hours AS "cooldownHours", + last_triggered_at AS "lastTriggeredAt", + created_by AS "createdBy", + created_at AS "createdAt", + updated_at AS "updatedAt" + FROM pricing_rules + ${whereClause} + ORDER BY priority DESC, created_at DESC + LIMIT $${paramIndex} OFFSET $${paramIndex + 1}`, + [...values, limit, offset] + ), + this.pool.query( + `SELECT COUNT(*) AS total FROM pricing_rules ${whereClause}`, + values + ), + ]); + + return { + rules: rulesResult.rows, + total: parseInt(countResult.rows[0]?.total || '0'), + }; + } + + async getRule(ruleId: number): Promise { + const result = await this.pool.query( + `SELECT + id, + brand_business_id AS "brandBusinessId", + name, + description, + state, + category, + catalog_item_id AS "catalogItemId", + rule_type AS "ruleType", + conditions, + actions, + min_price AS "minPrice", + max_price AS "maxPrice", + max_adjustment_percent AS "maxAdjustmentPercent", + priority, + is_enabled AS "isEnabled", + requires_approval AS "requiresApproval", + cooldown_hours AS "cooldownHours", + last_triggered_at AS "lastTriggeredAt", + created_by AS "createdBy", + created_at AS "createdAt", + updated_at AS "updatedAt" + FROM pricing_rules + WHERE id = $1`, + [ruleId] + ); + return result.rows[0] || null; + } + + async createRule(rule: Omit): Promise { + const result = await this.pool.query( + `INSERT INTO pricing_rules ( + brand_business_id, name, description, state, category, catalog_item_id, + rule_type, conditions, actions, min_price, max_price, max_adjustment_percent, + priority, is_enabled, requires_approval, cooldown_hours, created_by + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17) + RETURNING + id, + brand_business_id AS "brandBusinessId", + name, description, state, category, + catalog_item_id AS "catalogItemId", + rule_type AS "ruleType", + conditions, actions, + min_price AS "minPrice", + max_price AS "maxPrice", + max_adjustment_percent AS "maxAdjustmentPercent", + priority, + is_enabled AS "isEnabled", + requires_approval AS "requiresApproval", + cooldown_hours AS "cooldownHours", + created_by AS "createdBy", + created_at AS "createdAt"`, + [ + rule.brandBusinessId, + rule.name, + rule.description, + rule.state, + rule.category, + rule.catalogItemId, + rule.ruleType, + JSON.stringify(rule.conditions), + JSON.stringify(rule.actions), + rule.minPrice, + rule.maxPrice, + rule.maxAdjustmentPercent ?? 15, + rule.priority ?? 0, + rule.isEnabled ?? true, + rule.requiresApproval ?? false, + rule.cooldownHours ?? 24, + rule.createdBy, + ] + ); + return result.rows[0]; + } + + async updateRule(ruleId: number, updates: Partial): Promise { + const setClauses: string[] = []; + const values: any[] = []; + let paramIndex = 1; + + const fieldMap: Record = { + name: 'name', + description: 'description', + state: 'state', + category: 'category', + catalogItemId: 'catalog_item_id', + conditions: 'conditions', + actions: 'actions', + minPrice: 'min_price', + maxPrice: 'max_price', + maxAdjustmentPercent: 'max_adjustment_percent', + priority: 'priority', + isEnabled: 'is_enabled', + requiresApproval: 'requires_approval', + cooldownHours: 'cooldown_hours', + }; + + for (const [key, dbField] of Object.entries(fieldMap)) { + if ((updates as any)[key] !== undefined) { + setClauses.push(`${dbField} = $${paramIndex++}`); + let value = (updates as any)[key]; + if (key === 'conditions' || key === 'actions') { + value = JSON.stringify(value); + } + values.push(value); + } + } + + if (setClauses.length === 0) { + return this.getRule(ruleId); + } + + setClauses.push('updated_at = NOW()'); + values.push(ruleId); + + await this.pool.query( + `UPDATE pricing_rules SET ${setClauses.join(', ')} WHERE id = $${paramIndex}`, + values + ); + + return this.getRule(ruleId); + } + + async deleteRule(ruleId: number): Promise { + const result = await this.pool.query( + `DELETE FROM pricing_rules WHERE id = $1`, + [ruleId] + ); + return (result.rowCount ?? 0) > 0; + } + + async toggleRule(ruleId: number, enabled: boolean): Promise { + const result = await this.pool.query( + `UPDATE pricing_rules SET is_enabled = $2, updated_at = NOW() WHERE id = $1`, + [ruleId, enabled] + ); + return (result.rowCount ?? 0) > 0; + } + + // ============================================================================ + // PRICING SUGGESTIONS + // ============================================================================ + + async getSuggestions(options: PricingQueryOptions = {}): Promise<{ suggestions: PricingSuggestion[]; total: number }> { + const { limit = 50, offset = 0, brandBusinessId, suggestionStatus, state, category } = options; + + const conditions: string[] = []; + const values: any[] = []; + let paramIndex = 1; + + if (brandBusinessId) { + conditions.push(`ps.brand_business_id = $${paramIndex++}`); + values.push(brandBusinessId); + } + + if (suggestionStatus && suggestionStatus !== 'all') { + conditions.push(`ps.status = $${paramIndex++}`); + values.push(suggestionStatus); + } + + if (state) { + conditions.push(`ps.state = $${paramIndex++}`); + values.push(state); + } + + if (category) { + conditions.push(`bci.category = $${paramIndex++}`); + values.push(category); + } + + // Filter out expired + conditions.push(`(ps.expires_at IS NULL OR ps.expires_at > NOW())`); + + const whereClause = conditions.length > 0 ? `WHERE ${conditions.join(' AND ')}` : ''; + + const [suggestionsResult, countResult] = await Promise.all([ + this.pool.query( + `SELECT + ps.id, + ps.catalog_item_id AS "catalogItemId", + bci.sku, + bci.name AS "productName", + bci.category, + ps.brand_business_id AS "brandBusinessId", + ps.state, + ps.current_price AS "currentPrice", + ps.suggested_price AS "suggestedPrice", + ps.price_change_amount AS "priceChangeAmount", + ps.price_change_percent AS "priceChangePercent", + ps.suggestion_type AS "suggestionType", + ps.rationale, + ps.supporting_data AS "supportingData", + ps.projected_revenue_impact AS "projectedRevenueImpact", + ps.projected_margin_impact AS "projectedMarginImpact", + ps.confidence_score AS "confidenceScore", + ps.triggered_by_rule_id AS "triggeredByRuleId", + ps.status, + ps.decision_at AS "decisionAt", + ps.decision_by AS "decisionBy", + ps.decision_notes AS "decisionNotes", + ps.expires_at AS "expiresAt", + ps.created_at AS "createdAt" + FROM pricing_suggestions ps + JOIN brand_catalog_items bci ON ps.catalog_item_id = bci.id + ${whereClause} + ORDER BY ps.created_at DESC + LIMIT $${paramIndex} OFFSET $${paramIndex + 1}`, + [...values, limit, offset] + ), + this.pool.query( + `SELECT COUNT(*) AS total + FROM pricing_suggestions ps + JOIN brand_catalog_items bci ON ps.catalog_item_id = bci.id + ${whereClause}`, + values + ), + ]); + + return { + suggestions: suggestionsResult.rows, + total: parseInt(countResult.rows[0]?.total || '0'), + }; + } + + async createSuggestion(suggestion: Omit): Promise { + const priceChangeAmount = suggestion.suggestedPrice - suggestion.currentPrice; + const priceChangePercent = (priceChangeAmount / suggestion.currentPrice) * 100; + + const result = await this.pool.query( + `INSERT INTO pricing_suggestions ( + catalog_item_id, brand_business_id, state, current_price, suggested_price, + price_change_amount, price_change_percent, suggestion_type, rationale, + supporting_data, projected_revenue_impact, projected_margin_impact, + confidence_score, triggered_by_rule_id, status, expires_at + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16) + RETURNING + id, + catalog_item_id AS "catalogItemId", + brand_business_id AS "brandBusinessId", + state, + current_price AS "currentPrice", + suggested_price AS "suggestedPrice", + price_change_amount AS "priceChangeAmount", + price_change_percent AS "priceChangePercent", + suggestion_type AS "suggestionType", + rationale, + supporting_data AS "supportingData", + status, + created_at AS "createdAt"`, + [ + suggestion.catalogItemId, + suggestion.brandBusinessId, + suggestion.state, + suggestion.currentPrice, + suggestion.suggestedPrice, + priceChangeAmount, + priceChangePercent, + suggestion.suggestionType, + suggestion.rationale, + JSON.stringify(suggestion.supportingData), + suggestion.projectedRevenueImpact, + suggestion.projectedMarginImpact, + suggestion.confidenceScore, + suggestion.triggeredByRuleId, + suggestion.status || 'pending', + suggestion.expiresAt, + ] + ); + + return result.rows[0]; + } + + async acceptSuggestion(suggestionId: number, userId: number, notes?: string): Promise { + // Get suggestion details + const suggestion = await this.pool.query( + `SELECT catalog_item_id, state, current_price, suggested_price + FROM pricing_suggestions WHERE id = $1`, + [suggestionId] + ); + + if (!suggestion.rows[0]) return null; + + const { catalog_item_id, state, current_price, suggested_price } = suggestion.rows[0]; + + // Update suggestion status + await this.pool.query( + `UPDATE pricing_suggestions SET + status = 'accepted', + decision_at = NOW(), + decision_by = $2, + decision_notes = $3 + WHERE id = $1`, + [suggestionId, userId, notes] + ); + + // Apply the price change (update catalog item) + await this.pool.query( + `UPDATE brand_catalog_items SET msrp = $2, updated_at = NOW() WHERE id = $1`, + [catalog_item_id, suggested_price] + ); + + // Record in pricing history + await this.pool.query( + `INSERT INTO pricing_history ( + catalog_item_id, state, field_changed, old_value, new_value, + change_percent, change_source, suggestion_id, changed_by + ) VALUES ($1, $2, 'msrp', $3, $4, $5, 'suggestion_accepted', $6, $7)`, + [ + catalog_item_id, + state, + current_price, + suggested_price, + ((suggested_price - current_price) / current_price) * 100, + suggestionId, + userId, + ] + ); + + // Return updated suggestion + const result = await this.pool.query( + `SELECT id, status, decision_at AS "decisionAt", decision_by AS "decisionBy" + FROM pricing_suggestions WHERE id = $1`, + [suggestionId] + ); + + return result.rows[0]; + } + + async rejectSuggestion(suggestionId: number, userId: number, notes?: string): Promise { + const result = await this.pool.query( + `UPDATE pricing_suggestions SET + status = 'rejected', + decision_at = NOW(), + decision_by = $2, + decision_notes = $3 + WHERE id = $1`, + [suggestionId, userId, notes] + ); + return (result.rowCount ?? 0) > 0; + } + + async expirePendingSuggestions(): Promise { + const result = await this.pool.query( + `UPDATE pricing_suggestions SET status = 'expired' + WHERE status = 'pending' AND expires_at IS NOT NULL AND expires_at < NOW()` + ); + return result.rowCount ?? 0; + } + + // ============================================================================ + // PRICING HISTORY + // ============================================================================ + + async getPricingHistory( + catalogItemId: number, + options: { limit?: number; state?: string } = {} + ): Promise { + const { limit = 50, state } = options; + + const conditions = ['catalog_item_id = $1']; + const values: any[] = [catalogItemId]; + + if (state) { + conditions.push('state = $2'); + values.push(state); + } + + const result = await this.pool.query( + `SELECT + ph.id, + ph.catalog_item_id AS "catalogItemId", + ph.state, + ph.field_changed AS "fieldChanged", + ph.old_value AS "oldValue", + ph.new_value AS "newValue", + ph.change_percent AS "changePercent", + ph.change_source AS "changeSource", + ph.suggestion_id AS "suggestionId", + ph.rule_id AS "ruleId", + ph.changed_by AS "changedBy", + u.name AS "changedByName", + ph.created_at AS "createdAt" + FROM pricing_history ph + LEFT JOIN users u ON ph.changed_by = u.id + WHERE ${conditions.join(' AND ')} + ORDER BY ph.created_at DESC + LIMIT $${values.length + 1}`, + [...values, limit] + ); + + return result.rows; + } + + async recordPriceChange( + catalogItemId: number, + state: string | null, + fieldChanged: 'msrp' | 'wholesale_price', + oldValue: number | null, + newValue: number, + changeSource: 'manual' | 'rule_auto' | 'suggestion_accepted' | 'sync', + options: { suggestionId?: number; ruleId?: number; changedBy?: number } = {} + ): Promise { + const changePercent = oldValue ? ((newValue - oldValue) / oldValue) * 100 : null; + + const result = await this.pool.query( + `INSERT INTO pricing_history ( + catalog_item_id, state, field_changed, old_value, new_value, + change_percent, change_source, suggestion_id, rule_id, changed_by + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10) + RETURNING + id, + catalog_item_id AS "catalogItemId", + state, + field_changed AS "fieldChanged", + old_value AS "oldValue", + new_value AS "newValue", + change_percent AS "changePercent", + change_source AS "changeSource", + created_at AS "createdAt"`, + [ + catalogItemId, + state, + fieldChanged, + oldValue, + newValue, + changePercent, + changeSource, + options.suggestionId, + options.ruleId, + options.changedBy, + ] + ); + + return result.rows[0]; + } + + // ============================================================================ + // COMPETITIVE PRICING ANALYSIS + // ============================================================================ + + async analyzeCompetitivePricing( + brandBusinessId: number, + state: string + ): Promise<{ + products: { + catalogItemId: number; + sku: string; + name: string; + ourPrice: number; + avgCompetitorPrice: number; + minCompetitorPrice: number; + maxCompetitorPrice: number; + priceDiff: number; + priceDiffPercent: number; + competitorCount: number; + }[]; + }> { + // Get brand ID + const brandResult = await this.pool.query( + `SELECT brand_id FROM brand_businesses WHERE id = $1`, + [brandBusinessId] + ); + const brandId = brandResult.rows[0]?.brand_id; + + if (!brandId) { + return { products: [] }; + } + + // Compare catalog items with store products from other brands + const result = await this.pool.query( + `WITH our_products AS ( + SELECT + bci.id AS catalog_item_id, + bci.sku, + bci.name, + bci.msrp AS our_price, + bci.category + FROM brand_catalog_items bci + WHERE bci.brand_id = $1 AND bci.is_active = TRUE + ), + competitor_prices AS ( + SELECT + sp.category, + sp.name, + AVG(sp.price_rec) AS avg_price, + MIN(sp.price_rec) AS min_price, + MAX(sp.price_rec) AS max_price, + COUNT(DISTINCT sp.brand_id) AS competitor_count + FROM store_products sp + JOIN dispensaries d ON sp.dispensary_id = d.id + WHERE d.state = $2 + AND sp.brand_id != $1 + AND sp.price_rec IS NOT NULL + AND sp.price_rec > 0 + GROUP BY sp.category, sp.name + ) + SELECT + op.catalog_item_id AS "catalogItemId", + op.sku, + op.name, + op.our_price AS "ourPrice", + COALESCE(cp.avg_price, 0) AS "avgCompetitorPrice", + COALESCE(cp.min_price, 0) AS "minCompetitorPrice", + COALESCE(cp.max_price, 0) AS "maxCompetitorPrice", + op.our_price - COALESCE(cp.avg_price, op.our_price) AS "priceDiff", + CASE WHEN cp.avg_price > 0 + THEN ((op.our_price - cp.avg_price) / cp.avg_price) * 100 + ELSE 0 + END AS "priceDiffPercent", + COALESCE(cp.competitor_count, 0)::int AS "competitorCount" + FROM our_products op + LEFT JOIN competitor_prices cp ON + op.category = cp.category AND + similarity(op.name, cp.name) > 0.4 + WHERE op.our_price IS NOT NULL + ORDER BY ABS(op.our_price - COALESCE(cp.avg_price, op.our_price)) DESC + LIMIT 100`, + [brandId, state] + ); + + return { products: result.rows }; + } + + // ============================================================================ + // RULE EXECUTION ENGINE + // ============================================================================ + + async evaluateRulesForProduct( + catalogItemId: number, + state: string + ): Promise { + // Get product details + const productResult = await this.pool.query( + `SELECT + bci.id, + bci.brand_id, + bb.id AS brand_business_id, + bci.sku, + bci.name, + bci.category, + bci.msrp, + bci.wholesale_price, + bci.cogs + FROM brand_catalog_items bci + JOIN brand_businesses bb ON bci.brand_id = bb.brand_id + WHERE bci.id = $1`, + [catalogItemId] + ); + + const product = productResult.rows[0]; + if (!product) return []; + + // Get applicable rules + const rulesResult = await this.pool.query( + `SELECT * + FROM pricing_rules + WHERE brand_business_id = $1 + AND is_enabled = TRUE + AND (catalog_item_id IS NULL OR catalog_item_id = $2) + AND (state IS NULL OR state = $3) + AND (category IS NULL OR category = $4) + AND (last_triggered_at IS NULL OR last_triggered_at < NOW() - INTERVAL '1 hour' * cooldown_hours) + ORDER BY priority DESC`, + [product.brand_business_id, catalogItemId, state, product.category] + ); + + const suggestions: PricingSuggestion[] = []; + + for (const rule of rulesResult.rows) { + const conditions = rule.conditions as PricingConditions; + const actions = rule.actions as PricingActions; + + // Evaluate conditions and calculate suggested price + let shouldTrigger = false; + let suggestedPrice = product.msrp; + let rationale = ''; + + // Competitive pricing rule + if (rule.rule_type === 'competitive') { + const competitorData = await this.getCompetitorPriceData(catalogItemId, state); + if (competitorData.avgPrice && conditions.competitorPriceBelow) { + if (product.msrp > competitorData.avgPrice + conditions.competitorPriceBelow) { + shouldTrigger = true; + suggestedPrice = competitorData.avgPrice; + rationale = `Your price ($${product.msrp}) is $${(product.msrp - competitorData.avgPrice).toFixed(2)} above market average ($${competitorData.avgPrice.toFixed(2)})`; + } + } + } + + // Floor pricing rule + if (rule.rule_type === 'floor') { + if (product.msrp < (rule.min_price || 0)) { + shouldTrigger = true; + suggestedPrice = rule.min_price; + rationale = `Price is below minimum floor of $${rule.min_price}`; + } + } + + // Ceiling pricing rule + if (rule.rule_type === 'ceiling') { + if (product.msrp > (rule.max_price || Infinity)) { + shouldTrigger = true; + suggestedPrice = rule.max_price; + rationale = `Price exceeds maximum ceiling of $${rule.max_price}`; + } + } + + // Margin rule + if (rule.rule_type === 'margin' && product.cogs) { + const currentMargin = ((product.msrp - product.cogs) / product.msrp) * 100; + if (conditions.marginBelow && currentMargin < conditions.marginBelow) { + shouldTrigger = true; + // Calculate price needed for target margin + const targetMargin = conditions.marginBelow; + suggestedPrice = product.cogs / (1 - targetMargin / 100); + rationale = `Current margin (${currentMargin.toFixed(1)}%) is below target (${targetMargin}%)`; + } + } + + if (shouldTrigger) { + // Apply adjustment constraints + const maxAdjustment = (product.msrp * rule.max_adjustment_percent) / 100; + const priceDiff = suggestedPrice - product.msrp; + + if (Math.abs(priceDiff) > maxAdjustment) { + suggestedPrice = product.msrp + (priceDiff > 0 ? maxAdjustment : -maxAdjustment); + } + + // Respect floor/ceiling constraints + if (rule.min_price && suggestedPrice < rule.min_price) { + suggestedPrice = rule.min_price; + } + if (rule.max_price && suggestedPrice > rule.max_price) { + suggestedPrice = rule.max_price; + } + + // Create suggestion + const suggestion = await this.createSuggestion({ + catalogItemId, + brandBusinessId: product.brand_business_id, + state, + currentPrice: product.msrp, + suggestedPrice, + priceChangeAmount: null, + priceChangePercent: null, + suggestionType: rule.rule_type, + rationale, + supportingData: { ruleId: rule.id, ruleName: rule.name }, + projectedRevenueImpact: null, + projectedMarginImpact: null, + confidenceScore: 75, + triggeredByRuleId: rule.id, + status: rule.requires_approval ? 'pending' : 'auto_applied', + decisionAt: null, + decisionBy: null, + decisionNotes: null, + expiresAt: new Date(Date.now() + 7 * 24 * 60 * 60 * 1000), + }); + + suggestions.push(suggestion); + + // Update rule's last triggered timestamp + await this.pool.query( + `UPDATE pricing_rules SET last_triggered_at = NOW() WHERE id = $1`, + [rule.id] + ); + + // Auto-apply if rule doesn't require approval + if (!rule.requires_approval) { + await this.pool.query( + `UPDATE brand_catalog_items SET msrp = $2, updated_at = NOW() WHERE id = $1`, + [catalogItemId, suggestedPrice] + ); + + await this.recordPriceChange( + catalogItemId, + state, + 'msrp', + product.msrp, + suggestedPrice, + 'rule_auto', + { ruleId: rule.id, suggestionId: suggestion.id } + ); + } + } + } + + return suggestions; + } + + private async getCompetitorPriceData( + catalogItemId: number, + state: string + ): Promise<{ avgPrice: number | null; minPrice: number | null; maxPrice: number | null }> { + // Get product name and category + const productResult = await this.pool.query( + `SELECT name, category FROM brand_catalog_items WHERE id = $1`, + [catalogItemId] + ); + + if (!productResult.rows[0]) { + return { avgPrice: null, minPrice: null, maxPrice: null }; + } + + const { name, category } = productResult.rows[0]; + + // Find similar products from competitors + const result = await this.pool.query( + `SELECT + AVG(sp.price_rec) AS avg_price, + MIN(sp.price_rec) AS min_price, + MAX(sp.price_rec) AS max_price + FROM store_products sp + JOIN dispensaries d ON sp.dispensary_id = d.id + WHERE d.state = $1 + AND sp.category = $2 + AND similarity(sp.name, $3) > 0.4 + AND sp.price_rec IS NOT NULL + AND sp.price_rec > 0`, + [state, category, name] + ); + + return { + avgPrice: result.rows[0]?.avg_price ? parseFloat(result.rows[0].avg_price) : null, + minPrice: result.rows[0]?.min_price ? parseFloat(result.rows[0].min_price) : null, + maxPrice: result.rows[0]?.max_price ? parseFloat(result.rows[0].max_price) : null, + }; + } +} diff --git a/backend/src/portals/types.ts b/backend/src/portals/types.ts new file mode 100644 index 00000000..353018da --- /dev/null +++ b/backend/src/portals/types.ts @@ -0,0 +1,778 @@ +/** + * Portal Module Types + * Phase 6: Brand Portal + Buyer Portal + Intelligence Engine + * Phase 7: Orders + Inventory + Pricing Automation + */ + +// ============================================================================ +// ROLES & PERMISSIONS +// ============================================================================ + +export interface Role { + id: number; + name: string; + displayName: string; + description: string | null; + isSystemRole: boolean; + createdAt: Date; +} + +export interface Permission { + id: number; + name: string; + displayName: string; + description: string | null; + category: string; + createdAt: Date; +} + +export interface UserRole { + userId: number; + roleId: number; + assignedAt: Date; + assignedBy: number | null; +} + +export interface UserWithRoles { + id: number; + email: string; + name: string; + roles: Role[]; + permissions: string[]; +} + +// ============================================================================ +// BUSINESSES +// ============================================================================ + +export interface BrandBusiness { + id: number; + brandId: number; + brandName: string; + companyName: string; + contactEmail: string | null; + contactPhone: string | null; + billingAddress: BillingAddress | null; + onboardedAt: Date | null; + subscriptionTier: 'free' | 'basic' | 'pro' | 'enterprise'; + subscriptionExpiresAt: Date | null; + settings: BrandSettings; + states: string[]; + isActive: boolean; + createdAt: Date; + updatedAt: Date; +} + +export interface BuyerBusiness { + id: number; + dispensaryId: number; + dispensaryName: string; + companyName: string; + contactEmail: string | null; + contactPhone: string | null; + billingAddress: BillingAddress | null; + shippingAddresses: ShippingAddress[]; + licenseNumber: string | null; + licenseExpiresAt: Date | null; + onboardedAt: Date | null; + subscriptionTier: 'free' | 'basic' | 'pro'; + settings: BuyerSettings; + states: string[]; + isActive: boolean; + createdAt: Date; + updatedAt: Date; +} + +export interface BillingAddress { + street: string; + city: string; + state: string; + zip: string; + country: string; +} + +export interface ShippingAddress extends BillingAddress { + name: string; + isDefault: boolean; +} + +export interface BrandSettings { + notificationPreferences: NotificationPreferences; + catalogVisibility: 'public' | 'buyers_only' | 'hidden'; + autoApproveOrders: boolean; + minOrderAmount: number | null; +} + +export interface BuyerSettings { + notificationPreferences: NotificationPreferences; + preferredCategories: string[]; + preferredBrands: number[]; +} + +export interface NotificationPreferences { + email: boolean; + inApp: boolean; + orderUpdates: boolean; + priceAlerts: boolean; + newProducts: boolean; + weeklyDigest: boolean; +} + +// ============================================================================ +// MESSAGING +// ============================================================================ + +export interface MessageThread { + id: number; + subject: string; + threadType: 'order' | 'support' | 'general' | 'rfq'; + orderId: number | null; + brandBusinessId: number | null; + buyerBusinessId: number | null; + status: 'open' | 'closed' | 'archived'; + lastMessageAt: Date | null; + unreadCount: number; + createdAt: Date; + updatedAt: Date; +} + +export interface Message { + id: number; + threadId: number; + senderId: number; + senderType: 'brand' | 'buyer' | 'system'; + content: string; + attachments: MessageAttachment[]; + isRead: boolean; + readAt: Date | null; + createdAt: Date; +} + +export interface MessageAttachment { + id: number; + filename: string; + fileUrl: string; + fileSize: number; + mimeType: string; +} + +export interface ThreadParticipant { + threadId: number; + userId: number; + role: 'owner' | 'participant' | 'viewer'; + lastReadAt: Date | null; + isSubscribed: boolean; +} + +// ============================================================================ +// NOTIFICATIONS +// ============================================================================ + +export interface NotificationType { + id: number; + name: string; + displayName: string; + description: string | null; + category: 'order' | 'inventory' | 'pricing' | 'message' | 'system' | 'intelligence'; + defaultEnabled: boolean; + template: NotificationTemplate; +} + +export interface NotificationTemplate { + title: string; + body: string; + emailSubject?: string; + emailBody?: string; +} + +export interface Notification { + id: number; + userId: number; + notificationTypeId: number; + title: string; + body: string; + data: Record; + priority: 'low' | 'normal' | 'high' | 'urgent'; + isRead: boolean; + readAt: Date | null; + actionUrl: string | null; + expiresAt: Date | null; + createdAt: Date; +} + +export interface UserNotificationPreference { + userId: number; + notificationTypeId: number; + emailEnabled: boolean; + inAppEnabled: boolean; + pushEnabled: boolean; +} + +// ============================================================================ +// INTELLIGENCE ENGINE +// ============================================================================ + +export interface IntelligenceAlert { + id: number; + brandBusinessId: number | null; + buyerBusinessId: number | null; + alertType: 'price_drop' | 'price_increase' | 'new_competitor' | 'stock_low' | 'stock_out' | 'trend_change' | 'market_opportunity'; + severity: 'info' | 'warning' | 'critical'; + title: string; + description: string; + data: AlertData; + state: string | null; + category: string | null; + productId: number | null; + brandId: number | null; + isActionable: boolean; + suggestedAction: string | null; + status: 'new' | 'acknowledged' | 'resolved' | 'dismissed'; + acknowledgedAt: Date | null; + acknowledgedBy: number | null; + resolvedAt: Date | null; + expiresAt: Date | null; + createdAt: Date; +} + +export interface AlertData { + previousValue?: number; + currentValue?: number; + changePercent?: number; + affectedProducts?: number[]; + affectedStores?: number[]; + competitorBrandId?: number; + threshold?: number; + [key: string]: any; +} + +export interface IntelligenceRecommendation { + id: number; + brandBusinessId: number | null; + buyerBusinessId: number | null; + recommendationType: 'pricing' | 'inventory' | 'expansion' | 'product_mix' | 'promotion'; + title: string; + description: string; + rationale: string; + data: RecommendationData; + priority: number; + potentialImpact: PotentialImpact; + status: 'pending' | 'accepted' | 'rejected' | 'implemented'; + acceptedAt: Date | null; + acceptedBy: number | null; + implementedAt: Date | null; + expiresAt: Date | null; + createdAt: Date; +} + +export interface RecommendationData { + targetProducts?: number[]; + targetStates?: string[]; + suggestedPrice?: number; + suggestedQuantity?: number; + competitorData?: Record; + [key: string]: any; +} + +export interface PotentialImpact { + revenueChange?: number; + marginChange?: number; + marketShareChange?: number; + confidenceScore: number; +} + +export interface IntelligenceSummary { + id: number; + brandBusinessId: number | null; + buyerBusinessId: number | null; + summaryType: 'daily' | 'weekly' | 'monthly'; + periodStart: Date; + periodEnd: Date; + highlights: SummaryHighlight[]; + metrics: SummaryMetrics; + trends: SummaryTrend[]; + topPerformers: TopPerformer[]; + areasOfConcern: AreaOfConcern[]; + generatedAt: Date; +} + +export interface SummaryHighlight { + type: string; + title: string; + value: string | number; + change?: number; + changeDirection?: 'up' | 'down' | 'stable'; +} + +export interface SummaryMetrics { + totalRevenue?: number; + totalOrders?: number; + avgOrderValue?: number; + productsSold?: number; + newCustomers?: number; + [key: string]: number | undefined; +} + +export interface SummaryTrend { + metric: string; + direction: 'up' | 'down' | 'stable'; + changePercent: number; + dataPoints: { date: string; value: number }[]; +} + +export interface TopPerformer { + type: 'product' | 'category' | 'store' | 'state'; + id: number | string; + name: string; + value: number; + rank: number; +} + +export interface AreaOfConcern { + type: string; + title: string; + description: string; + severity: 'low' | 'medium' | 'high'; + suggestedAction?: string; +} + +export interface IntelligenceRule { + id: number; + brandBusinessId: number | null; + buyerBusinessId: number | null; + ruleName: string; + ruleType: 'alert' | 'recommendation'; + conditions: RuleConditions; + actions: RuleActions; + isEnabled: boolean; + lastTriggeredAt: Date | null; + triggerCount: number; + createdBy: number | null; + createdAt: Date; + updatedAt: Date; +} + +export interface RuleConditions { + metric: string; + operator: 'gt' | 'lt' | 'eq' | 'gte' | 'lte' | 'change_gt' | 'change_lt'; + threshold: number; + state?: string; + category?: string; + productId?: number; + timeWindow?: string; +} + +export interface RuleActions { + alertType?: string; + alertSeverity?: string; + notifyUsers?: number[]; + autoResolve?: boolean; + [key: string]: any; +} + +// ============================================================================ +// BRAND CATALOG +// ============================================================================ + +export interface BrandCatalogItem { + id: number; + brandId: number; + sku: string; + name: string; + description: string | null; + category: string; + subcategory: string | null; + thcContent: number | null; + cbdContent: number | null; + terpeneProfile: Record | null; + strainType: 'indica' | 'sativa' | 'hybrid' | null; + weight: number | null; + weightUnit: string | null; + imageUrl: string | null; + additionalImages: string[]; + msrp: number | null; + wholesalePrice: number | null; + cogs: number | null; + isActive: boolean; + availableStates: string[]; + createdAt: Date; + updatedAt: Date; +} + +export interface BrandCatalogDistribution { + id: number; + catalogItemId: number; + state: string; + isAvailable: boolean; + customMsrp: number | null; + customWholesale: number | null; + minOrderQty: number; + leadTimeDays: number; + notes: string | null; + createdAt: Date; + updatedAt: Date; +} + +// ============================================================================ +// ORDERS +// ============================================================================ + +export type OrderStatus = + | 'draft' + | 'submitted' + | 'accepted' + | 'rejected' + | 'processing' + | 'packed' + | 'shipped' + | 'delivered' + | 'cancelled'; + +export interface Order { + id: number; + orderNumber: string; + buyerBusinessId: number; + sellerBrandBusinessId: number; + state: string; + shippingAddress: ShippingAddress | null; + subtotal: number; + taxAmount: number; + discountAmount: number; + shippingCost: number; + total: number; + currency: string; + status: OrderStatus; + submittedAt: Date | null; + acceptedAt: Date | null; + rejectedAt: Date | null; + processingAt: Date | null; + packedAt: Date | null; + shippedAt: Date | null; + deliveredAt: Date | null; + cancelledAt: Date | null; + trackingNumber: string | null; + carrier: string | null; + estimatedDeliveryDate: Date | null; + buyerNotes: string | null; + sellerNotes: string | null; + internalNotes: string | null; + poNumber: string | null; + manifestNumber: string | null; + metadata: Record; + createdBy: number | null; + createdAt: Date; + updatedAt: Date; +} + +export interface OrderItem { + id: number; + orderId: number; + catalogItemId: number | null; + storeProductId: number | null; + sku: string; + name: string; + category: string | null; + quantity: number; + unitPrice: number; + discountPercent: number; + discountAmount: number; + lineTotal: number; + quantityFulfilled: number; + fulfillmentStatus: 'pending' | 'partial' | 'complete' | 'cancelled'; + notes: string | null; + createdAt: Date; +} + +export interface OrderStatusHistory { + id: number; + orderId: number; + fromStatus: OrderStatus | null; + toStatus: OrderStatus; + changedBy: number | null; + reason: string | null; + metadata: Record; + createdAt: Date; +} + +export interface OrderDocument { + id: number; + orderId: number; + documentType: 'po' | 'invoice' | 'manifest' | 'packing_slip' | 'other'; + filename: string; + fileUrl: string; + fileSize: number | null; + mimeType: string | null; + uploadedBy: number | null; + createdAt: Date; +} + +// ============================================================================ +// INVENTORY +// ============================================================================ + +export interface BrandInventory { + id: number; + brandId: number; + catalogItemId: number; + state: string; + quantityOnHand: number; + quantityReserved: number; + quantityAvailable: number; + reorderPoint: number; + inventoryStatus: 'in_stock' | 'low' | 'oos' | 'preorder' | 'discontinued'; + availableDate: Date | null; + lastSyncSource: string | null; + lastSyncAt: Date | null; + createdAt: Date; + updatedAt: Date; +} + +export interface InventoryHistory { + id: number; + brandInventoryId: number; + changeType: 'adjustment' | 'order_reserve' | 'order_fulfill' | 'sync' | 'restock' | 'write_off'; + quantityChange: number; + quantityBefore: number; + quantityAfter: number; + orderId: number | null; + reason: string | null; + changedBy: number | null; + createdAt: Date; +} + +export interface InventorySyncLog { + id: number; + brandBusinessId: number; + syncSource: 'api' | 'webhook' | 'manual' | 'erp'; + status: 'pending' | 'processing' | 'completed' | 'failed'; + itemsSynced: number; + itemsFailed: number; + errorMessage: string | null; + requestPayload: Record | null; + responseSummary: Record | null; + startedAt: Date; + completedAt: Date | null; +} + +// ============================================================================ +// PRICING +// ============================================================================ + +export interface PricingRule { + id: number; + brandBusinessId: number; + name: string; + description: string | null; + state: string | null; + category: string | null; + catalogItemId: number | null; + ruleType: 'floor' | 'ceiling' | 'competitive' | 'margin' | 'velocity'; + conditions: PricingConditions; + actions: PricingActions; + minPrice: number | null; + maxPrice: number | null; + maxAdjustmentPercent: number; + priority: number; + isEnabled: boolean; + requiresApproval: boolean; + cooldownHours: number; + lastTriggeredAt: Date | null; + createdBy: number | null; + createdAt: Date; + updatedAt: Date; +} + +export interface PricingConditions { + competitorPriceBelow?: number; + competitorPriceAbove?: number; + velocityAbove?: number; + velocityBelow?: number; + marginBelow?: number; + marginAbove?: number; + daysInStock?: number; + [key: string]: number | undefined; +} + +export interface PricingActions { + adjustPriceByPercent?: number; + adjustPriceByAmount?: number; + setPrice?: number; + matchCompetitor?: boolean; + matchCompetitorOffset?: number; + [key: string]: number | boolean | undefined; +} + +export interface PricingSuggestion { + id: number; + catalogItemId: number; + brandBusinessId: number; + state: string; + currentPrice: number; + suggestedPrice: number; + priceChangeAmount: number | null; + priceChangePercent: number | null; + suggestionType: 'competitive' | 'margin' | 'velocity' | 'promotional'; + rationale: string | null; + supportingData: PricingSupportingData; + projectedRevenueImpact: number | null; + projectedMarginImpact: number | null; + confidenceScore: number | null; + triggeredByRuleId: number | null; + status: 'pending' | 'accepted' | 'rejected' | 'expired' | 'auto_applied'; + decisionAt: Date | null; + decisionBy: number | null; + decisionNotes: string | null; + expiresAt: Date | null; + createdAt: Date; +} + +export interface PricingSupportingData { + competitorPrices?: { brandId: number; brandName: string; price: number }[]; + velocityData?: { period: string; unitsSold: number }[]; + marginData?: { currentMargin: number; targetMargin: number }[]; + [key: string]: any; +} + +export interface PricingHistory { + id: number; + catalogItemId: number; + state: string | null; + fieldChanged: 'msrp' | 'wholesale_price'; + oldValue: number | null; + newValue: number | null; + changePercent: number | null; + changeSource: 'manual' | 'rule_auto' | 'suggestion_accepted' | 'sync'; + suggestionId: number | null; + ruleId: number | null; + changedBy: number | null; + createdAt: Date; +} + +// ============================================================================ +// BUYER CARTS +// ============================================================================ + +export interface BuyerCart { + id: number; + buyerBusinessId: number; + sellerBrandBusinessId: number; + state: string; + status: 'active' | 'abandoned' | 'converted'; + convertedToOrderId: number | null; + lastActivityAt: Date; + expiresAt: Date | null; + createdAt: Date; + updatedAt: Date; +} + +export interface CartItem { + id: number; + cartId: number; + catalogItemId: number; + quantity: number; + unitPrice: number; + notes: string | null; + createdAt: Date; + updatedAt: Date; +} + +// ============================================================================ +// DISCOVERY FEED +// ============================================================================ + +export interface DiscoveryFeedItem { + id: number; + itemType: 'new_brand' | 'new_sku' | 'trending' | 'recommendation' | 'expansion'; + state: string; + brandId: number | null; + catalogItemId: number | null; + category: string | null; + title: string; + description: string | null; + imageUrl: string | null; + data: Record; + targetBuyerBusinessIds: number[] | null; + targetCategories: string[] | null; + priority: number; + isFeatured: boolean; + ctaText: string | null; + ctaUrl: string | null; + isActive: boolean; + startsAt: Date; + expiresAt: Date | null; + createdAt: Date; +} + +// ============================================================================ +// QUERY OPTIONS +// ============================================================================ + +export interface PortalQueryOptions { + limit?: number; + offset?: number; + sortBy?: string; + sortDir?: 'asc' | 'desc'; + state?: string; + states?: string[]; + category?: string; + status?: string; + dateFrom?: Date; + dateTo?: Date; + search?: string; +} + +export interface OrderQueryOptions extends PortalQueryOptions { + buyerBusinessId?: number; + sellerBrandBusinessId?: number; + orderStatus?: OrderStatus | OrderStatus[]; +} + +export interface InventoryQueryOptions extends PortalQueryOptions { + brandId?: number; + inventoryStatus?: string; + lowStockOnly?: boolean; +} + +export interface PricingQueryOptions extends PortalQueryOptions { + brandBusinessId?: number; + suggestionStatus?: string; + ruleType?: string; +} + +// ============================================================================ +// DASHBOARD METRICS +// ============================================================================ + +export interface BrandDashboardMetrics { + totalProducts: number; + activeProducts: number; + totalOrders: number; + pendingOrders: number; + totalRevenue: number; + revenueThisMonth: number; + storePresence: number; + statesCovered: number; + lowStockAlerts: number; + pendingPriceSuggestions: number; + unreadMessages: number; + activeAlerts: number; + topProducts: TopPerformer[]; + revenueByState: { state: string; revenue: number }[]; + orderTrend: { date: string; orders: number; revenue: number }[]; +} + +export interface BuyerDashboardMetrics { + totalOrders: number; + pendingOrders: number; + totalSpent: number; + spentThisMonth: number; + savedAmount: number; + brandsFollowed: number; + cartItems: number; + cartValue: number; + unreadMessages: number; + newDiscoveryItems: number; + recentOrders: Order[]; + recommendedProducts: BrandCatalogItem[]; + pricingAlerts: IntelligenceAlert[]; +} diff --git a/backend/src/routes/admin.ts b/backend/src/routes/admin.ts new file mode 100644 index 00000000..e81823c4 --- /dev/null +++ b/backend/src/routes/admin.ts @@ -0,0 +1,53 @@ +/** + * Admin Routes + * + * Top-level admin/operator actions (crawl triggers, health checks, etc.) + * + * Route semantics: + * /api/admin/... = Admin/operator actions + * /api/az/... = Arizona data slice (stores, products, metrics) + */ + +import { Router, Request, Response } from 'express'; +import { getDispensaryById, crawlSingleDispensary } from '../dutchie-az'; + +const router = Router(); + +// ============================================================ +// CRAWL TRIGGER +// ============================================================ + +/** + * POST /api/admin/crawl/:dispensaryId + * + * Trigger a crawl for a specific dispensary. + * This is the CANONICAL endpoint for triggering crawls. + * + * Request body (optional): + * - pricingType: 'rec' | 'med' (default: 'rec') + * - useBothModes: boolean (default: true) + * + * Response: + * - On success: crawl result with product counts + * - On 404: dispensary not found + * - On 500: crawl error + */ +router.post('/crawl/:dispensaryId', async (req: Request, res: Response) => { + try { + const { dispensaryId } = req.params; + const { pricingType = 'rec', useBothModes = true } = req.body; + + // Fetch the dispensary first + const dispensary = await getDispensaryById(parseInt(dispensaryId, 10)); + if (!dispensary) { + return res.status(404).json({ error: 'Dispensary not found' }); + } + + const result = await crawlSingleDispensary(dispensary, pricingType, { useBothModes }); + res.json(result); + } catch (error: any) { + res.status(500).json({ error: error.message }); + } +}); + +export default router; diff --git a/backend/src/routes/analytics-v2.ts b/backend/src/routes/analytics-v2.ts new file mode 100644 index 00000000..a808e989 --- /dev/null +++ b/backend/src/routes/analytics-v2.ts @@ -0,0 +1,587 @@ +/** + * Analytics V2 API Routes + * + * Enhanced analytics endpoints using the canonical schema with + * rec/med state segmentation and comprehensive market analysis. + * + * Routes are prefixed with /api/analytics/v2 + * + * Phase 3: Analytics Engine + Rec/Med by State + */ + +import { Router, Request, Response } from 'express'; +import { Pool } from 'pg'; +import { PriceAnalyticsService } from '../services/analytics/PriceAnalyticsService'; +import { BrandPenetrationService } from '../services/analytics/BrandPenetrationService'; +import { CategoryAnalyticsService } from '../services/analytics/CategoryAnalyticsService'; +import { StoreAnalyticsService } from '../services/analytics/StoreAnalyticsService'; +import { StateAnalyticsService } from '../services/analytics/StateAnalyticsService'; +import { TimeWindow, LegalType } from '../services/analytics/types'; + +function parseTimeWindow(window?: string): TimeWindow { + if (window === '7d' || window === '30d' || window === '90d' || window === 'custom') { + return window; + } + return '30d'; +} + +function parseLegalType(legalType?: string): LegalType { + if (legalType === 'recreational' || legalType === 'medical_only' || legalType === 'no_program') { + return legalType; + } + return 'all'; +} + +export function createAnalyticsV2Router(pool: Pool): Router { + const router = Router(); + + // Initialize services + const priceService = new PriceAnalyticsService(pool); + const brandService = new BrandPenetrationService(pool); + const categoryService = new CategoryAnalyticsService(pool); + const storeService = new StoreAnalyticsService(pool); + const stateService = new StateAnalyticsService(pool); + + // ============================================================ + // PRICE ANALYTICS + // ============================================================ + + /** + * GET /price/product/:id + * Get price trends for a specific store product + */ + router.get('/price/product/:id', async (req: Request, res: Response) => { + try { + const storeProductId = parseInt(req.params.id); + const window = parseTimeWindow(req.query.window as string); + + const result = await priceService.getPriceTrendsForStoreProduct(storeProductId, { window }); + if (!result) { + return res.status(404).json({ error: 'Product not found' }); + } + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] Price product error:', error); + res.status(500).json({ error: 'Failed to fetch product price trend' }); + } + }); + + /** + * GET /price/category/:category + * Get price statistics for a category by state + */ + router.get('/price/category/:category', async (req: Request, res: Response) => { + try { + const category = decodeURIComponent(req.params.category); + const stateCode = req.query.state as string | undefined; + + const result = await priceService.getCategoryPriceByState(category, { stateCode }); + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] Price category error:', error); + res.status(500).json({ error: 'Failed to fetch category price stats' }); + } + }); + + /** + * GET /price/brand/:brand + * Get price statistics for a brand by state + */ + router.get('/price/brand/:brand', async (req: Request, res: Response) => { + try { + const brandName = decodeURIComponent(req.params.brand); + const stateCode = req.query.state as string | undefined; + + const result = await priceService.getBrandPriceByState(brandName, { stateCode }); + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] Price brand error:', error); + res.status(500).json({ error: 'Failed to fetch brand price stats' }); + } + }); + + /** + * GET /price/volatile + * Get most volatile products (frequent price changes) + */ + router.get('/price/volatile', async (req: Request, res: Response) => { + try { + const window = parseTimeWindow(req.query.window as string); + const limit = req.query.limit ? parseInt(req.query.limit as string) : 50; + const stateCode = req.query.state as string | undefined; + const category = req.query.category as string | undefined; + + const result = await priceService.getMostVolatileProducts({ + window, + limit, + stateCode, + category, + }); + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] Price volatile error:', error); + res.status(500).json({ error: 'Failed to fetch volatile products' }); + } + }); + + /** + * GET /price/rec-vs-med + * Get rec vs med price comparison by category + */ + router.get('/price/rec-vs-med', async (req: Request, res: Response) => { + try { + const category = req.query.category as string | undefined; + const result = await priceService.getCategoryRecVsMedPrices(category); + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] Price rec vs med error:', error); + res.status(500).json({ error: 'Failed to fetch rec vs med prices' }); + } + }); + + // ============================================================ + // BRAND PENETRATION + // ============================================================ + + /** + * GET /brand/:name/penetration + * Get brand penetration metrics + */ + router.get('/brand/:name/penetration', async (req: Request, res: Response) => { + try { + const brandName = decodeURIComponent(req.params.name); + const window = parseTimeWindow(req.query.window as string); + + const result = await brandService.getBrandPenetration(brandName, { window }); + if (!result) { + return res.status(404).json({ error: 'Brand not found' }); + } + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] Brand penetration error:', error); + res.status(500).json({ error: 'Failed to fetch brand penetration' }); + } + }); + + /** + * GET /brand/:name/market-position + * Get brand market position within categories + */ + router.get('/brand/:name/market-position', async (req: Request, res: Response) => { + try { + const brandName = decodeURIComponent(req.params.name); + const category = req.query.category as string | undefined; + const stateCode = req.query.state as string | undefined; + + const result = await brandService.getBrandMarketPosition(brandName, { category, stateCode }); + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] Brand market position error:', error); + res.status(500).json({ error: 'Failed to fetch brand market position' }); + } + }); + + /** + * GET /brand/:name/rec-vs-med + * Get brand presence in rec vs med-only states + */ + router.get('/brand/:name/rec-vs-med', async (req: Request, res: Response) => { + try { + const brandName = decodeURIComponent(req.params.name); + const result = await brandService.getBrandRecVsMedFootprint(brandName); + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] Brand rec vs med error:', error); + res.status(500).json({ error: 'Failed to fetch brand rec vs med footprint' }); + } + }); + + /** + * GET /brand/top + * Get top brands by penetration + */ + router.get('/brand/top', async (req: Request, res: Response) => { + try { + const limit = req.query.limit ? parseInt(req.query.limit as string) : 25; + const stateCode = req.query.state as string | undefined; + const category = req.query.category as string | undefined; + + const result = await brandService.getTopBrandsByPenetration({ limit, stateCode, category }); + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] Top brands error:', error); + res.status(500).json({ error: 'Failed to fetch top brands' }); + } + }); + + /** + * GET /brand/expansion-contraction + * Get brands that have expanded or contracted + */ + router.get('/brand/expansion-contraction', async (req: Request, res: Response) => { + try { + const window = parseTimeWindow(req.query.window as string); + const limit = req.query.limit ? parseInt(req.query.limit as string) : 25; + + const result = await brandService.getBrandExpansionContraction({ window, limit }); + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] Brand expansion error:', error); + res.status(500).json({ error: 'Failed to fetch brand expansion/contraction' }); + } + }); + + // ============================================================ + // CATEGORY ANALYTICS + // ============================================================ + + /** + * GET /category/:name/growth + * Get category growth metrics + */ + router.get('/category/:name/growth', async (req: Request, res: Response) => { + try { + const category = decodeURIComponent(req.params.name); + const window = parseTimeWindow(req.query.window as string); + + const result = await categoryService.getCategoryGrowth(category, { window }); + if (!result) { + return res.status(404).json({ error: 'Category not found' }); + } + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] Category growth error:', error); + res.status(500).json({ error: 'Failed to fetch category growth' }); + } + }); + + /** + * GET /category/:name/trend + * Get category growth trend over time + */ + router.get('/category/:name/trend', async (req: Request, res: Response) => { + try { + const category = decodeURIComponent(req.params.name); + const window = parseTimeWindow(req.query.window as string); + + const result = await categoryService.getCategoryGrowthTrend(category, { window }); + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] Category trend error:', error); + res.status(500).json({ error: 'Failed to fetch category trend' }); + } + }); + + /** + * GET /category/:name/top-brands + * Get top brands within a category + */ + router.get('/category/:name/top-brands', async (req: Request, res: Response) => { + try { + const category = decodeURIComponent(req.params.name); + const limit = req.query.limit ? parseInt(req.query.limit as string) : 25; + const stateCode = req.query.state as string | undefined; + + const result = await categoryService.getTopBrandsInCategory(category, { limit, stateCode }); + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] Category top brands error:', error); + res.status(500).json({ error: 'Failed to fetch top brands in category' }); + } + }); + + /** + * GET /category/all + * Get all categories with metrics + */ + router.get('/category/all', async (req: Request, res: Response) => { + try { + const stateCode = req.query.state as string | undefined; + const limit = req.query.limit ? parseInt(req.query.limit as string) : 50; + + const result = await categoryService.getAllCategories({ stateCode, limit }); + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] All categories error:', error); + res.status(500).json({ error: 'Failed to fetch categories' }); + } + }); + + /** + * GET /category/rec-vs-med + * Get category comparison between rec and med-only states + */ + router.get('/category/rec-vs-med', async (req: Request, res: Response) => { + try { + const category = req.query.category as string | undefined; + const result = await categoryService.getCategoryRecVsMedComparison(category); + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] Category rec vs med error:', error); + res.status(500).json({ error: 'Failed to fetch category rec vs med comparison' }); + } + }); + + /** + * GET /category/fastest-growing + * Get fastest growing categories + */ + router.get('/category/fastest-growing', async (req: Request, res: Response) => { + try { + const window = parseTimeWindow(req.query.window as string); + const limit = req.query.limit ? parseInt(req.query.limit as string) : 25; + + const result = await categoryService.getFastestGrowingCategories({ window, limit }); + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] Fastest growing error:', error); + res.status(500).json({ error: 'Failed to fetch fastest growing categories' }); + } + }); + + // ============================================================ + // STORE ANALYTICS + // ============================================================ + + /** + * GET /store/:id/summary + * Get change summary for a store + */ + router.get('/store/:id/summary', async (req: Request, res: Response) => { + try { + const dispensaryId = parseInt(req.params.id); + const window = parseTimeWindow(req.query.window as string); + + const result = await storeService.getStoreChangeSummary(dispensaryId, { window }); + if (!result) { + return res.status(404).json({ error: 'Store not found' }); + } + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] Store summary error:', error); + res.status(500).json({ error: 'Failed to fetch store summary' }); + } + }); + + /** + * GET /store/:id/events + * Get recent product change events for a store + */ + router.get('/store/:id/events', async (req: Request, res: Response) => { + try { + const dispensaryId = parseInt(req.params.id); + const window = parseTimeWindow(req.query.window as string); + const limit = req.query.limit ? parseInt(req.query.limit as string) : 100; + + const result = await storeService.getProductChangeEvents(dispensaryId, { window, limit }); + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] Store events error:', error); + res.status(500).json({ error: 'Failed to fetch store events' }); + } + }); + + /** + * GET /store/:id/changes + * Alias for /store/:id/events - matches Analytics V2 spec naming + * Returns list of detected changes (new products, price drops, new brands) + */ + router.get('/store/:id/changes', async (req: Request, res: Response) => { + try { + const dispensaryId = parseInt(req.params.id); + const window = parseTimeWindow(req.query.window as string); + const limit = req.query.limit ? parseInt(req.query.limit as string) : 100; + + const result = await storeService.getProductChangeEvents(dispensaryId, { window, limit }); + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] Store changes error:', error); + res.status(500).json({ error: 'Failed to fetch store changes' }); + } + }); + + /** + * GET /store/:id/inventory + * Get store inventory composition + */ + router.get('/store/:id/inventory', async (req: Request, res: Response) => { + try { + const dispensaryId = parseInt(req.params.id); + const result = await storeService.getStoreInventoryComposition(dispensaryId); + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] Store inventory error:', error); + res.status(500).json({ error: 'Failed to fetch store inventory' }); + } + }); + + /** + * GET /store/:id/price-position + * Get store price positioning vs market + */ + router.get('/store/:id/price-position', async (req: Request, res: Response) => { + try { + const dispensaryId = parseInt(req.params.id); + const result = await storeService.getStorePricePositioning(dispensaryId); + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] Store price position error:', error); + res.status(500).json({ error: 'Failed to fetch store price positioning' }); + } + }); + + /** + * GET /store/most-active + * Get stores with most changes + */ + router.get('/store/most-active', async (req: Request, res: Response) => { + try { + const window = parseTimeWindow(req.query.window as string); + const limit = req.query.limit ? parseInt(req.query.limit as string) : 25; + const stateCode = req.query.state as string | undefined; + + const result = await storeService.getMostActiveStores({ window, limit, stateCode }); + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] Most active stores error:', error); + res.status(500).json({ error: 'Failed to fetch most active stores' }); + } + }); + + // ============================================================ + // STATE ANALYTICS + // ============================================================ + + /** + * GET /state/:code/summary + * Get market summary for a specific state + */ + router.get('/state/:code/summary', async (req: Request, res: Response) => { + try { + const stateCode = req.params.code.toUpperCase(); + const result = await stateService.getStateMarketSummary(stateCode); + if (!result) { + return res.status(404).json({ error: 'State not found' }); + } + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] State summary error:', error); + res.status(500).json({ error: 'Failed to fetch state summary' }); + } + }); + + /** + * GET /state/all + * Get all states with coverage metrics + */ + router.get('/state/all', async (_req: Request, res: Response) => { + try { + const result = await stateService.getAllStatesWithCoverage(); + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] All states error:', error); + res.status(500).json({ error: 'Failed to fetch states' }); + } + }); + + /** + * GET /state/legal-breakdown + * Get breakdown by legal status (rec, med-only, no program) + */ + router.get('/state/legal-breakdown', async (_req: Request, res: Response) => { + try { + const result = await stateService.getLegalStateBreakdown(); + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] Legal breakdown error:', error); + res.status(500).json({ error: 'Failed to fetch legal breakdown' }); + } + }); + + /** + * GET /state/rec-vs-med-pricing + * Get rec vs med price comparison by category + */ + router.get('/state/rec-vs-med-pricing', async (req: Request, res: Response) => { + try { + const category = req.query.category as string | undefined; + const result = await stateService.getRecVsMedPriceComparison(category); + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] Rec vs med pricing error:', error); + res.status(500).json({ error: 'Failed to fetch rec vs med pricing' }); + } + }); + + /** + * GET /state/coverage-gaps + * Get states with coverage gaps + */ + router.get('/state/coverage-gaps', async (_req: Request, res: Response) => { + try { + const result = await stateService.getStateCoverageGaps(); + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] Coverage gaps error:', error); + res.status(500).json({ error: 'Failed to fetch coverage gaps' }); + } + }); + + /** + * GET /state/price-comparison + * Get pricing comparison across all states + */ + router.get('/state/price-comparison', async (_req: Request, res: Response) => { + try { + const result = await stateService.getStatePricingComparison(); + res.json(result); + } catch (error) { + console.error('[AnalyticsV2] State price comparison error:', error); + res.status(500).json({ error: 'Failed to fetch state price comparison' }); + } + }); + + /** + * GET /state/recreational + * Get list of recreational state codes + */ + router.get('/state/recreational', async (_req: Request, res: Response) => { + try { + const result = await stateService.getRecreationalStates(); + res.json({ legal_type: 'recreational', states: result, count: result.length }); + } catch (error) { + console.error('[AnalyticsV2] Recreational states error:', error); + res.status(500).json({ error: 'Failed to fetch recreational states' }); + } + }); + + /** + * GET /state/medical-only + * Get list of medical-only state codes (not recreational) + */ + router.get('/state/medical-only', async (_req: Request, res: Response) => { + try { + const result = await stateService.getMedicalOnlyStates(); + res.json({ legal_type: 'medical_only', states: result, count: result.length }); + } catch (error) { + console.error('[AnalyticsV2] Medical-only states error:', error); + res.status(500).json({ error: 'Failed to fetch medical-only states' }); + } + }); + + /** + * GET /state/no-program + * Get list of states with no cannabis program + */ + router.get('/state/no-program', async (_req: Request, res: Response) => { + try { + const result = await stateService.getNoProgramStates(); + res.json({ legal_type: 'no_program', states: result, count: result.length }); + } catch (error) { + console.error('[AnalyticsV2] No-program states error:', error); + res.status(500).json({ error: 'Failed to fetch no-program states' }); + } + }); + + return router; +} diff --git a/backend/src/routes/analytics.ts b/backend/src/routes/analytics.ts index ff03e908..5c2e2ea3 100755 --- a/backend/src/routes/analytics.ts +++ b/backend/src/routes/analytics.ts @@ -1,6 +1,6 @@ import { Router } from 'express'; import { authMiddleware } from '../auth/middleware'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; const router = Router(); router.use(authMiddleware); diff --git a/backend/src/routes/api-permissions.ts b/backend/src/routes/api-permissions.ts index 9784c943..022b5148 100644 --- a/backend/src/routes/api-permissions.ts +++ b/backend/src/routes/api-permissions.ts @@ -1,6 +1,6 @@ import { Router } from 'express'; import { authMiddleware, requireRole } from '../auth/middleware'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import crypto from 'crypto'; const router = Router(); diff --git a/backend/src/routes/api-tokens.ts b/backend/src/routes/api-tokens.ts index d08b64ad..6fa71239 100644 --- a/backend/src/routes/api-tokens.ts +++ b/backend/src/routes/api-tokens.ts @@ -1,6 +1,6 @@ import { Router } from 'express'; import { authMiddleware, requireRole } from '../auth/middleware'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import crypto from 'crypto'; const router = Router(); diff --git a/backend/src/routes/campaigns.ts b/backend/src/routes/campaigns.ts index 93299466..91fb1fb4 100755 --- a/backend/src/routes/campaigns.ts +++ b/backend/src/routes/campaigns.ts @@ -1,6 +1,6 @@ import { Router } from 'express'; import { authMiddleware, requireRole } from '../auth/middleware'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; const router = Router(); router.use(authMiddleware); diff --git a/backend/src/routes/categories.ts b/backend/src/routes/categories.ts index 7d61e918..86c9db20 100644 --- a/backend/src/routes/categories.ts +++ b/backend/src/routes/categories.ts @@ -1,6 +1,6 @@ import { Router } from 'express'; import { authMiddleware } from '../auth/middleware'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; const router = Router(); router.use(authMiddleware); diff --git a/backend/src/routes/changes.ts b/backend/src/routes/changes.ts index 57daf652..fdcb9e1d 100644 --- a/backend/src/routes/changes.ts +++ b/backend/src/routes/changes.ts @@ -1,6 +1,6 @@ import { Router } from 'express'; import { authMiddleware } from '../auth/middleware'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; const router = Router(); router.use(authMiddleware); diff --git a/backend/src/routes/crawler-sandbox.ts b/backend/src/routes/crawler-sandbox.ts index e1b6fb8e..b1f7d16b 100644 --- a/backend/src/routes/crawler-sandbox.ts +++ b/backend/src/routes/crawler-sandbox.ts @@ -5,7 +5,7 @@ */ import express from 'express'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { authMiddleware, requireRole } from '../auth/middleware'; import { logger } from '../services/logger'; import { diff --git a/backend/src/routes/dispensaries.ts b/backend/src/routes/dispensaries.ts index 1e07b00a..68b029d3 100644 --- a/backend/src/routes/dispensaries.ts +++ b/backend/src/routes/dispensaries.ts @@ -1,6 +1,6 @@ import { Router } from 'express'; import { authMiddleware } from '../auth/middleware'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; const router = Router(); router.use(authMiddleware); diff --git a/backend/src/routes/orchestrator-admin.ts b/backend/src/routes/orchestrator-admin.ts new file mode 100644 index 00000000..90da6ca6 --- /dev/null +++ b/backend/src/routes/orchestrator-admin.ts @@ -0,0 +1,430 @@ +/** + * Orchestrator Admin Routes + * + * Read-only admin API endpoints for the CannaiQ Orchestrator Dashboard. + * Provides OBSERVABILITY ONLY - no state changes. + */ + +import { Router, Request, Response } from 'express'; +import { pool } from '../db/pool'; +import { getLatestTrace, getTracesForDispensary, getTraceById } from '../services/orchestrator-trace'; +import { getProviderDisplayName } from '../utils/provider-display'; +import * as fs from 'fs'; +import * as path from 'path'; + +const router = Router(); + +// ============================================================ +// ORCHESTRATOR METRICS +// ============================================================ + +/** + * GET /api/admin/orchestrator/metrics + * Returns nationwide metrics for the orchestrator dashboard + */ +router.get('/metrics', async (_req: Request, res: Response) => { + try { + // Get aggregate metrics + const { rows: metrics } = await pool.query(` + SELECT + (SELECT COUNT(*) FROM dutchie_products) as total_products, + (SELECT COUNT(DISTINCT brand_name) FROM dutchie_products WHERE brand_name IS NOT NULL) as total_brands, + (SELECT COUNT(*) FROM dispensaries WHERE state = 'AZ') as total_stores, + ( + SELECT COUNT(*) + FROM dispensary_crawler_profiles dcp + WHERE dcp.enabled = true + AND (dcp.status = 'production' OR (dcp.config->>'status')::text = 'production') + ) as healthy_count, + ( + SELECT COUNT(*) + FROM dispensary_crawler_profiles dcp + WHERE dcp.enabled = true + AND (dcp.status = 'sandbox' OR (dcp.config->>'status')::text = 'sandbox') + ) as sandbox_count, + ( + SELECT COUNT(*) + FROM dispensary_crawler_profiles dcp + WHERE dcp.enabled = true + AND (dcp.status = 'needs_manual' OR (dcp.config->>'status')::text = 'needs_manual') + ) as needs_manual_count, + ( + SELECT COUNT(*) + FROM dispensary_crawler_profiles dcp + JOIN dispensaries d ON d.id = dcp.dispensary_id + WHERE d.state = 'AZ' + AND dcp.status = 'needs_manual' + ) as failing_count + `); + + const row = metrics[0] || {}; + + res.json({ + total_products: parseInt(row.total_products || '0', 10), + total_brands: parseInt(row.total_brands || '0', 10), + total_stores: parseInt(row.total_stores || '0', 10), + // Placeholder sentiment values - these would come from actual analytics + market_sentiment: 'neutral', + market_direction: 'stable', + // Health counts + healthy_count: parseInt(row.healthy_count || '0', 10), + sandbox_count: parseInt(row.sandbox_count || '0', 10), + needs_manual_count: parseInt(row.needs_manual_count || '0', 10), + failing_count: parseInt(row.failing_count || '0', 10), + }); + } catch (error: any) { + console.error('[OrchestratorAdmin] Error fetching metrics:', error.message); + res.status(500).json({ error: error.message }); + } +}); + +// ============================================================ +// STATES LIST +// ============================================================ + +/** + * GET /api/admin/orchestrator/states + * Returns array of states with at least one known dispensary + */ +router.get('/states', async (_req: Request, res: Response) => { + try { + const { rows } = await pool.query(` + SELECT DISTINCT state, COUNT(*) as store_count + FROM dispensaries + WHERE state IS NOT NULL + GROUP BY state + ORDER BY state + `); + + res.json({ + states: rows.map((r: any) => ({ + state: r.state, + storeCount: parseInt(r.store_count || '0', 10), + })), + }); + } catch (error: any) { + console.error('[OrchestratorAdmin] Error fetching states:', error.message); + res.status(500).json({ error: error.message }); + } +}); + +// ============================================================ +// STORES LIST +// ============================================================ + +/** + * GET /api/admin/orchestrator/stores + * Returns list of stores with orchestrator status info + * Query params: + * - state: Filter by state (e.g., "AZ") + * - limit: Max results (default 100) + * - offset: Pagination offset + */ +router.get('/stores', async (req: Request, res: Response) => { + try { + const { state, limit = '100', offset = '0' } = req.query; + + let whereClause = 'WHERE 1=1'; + const params: any[] = []; + let paramIndex = 1; + + if (state && state !== 'all') { + whereClause += ` AND d.state = $${paramIndex}`; + params.push(state); + paramIndex++; + } + + params.push(parseInt(limit as string, 10), parseInt(offset as string, 10)); + + const { rows } = await pool.query(` + SELECT + d.id, + d.name, + d.city, + d.state, + d.menu_type as provider, + d.platform_dispensary_id, + d.last_crawl_at, + dcp.id as profile_id, + dcp.profile_key, + COALESCE(dcp.status, dcp.config->>'status', 'legacy') as crawler_status, + ( + SELECT MAX(cot.completed_at) + FROM crawl_orchestration_traces cot + WHERE cot.dispensary_id = d.id AND cot.success = true + ) as last_success_at, + ( + SELECT MAX(cot.completed_at) + FROM crawl_orchestration_traces cot + WHERE cot.dispensary_id = d.id AND cot.success = false + ) as last_failure_at, + ( + SELECT COUNT(*) + FROM dutchie_products dp + WHERE dp.dispensary_id = d.id + ) as product_count + FROM dispensaries d + LEFT JOIN dispensary_crawler_profiles dcp + ON dcp.dispensary_id = d.id AND dcp.enabled = true + ${whereClause} + ORDER BY d.name + LIMIT $${paramIndex} OFFSET $${paramIndex + 1} + `, params); + + // Get total count + const { rows: countRows } = await pool.query( + `SELECT COUNT(*) as total FROM dispensaries d ${whereClause}`, + params.slice(0, -2) + ); + + res.json({ + stores: rows.map((r: any) => ({ + id: r.id, + name: r.name, + city: r.city, + state: r.state, + provider: r.provider || 'unknown', + provider_raw: r.provider || null, + provider_display: getProviderDisplayName(r.provider), + platformDispensaryId: r.platform_dispensary_id, + status: r.crawler_status || (r.platform_dispensary_id ? 'legacy' : 'pending'), + profileId: r.profile_id, + profileKey: r.profile_key, + lastCrawlAt: r.last_crawl_at, + lastSuccessAt: r.last_success_at, + lastFailureAt: r.last_failure_at, + productCount: parseInt(r.product_count || '0', 10), + })), + total: parseInt(countRows[0]?.total || '0', 10), + limit: parseInt(limit as string, 10), + offset: parseInt(offset as string, 10), + }); + } catch (error: any) { + console.error('[OrchestratorAdmin] Error fetching stores:', error.message); + res.status(500).json({ error: error.message }); + } +}); + +// ============================================================ +// DISPENSARY TRACE (already exists but adding here for clarity) +// ============================================================ + +/** + * GET /api/admin/dispensaries/:id/crawl-trace/latest + * Returns the latest orchestrator trace for a dispensary + */ +router.get('/dispensaries/:id/crawl-trace/latest', async (req: Request, res: Response) => { + try { + const { id } = req.params; + const trace = await getLatestTrace(parseInt(id, 10)); + + if (!trace) { + return res.status(404).json({ error: 'No trace found for this dispensary' }); + } + + res.json(trace); + } catch (error: any) { + console.error('[OrchestratorAdmin] Error fetching trace:', error.message); + res.status(500).json({ error: error.message }); + } +}); + +/** + * GET /api/admin/dispensaries/:id/crawl-traces + * Returns paginated list of traces for a dispensary + */ +router.get('/dispensaries/:id/crawl-traces', async (req: Request, res: Response) => { + try { + const { id } = req.params; + const { limit = '20', offset = '0' } = req.query; + + const result = await getTracesForDispensary( + parseInt(id, 10), + parseInt(limit as string, 10), + parseInt(offset as string, 10) + ); + + res.json(result); + } catch (error: any) { + console.error('[OrchestratorAdmin] Error fetching traces:', error.message); + res.status(500).json({ error: error.message }); + } +}); + +// ============================================================ +// DISPENSARY PROFILE +// ============================================================ + +/** + * GET /api/admin/dispensaries/:id/profile + * Returns the crawler profile for a dispensary + */ +router.get('/dispensaries/:id/profile', async (req: Request, res: Response) => { + try { + const { id } = req.params; + + const { rows } = await pool.query(` + SELECT + dcp.id, + dcp.dispensary_id, + dcp.profile_key, + dcp.profile_name, + dcp.platform, + dcp.version, + dcp.status, + dcp.config, + dcp.enabled, + dcp.sandbox_attempt_count, + dcp.next_retry_at, + dcp.created_at, + dcp.updated_at, + d.name as dispensary_name, + d.active_crawler_profile_id + FROM dispensary_crawler_profiles dcp + JOIN dispensaries d ON d.id = dcp.dispensary_id + WHERE dcp.dispensary_id = $1 AND dcp.enabled = true + ORDER BY dcp.updated_at DESC + LIMIT 1 + `, [parseInt(id, 10)]); + + if (rows.length === 0) { + // Return basic dispensary info even if no profile + const { rows: dispRows } = await pool.query(` + SELECT id, name, active_crawler_profile_id, menu_type, platform_dispensary_id + FROM dispensaries WHERE id = $1 + `, [parseInt(id, 10)]); + + if (dispRows.length === 0) { + return res.status(404).json({ error: 'Dispensary not found' }); + } + + return res.json({ + dispensaryId: dispRows[0].id, + dispensaryName: dispRows[0].name, + hasProfile: false, + activeProfileId: dispRows[0].active_crawler_profile_id, + menuType: dispRows[0].menu_type, + platformDispensaryId: dispRows[0].platform_dispensary_id, + }); + } + + const profile = rows[0]; + res.json({ + dispensaryId: profile.dispensary_id, + dispensaryName: profile.dispensary_name, + hasProfile: true, + activeProfileId: profile.active_crawler_profile_id, + profile: { + id: profile.id, + profileKey: profile.profile_key, + profileName: profile.profile_name, + platform: profile.platform, + version: profile.version, + status: profile.status || profile.config?.status || 'unknown', + config: profile.config, + enabled: profile.enabled, + sandboxAttemptCount: profile.sandbox_attempt_count, + nextRetryAt: profile.next_retry_at, + createdAt: profile.created_at, + updatedAt: profile.updated_at, + }, + }); + } catch (error: any) { + console.error('[OrchestratorAdmin] Error fetching profile:', error.message); + res.status(500).json({ error: error.message }); + } +}); + +// ============================================================ +// CRAWLER MODULE PREVIEW +// ============================================================ + +/** + * GET /api/admin/dispensaries/:id/crawler-module + * Returns the raw .ts file content for the per-store crawler + */ +router.get('/dispensaries/:id/crawler-module', async (req: Request, res: Response) => { + try { + const { id } = req.params; + + // Get the profile key for this dispensary + const { rows } = await pool.query(` + SELECT profile_key, platform + FROM dispensary_crawler_profiles + WHERE dispensary_id = $1 AND enabled = true + ORDER BY updated_at DESC + LIMIT 1 + `, [parseInt(id, 10)]); + + if (rows.length === 0 || !rows[0].profile_key) { + return res.status(404).json({ + error: 'No per-store crawler module found for this dispensary', + hasModule: false, + }); + } + + const profileKey = rows[0].profile_key; + const platform = rows[0].platform || 'dutchie'; + + // Construct file path + const modulePath = path.join( + __dirname, + '..', + 'crawlers', + platform, + 'stores', + `${profileKey}.ts` + ); + + // Check if file exists + if (!fs.existsSync(modulePath)) { + return res.status(404).json({ + error: `Crawler module file not found: ${profileKey}.ts`, + hasModule: false, + expectedPath: `crawlers/${platform}/stores/${profileKey}.ts`, + }); + } + + // Read file content + const content = fs.readFileSync(modulePath, 'utf-8'); + + res.json({ + hasModule: true, + profileKey, + platform, + fileName: `${profileKey}.ts`, + filePath: `crawlers/${platform}/stores/${profileKey}.ts`, + content, + lines: content.split('\n').length, + }); + } catch (error: any) { + console.error('[OrchestratorAdmin] Error fetching crawler module:', error.message); + res.status(500).json({ error: error.message }); + } +}); + +// ============================================================ +// TRACE BY ID +// ============================================================ + +/** + * GET /api/admin/crawl-traces/:traceId + * Returns a specific trace by ID + */ +router.get('/crawl-traces/:traceId', async (req: Request, res: Response) => { + try { + const { traceId } = req.params; + const trace = await getTraceById(parseInt(traceId, 10)); + + if (!trace) { + return res.status(404).json({ error: 'Trace not found' }); + } + + res.json(trace); + } catch (error: any) { + console.error('[OrchestratorAdmin] Error fetching trace:', error.message); + res.status(500).json({ error: error.message }); + } +}); + +export default router; diff --git a/backend/src/routes/parallel-scrape.ts b/backend/src/routes/parallel-scrape.ts index aa057a81..e92d1ceb 100644 --- a/backend/src/routes/parallel-scrape.ts +++ b/backend/src/routes/parallel-scrape.ts @@ -1,5 +1,5 @@ import { Router } from 'express'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { getActiveProxy, putProxyInTimeout, isBotDetectionError } from '../services/proxy'; import { authMiddleware } from '../auth/middleware'; diff --git a/backend/src/routes/products.ts b/backend/src/routes/products.ts index 298cbd65..443e39a8 100755 --- a/backend/src/routes/products.ts +++ b/backend/src/routes/products.ts @@ -1,6 +1,6 @@ import { Router } from 'express'; import { authMiddleware } from '../auth/middleware'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { getImageUrl } from '../utils/minio'; const router = Router(); diff --git a/backend/src/routes/proxies.ts b/backend/src/routes/proxies.ts index 67812a76..36d33468 100755 --- a/backend/src/routes/proxies.ts +++ b/backend/src/routes/proxies.ts @@ -1,6 +1,6 @@ import { Router } from 'express'; import { authMiddleware, requireRole } from '../auth/middleware'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { testProxy, addProxy, addProxiesFromList } from '../services/proxy'; import { createProxyTestJob, getProxyTestJob, getActiveProxyTestJob, cancelProxyTestJob } from '../services/proxyTestQueue'; diff --git a/backend/src/routes/public-api.ts b/backend/src/routes/public-api.ts index 489fe735..acabce04 100644 --- a/backend/src/routes/public-api.ts +++ b/backend/src/routes/public-api.ts @@ -8,7 +8,7 @@ */ import { Router, Request, Response, NextFunction } from 'express'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { query as dutchieAzQuery } from '../dutchie-az/db/connection'; import ipaddr from 'ipaddr.js'; import { diff --git a/backend/src/routes/schedule.ts b/backend/src/routes/schedule.ts index 6b414924..af46addc 100644 --- a/backend/src/routes/schedule.ts +++ b/backend/src/routes/schedule.ts @@ -26,7 +26,7 @@ import { getDispensariesDueForOrchestration, ensureAllDispensariesHaveSchedules, } from '../services/dispensary-orchestrator'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { resolveDispensaryId } from '../dutchie-az/services/graphql-client'; const router = Router(); @@ -376,7 +376,7 @@ router.get('/dispensaries', async (req: Request, res: Response) => { } if (search) { - conditions.push(`(d.name ILIKE $${paramIndex} OR d.slug ILIKE $${paramIndex} OR d.dba_name ILIKE $${paramIndex})`); + conditions.push(`(d.name ILIKE $${paramIndex} OR d.slug ILIKE $${paramIndex})`); params.push(`%${search}%`); paramIndex++; } @@ -386,7 +386,7 @@ router.get('/dispensaries', async (req: Request, res: Response) => { const query = ` SELECT d.id AS dispensary_id, - COALESCE(d.dba_name, d.name) AS dispensary_name, + d.name AS dispensary_name, d.slug AS dispensary_slug, d.city, d.state, @@ -436,7 +436,7 @@ router.get('/dispensaries', async (req: Request, res: Response) => { LIMIT 1 ) j ON true ${whereClause} - ORDER BY cs.priority DESC NULLS LAST, COALESCE(d.dba_name, d.name) + ORDER BY cs.priority DESC NULLS LAST, d.name `; const result = await pool.query(query, params); diff --git a/backend/src/routes/scraper-monitor.ts b/backend/src/routes/scraper-monitor.ts index 9a6f85a5..8f0025e8 100644 --- a/backend/src/routes/scraper-monitor.ts +++ b/backend/src/routes/scraper-monitor.ts @@ -1,6 +1,6 @@ import { Router } from 'express'; import { authMiddleware } from '../auth/middleware'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; const router = Router(); router.use(authMiddleware); @@ -76,7 +76,7 @@ router.get('/history', async (req, res) => { let query = ` SELECT d.id as dispensary_id, - COALESCE(d.dba_name, d.name) as dispensary_name, + d.name as dispensary_name, d.city, d.state, dcj.id as job_id, @@ -245,7 +245,7 @@ router.get('/jobs/active', async (req, res) => { SELECT dcj.id, dcj.dispensary_id, - COALESCE(d.dba_name, d.name) as dispensary_name, + d.name as dispensary_name, dcj.job_type, dcj.status, dcj.worker_id, @@ -298,7 +298,7 @@ router.get('/jobs/recent', async (req, res) => { SELECT dcj.id, dcj.dispensary_id, - COALESCE(d.dba_name, d.name) as dispensary_name, + d.name as dispensary_name, dcj.job_type, dcj.status, dcj.worker_id, diff --git a/backend/src/routes/settings.ts b/backend/src/routes/settings.ts index e620198e..ecc6242e 100755 --- a/backend/src/routes/settings.ts +++ b/backend/src/routes/settings.ts @@ -1,6 +1,6 @@ import { Router } from 'express'; import { authMiddleware, requireRole } from '../auth/middleware'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { restartScheduler } from '../services/scheduler'; const router = Router(); diff --git a/backend/src/routes/states.ts b/backend/src/routes/states.ts new file mode 100644 index 00000000..4b5b8b43 --- /dev/null +++ b/backend/src/routes/states.ts @@ -0,0 +1,318 @@ +/** + * States API Routes + * + * Endpoints for querying cannabis legalization status by state. + * + * Routes: + * GET /api/states - All states with dispensary counts + * GET /api/states/legal - States with rec/med flags & years + * GET /api/states/targets - Legal states prioritized for crawling + * GET /api/states/summary - Summary statistics + * GET /api/states/:code - Single state by code (e.g., AZ, CA) + * GET /api/states/recreational - Recreational states only + * GET /api/states/medical-only - Medical-only states (no rec) + * GET /api/states/no-program - States with no cannabis programs + */ + +import { Router, Request, Response } from 'express'; +import { Pool } from 'pg'; +import { LegalStateService } from '../services/LegalStateService'; + +export function createStatesRouter(pool: Pool): Router { + const router = Router(); + const service = new LegalStateService(pool); + + /** + * GET /api/states + * Get all states with dispensary counts + */ + router.get('/', async (_req: Request, res: Response) => { + try { + const states = await service.getStateSummaries(); + res.json({ + success: true, + count: states.length, + states, + }); + } catch (error: any) { + console.error('[States API] Error fetching states:', error); + res.status(500).json({ + success: false, + error: error.message, + }); + } + }); + + /** + * GET /api/states/legal + * Get all states with cannabis programs (rec or medical) + */ + router.get('/legal', async (_req: Request, res: Response) => { + try { + const allStates = await service.getAllStatesWithDispensaryCounts(); + const legalStates = allStates.filter( + (s) => s.recreational_legal === true || s.medical_legal === true + ); + + const formatted = legalStates.map((s) => ({ + code: s.code, + name: s.name, + recreational: { + legal: s.recreational_legal || false, + year: s.rec_year, + }, + medical: { + legal: s.medical_legal || false, + year: s.med_year, + }, + dispensary_count: s.dispensary_count, + })); + + res.json({ + success: true, + count: formatted.length, + states: formatted, + }); + } catch (error: any) { + console.error('[States API] Error fetching legal states:', error); + res.status(500).json({ + success: false, + error: error.message, + }); + } + }); + + /** + * GET /api/states/targets + * Get legal states prioritized for crawling + * Returns: rec states with no dispensaries, med-only states with no dispensaries + */ + router.get('/targets', async (_req: Request, res: Response) => { + try { + const [recNoDisp, medOnlyNoDisp, summary] = await Promise.all([ + service.getRecreationalStatesWithNoDispensaries(), + service.getMedicalOnlyStatesWithNoDispensaries(), + service.getLegalStatusSummary(), + ]); + + res.json({ + success: true, + summary: { + total_legal_states: summary.recreational_states + summary.medical_only_states, + states_with_dispensaries: summary.states_with_dispensaries, + states_needing_data: summary.legal_states_without_dispensaries, + }, + recreational_states_no_dispensaries: { + count: recNoDisp.length, + states: recNoDisp.map((s) => ({ + code: s.code, + name: s.name, + rec_year: s.rec_year, + med_year: s.med_year, + priority_score: s.priority_score, + })), + }, + medical_only_states_no_dispensaries: { + count: medOnlyNoDisp.length, + states: medOnlyNoDisp.map((s) => ({ + code: s.code, + name: s.name, + med_year: s.med_year, + priority_score: s.priority_score, + })), + }, + }); + } catch (error: any) { + console.error('[States API] Error fetching target states:', error); + res.status(500).json({ + success: false, + error: error.message, + }); + } + }); + + /** + * GET /api/states/summary + * Get summary statistics about state legalization + */ + router.get('/summary', async (_req: Request, res: Response) => { + try { + const summary = await service.getLegalStatusSummary(); + res.json({ + success: true, + summary: { + recreational_states: summary.recreational_states, + medical_only_states: summary.medical_only_states, + no_program_states: summary.no_program_states, + total_states: summary.total_states, + states_with_dispensaries: summary.states_with_dispensaries, + legal_states_without_dispensaries: summary.legal_states_without_dispensaries, + }, + }); + } catch (error: any) { + console.error('[States API] Error fetching summary:', error); + res.status(500).json({ + success: false, + error: error.message, + }); + } + }); + + /** + * GET /api/states/recreational + * Get all recreational states + */ + router.get('/recreational', async (_req: Request, res: Response) => { + try { + const states = await service.getRecreationalStates(); + res.json({ + success: true, + count: states.length, + states: states.map((s) => ({ + code: s.code, + name: s.name, + rec_year: s.rec_year, + med_year: s.med_year, + })), + }); + } catch (error: any) { + console.error('[States API] Error fetching recreational states:', error); + res.status(500).json({ + success: false, + error: error.message, + }); + } + }); + + /** + * GET /api/states/medical-only + * Get medical-only states (no recreational) + */ + router.get('/medical-only', async (_req: Request, res: Response) => { + try { + const states = await service.getMedicalOnlyStates(); + res.json({ + success: true, + count: states.length, + states: states.map((s) => ({ + code: s.code, + name: s.name, + med_year: s.med_year, + })), + }); + } catch (error: any) { + console.error('[States API] Error fetching medical-only states:', error); + res.status(500).json({ + success: false, + error: error.message, + }); + } + }); + + /** + * GET /api/states/no-program + * Get states with no cannabis programs + */ + router.get('/no-program', async (_req: Request, res: Response) => { + try { + const states = await service.getIllegalStates(); + res.json({ + success: true, + count: states.length, + states: states.map((s) => ({ + code: s.code, + name: s.name, + })), + }); + } catch (error: any) { + console.error('[States API] Error fetching no-program states:', error); + res.status(500).json({ + success: false, + error: error.message, + }); + } + }); + + /** + * GET /api/states/priorities + * Get all legal states ranked by crawl priority + */ + router.get('/priorities', async (_req: Request, res: Response) => { + try { + const states = await service.getTargetStates(); + res.json({ + success: true, + count: states.length, + states: states.map((s) => ({ + code: s.code, + name: s.name, + legal_type: s.legal_type, + rec_year: s.rec_year, + med_year: s.med_year, + dispensary_count: s.dispensary_count, + priority_score: s.priority_score, + })), + }); + } catch (error: any) { + console.error('[States API] Error fetching priorities:', error); + res.status(500).json({ + success: false, + error: error.message, + }); + } + }); + + /** + * GET /api/states/:code + * Get a single state by code + */ + router.get('/:code', async (req: Request, res: Response) => { + try { + const { code } = req.params; + + if (!code || code.length !== 2) { + return res.status(400).json({ + success: false, + error: 'Invalid state code. Must be 2 characters (e.g., AZ, CA).', + }); + } + + const state = await service.getStateByCode(code); + + if (!state) { + return res.status(404).json({ + success: false, + error: `State not found: ${code.toUpperCase()}`, + }); + } + + res.json({ + success: true, + state: { + code: state.code, + name: state.name, + timezone: state.timezone, + recreational: { + legal: state.recreational_legal || false, + year: state.rec_year, + }, + medical: { + legal: state.medical_legal || false, + year: state.med_year, + }, + dispensary_count: state.dispensary_count, + }, + }); + } catch (error: any) { + console.error('[States API] Error fetching state:', error); + res.status(500).json({ + success: false, + error: error.message, + }); + } + }); + + return router; +} + +export default createStatesRouter; diff --git a/backend/src/routes/stores.ts b/backend/src/routes/stores.ts index 4ce4439a..e70bfe55 100755 --- a/backend/src/routes/stores.ts +++ b/backend/src/routes/stores.ts @@ -1,6 +1,6 @@ import { Router } from 'express'; import { authMiddleware, requireRole } from '../auth/middleware'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { scrapeStore, scrapeCategory, discoverCategories } from '../scraper-v2'; const router = Router(); diff --git a/backend/src/routes/users.ts b/backend/src/routes/users.ts index d900350b..be803ffb 100644 --- a/backend/src/routes/users.ts +++ b/backend/src/routes/users.ts @@ -1,6 +1,6 @@ import { Router } from 'express'; import bcrypt from 'bcrypt'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { authMiddleware, requireRole, AuthRequest } from '../auth/middleware'; const router = Router(); diff --git a/backend/src/scraper-v2/engine.ts b/backend/src/scraper-v2/engine.ts index cda2f375..2bf194f2 100644 --- a/backend/src/scraper-v2/engine.ts +++ b/backend/src/scraper-v2/engine.ts @@ -4,7 +4,7 @@ import { MiddlewareEngine, UserAgentMiddleware, ProxyMiddleware, RateLimitMiddle import { PipelineEngine, ValidationPipeline, SanitizationPipeline, DeduplicationPipeline, ImagePipeline, DatabasePipeline, StatsPipeline } from './pipelines'; import { ScraperRequest, ScraperResponse, ParseResult, Product, ScraperStats } from './types'; import { logger } from '../services/logger'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; /** * Main Scraper Engine - orchestrates the entire scraping process diff --git a/backend/src/scraper-v2/middlewares.ts b/backend/src/scraper-v2/middlewares.ts index 62b36eed..49743270 100644 --- a/backend/src/scraper-v2/middlewares.ts +++ b/backend/src/scraper-v2/middlewares.ts @@ -1,6 +1,6 @@ import { Middleware, ScraperRequest, ScraperResponse, ScraperError, ErrorType, ProxyConfig } from './types'; import { logger } from '../services/logger'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { getActiveProxy, putProxyInTimeout, isBotDetectionError } from '../services/proxy'; // Diverse, realistic user agents - updated for 2024/2025 diff --git a/backend/src/scraper-v2/navigation.ts b/backend/src/scraper-v2/navigation.ts index 1c56c0ad..d9e96302 100644 --- a/backend/src/scraper-v2/navigation.ts +++ b/backend/src/scraper-v2/navigation.ts @@ -1,4 +1,4 @@ -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { logger } from '../services/logger'; import { Downloader } from './downloader'; import { ScraperRequest } from './types'; diff --git a/backend/src/scraper-v2/pipelines.ts b/backend/src/scraper-v2/pipelines.ts index 37f1163a..caef6411 100644 --- a/backend/src/scraper-v2/pipelines.ts +++ b/backend/src/scraper-v2/pipelines.ts @@ -1,6 +1,6 @@ import { ItemPipeline, Product } from './types'; import { logger } from '../services/logger'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { uploadImageFromUrl } from '../utils/minio'; import { normalizeProductName, normalizeBrandName } from '../utils/product-normalizer'; diff --git a/backend/src/scripts/backfill-legacy-to-canonical.ts b/backend/src/scripts/backfill-legacy-to-canonical.ts new file mode 100644 index 00000000..a001a087 --- /dev/null +++ b/backend/src/scripts/backfill-legacy-to-canonical.ts @@ -0,0 +1,1038 @@ +#!/usr/bin/env npx tsx +/** + * Backfill Legacy Dutchie Data to Canonical Schema + * + * Migrates data from dutchie_products (+ raw payload) into: + * - store_products (upsert with enriched data) + * - store_product_snapshots (insert if not exists) + * - crawl_runs (create backfill runs per dispensary) + * + * Usage: + * npx tsx src/scripts/backfill-legacy-to-canonical.ts + * npx tsx src/scripts/backfill-legacy-to-canonical.ts --since=2025-12-01 + * npx tsx src/scripts/backfill-legacy-to-canonical.ts --store-id=73 + * npx tsx src/scripts/backfill-legacy-to-canonical.ts --limit=1000 + * npx tsx src/scripts/backfill-legacy-to-canonical.ts --dry-run + * + * Options: + * --since=YYYY-MM-DD Only process products created/updated since this date + * --store-id=N Only process products for this dispensary ID + * --limit=N Limit number of products to process + * --dry-run Show what would be done without making changes + * --batch-size=N Number of products per batch (default: 100) + */ + +import { Pool } from 'pg'; + +const DB_URL = process.env.DATABASE_URL || process.env.CANNAIQ_DB_URL || + 'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus'; + +// ============================================================================ +// Types +// ============================================================================ + +interface BackfillOptions { + since?: Date; + storeId?: number; + limit?: number; + dryRun: boolean; + batchSize: number; +} + +interface LegacyProduct { + id: number; + dispensary_id: number; + external_product_id: string; + platform: string; + name: string; + brand_name: string | null; + type: string | null; + subcategory: string | null; + primary_image_url: string | null; + stock_status: string | null; + thc: number | null; + cbd: number | null; + created_at: Date; + updated_at: Date; + latest_raw_payload: any; + // Additional legacy columns (actual columns in dutchie_products) + strain_type: string | null; + brand_logo_url: string | null; + platform_dispensary_id: string | null; + weights: any | null; + terpenes: any | null; + effects: any | null; + cannabinoids_v2: any | null; + description: string | null; + medical_only: boolean | null; + rec_only: boolean | null; + featured: boolean | null; + is_below_threshold: boolean | null; + is_below_kiosk_threshold: boolean | null; + price_rec: number | null; + price_med: number | null; +} + +interface BackfillStats { + productsProcessed: number; + productsUpserted: number; + snapshotsCreated: number; + crawlRunsCreated: number; + priceOverflows: number; + errors: string[]; + startTime: Date; + endTime?: Date; +} + +// ============================================================================ +// CLI Argument Parsing +// ============================================================================ + +function parseArgs(): BackfillOptions { + const args = process.argv.slice(2); + const options: BackfillOptions = { + dryRun: false, + batchSize: 100, + }; + + for (const arg of args) { + if (arg === '--dry-run') { + options.dryRun = true; + } else if (arg.startsWith('--since=')) { + const dateStr = arg.substring('--since='.length); + options.since = new Date(dateStr); + if (isNaN(options.since.getTime())) { + console.error(`Invalid date: ${dateStr}`); + process.exit(1); + } + } else if (arg.startsWith('--store-id=')) { + options.storeId = parseInt(arg.substring('--store-id='.length), 10); + } else if (arg.startsWith('--limit=')) { + options.limit = parseInt(arg.substring('--limit='.length), 10); + } else if (arg.startsWith('--batch-size=')) { + options.batchSize = parseInt(arg.substring('--batch-size='.length), 10); + } + } + + return options; +} + +// ============================================================================ +// Data Extraction Helpers +// ============================================================================ + +// Maximum sane price threshold to prevent numeric overflow +const MAX_SANE_PRICE = 10000; + +/** + * Sanitize a price value - returns null if invalid or exceeds threshold + */ +function sanitizePrice(price: number | null | undefined): { value: number | null; overflow: boolean; raw: number | null } { + if (price === null || price === undefined) { + return { value: null, overflow: false, raw: null }; + } + + const numPrice = typeof price === 'number' ? price : parseFloat(String(price)); + + if (isNaN(numPrice)) { + return { value: null, overflow: false, raw: null }; + } + + // Check if price exceeds sane threshold (would cause numeric overflow in DB) + if (numPrice > MAX_SANE_PRICE || numPrice < -MAX_SANE_PRICE) { + return { value: null, overflow: true, raw: numPrice }; + } + + return { value: numPrice, overflow: false, raw: null }; +} + +/** + * Extract pricing from raw payload (or product columns) + * Returns overflow info for storage in provider_data + */ +function extractPricing(payload: any, product?: LegacyProduct): { + priceRec: number | null; + priceMed: number | null; + priceRecSpecial: number | null; + priceMedSpecial: number | null; + isOnSpecial: boolean; + stockQuantity: number | null; + overflowPrices: Record | null; +} { + const overflowPrices: Record = {}; + + // First try to use product columns if available + let rawPriceRec = product?.price_rec ?? null; + let rawPriceMed = product?.price_med ?? null; + + if (!payload && !rawPriceRec && !rawPriceMed) { + return { + priceRec: null, + priceMed: null, + priceRecSpecial: null, + priceMedSpecial: null, + isOnSpecial: false, + stockQuantity: null, + overflowPrices: null, + }; + } + + // If no price from columns, extract from payload arrays + if (rawPriceRec === null && payload) { + const recPrices = payload.recPrices || payload.Prices || []; + rawPriceRec = recPrices.length > 0 ? parseFloat(recPrices[0]) : null; + } + + if (rawPriceMed === null && payload) { + const medPrices = payload.medPrices || []; + rawPriceMed = medPrices.length > 0 ? parseFloat(medPrices[0]) : null; + } + + // Sanitize prices - handle overflow + const recResult = sanitizePrice(rawPriceRec); + const medResult = sanitizePrice(rawPriceMed); + + if (recResult.overflow && recResult.raw !== null) { + overflowPrices.raw_legacy_price_rec = recResult.raw; + } + if (medResult.overflow && medResult.raw !== null) { + overflowPrices.raw_legacy_price_med = medResult.raw; + } + + // Check for special pricing + const isOnSpecial = payload?.special === true; + + // Try to get special prices from POSMetaData + let rawPriceRecSpecial: number | null = null; + let rawPriceMedSpecial: number | null = null; + let stockQuantity: number | null = null; + + if (payload?.POSMetaData?.children?.length > 0) { + const firstChild = payload.POSMetaData.children[0]; + if (firstChild.specialPrice && isOnSpecial) { + rawPriceRecSpecial = parseFloat(firstChild.specialPrice); + } + if (firstChild.medSpecialPrice && isOnSpecial) { + rawPriceMedSpecial = parseFloat(firstChild.medSpecialPrice); + } + if (firstChild.quantity != null) { + stockQuantity = parseInt(firstChild.quantity, 10); + } + } + + // Sanitize special prices + const recSpecialResult = sanitizePrice(rawPriceRecSpecial); + const medSpecialResult = sanitizePrice(rawPriceMedSpecial); + + if (recSpecialResult.overflow && recSpecialResult.raw !== null) { + overflowPrices.raw_legacy_price_rec_special = recSpecialResult.raw; + } + if (medSpecialResult.overflow && medSpecialResult.raw !== null) { + overflowPrices.raw_legacy_price_med_special = medSpecialResult.raw; + } + + return { + priceRec: recResult.value, + priceMed: medResult.value, + priceRecSpecial: recSpecialResult.value, + priceMedSpecial: medSpecialResult.value, + isOnSpecial, + stockQuantity: stockQuantity && !isNaN(stockQuantity) ? stockQuantity : null, + overflowPrices: Object.keys(overflowPrices).length > 0 ? overflowPrices : null, + }; +} + +// Maximum sane potency percent (THC/CBD are usually 0-100%, but some strains can be higher) +// Anything above this is likely in milligrams (for edibles) not percent +const MAX_SANE_POTENCY_PERCENT = 100; + +/** + * Extract THC/CBD from raw payload + * Note: Some edibles report THC in mg (e.g. 1000mg), not percent. + * We detect this by checking if value > 100 and store raw value in overflowPotency. + */ +function extractPotency(payload: any, product?: LegacyProduct): { + thcPercent: number | null; + cbdPercent: number | null; + overflowPotency: Record | null; +} { + const overflowPotency: Record = {}; + + if (!payload) { + return { thcPercent: null, cbdPercent: null, overflowPotency: null }; + } + + let rawThc: number | null = null; + let rawCbd: number | null = null; + let thcUnit: string | null = null; + let cbdUnit: string | null = null; + + // Try THCContent.range - may have unit info + if (payload.THCContent?.range?.length > 0) { + rawThc = parseFloat(payload.THCContent.range[0]); + thcUnit = payload.THCContent.unit || null; + } else if (product?.thc != null) { + rawThc = product.thc; + } else if (payload.THC != null) { + rawThc = parseFloat(payload.THC); + } + + // Try CBDContent.range - may have unit info + if (payload.CBDContent?.range?.length > 0) { + rawCbd = parseFloat(payload.CBDContent.range[0]); + cbdUnit = payload.CBDContent.unit || null; + } else if (product?.cbd != null) { + rawCbd = product.cbd; + } else if (payload.CBD != null) { + rawCbd = parseFloat(payload.CBD); + } + + // Sanitize THC - if unit is MILLIGRAMS or value > 100, it's not a percentage + let thcPercent: number | null = null; + if (rawThc !== null && !isNaN(rawThc)) { + const isMg = thcUnit === 'MILLIGRAMS' || rawThc > MAX_SANE_POTENCY_PERCENT; + if (isMg) { + // Store raw value and unit info, don't use as percentage + overflowPotency.raw_thc_mg = rawThc; + if (thcUnit) overflowPotency.thc_unit = thcUnit; + } else { + thcPercent = rawThc; + } + } + + // Sanitize CBD - if unit is MILLIGRAMS or value > 100, it's not a percentage + let cbdPercent: number | null = null; + if (rawCbd !== null && !isNaN(rawCbd)) { + const isMg = cbdUnit === 'MILLIGRAMS' || rawCbd > MAX_SANE_POTENCY_PERCENT; + if (isMg) { + // Store raw value and unit info, don't use as percentage + overflowPotency.raw_cbd_mg = rawCbd; + if (cbdUnit) overflowPotency.cbd_unit = cbdUnit; + } else { + cbdPercent = rawCbd; + } + } + + return { + thcPercent, + cbdPercent, + overflowPotency: Object.keys(overflowPotency).length > 0 ? overflowPotency : null, + }; +} + +/** + * Map stock status from legacy to canonical + */ +function mapStockStatus(legacyStatus: string | null, payload: any): { + isInStock: boolean; + stockStatus: string; +} { + // Check payload Status field + const payloadStatus = payload?.Status; + + if (payloadStatus === 'Active' || legacyStatus === 'in_stock') { + return { isInStock: true, stockStatus: 'in_stock' }; + } else if (payloadStatus === 'Inactive' || legacyStatus === 'out_of_stock') { + return { isInStock: false, stockStatus: 'out_of_stock' }; + } + + // Default to in_stock if we have the product + return { isInStock: true, stockStatus: 'in_stock' }; +} + +/** + * Extract hybrid model fields for store_products + * (strain_type, medical_only, rec_only, brand_logo_url, platform_dispensary_id) + */ +function extractProductHybridFields(product: LegacyProduct, payload: any): { + strainType: string | null; + medicalOnly: boolean; + recOnly: boolean; + brandLogoUrl: string | null; + platformDispensaryId: string | null; +} { + // strain_type from legacy column or raw payload + const strainType = product.strain_type || payload?.strainType || null; + + // medical_only / rec_only from legacy column or raw payload + const medicalOnly = product.medical_only === true || payload?.medicalOnly === true; + const recOnly = product.rec_only === true || payload?.recOnly === true; + + // brand_logo_url from legacy column or raw payload + const brandLogoUrl = product.brand_logo_url || payload?.brandLogo || null; + + // platform_dispensary_id from legacy column + const platformDispensaryId = product.platform_dispensary_id || null; + + return { + strainType, + medicalOnly, + recOnly, + brandLogoUrl, + platformDispensaryId, + }; +} + +/** + * Extract hybrid model fields for store_product_snapshots + * (featured, is_below_threshold, is_below_kiosk_threshold) + */ +function extractSnapshotHybridFields(product: LegacyProduct, payload: any): { + featured: boolean; + isBelowThreshold: boolean; + isBelowKioskThreshold: boolean; +} { + // Featured flag from legacy column or raw payload + const featured = product.featured === true || payload?.featured === true; + + // Threshold flags from legacy column or raw payload + const isBelowThreshold = product.is_below_threshold === true || payload?.isBelowThreshold === true || payload?.belowThreshold === true; + const isBelowKioskThreshold = product.is_below_kiosk_threshold === true || payload?.isBelowKioskThreshold === true || payload?.belowKioskThreshold === true; + + return { + featured, + isBelowThreshold, + isBelowKioskThreshold, + }; +} + +/** + * Build provider_data JSONB for store_products + * Contains ALL fields not mapped to canonical columns + */ +function buildProductProviderData(product: LegacyProduct, payload: any): Record { + const providerData: Record = {}; + + // === From dutchie_products columns === + if (product.weights) providerData.weights = product.weights; + if (product.terpenes) providerData.terpenes = product.terpenes; + if (product.effects) providerData.effects = product.effects; + if (product.cannabinoids_v2) providerData.cannabinoids_v2 = product.cannabinoids_v2; + if (product.description) providerData.description = product.description; + + // === From latest_raw_payload (fields not already mapped) === + if (payload) { + // Weight/options/variants + if (payload.Options) providerData.Options = payload.Options; + if (payload.Weights) providerData.Weights = payload.Weights; + if (payload.weightOptions) providerData.weightOptions = payload.weightOptions; + + // Terpenes and effects (from payload if not in product) + if (!product.terpenes && payload.terpenes) providerData.terpenes = payload.terpenes; + if (!product.effects && payload.effects) providerData.effects = payload.effects; + + // Cannabinoids (extended) + if (payload.cannabinoids) providerData.cannabinoids = payload.cannabinoids; + if (payload.cannabinoidsV2) providerData.cannabinoidsV2 = payload.cannabinoidsV2; + if (payload.THCContent) providerData.THCContent = payload.THCContent; + if (payload.CBDContent) providerData.CBDContent = payload.CBDContent; + + // Pricing arrays (full data) + if (payload.recPrices) providerData.recPrices = payload.recPrices; + if (payload.medPrices) providerData.medPrices = payload.medPrices; + if (payload.Prices) providerData.Prices = payload.Prices; + + // POS metadata + if (payload.POSMetaData) providerData.POSMetaData = payload.POSMetaData; + + // Product metadata + if (payload.slug) providerData.slug = payload.slug; + if (payload.enterprise) providerData.enterprise = payload.enterprise; + if (payload.enterpriseProductId) providerData.enterpriseProductId = payload.enterpriseProductId; + if (payload.posId) providerData.posId = payload.posId; + if (payload.cName) providerData.cName = payload.cName; + if (payload.dispensaryId) providerData.dispensaryId = payload.dispensaryId; + + // Additional flags + if (payload.isMixAndMatch != null) providerData.isMixAndMatch = payload.isMixAndMatch; + if (payload.isStaffPick != null) providerData.isStaffPick = payload.isStaffPick; + if (payload.customerLimit != null) providerData.customerLimit = payload.customerLimit; + if (payload.purchaseLimit != null) providerData.purchaseLimit = payload.purchaseLimit; + + // Category/type details + if (payload.category) providerData.rawCategory = payload.category; + if (payload.subcategory) providerData.rawSubcategory = payload.subcategory; + if (payload.type) providerData.rawType = payload.type; + } + + // Only return if we have data + return Object.keys(providerData).length > 0 ? providerData : {}; +} + +/** + * Build provider_data JSONB for store_product_snapshots + * Contains ALL snapshot-specific fields not mapped to canonical columns + */ +function buildSnapshotProviderData(payload: any): Record { + const providerData: Record = {}; + + if (!payload) return providerData; + + // Snapshot-specific option data (pricing tiers, etc.) + if (payload.Options) providerData.Options = payload.Options; + if (payload.POSMetaData?.children) { + providerData.posChildren = payload.POSMetaData.children; + } + + // Kiosk-specific fields + if (payload.kioskPrices) providerData.kioskPrices = payload.kioskPrices; + if (payload.kioskData) providerData.kioskData = payload.kioskData; + + // Quantity/inventory details beyond simple count + if (payload.quantityAvailable != null) providerData.quantityAvailable = payload.quantityAvailable; + if (payload.inventoryByLocation) providerData.inventoryByLocation = payload.inventoryByLocation; + + return Object.keys(providerData).length > 0 ? providerData : {}; +} + +// ============================================================================ +// Database Operations +// ============================================================================ + +/** + * Get or create a backfill crawl_run for a dispensary + */ +async function getOrCreateBackfillCrawlRun( + pool: Pool, + dispensaryId: number, + capturedAt: Date, + dryRun: boolean +): Promise { + // Normalize to start of day for grouping + const dayStart = new Date(capturedAt); + dayStart.setUTCHours(0, 0, 0, 0); + + // Check if a backfill run already exists for this dispensary/day + const existingResult = await pool.query(` + SELECT id FROM crawl_runs + WHERE dispensary_id = $1 + AND trigger_type = 'backfill' + AND DATE(started_at) = DATE($2) + LIMIT 1 + `, [dispensaryId, dayStart]); + + if (existingResult.rows.length > 0) { + return existingResult.rows[0].id; + } + + if (dryRun) { + console.log(` [DRY RUN] Would create crawl_run for dispensary ${dispensaryId} on ${dayStart.toISOString().split('T')[0]}`); + return null; + } + + // Create a new backfill crawl run + const insertResult = await pool.query(` + INSERT INTO crawl_runs ( + dispensary_id, + provider, + started_at, + finished_at, + duration_ms, + status, + trigger_type, + metadata + ) VALUES ( + $1, + 'dutchie', + $2, + $2, + 0, + 'success', + 'backfill', + $3 + ) + RETURNING id + `, [ + dispensaryId, + dayStart, + JSON.stringify({ source: 'legacy_backfill', backfill_date: new Date().toISOString() }) + ]); + + return insertResult.rows[0].id; +} + +/** + * Upsert a store_product from legacy data + */ +async function upsertStoreProduct( + pool: Pool, + product: LegacyProduct, + dryRun: boolean +): Promise<{ id: number | null; hadOverflow: boolean }> { + const payload = product.latest_raw_payload; + const pricing = extractPricing(payload, product); + const potency = extractPotency(payload, product); + const stockInfo = mapStockStatus(product.stock_status, payload); + const hybridFields = extractProductHybridFields(product, payload); + const providerData = buildProductProviderData(product, payload); + + // Merge overflow prices into provider_data if any + let hadOverflow = false; + if (pricing.overflowPrices) { + hadOverflow = true; + Object.assign(providerData, pricing.overflowPrices); + console.log(` [WARN] Product ${product.id} "${product.name.substring(0, 40)}": price overflow, storing in provider_data`); + } + + // Merge overflow potency (THC/CBD in mg) into provider_data if any + if (potency.overflowPotency) { + hadOverflow = true; + Object.assign(providerData, potency.overflowPotency); + console.log(` [WARN] Product ${product.id} "${product.name.substring(0, 40)}": potency in mg, storing in provider_data`); + } + + if (dryRun) { + console.log(` [DRY RUN] Would upsert store_product: ${product.name.substring(0, 50)}...`); + return { id: null, hadOverflow }; + } + + const result = await pool.query(` + INSERT INTO store_products ( + dispensary_id, + provider, + provider_product_id, + name_raw, + brand_name_raw, + category_raw, + subcategory_raw, + image_url, + price_rec, + price_med, + price_rec_special, + price_med_special, + is_on_special, + is_in_stock, + stock_quantity, + stock_status, + thc_percent, + cbd_percent, + first_seen_at, + last_seen_at, + -- Hybrid model columns + strain_type, + medical_only, + rec_only, + brand_logo_url, + platform_dispensary_id, + provider_data, + updated_at + ) VALUES ( + $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, + $21, $22, $23, $24, $25, $26, NOW() + ) + ON CONFLICT (dispensary_id, provider, provider_product_id) + DO UPDATE SET + name_raw = COALESCE(EXCLUDED.name_raw, store_products.name_raw), + brand_name_raw = COALESCE(EXCLUDED.brand_name_raw, store_products.brand_name_raw), + category_raw = COALESCE(EXCLUDED.category_raw, store_products.category_raw), + subcategory_raw = COALESCE(EXCLUDED.subcategory_raw, store_products.subcategory_raw), + image_url = COALESCE(EXCLUDED.image_url, store_products.image_url), + price_rec = COALESCE(EXCLUDED.price_rec, store_products.price_rec), + price_med = COALESCE(EXCLUDED.price_med, store_products.price_med), + price_rec_special = COALESCE(EXCLUDED.price_rec_special, store_products.price_rec_special), + price_med_special = COALESCE(EXCLUDED.price_med_special, store_products.price_med_special), + is_on_special = COALESCE(EXCLUDED.is_on_special, store_products.is_on_special), + is_in_stock = COALESCE(EXCLUDED.is_in_stock, store_products.is_in_stock), + stock_quantity = COALESCE(EXCLUDED.stock_quantity, store_products.stock_quantity), + stock_status = COALESCE(EXCLUDED.stock_status, store_products.stock_status), + thc_percent = COALESCE(EXCLUDED.thc_percent, store_products.thc_percent), + cbd_percent = COALESCE(EXCLUDED.cbd_percent, store_products.cbd_percent), + first_seen_at = LEAST(store_products.first_seen_at, EXCLUDED.first_seen_at), + last_seen_at = GREATEST(store_products.last_seen_at, EXCLUDED.last_seen_at), + -- Hybrid model columns + strain_type = COALESCE(EXCLUDED.strain_type, store_products.strain_type), + medical_only = COALESCE(EXCLUDED.medical_only, store_products.medical_only), + rec_only = COALESCE(EXCLUDED.rec_only, store_products.rec_only), + brand_logo_url = COALESCE(EXCLUDED.brand_logo_url, store_products.brand_logo_url), + platform_dispensary_id = COALESCE(EXCLUDED.platform_dispensary_id, store_products.platform_dispensary_id), + provider_data = COALESCE(EXCLUDED.provider_data, store_products.provider_data), + updated_at = NOW() + RETURNING id + `, [ + product.dispensary_id, + product.platform || 'dutchie', + product.external_product_id, + product.name, + product.brand_name, + product.type, + product.subcategory, + product.primary_image_url, + pricing.priceRec, + pricing.priceMed, + pricing.priceRecSpecial, + pricing.priceMedSpecial, + pricing.isOnSpecial, + stockInfo.isInStock, + pricing.stockQuantity, + stockInfo.stockStatus, + potency.thcPercent, + potency.cbdPercent, + product.created_at, + product.updated_at, + // Hybrid model columns + hybridFields.strainType, + hybridFields.medicalOnly, + hybridFields.recOnly, + hybridFields.brandLogoUrl, + hybridFields.platformDispensaryId, + Object.keys(providerData).length > 0 ? JSON.stringify(providerData) : null, + ]); + + return { id: result.rows[0].id, hadOverflow }; +} + +/** + * Create a snapshot if one doesn't exist for this product+crawl_run + */ +async function createSnapshotIfNotExists( + pool: Pool, + product: LegacyProduct, + storeProductId: number, + crawlRunId: number, + capturedAt: Date, + dryRun: boolean +): Promise { + // Check if snapshot already exists + const existingResult = await pool.query(` + SELECT id FROM store_product_snapshots + WHERE store_product_id = $1 AND crawl_run_id = $2 + LIMIT 1 + `, [storeProductId, crawlRunId]); + + if (existingResult.rows.length > 0) { + return false; // Already exists + } + + if (dryRun) { + console.log(` [DRY RUN] Would create snapshot for store_product ${storeProductId}`); + return true; + } + + const payload = product.latest_raw_payload; + const pricing = extractPricing(payload, product); + const potency = extractPotency(payload, product); + const stockInfo = mapStockStatus(product.stock_status, payload); + const snapshotHybrid = extractSnapshotHybridFields(product, payload); + const snapshotProviderData = buildSnapshotProviderData(payload); + + // Merge overflow prices into provider_data if any + if (pricing.overflowPrices) { + Object.assign(snapshotProviderData, pricing.overflowPrices); + } + + // Merge overflow potency (THC/CBD in mg) into provider_data if any + if (potency.overflowPotency) { + Object.assign(snapshotProviderData, potency.overflowPotency); + } + + await pool.query(` + INSERT INTO store_product_snapshots ( + dispensary_id, + store_product_id, + provider, + provider_product_id, + crawl_run_id, + captured_at, + name_raw, + brand_name_raw, + category_raw, + subcategory_raw, + price_rec, + price_med, + price_rec_special, + price_med_special, + is_on_special, + is_in_stock, + stock_quantity, + stock_status, + thc_percent, + cbd_percent, + image_url, + raw_data, + -- Hybrid model columns + featured, + is_below_threshold, + is_below_kiosk_threshold, + provider_data + ) VALUES ( + $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, + $23, $24, $25, $26 + ) + ON CONFLICT DO NOTHING + `, [ + product.dispensary_id, + storeProductId, + product.platform || 'dutchie', + product.external_product_id, + crawlRunId, + capturedAt, + product.name, + product.brand_name, + product.type, + product.subcategory, + pricing.priceRec, + pricing.priceMed, + pricing.priceRecSpecial, + pricing.priceMedSpecial, + pricing.isOnSpecial, + stockInfo.isInStock, + pricing.stockQuantity, + stockInfo.stockStatus, + potency.thcPercent, + potency.cbdPercent, + product.primary_image_url, + payload, + // Hybrid model columns + snapshotHybrid.featured, + snapshotHybrid.isBelowThreshold, + snapshotHybrid.isBelowKioskThreshold, + Object.keys(snapshotProviderData).length > 0 ? JSON.stringify(snapshotProviderData) : null, + ]); + + return true; +} + +// ============================================================================ +// Main Backfill Logic +// ============================================================================ + +async function backfillProducts( + pool: Pool, + options: BackfillOptions +): Promise { + const stats: BackfillStats = { + productsProcessed: 0, + productsUpserted: 0, + snapshotsCreated: 0, + crawlRunsCreated: 0, + priceOverflows: 0, + errors: [], + startTime: new Date(), + }; + + // Build query for legacy products + let whereConditions: string[] = []; + let params: any[] = []; + let paramIndex = 1; + + if (options.since) { + whereConditions.push(`dp.updated_at >= $${paramIndex}`); + params.push(options.since); + paramIndex++; + } + + if (options.storeId) { + whereConditions.push(`dp.dispensary_id = $${paramIndex}`); + params.push(options.storeId); + paramIndex++; + } + + const whereClause = whereConditions.length > 0 + ? `WHERE ${whereConditions.join(' AND ')}` + : ''; + + // Count total products + const countResult = await pool.query(` + SELECT COUNT(*) as count FROM dutchie_products dp ${whereClause} + `, params); + const totalProducts = parseInt(countResult.rows[0].count, 10); + + console.log(`\nFound ${totalProducts} legacy products to process`); + if (options.dryRun) { + console.log('DRY RUN MODE - No changes will be made\n'); + } + + // Track crawl runs we've created + const crawlRunCache = new Map(); + + // Process in batches + let offset = 0; + while (true) { + const batchResult = await pool.query(` + SELECT + dp.id, + dp.dispensary_id, + dp.external_product_id, + dp.platform, + dp.name, + dp.brand_name, + dp.type, + dp.subcategory, + dp.primary_image_url, + dp.stock_status, + dp.thc, + dp.cbd, + dp.created_at, + dp.updated_at, + dp.latest_raw_payload, + -- Additional legacy columns for hybrid model + dp.strain_type, + dp.brand_logo_url, + dp.platform_dispensary_id, + dp.weights, + dp.terpenes, + dp.effects, + dp.cannabinoids_v2, + dp.description, + dp.medical_only, + dp.rec_only, + dp.featured, + dp.is_below_threshold, + dp.is_below_kiosk_threshold, + dp.price_rec, + dp.price_med + FROM dutchie_products dp + ${whereClause} + ORDER BY dp.dispensary_id, dp.id + OFFSET ${offset} + LIMIT ${options.batchSize} + `, params); + + if (batchResult.rows.length === 0) { + break; + } + + console.log(`Processing batch: ${offset + 1} to ${offset + batchResult.rows.length} of ${options.limit || totalProducts}`); + + for (const row of batchResult.rows) { + try { + const product: LegacyProduct = row; + stats.productsProcessed++; + + // Get or create crawl run for this dispensary/day + const capturedAt = product.updated_at || product.created_at || new Date(); + const dayKey = `${product.dispensary_id}:${capturedAt.toISOString().split('T')[0]}`; + + let crawlRunId = crawlRunCache.get(dayKey); + if (!crawlRunId && !options.dryRun) { + crawlRunId = await getOrCreateBackfillCrawlRun( + pool, + product.dispensary_id, + capturedAt, + options.dryRun + ); + if (crawlRunId) { + crawlRunCache.set(dayKey, crawlRunId); + stats.crawlRunsCreated++; + } + } + + // Upsert store_product + const result = await upsertStoreProduct(pool, product, options.dryRun); + if (result.hadOverflow) { + stats.priceOverflows++; + } + if (result.id) { + stats.productsUpserted++; + + // Create snapshot if crawl run exists + if (crawlRunId) { + const created = await createSnapshotIfNotExists( + pool, + product, + result.id, + crawlRunId, + capturedAt, + options.dryRun + ); + if (created) { + stats.snapshotsCreated++; + } + } + } else if (options.dryRun) { + // In dry run mode, result.id is null but we still count it + stats.productsUpserted++; + } + } catch (error: any) { + const errorMsg = `Product ${row.id}: ${error.message}`; + stats.errors.push(errorMsg); + if (stats.errors.length <= 10) { + console.error(` Error: ${errorMsg}`); + } + } + } + + offset += batchResult.rows.length; + + // Progress update + const progress = Math.round((stats.productsProcessed / (options.limit || totalProducts)) * 100); + console.log(` Progress: ${progress}% (${stats.productsUpserted} upserted, ${stats.snapshotsCreated} snapshots)`); + + // Check if we've hit the limit + if (options.limit && stats.productsProcessed >= options.limit) { + break; + } + } + + stats.endTime = new Date(); + return stats; +} + +// ============================================================================ +// Main Entry Point +// ============================================================================ + +async function main() { + const options = parseArgs(); + + console.log('========================================================='); + console.log(' Backfill Legacy Dutchie Data to Canonical Schema'); + console.log('========================================================='); + console.log(`\nDatabase: ${DB_URL.replace(/:[^:@]+@/, ':****@')}`); + console.log(`Options:`); + if (options.since) console.log(` - Since: ${options.since.toISOString()}`); + if (options.storeId) console.log(` - Store ID: ${options.storeId}`); + if (options.limit) console.log(` - Limit: ${options.limit}`); + console.log(` - Batch size: ${options.batchSize}`); + console.log(` - Dry run: ${options.dryRun}`); + + const pool = new Pool({ connectionString: DB_URL }); + + try { + // Test connection + const { rows } = await pool.query('SELECT NOW() as time'); + console.log(`\nConnected at: ${rows[0].time}`); + + // Run backfill + const stats = await backfillProducts(pool, options); + + // Print summary + const duration = stats.endTime + ? (stats.endTime.getTime() - stats.startTime.getTime()) / 1000 + : 0; + + console.log('\n========================================================='); + console.log(' SUMMARY'); + console.log('========================================================='); + console.log(` Products processed: ${stats.productsProcessed}`); + console.log(` Products upserted: ${stats.productsUpserted}`); + console.log(` Snapshots created: ${stats.snapshotsCreated}`); + console.log(` Crawl runs created: ${stats.crawlRunsCreated}`); + console.log(` Price overflows: ${stats.priceOverflows}` + (stats.priceOverflows > 0 ? ' (stored in provider_data)' : '')); + console.log(` Errors: ${stats.errors.length}`); + console.log(` Duration: ${duration.toFixed(1)}s`); + + if (stats.errors.length > 10) { + console.log(`\n (${stats.errors.length - 10} more errors not shown)`); + } + + if (options.dryRun) { + console.log('\n [DRY RUN] No changes were made to the database'); + } + + if (stats.errors.length > 0) { + console.log('\n Completed with errors'); + process.exit(1); + } + + console.log('\n Backfill completed successfully'); + process.exit(0); + } catch (error: any) { + console.error('\n Backfill failed:', error.message); + process.exit(1); + } finally { + await pool.end(); + } +} + +main(); diff --git a/backend/src/scripts/backfill-store-dispensary.ts b/backend/src/scripts/backfill-store-dispensary.ts index 45a1afb6..98d567ca 100644 --- a/backend/src/scripts/backfill-store-dispensary.ts +++ b/backend/src/scripts/backfill-store-dispensary.ts @@ -11,7 +11,7 @@ * npx tsx src/scripts/backfill-store-dispensary.ts --verbose # Show all match details */ -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { logger } from '../services/logger'; const args = process.argv.slice(2); diff --git a/backend/src/scripts/bootstrap-discovery.ts b/backend/src/scripts/bootstrap-discovery.ts index 2aa2a00c..86dbe642 100644 --- a/backend/src/scripts/bootstrap-discovery.ts +++ b/backend/src/scripts/bootstrap-discovery.ts @@ -14,7 +14,7 @@ * npx tsx src/scripts/bootstrap-discovery.ts --status # Show current status only */ -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { ensureAllDispensariesHaveSchedules, runDispensaryOrchestrator, diff --git a/backend/src/scripts/bootstrap-local-admin.ts b/backend/src/scripts/bootstrap-local-admin.ts new file mode 100644 index 00000000..a42629cf --- /dev/null +++ b/backend/src/scripts/bootstrap-local-admin.ts @@ -0,0 +1,101 @@ +/** + * LOCAL-ONLY Admin Bootstrap Script + * + * Creates or resets a local admin user for development. + * This script is ONLY for local development - never use in production. + * + * Usage: + * cd backend + * npx tsx src/scripts/bootstrap-local-admin.ts + * + * Default credentials: + * Email: admin@local.test + * Password: admin123 + */ + +import bcrypt from 'bcrypt'; +import { query, closePool } from '../dutchie-az/db/connection'; + +// Local admin credentials - deterministic for dev +const LOCAL_ADMIN_EMAIL = 'admin@local.test'; +const LOCAL_ADMIN_PASSWORD = 'admin123'; +const LOCAL_ADMIN_ROLE = 'admin'; // Match existing schema (admin, not superadmin) + +async function bootstrapLocalAdmin(): Promise { + console.log('='.repeat(60)); + console.log('LOCAL ADMIN BOOTSTRAP'); + console.log('='.repeat(60)); + console.log(''); + console.log('This script creates/resets a local admin user for development.'); + console.log(''); + + try { + // Hash the password with bcrypt (10 rounds, matching existing code) + const passwordHash = await bcrypt.hash(LOCAL_ADMIN_PASSWORD, 10); + + // Check if user exists + const existing = await query<{ id: number; email: string }>( + 'SELECT id, email FROM users WHERE email = $1', + [LOCAL_ADMIN_EMAIL] + ); + + if (existing.rows.length > 0) { + // User exists - update password and role + console.log(`User "${LOCAL_ADMIN_EMAIL}" already exists (id=${existing.rows[0].id})`); + console.log('Resetting password and ensuring admin role...'); + + await query( + `UPDATE users + SET password_hash = $1, + role = $2, + updated_at = NOW() + WHERE email = $3`, + [passwordHash, LOCAL_ADMIN_ROLE, LOCAL_ADMIN_EMAIL] + ); + + console.log('User updated successfully.'); + } else { + // User doesn't exist - create new + console.log(`Creating new admin user: ${LOCAL_ADMIN_EMAIL}`); + + const result = await query<{ id: number }>( + `INSERT INTO users (email, password_hash, role, created_at, updated_at) + VALUES ($1, $2, $3, NOW(), NOW()) + RETURNING id`, + [LOCAL_ADMIN_EMAIL, passwordHash, LOCAL_ADMIN_ROLE] + ); + + console.log(`User created successfully (id=${result.rows[0].id})`); + } + + console.log(''); + console.log('='.repeat(60)); + console.log('LOCAL ADMIN READY'); + console.log('='.repeat(60)); + console.log(''); + console.log('Login credentials:'); + console.log(` Email: ${LOCAL_ADMIN_EMAIL}`); + console.log(` Password: ${LOCAL_ADMIN_PASSWORD}`); + console.log(''); + console.log('Admin UI: http://localhost:8080/admin'); + console.log(''); + + } catch (error: any) { + console.error(''); + console.error('ERROR: Failed to bootstrap local admin'); + console.error(error.message); + + if (error.message.includes('relation "users" does not exist')) { + console.error(''); + console.error('The "users" table does not exist.'); + console.error('Run migrations first: npm run migrate'); + } + + process.exit(1); + } finally { + await closePool(); + } +} + +// Run the bootstrap +bootstrapLocalAdmin(); diff --git a/backend/src/scripts/discovery-dutchie-cities.ts b/backend/src/scripts/discovery-dutchie-cities.ts new file mode 100644 index 00000000..2215d4df --- /dev/null +++ b/backend/src/scripts/discovery-dutchie-cities.ts @@ -0,0 +1,86 @@ +#!/usr/bin/env npx tsx +/** + * Dutchie City Discovery CLI Runner + * + * Discovers cities from Dutchie's /cities page and upserts to dutchie_discovery_cities. + * + * Usage: + * npm run discovery:dutchie:cities + * npx tsx src/scripts/discovery-dutchie-cities.ts + * + * Environment: + * DATABASE_URL - PostgreSQL connection string (required) + */ + +import { Pool } from 'pg'; +import { DutchieCityDiscovery } from '../dutchie-az/discovery/DutchieCityDiscovery'; + +async function main() { + console.log('='.repeat(60)); + console.log('DUTCHIE CITY DISCOVERY'); + console.log('='.repeat(60)); + + // Get database URL from environment + const connectionString = process.env.DATABASE_URL; + if (!connectionString) { + console.error('ERROR: DATABASE_URL environment variable is required'); + console.error(''); + console.error('Usage:'); + console.error(' DATABASE_URL="postgresql://..." npm run discovery:dutchie:cities'); + process.exit(1); + } + + // Create pool + const pool = new Pool({ connectionString }); + + try { + // Test connection + await pool.query('SELECT 1'); + console.log('[CLI] Database connection established'); + + // Run discovery + const discovery = new DutchieCityDiscovery(pool); + const result = await discovery.run(); + + // Print summary + console.log(''); + console.log('='.repeat(60)); + console.log('DISCOVERY COMPLETE'); + console.log('='.repeat(60)); + console.log(`Cities found: ${result.citiesFound}`); + console.log(`Cities inserted: ${result.citiesInserted}`); + console.log(`Cities updated: ${result.citiesUpdated}`); + console.log(`Errors: ${result.errors.length}`); + console.log(`Duration: ${(result.durationMs / 1000).toFixed(1)}s`); + + if (result.errors.length > 0) { + console.log(''); + console.log('Errors:'); + result.errors.forEach((e) => console.log(` - ${e}`)); + } + + // Show stats + console.log(''); + console.log('Current Statistics:'); + const stats = await discovery.getStats(); + console.log(` Total cities: ${stats.total}`); + console.log(` Crawl enabled: ${stats.crawlEnabled}`); + console.log(` Never crawled: ${stats.neverCrawled}`); + console.log(''); + console.log('By Country:'); + stats.byCountry.forEach((c) => console.log(` ${c.countryCode}: ${c.count}`)); + console.log(''); + console.log('By State (top 10):'); + stats.byState.slice(0, 10).forEach((s) => console.log(` ${s.stateCode} (${s.countryCode}): ${s.count}`)); + + process.exit(result.errors.length > 0 ? 1 : 0); + } catch (error: any) { + console.error('FATAL ERROR:', error.message); + console.error(error.stack); + process.exit(1); + } finally { + await pool.end(); + } +} + +main(); diff --git a/backend/src/scripts/discovery-dutchie-locations.ts b/backend/src/scripts/discovery-dutchie-locations.ts new file mode 100644 index 00000000..7567cb01 --- /dev/null +++ b/backend/src/scripts/discovery-dutchie-locations.ts @@ -0,0 +1,189 @@ +#!/usr/bin/env npx tsx +/** + * Dutchie Location Discovery CLI Runner + * + * Discovers store locations for cities and upserts to dutchie_discovery_locations. + * + * Usage: + * npm run discovery:dutchie:locations -- --all-enabled + * npm run discovery:dutchie:locations -- --city-slug=phoenix + * npm run discovery:dutchie:locations -- --all-enabled --limit=10 + * + * npx tsx src/scripts/discovery-dutchie-locations.ts --all-enabled + * npx tsx src/scripts/discovery-dutchie-locations.ts --city-slug=phoenix + * + * Options: + * --city-slug= Run for a single city by its slug + * --all-enabled Run for all cities where crawl_enabled = TRUE + * --limit= Limit the number of cities to process + * --delay= Delay between cities in ms (default: 2000) + * + * Environment: + * DATABASE_URL - PostgreSQL connection string (required) + */ + +import { Pool } from 'pg'; +import { DutchieLocationDiscovery } from '../dutchie-az/discovery/DutchieLocationDiscovery'; + +// Parse command line arguments +function parseArgs(): { + citySlug: string | null; + allEnabled: boolean; + limit: number | undefined; + delay: number; +} { + const args = process.argv.slice(2); + let citySlug: string | null = null; + let allEnabled = false; + let limit: number | undefined = undefined; + let delay = 2000; + + for (const arg of args) { + if (arg.startsWith('--city-slug=')) { + citySlug = arg.split('=')[1]; + } else if (arg === '--all-enabled') { + allEnabled = true; + } else if (arg.startsWith('--limit=')) { + limit = parseInt(arg.split('=')[1], 10); + } else if (arg.startsWith('--delay=')) { + delay = parseInt(arg.split('=')[1], 10); + } + } + + return { citySlug, allEnabled, limit, delay }; +} + +function printUsage() { + console.log(` +Dutchie Location Discovery CLI + +Usage: + npx tsx src/scripts/discovery-dutchie-locations.ts [options] + +Options: + --city-slug= Run for a single city by its slug + --all-enabled Run for all cities where crawl_enabled = TRUE + --limit= Limit the number of cities to process + --delay= Delay between cities in ms (default: 2000) + +Examples: + npx tsx src/scripts/discovery-dutchie-locations.ts --all-enabled + npx tsx src/scripts/discovery-dutchie-locations.ts --city-slug=phoenix + npx tsx src/scripts/discovery-dutchie-locations.ts --all-enabled --limit=5 + +Environment: + DATABASE_URL - PostgreSQL connection string (required) +`); +} + +async function main() { + const { citySlug, allEnabled, limit, delay } = parseArgs(); + + if (!citySlug && !allEnabled) { + console.error('ERROR: Must specify either --city-slug= or --all-enabled'); + printUsage(); + process.exit(1); + } + + console.log('='.repeat(60)); + console.log('DUTCHIE LOCATION DISCOVERY'); + console.log('='.repeat(60)); + + if (citySlug) { + console.log(`Mode: Single city (${citySlug})`); + } else { + console.log(`Mode: All enabled cities${limit ? ` (limit: ${limit})` : ''}`); + } + console.log(`Delay between cities: ${delay}ms`); + console.log(''); + + // Get database URL from environment + const connectionString = process.env.DATABASE_URL; + if (!connectionString) { + console.error('ERROR: DATABASE_URL environment variable is required'); + console.error(''); + console.error('Usage:'); + console.error(' DATABASE_URL="postgresql://..." npx tsx src/scripts/discovery-dutchie-locations.ts --all-enabled'); + process.exit(1); + } + + // Create pool + const pool = new Pool({ connectionString }); + + try { + // Test connection + await pool.query('SELECT 1'); + console.log('[CLI] Database connection established'); + + const discovery = new DutchieLocationDiscovery(pool); + + if (citySlug) { + // Single city mode + const city = await discovery.getCityBySlug(citySlug); + if (!city) { + console.error(`ERROR: City not found: ${citySlug}`); + console.error(''); + console.error('Make sure you have run city discovery first:'); + console.error(' npm run discovery:dutchie:cities'); + process.exit(1); + } + + const result = await discovery.discoverForCity(city); + + console.log(''); + console.log('='.repeat(60)); + console.log('DISCOVERY COMPLETE'); + console.log('='.repeat(60)); + console.log(`City: ${city.cityName}, ${city.stateCode}`); + console.log(`Locations found: ${result.locationsFound}`); + console.log(`Inserted: ${result.locationsInserted}`); + console.log(`Updated: ${result.locationsUpdated}`); + console.log(`Skipped (protected): ${result.locationsSkipped}`); + console.log(`Errors: ${result.errors.length}`); + console.log(`Duration: ${(result.durationMs / 1000).toFixed(1)}s`); + + if (result.errors.length > 0) { + console.log(''); + console.log('Errors:'); + result.errors.forEach((e) => console.log(` - ${e}`)); + } + + process.exit(result.errors.length > 0 ? 1 : 0); + } else { + // All enabled cities mode + const result = await discovery.discoverAllEnabled({ limit, delayMs: delay }); + + console.log(''); + console.log('='.repeat(60)); + console.log('DISCOVERY COMPLETE'); + console.log('='.repeat(60)); + console.log(`Total cities processed: ${result.totalCities}`); + console.log(`Total locations found: ${result.totalLocationsFound}`); + console.log(`Total inserted: ${result.totalInserted}`); + console.log(`Total updated: ${result.totalUpdated}`); + console.log(`Total skipped: ${result.totalSkipped}`); + console.log(`Total errors: ${result.errors.length}`); + console.log(`Duration: ${(result.durationMs / 1000).toFixed(1)}s`); + + if (result.errors.length > 0 && result.errors.length <= 20) { + console.log(''); + console.log('Errors:'); + result.errors.forEach((e) => console.log(` - ${e}`)); + } else if (result.errors.length > 20) { + console.log(''); + console.log(`First 20 of ${result.errors.length} errors:`); + result.errors.slice(0, 20).forEach((e) => console.log(` - ${e}`)); + } + + process.exit(result.errors.length > 0 ? 1 : 0); + } + } catch (error: any) { + console.error('FATAL ERROR:', error.message); + console.error(error.stack); + process.exit(1); + } finally { + await pool.end(); + } +} + +main(); diff --git a/backend/src/scripts/etl/042_legacy_import.ts b/backend/src/scripts/etl/042_legacy_import.ts new file mode 100644 index 00000000..1d41ced6 --- /dev/null +++ b/backend/src/scripts/etl/042_legacy_import.ts @@ -0,0 +1,833 @@ +/** + * ETL Script: 042 Legacy Import + * + * Copies data from legacy dutchie_legacy database into canonical CannaiQ tables + * in the dutchie_menus database. + * + * CRITICAL DATABASE ARCHITECTURE: + * - SOURCE (READ-ONLY): dutchie_legacy - Contains legacy dutchie_* tables + * - DESTINATION (WRITE): dutchie_menus - Contains canonical CannaiQ tables + * + * IMPORTANT: + * - This script is INSERT-ONLY and IDEMPOTENT + * - Uses ON CONFLICT DO NOTHING for all inserts + * - NO deletes, NO truncates, NO schema changes + * - Legacy database is READ-ONLY - never modified + * + * Run manually with: + * cd backend + * npx tsx src/scripts/etl/042_legacy_import.ts + * + * Prerequisites: + * - Migration 041_cannaiq_canonical_schema.sql must be run on dutchie_menus FIRST + * - Both CANNAIQ_DB_* and LEGACY_DB_* env vars must be set + */ + +import { Pool } from 'pg'; + +// ===================================================== +// DATABASE CONNECTIONS - DUAL POOL ARCHITECTURE +// ===================================================== + +/** + * Get connection string for CannaiQ database (dutchie_menus). + * This is the DESTINATION - where we WRITE canonical data. + */ +function getCannaiqConnectionString(): string { + if (process.env.CANNAIQ_DB_URL) { + return process.env.CANNAIQ_DB_URL; + } + + const required = ['CANNAIQ_DB_HOST', 'CANNAIQ_DB_PORT', 'CANNAIQ_DB_NAME', 'CANNAIQ_DB_USER', 'CANNAIQ_DB_PASS']; + const missing = required.filter((key) => !process.env[key]); + + if (missing.length > 0) { + throw new Error( + `[042_legacy_import] Missing required CannaiQ env vars: ${missing.join(', ')}\n` + + `Set either CANNAIQ_DB_URL or all of: CANNAIQ_DB_HOST, CANNAIQ_DB_PORT, CANNAIQ_DB_NAME, CANNAIQ_DB_USER, CANNAIQ_DB_PASS` + ); + } + + const host = process.env.CANNAIQ_DB_HOST!; + const port = process.env.CANNAIQ_DB_PORT!; + const name = process.env.CANNAIQ_DB_NAME!; + const user = process.env.CANNAIQ_DB_USER!; + const pass = process.env.CANNAIQ_DB_PASS!; + + return `postgresql://${user}:${pass}@${host}:${port}/${name}`; +} + +/** + * Get connection string for Legacy database (dutchie_legacy). + * This is the SOURCE - where we READ legacy data (READ-ONLY). + */ +function getLegacyConnectionString(): string { + if (process.env.LEGACY_DB_URL) { + return process.env.LEGACY_DB_URL; + } + + const required = ['LEGACY_DB_HOST', 'LEGACY_DB_PORT', 'LEGACY_DB_NAME', 'LEGACY_DB_USER', 'LEGACY_DB_PASS']; + const missing = required.filter((key) => !process.env[key]); + + if (missing.length > 0) { + throw new Error( + `[042_legacy_import] Missing required Legacy env vars: ${missing.join(', ')}\n` + + `Set either LEGACY_DB_URL or all of: LEGACY_DB_HOST, LEGACY_DB_PORT, LEGACY_DB_NAME, LEGACY_DB_USER, LEGACY_DB_PASS` + ); + } + + const host = process.env.LEGACY_DB_HOST!; + const port = process.env.LEGACY_DB_PORT!; + const name = process.env.LEGACY_DB_NAME!; + const user = process.env.LEGACY_DB_USER!; + const pass = process.env.LEGACY_DB_PASS!; + + return `postgresql://${user}:${pass}@${host}:${port}/${name}`; +} + +// Create both pools +const cannaiqPool = new Pool({ connectionString: getCannaiqConnectionString() }); +const legacyPool = new Pool({ connectionString: getLegacyConnectionString() }); + +// ===================================================== +// LOGGING HELPERS +// ===================================================== +interface Stats { + read: number; + inserted: number; + skipped: number; +} + +interface StoreProductStats extends Stats { + skipped_missing_store: number; + skipped_duplicate: number; +} + +function log(message: string) { + console.log(`[042_legacy_import] ${message}`); +} + +function logStats(table: string, stats: Stats) { + log(` ${table}: read=${stats.read}, inserted=${stats.inserted}, skipped=${stats.skipped}`); +} + +function logStoreProductStats(stats: StoreProductStats) { + log(` store_products: read=${stats.read}, inserted=${stats.inserted}, skipped_missing_store=${stats.skipped_missing_store}, skipped_duplicate=${stats.skipped_duplicate}`); +} + +// ===================================================== +// CATEGORY NORMALIZATION HELPER +// ===================================================== +// Legacy dutchie_products has only 'subcategory', not 'category'. +// We derive a canonical category from the subcategory value. + +const SUBCATEGORY_TO_CATEGORY: Record = { + // Flower + 'flower': 'Flower', + 'pre-rolls': 'Flower', + 'pre-roll': 'Flower', + 'preroll': 'Flower', + 'prerolls': 'Flower', + 'shake': 'Flower', + 'smalls': 'Flower', + 'popcorn': 'Flower', + + // Concentrates + 'concentrates': 'Concentrates', + 'concentrate': 'Concentrates', + 'live resin': 'Concentrates', + 'live-resin': 'Concentrates', + 'rosin': 'Concentrates', + 'shatter': 'Concentrates', + 'wax': 'Concentrates', + 'badder': 'Concentrates', + 'crumble': 'Concentrates', + 'diamonds': 'Concentrates', + 'sauce': 'Concentrates', + 'hash': 'Concentrates', + 'kief': 'Concentrates', + 'rso': 'Concentrates', + 'distillate': 'Concentrates', + + // Edibles + 'edibles': 'Edibles', + 'edible': 'Edibles', + 'gummies': 'Edibles', + 'gummy': 'Edibles', + 'chocolates': 'Edibles', + 'chocolate': 'Edibles', + 'baked goods': 'Edibles', + 'beverages': 'Edibles', + 'drinks': 'Edibles', + 'candy': 'Edibles', + 'mints': 'Edibles', + 'capsules': 'Edibles', + 'tablets': 'Edibles', + + // Vapes + 'vapes': 'Vapes', + 'vape': 'Vapes', + 'vaporizers': 'Vapes', + 'cartridges': 'Vapes', + 'cartridge': 'Vapes', + 'carts': 'Vapes', + 'cart': 'Vapes', + 'pods': 'Vapes', + 'disposables': 'Vapes', + 'disposable': 'Vapes', + 'pax': 'Vapes', + + // Topicals + 'topicals': 'Topicals', + 'topical': 'Topicals', + 'lotions': 'Topicals', + 'balms': 'Topicals', + 'salves': 'Topicals', + 'patches': 'Topicals', + 'bath': 'Topicals', + + // Tinctures + 'tinctures': 'Tinctures', + 'tincture': 'Tinctures', + 'oils': 'Tinctures', + 'sublinguals': 'Tinctures', + + // Accessories + 'accessories': 'Accessories', + 'gear': 'Accessories', + 'papers': 'Accessories', + 'grinders': 'Accessories', + 'pipes': 'Accessories', + 'bongs': 'Accessories', + 'batteries': 'Accessories', +}; + +/** + * Derive a canonical category from the legacy subcategory field. + * Returns null if subcategory is null/empty or cannot be mapped. + */ +function deriveCategory(subcategory: string | null | undefined): string | null { + if (!subcategory) return null; + + const normalized = subcategory.toLowerCase().trim(); + + // Direct lookup + if (SUBCATEGORY_TO_CATEGORY[normalized]) { + return SUBCATEGORY_TO_CATEGORY[normalized]; + } + + // Partial match - check if any key is contained in the subcategory + for (const [key, category] of Object.entries(SUBCATEGORY_TO_CATEGORY)) { + if (normalized.includes(key)) { + return category; + } + } + + // No match - return the original subcategory as-is for category_raw + return null; +} + +// ===================================================== +// STEP 1: Backfill dispensaries.state_id (on cannaiq db) +// ===================================================== +async function backfillStateIds(): Promise { + log('Step 1: Backfill dispensaries.state_id from states table...'); + + const result = await cannaiqPool.query(` + UPDATE dispensaries d + SET state_id = s.id + FROM states s + WHERE UPPER(d.state) = s.code + AND d.state_id IS NULL + RETURNING d.id + `); + + const stats: Stats = { + read: result.rowCount || 0, + inserted: result.rowCount || 0, + skipped: 0, + }; + + logStats('dispensaries.state_id', stats); + return stats; +} + +// ===================================================== +// STEP 2: Insert known chains (on cannaiq db) +// ===================================================== +async function insertChains(): Promise { + log('Step 2: Insert known chains...'); + + const knownChains = [ + { name: 'Curaleaf', slug: 'curaleaf', website: 'https://curaleaf.com' }, + { name: 'Trulieve', slug: 'trulieve', website: 'https://trulieve.com' }, + { name: 'Harvest', slug: 'harvest', website: 'https://harvesthoc.com' }, + { name: 'Nirvana Center', slug: 'nirvana-center', website: 'https://nirvanacannabis.com' }, + { name: 'Sol Flower', slug: 'sol-flower', website: 'https://solflower.com' }, + { name: 'Mint Cannabis', slug: 'mint-cannabis', website: 'https://mintcannabis.com' }, + { name: 'JARS Cannabis', slug: 'jars-cannabis', website: 'https://jarscannabis.com' }, + { name: 'Zen Leaf', slug: 'zen-leaf', website: 'https://zenleafdispensaries.com' }, + { name: "Nature's Medicines", slug: 'natures-medicines', website: 'https://naturesmedicines.com' }, + { name: 'The Mint', slug: 'the-mint', website: 'https://themintdispensary.com' }, + { name: 'Giving Tree', slug: 'giving-tree', website: 'https://givingtreeaz.com' }, + { name: 'Health for Life', slug: 'health-for-life', website: 'https://healthforlifeaz.com' }, + { name: 'Oasis Cannabis', slug: 'oasis-cannabis', website: 'https://oasiscannabis.com' }, + ]; + + let inserted = 0; + for (const chain of knownChains) { + const result = await cannaiqPool.query( + ` + INSERT INTO chains (name, slug, website_url) + VALUES ($1, $2, $3) + ON CONFLICT (slug) DO NOTHING + RETURNING id + `, + [chain.name, chain.slug, chain.website] + ); + if (result.rowCount && result.rowCount > 0) { + inserted++; + } + } + + const stats: Stats = { + read: knownChains.length, + inserted, + skipped: knownChains.length - inserted, + }; + + logStats('chains', stats); + return stats; +} + +// ===================================================== +// STEP 3: Link dispensaries to chains by name pattern (on cannaiq db) +// ===================================================== +async function linkDispensariesToChains(): Promise { + log('Step 3: Link dispensaries to chains by name pattern...'); + + // Get all chains from cannaiq + const chainsResult = await cannaiqPool.query('SELECT id, name, slug FROM chains'); + const chains = chainsResult.rows; + + let totalUpdated = 0; + + for (const chain of chains) { + // Match by name pattern (case-insensitive) + const result = await cannaiqPool.query( + ` + UPDATE dispensaries + SET chain_id = $1 + WHERE (name ILIKE $2 OR dba_name ILIKE $2) + AND chain_id IS NULL + RETURNING id + `, + [chain.id, `%${chain.name}%`] + ); + + if (result.rowCount && result.rowCount > 0) { + log(` Linked ${result.rowCount} dispensaries to chain: ${chain.name}`); + totalUpdated += result.rowCount; + } + } + + const stats: Stats = { + read: chains.length, + inserted: totalUpdated, + skipped: 0, + }; + + logStats('dispensaries.chain_id', stats); + return stats; +} + +// ===================================================== +// STEP 4: Insert brands from legacy dutchie_products +// ===================================================== +async function insertBrands(): Promise { + log('Step 4: Insert brands from legacy dutchie_products -> cannaiq brands...'); + + // READ from legacy database + const brandsResult = await legacyPool.query(` + SELECT DISTINCT TRIM(brand_name) AS brand_name + FROM dutchie_products + WHERE brand_name IS NOT NULL + AND TRIM(brand_name) != '' + ORDER BY brand_name + `); + + const stats: Stats = { + read: brandsResult.rowCount || 0, + inserted: 0, + skipped: 0, + }; + + const BATCH_SIZE = 100; + const brands = brandsResult.rows; + + for (let i = 0; i < brands.length; i += BATCH_SIZE) { + const batch = brands.slice(i, i + BATCH_SIZE); + + for (const row of batch) { + const brandName = row.brand_name.trim(); + // Create slug: lowercase, replace non-alphanumeric with hyphens, collapse multiple hyphens + const slug = brandName + .toLowerCase() + .replace(/[^a-z0-9]+/g, '-') + .replace(/^-+|-+$/g, '') + .substring(0, 250); + + if (!slug) continue; + + // WRITE to cannaiq database + const result = await cannaiqPool.query( + ` + INSERT INTO brands (name, slug) + VALUES ($1, $2) + ON CONFLICT (slug) DO NOTHING + RETURNING id + `, + [brandName, slug] + ); + + if (result.rowCount && result.rowCount > 0) { + stats.inserted++; + } else { + stats.skipped++; + } + } + + log(` Processed ${Math.min(i + BATCH_SIZE, brands.length)}/${brands.length} brands...`); + } + + logStats('brands', stats); + return stats; +} + +// ===================================================== +// STEP 5: Insert store_products from legacy dutchie_products +// ===================================================== +async function insertStoreProducts(): Promise { + log('Step 5: Insert store_products from legacy dutchie_products -> cannaiq store_products...'); + + // Step 5a: Preload valid dispensary IDs from canonical database + log(' Loading valid dispensary IDs from canonical database...'); + const dispensaryResult = await cannaiqPool.query('SELECT id FROM dispensaries'); + const validDispensaryIds = new Set(dispensaryResult.rows.map((r) => r.id)); + log(` Found ${validDispensaryIds.size} valid dispensaries in canonical database`); + + // Count total in legacy + const countResult = await legacyPool.query('SELECT COUNT(*) FROM dutchie_products'); + const totalCount = parseInt(countResult.rows[0].count, 10); + + const stats: StoreProductStats = { + read: totalCount, + inserted: 0, + skipped: 0, + skipped_missing_store: 0, + skipped_duplicate: 0, + }; + + const BATCH_SIZE = 200; + let offset = 0; + + while (offset < totalCount) { + // READ batch from legacy database + // ONLY use columns that actually exist in dutchie_products: + // id, dispensary_id, external_product_id, name, brand_name, + // subcategory, stock_status, primary_image_url, created_at + // Missing columns: category, first_seen_at, last_seen_at, updated_at, thc_content, cbd_content + const batchResult = await legacyPool.query( + ` + SELECT + dp.id, + dp.dispensary_id, + dp.external_product_id, + dp.name, + dp.brand_name, + dp.subcategory, + dp.stock_status, + dp.primary_image_url, + dp.created_at + FROM dutchie_products dp + ORDER BY dp.id + LIMIT $1 OFFSET $2 + `, + [BATCH_SIZE, offset] + ); + + for (const row of batchResult.rows) { + // Skip if dispensary_id is missing or not in canonical database + if (!row.dispensary_id || !validDispensaryIds.has(row.dispensary_id)) { + stats.skipped_missing_store++; + stats.skipped++; + continue; + } + + // Derive category from subcategory in TypeScript + const categoryRaw = deriveCategory(row.subcategory) || row.subcategory || null; + + // Use created_at as first_seen_at if available, otherwise NOW() + const timestamp = row.created_at || new Date(); + + // WRITE to cannaiq database + try { + const result = await cannaiqPool.query( + ` + INSERT INTO store_products ( + dispensary_id, + provider, + provider_product_id, + name_raw, + brand_name_raw, + category_raw, + subcategory_raw, + stock_status, + is_in_stock, + image_url, + first_seen_at, + last_seen_at, + created_at, + updated_at + ) VALUES ( + $1, 'dutchie', $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13 + ) + ON CONFLICT (dispensary_id, provider, provider_product_id) DO NOTHING + RETURNING id + `, + [ + row.dispensary_id, + row.external_product_id, + row.name, + row.brand_name, + categoryRaw, + row.subcategory || null, + row.stock_status || 'in_stock', + row.stock_status !== 'out_of_stock', + row.primary_image_url || null, + timestamp, // first_seen_at = created_at or NOW() + timestamp, // last_seen_at = created_at or NOW() + timestamp, // created_at + timestamp, // updated_at + ] + ); + + if (result.rowCount && result.rowCount > 0) { + stats.inserted++; + } else { + stats.skipped_duplicate++; + stats.skipped++; + } + } catch (err: any) { + // If somehow we still hit an FK error, skip gracefully + if (err.code === '23503') { + // FK violation + stats.skipped_missing_store++; + stats.skipped++; + } else { + throw err; // Re-throw unexpected errors + } + } + } + + offset += BATCH_SIZE; + log(` Processed ${Math.min(offset, totalCount)}/${totalCount} products...`); + } + + logStoreProductStats(stats); + return stats; +} + +// ===================================================== +// STEP 6: Link store_products to brands (on cannaiq db) +// ===================================================== +async function linkStoreProductsToBrands(): Promise { + log('Step 6: Link store_products to brands by brand_name_raw...'); + + const result = await cannaiqPool.query(` + UPDATE store_products sp + SET brand_id = b.id + FROM brands b + WHERE LOWER(TRIM(sp.brand_name_raw)) = LOWER(b.name) + AND sp.brand_id IS NULL + RETURNING sp.id + `); + + const stats: Stats = { + read: result.rowCount || 0, + inserted: result.rowCount || 0, + skipped: 0, + }; + + logStats('store_products.brand_id', stats); + return stats; +} + +// ===================================================== +// STEP 7: Insert store_product_snapshots from legacy dutchie_product_snapshots +// ===================================================== +async function insertStoreProductSnapshots(): Promise { + log('Step 7: Insert store_product_snapshots from legacy -> cannaiq...'); + + // Step 7a: Preload valid dispensary IDs from canonical database + log(' Loading valid dispensary IDs from canonical database...'); + const dispensaryResult = await cannaiqPool.query('SELECT id FROM dispensaries'); + const validDispensaryIds = new Set(dispensaryResult.rows.map((r) => r.id)); + log(` Found ${validDispensaryIds.size} valid dispensaries in canonical database`); + + // Count total in legacy + const countResult = await legacyPool.query('SELECT COUNT(*) FROM dutchie_product_snapshots'); + const totalCount = parseInt(countResult.rows[0].count, 10); + + const stats: StoreProductStats = { + read: totalCount, + inserted: 0, + skipped: 0, + skipped_missing_store: 0, + skipped_duplicate: 0, + }; + + if (totalCount === 0) { + log(' No snapshots to migrate.'); + return stats; + } + + const BATCH_SIZE = 500; + let offset = 0; + + while (offset < totalCount) { + // READ batch from legacy with join to get provider_product_id from dutchie_products + // ONLY use columns that actually exist in dutchie_product_snapshots: + // id, dispensary_id, dutchie_product_id, crawled_at, created_at + // Missing columns: raw_product_data + // We join to dutchie_products for: external_product_id, name, brand_name, subcategory, primary_image_url + const batchResult = await legacyPool.query( + ` + SELECT + dps.id, + dps.dispensary_id, + dp.external_product_id AS provider_product_id, + dp.name, + dp.brand_name, + dp.subcategory, + dp.primary_image_url, + dps.crawled_at, + dps.created_at + FROM dutchie_product_snapshots dps + JOIN dutchie_products dp ON dp.id = dps.dutchie_product_id + ORDER BY dps.id + LIMIT $1 OFFSET $2 + `, + [BATCH_SIZE, offset] + ); + + for (const row of batchResult.rows) { + // Skip if dispensary_id is missing or not in canonical database + if (!row.dispensary_id || !validDispensaryIds.has(row.dispensary_id)) { + stats.skipped_missing_store++; + stats.skipped++; + continue; + } + + // Derive category from subcategory in TypeScript + const categoryRaw = deriveCategory(row.subcategory) || row.subcategory || null; + + // Pricing/THC/CBD/stock data not available (raw_product_data doesn't exist in legacy) + // These will be NULL for legacy snapshots - future crawls will populate them + const timestamp = row.crawled_at || row.created_at || new Date(); + + // WRITE to cannaiq database + try { + const result = await cannaiqPool.query( + ` + INSERT INTO store_product_snapshots ( + dispensary_id, + provider, + provider_product_id, + captured_at, + name_raw, + brand_name_raw, + category_raw, + subcategory_raw, + price_rec, + price_med, + price_rec_special, + is_on_special, + is_in_stock, + stock_status, + thc_percent, + cbd_percent, + image_url, + raw_data, + created_at + ) VALUES ( + $1, 'dutchie', $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18 + ) + ON CONFLICT DO NOTHING + RETURNING id + `, + [ + row.dispensary_id, + row.provider_product_id, + timestamp, // captured_at + row.name, + row.brand_name, + categoryRaw, + row.subcategory || null, + null, // price_rec - not available + null, // price_med - not available + null, // price_rec_special - not available + false, // is_on_special - default false + true, // is_in_stock - default true (unknown) + 'unknown', // stock_status - unknown for legacy + null, // thc_percent - not available + null, // cbd_percent - not available + row.primary_image_url || null, // image_url from legacy product + null, // raw_data - not available + row.created_at || timestamp, + ] + ); + + if (result.rowCount && result.rowCount > 0) { + stats.inserted++; + } else { + stats.skipped_duplicate++; + stats.skipped++; + } + } catch (err: any) { + // If somehow we still hit an FK error, skip gracefully + if (err.code === '23503') { + // FK violation + stats.skipped_missing_store++; + stats.skipped++; + } else { + throw err; // Re-throw unexpected errors + } + } + } + + offset += BATCH_SIZE; + log(` Processed ${Math.min(offset, totalCount)}/${totalCount} snapshots...`); + } + + logStoreProductStats(stats); + return stats; +} + +// ===================================================== +// STEP 8: Link store_product_snapshots to store_products (on cannaiq db) +// ===================================================== +async function linkSnapshotsToStoreProducts(): Promise { + log('Step 8: Link store_product_snapshots to store_products...'); + + const result = await cannaiqPool.query(` + UPDATE store_product_snapshots sps + SET store_product_id = sp.id + FROM store_products sp + WHERE sps.dispensary_id = sp.dispensary_id + AND sps.provider = sp.provider + AND sps.provider_product_id = sp.provider_product_id + AND sps.store_product_id IS NULL + RETURNING sps.id + `); + + const stats: Stats = { + read: result.rowCount || 0, + inserted: result.rowCount || 0, + skipped: 0, + }; + + logStats('store_product_snapshots.store_product_id', stats); + return stats; +} + +// ===================================================== +// MAIN +// ===================================================== +async function main() { + log('='.repeat(60)); + log('CannaiQ Legacy Import ETL'); + log('='.repeat(60)); + log(''); + log('This script migrates data from dutchie_legacy -> dutchie_menus.'); + log('All operations are INSERT-ONLY and IDEMPOTENT.'); + log(''); + + try { + // Test both connections and show which databases we're connected to + const cannaiqInfo = await cannaiqPool.query('SELECT current_database() as db, current_user as user'); + const legacyInfo = await legacyPool.query('SELECT current_database() as db, current_user as user'); + + log(`DESTINATION (cannaiq): ${cannaiqInfo.rows[0].user}@${cannaiqInfo.rows[0].db}`); + log(`SOURCE (legacy): ${legacyInfo.rows[0].user}@${legacyInfo.rows[0].db}`); + log(''); + + // Verify we're not writing to legacy + if (legacyInfo.rows[0].db === cannaiqInfo.rows[0].db) { + throw new Error( + 'SAFETY CHECK FAILED: Source and destination are the same database!\n' + + 'CANNAIQ_DB_NAME must be different from LEGACY_DB_NAME.' + ); + } + + // Run steps + await backfillStateIds(); + log(''); + + await insertChains(); + log(''); + + await linkDispensariesToChains(); + log(''); + + await insertBrands(); + log(''); + + await insertStoreProducts(); + log(''); + + await linkStoreProductsToBrands(); + log(''); + + await insertStoreProductSnapshots(); + log(''); + + await linkSnapshotsToStoreProducts(); + log(''); + + // Final summary (from cannaiq db) + log('='.repeat(60)); + log('SUMMARY (from dutchie_menus)'); + log('='.repeat(60)); + + const summaryQueries = [ + { table: 'states', query: 'SELECT COUNT(*) FROM states' }, + { table: 'chains', query: 'SELECT COUNT(*) FROM chains' }, + { table: 'brands', query: 'SELECT COUNT(*) FROM brands' }, + { table: 'dispensaries (with state_id)', query: 'SELECT COUNT(*) FROM dispensaries WHERE state_id IS NOT NULL' }, + { table: 'dispensaries (with chain_id)', query: 'SELECT COUNT(*) FROM dispensaries WHERE chain_id IS NOT NULL' }, + { table: 'store_products', query: 'SELECT COUNT(*) FROM store_products' }, + { table: 'store_products (with brand_id)', query: 'SELECT COUNT(*) FROM store_products WHERE brand_id IS NOT NULL' }, + { table: 'store_product_snapshots', query: 'SELECT COUNT(*) FROM store_product_snapshots' }, + { table: 'store_product_snapshots (with store_product_id)', query: 'SELECT COUNT(*) FROM store_product_snapshots WHERE store_product_id IS NOT NULL' }, + ]; + + for (const sq of summaryQueries) { + const result = await cannaiqPool.query(sq.query); + log(` ${sq.table}: ${result.rows[0].count}`); + } + + log(''); + log('Legacy import complete!'); + } catch (error: any) { + log(`ERROR: ${error.message}`); + console.error(error); + process.exit(1); + } finally { + await cannaiqPool.end(); + await legacyPool.end(); + } +} + +// Run +main(); diff --git a/backend/src/scripts/etl/legacy-import.ts b/backend/src/scripts/etl/legacy-import.ts new file mode 100644 index 00000000..aa94da7c --- /dev/null +++ b/backend/src/scripts/etl/legacy-import.ts @@ -0,0 +1,749 @@ +/** + * Legacy Data Import ETL Script + * + * DEPRECATED: This script assumed a two-database architecture. + * + * CURRENT ARCHITECTURE (Single Database): + * - All data lives in ONE database: cannaiq (cannaiq-postgres container) + * - Legacy tables exist INSIDE this same database with namespaced prefixes (e.g., legacy_*) + * - The only database is: cannaiq (in cannaiq-postgres container) + * + * If you need to import legacy data: + * 1. Import into namespaced tables (legacy_dispensaries, legacy_products, etc.) + * inside the main cannaiq database + * 2. Use the canonical connection from src/dutchie-az/db/connection.ts + * + * SAFETY RULES: + * - INSERT-ONLY: No UPDATE, no DELETE, no TRUNCATE + * - ON CONFLICT DO NOTHING: Skip duplicates, never overwrite + * - Batch Processing: 500-1000 rows per batch + * - Manual Invocation Only: Requires explicit user execution + */ + +import { Pool, PoolClient } from 'pg'; + +// ============================================================ +// CONFIGURATION +// ============================================================ + +const BATCH_SIZE = 500; + +interface ETLConfig { + dryRun: boolean; + tables: string[]; +} + +interface ETLStats { + table: string; + read: number; + inserted: number; + skipped: number; + errors: number; + durationMs: number; +} + +// Parse command line arguments +function parseArgs(): ETLConfig { + const args = process.argv.slice(2); + const config: ETLConfig = { + dryRun: false, + tables: ['dispensaries', 'products', 'dutchie_products', 'dutchie_product_snapshots'], + }; + + for (const arg of args) { + if (arg === '--dry-run') { + config.dryRun = true; + } else if (arg.startsWith('--tables=')) { + config.tables = arg.replace('--tables=', '').split(','); + } + } + + return config; +} + +// ============================================================ +// DATABASE CONNECTIONS +// ============================================================ + +// DEPRECATED: Both pools point to the same database (cannaiq) +// Legacy tables exist inside the main database with namespaced prefixes +function createLegacyPool(): Pool { + return new Pool({ + host: process.env.CANNAIQ_DB_HOST || 'localhost', + port: parseInt(process.env.CANNAIQ_DB_PORT || '54320'), + user: process.env.CANNAIQ_DB_USER || 'dutchie', + password: process.env.CANNAIQ_DB_PASS || 'dutchie_local_pass', + database: process.env.CANNAIQ_DB_NAME || 'cannaiq', + max: 5, + }); +} + +function createCannaiqPool(): Pool { + return new Pool({ + host: process.env.CANNAIQ_DB_HOST || 'localhost', + port: parseInt(process.env.CANNAIQ_DB_PORT || '54320'), + user: process.env.CANNAIQ_DB_USER || 'dutchie', + password: process.env.CANNAIQ_DB_PASS || 'dutchie_local_pass', + database: process.env.CANNAIQ_DB_NAME || 'cannaiq', + max: 5, + }); +} + +// ============================================================ +// STAGING TABLE CREATION +// ============================================================ + +const STAGING_TABLES_SQL = ` +-- Staging table for legacy dispensaries +CREATE TABLE IF NOT EXISTS dispensaries_from_legacy ( + id SERIAL PRIMARY KEY, + legacy_id INTEGER NOT NULL, + name VARCHAR(255) NOT NULL, + slug VARCHAR(255) NOT NULL, + city VARCHAR(100) NOT NULL, + state VARCHAR(10) NOT NULL, + postal_code VARCHAR(20), + address TEXT, + latitude DECIMAL(10,7), + longitude DECIMAL(10,7), + menu_url TEXT, + website TEXT, + legacy_metadata JSONB, + imported_at TIMESTAMPTZ DEFAULT NOW(), + UNIQUE(legacy_id) +); + +-- Staging table for legacy products +CREATE TABLE IF NOT EXISTS products_from_legacy ( + id SERIAL PRIMARY KEY, + legacy_product_id INTEGER NOT NULL, + legacy_dispensary_id INTEGER, + external_product_id VARCHAR(255), + name VARCHAR(500) NOT NULL, + brand_name VARCHAR(255), + type VARCHAR(100), + subcategory VARCHAR(100), + strain_type VARCHAR(50), + thc DECIMAL(10,4), + cbd DECIMAL(10,4), + price_cents INTEGER, + original_price_cents INTEGER, + stock_status VARCHAR(20), + weight VARCHAR(100), + primary_image_url TEXT, + first_seen_at TIMESTAMPTZ, + last_seen_at TIMESTAMPTZ, + legacy_raw_payload JSONB, + imported_at TIMESTAMPTZ DEFAULT NOW(), + UNIQUE(legacy_product_id) +); + +-- Staging table for legacy price history +CREATE TABLE IF NOT EXISTS price_history_legacy ( + id SERIAL PRIMARY KEY, + legacy_product_id INTEGER NOT NULL, + price_cents INTEGER, + recorded_at TIMESTAMPTZ, + imported_at TIMESTAMPTZ DEFAULT NOW() +); + +-- Index for efficient lookups +CREATE INDEX IF NOT EXISTS idx_disp_legacy_slug ON dispensaries_from_legacy(slug, city, state); +CREATE INDEX IF NOT EXISTS idx_prod_legacy_ext_id ON products_from_legacy(external_product_id); +`; + +async function createStagingTables(cannaiqPool: Pool, dryRun: boolean): Promise { + console.log('[ETL] Creating staging tables...'); + + if (dryRun) { + console.log('[ETL] DRY RUN: Would create staging tables'); + return; + } + + const client = await cannaiqPool.connect(); + try { + await client.query(STAGING_TABLES_SQL); + console.log('[ETL] Staging tables created successfully'); + } finally { + client.release(); + } +} + +// ============================================================ +// ETL FUNCTIONS +// ============================================================ + +async function importDispensaries( + legacyPool: Pool, + cannaiqPool: Pool, + dryRun: boolean +): Promise { + const startTime = Date.now(); + const stats: ETLStats = { + table: 'dispensaries', + read: 0, + inserted: 0, + skipped: 0, + errors: 0, + durationMs: 0, + }; + + console.log('[ETL] Importing dispensaries...'); + + const legacyClient = await legacyPool.connect(); + const cannaiqClient = await cannaiqPool.connect(); + + try { + // Count total rows + const countResult = await legacyClient.query('SELECT COUNT(*) FROM dispensaries'); + const totalRows = parseInt(countResult.rows[0].count); + console.log(`[ETL] Found ${totalRows} dispensaries in legacy database`); + + // Process in batches + let offset = 0; + while (offset < totalRows) { + const batchResult = await legacyClient.query(` + SELECT + id, name, slug, city, state, zip, address, + latitude, longitude, menu_url, website, dba_name, + menu_provider, product_provider, provider_detection_data + FROM dispensaries + ORDER BY id + LIMIT $1 OFFSET $2 + `, [BATCH_SIZE, offset]); + + stats.read += batchResult.rows.length; + + if (dryRun) { + console.log(`[ETL] DRY RUN: Would insert batch of ${batchResult.rows.length} dispensaries`); + stats.inserted += batchResult.rows.length; + } else { + for (const row of batchResult.rows) { + try { + const legacyMetadata = { + dba_name: row.dba_name, + menu_provider: row.menu_provider, + product_provider: row.product_provider, + provider_detection_data: row.provider_detection_data, + }; + + const insertResult = await cannaiqClient.query(` + INSERT INTO dispensaries_from_legacy + (legacy_id, name, slug, city, state, postal_code, address, + latitude, longitude, menu_url, website, legacy_metadata) + VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12) + ON CONFLICT (legacy_id) DO NOTHING + RETURNING id + `, [ + row.id, + row.name, + row.slug, + row.city, + row.state, + row.zip, + row.address, + row.latitude, + row.longitude, + row.menu_url, + row.website, + JSON.stringify(legacyMetadata), + ]); + + if (insertResult.rowCount > 0) { + stats.inserted++; + } else { + stats.skipped++; + } + } catch (err: any) { + stats.errors++; + console.error(`[ETL] Error inserting dispensary ${row.id}:`, err.message); + } + } + } + + offset += BATCH_SIZE; + console.log(`[ETL] Processed ${Math.min(offset, totalRows)}/${totalRows} dispensaries`); + } + } finally { + legacyClient.release(); + cannaiqClient.release(); + } + + stats.durationMs = Date.now() - startTime; + return stats; +} + +async function importProducts( + legacyPool: Pool, + cannaiqPool: Pool, + dryRun: boolean +): Promise { + const startTime = Date.now(); + const stats: ETLStats = { + table: 'products', + read: 0, + inserted: 0, + skipped: 0, + errors: 0, + durationMs: 0, + }; + + console.log('[ETL] Importing legacy products...'); + + const legacyClient = await legacyPool.connect(); + const cannaiqClient = await cannaiqPool.connect(); + + try { + const countResult = await legacyClient.query('SELECT COUNT(*) FROM products'); + const totalRows = parseInt(countResult.rows[0].count); + console.log(`[ETL] Found ${totalRows} products in legacy database`); + + let offset = 0; + while (offset < totalRows) { + const batchResult = await legacyClient.query(` + SELECT + id, dispensary_id, dutchie_product_id, name, brand, + subcategory, strain_type, thc_percentage, cbd_percentage, + price, original_price, in_stock, weight, image_url, + first_seen_at, last_seen_at, raw_data + FROM products + ORDER BY id + LIMIT $1 OFFSET $2 + `, [BATCH_SIZE, offset]); + + stats.read += batchResult.rows.length; + + if (dryRun) { + console.log(`[ETL] DRY RUN: Would insert batch of ${batchResult.rows.length} products`); + stats.inserted += batchResult.rows.length; + } else { + for (const row of batchResult.rows) { + try { + const stockStatus = row.in_stock === true ? 'in_stock' : + row.in_stock === false ? 'out_of_stock' : 'unknown'; + const priceCents = row.price ? Math.round(parseFloat(row.price) * 100) : null; + const originalPriceCents = row.original_price ? Math.round(parseFloat(row.original_price) * 100) : null; + + const insertResult = await cannaiqClient.query(` + INSERT INTO products_from_legacy + (legacy_product_id, legacy_dispensary_id, external_product_id, + name, brand_name, subcategory, strain_type, thc, cbd, + price_cents, original_price_cents, stock_status, weight, + primary_image_url, first_seen_at, last_seen_at, legacy_raw_payload) + VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17) + ON CONFLICT (legacy_product_id) DO NOTHING + RETURNING id + `, [ + row.id, + row.dispensary_id, + row.dutchie_product_id, + row.name, + row.brand, + row.subcategory, + row.strain_type, + row.thc_percentage, + row.cbd_percentage, + priceCents, + originalPriceCents, + stockStatus, + row.weight, + row.image_url, + row.first_seen_at, + row.last_seen_at, + row.raw_data ? JSON.stringify(row.raw_data) : null, + ]); + + if (insertResult.rowCount > 0) { + stats.inserted++; + } else { + stats.skipped++; + } + } catch (err: any) { + stats.errors++; + console.error(`[ETL] Error inserting product ${row.id}:`, err.message); + } + } + } + + offset += BATCH_SIZE; + console.log(`[ETL] Processed ${Math.min(offset, totalRows)}/${totalRows} products`); + } + } finally { + legacyClient.release(); + cannaiqClient.release(); + } + + stats.durationMs = Date.now() - startTime; + return stats; +} + +async function importDutchieProducts( + legacyPool: Pool, + cannaiqPool: Pool, + dryRun: boolean +): Promise { + const startTime = Date.now(); + const stats: ETLStats = { + table: 'dutchie_products', + read: 0, + inserted: 0, + skipped: 0, + errors: 0, + durationMs: 0, + }; + + console.log('[ETL] Importing dutchie_products...'); + + const legacyClient = await legacyPool.connect(); + const cannaiqClient = await cannaiqPool.connect(); + + try { + const countResult = await legacyClient.query('SELECT COUNT(*) FROM dutchie_products'); + const totalRows = parseInt(countResult.rows[0].count); + console.log(`[ETL] Found ${totalRows} dutchie_products in legacy database`); + + // Note: For dutchie_products, we need to map dispensary_id to the canonical dispensary + // This requires the dispensaries to be imported first + // For now, we'll insert directly since the schema is nearly identical + + let offset = 0; + while (offset < totalRows) { + const batchResult = await legacyClient.query(` + SELECT * + FROM dutchie_products + ORDER BY id + LIMIT $1 OFFSET $2 + `, [BATCH_SIZE, offset]); + + stats.read += batchResult.rows.length; + + if (dryRun) { + console.log(`[ETL] DRY RUN: Would insert batch of ${batchResult.rows.length} dutchie_products`); + stats.inserted += batchResult.rows.length; + } else { + // For each row, attempt insert with ON CONFLICT DO NOTHING + for (const row of batchResult.rows) { + try { + // Check if dispensary exists in canonical table + const dispCheck = await cannaiqClient.query(` + SELECT id FROM dispensaries WHERE id = $1 + `, [row.dispensary_id]); + + if (dispCheck.rows.length === 0) { + stats.skipped++; + continue; // Skip products for dispensaries not yet imported + } + + const insertResult = await cannaiqClient.query(` + INSERT INTO dutchie_products + (dispensary_id, platform, external_product_id, platform_dispensary_id, + c_name, name, brand_name, brand_id, brand_logo_url, + type, subcategory, strain_type, provider, + thc, thc_content, cbd, cbd_content, cannabinoids_v2, effects, + status, medical_only, rec_only, featured, coming_soon, + certificate_of_analysis_enabled, + is_below_threshold, is_below_kiosk_threshold, + options_below_threshold, options_below_kiosk_threshold, + stock_status, total_quantity_available, + primary_image_url, images, measurements, weight, past_c_names, + created_at_dutchie, updated_at_dutchie, latest_raw_payload) + VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, $29, $30, $31, $32, $33, $34, $35, $36, $37, $38, $39) + ON CONFLICT (dispensary_id, external_product_id) DO NOTHING + RETURNING id + `, [ + row.dispensary_id, + row.platform || 'dutchie', + row.external_product_id, + row.platform_dispensary_id, + row.c_name, + row.name, + row.brand_name, + row.brand_id, + row.brand_logo_url, + row.type, + row.subcategory, + row.strain_type, + row.provider, + row.thc, + row.thc_content, + row.cbd, + row.cbd_content, + row.cannabinoids_v2, + row.effects, + row.status, + row.medical_only, + row.rec_only, + row.featured, + row.coming_soon, + row.certificate_of_analysis_enabled, + row.is_below_threshold, + row.is_below_kiosk_threshold, + row.options_below_threshold, + row.options_below_kiosk_threshold, + row.stock_status, + row.total_quantity_available, + row.primary_image_url, + row.images, + row.measurements, + row.weight, + row.past_c_names, + row.created_at_dutchie, + row.updated_at_dutchie, + row.latest_raw_payload, + ]); + + if (insertResult.rowCount > 0) { + stats.inserted++; + } else { + stats.skipped++; + } + } catch (err: any) { + stats.errors++; + if (stats.errors <= 5) { + console.error(`[ETL] Error inserting dutchie_product ${row.id}:`, err.message); + } + } + } + } + + offset += BATCH_SIZE; + console.log(`[ETL] Processed ${Math.min(offset, totalRows)}/${totalRows} dutchie_products`); + } + } finally { + legacyClient.release(); + cannaiqClient.release(); + } + + stats.durationMs = Date.now() - startTime; + return stats; +} + +async function importDutchieSnapshots( + legacyPool: Pool, + cannaiqPool: Pool, + dryRun: boolean +): Promise { + const startTime = Date.now(); + const stats: ETLStats = { + table: 'dutchie_product_snapshots', + read: 0, + inserted: 0, + skipped: 0, + errors: 0, + durationMs: 0, + }; + + console.log('[ETL] Importing dutchie_product_snapshots...'); + + const legacyClient = await legacyPool.connect(); + const cannaiqClient = await cannaiqPool.connect(); + + try { + const countResult = await legacyClient.query('SELECT COUNT(*) FROM dutchie_product_snapshots'); + const totalRows = parseInt(countResult.rows[0].count); + console.log(`[ETL] Found ${totalRows} dutchie_product_snapshots in legacy database`); + + // Build mapping of legacy product IDs to canonical product IDs + console.log('[ETL] Building product ID mapping...'); + const productMapping = new Map(); + const mappingResult = await cannaiqClient.query(` + SELECT id, external_product_id, dispensary_id FROM dutchie_products + `); + // Create a key from dispensary_id + external_product_id + const productByKey = new Map(); + for (const row of mappingResult.rows) { + const key = `${row.dispensary_id}:${row.external_product_id}`; + productByKey.set(key, row.id); + } + + let offset = 0; + while (offset < totalRows) { + const batchResult = await legacyClient.query(` + SELECT * + FROM dutchie_product_snapshots + ORDER BY id + LIMIT $1 OFFSET $2 + `, [BATCH_SIZE, offset]); + + stats.read += batchResult.rows.length; + + if (dryRun) { + console.log(`[ETL] DRY RUN: Would insert batch of ${batchResult.rows.length} snapshots`); + stats.inserted += batchResult.rows.length; + } else { + for (const row of batchResult.rows) { + try { + // Map legacy product ID to canonical product ID + const key = `${row.dispensary_id}:${row.external_product_id}`; + const canonicalProductId = productByKey.get(key); + + if (!canonicalProductId) { + stats.skipped++; + continue; // Skip snapshots for products not yet imported + } + + // Insert snapshot (no conflict handling - all snapshots are historical) + await cannaiqClient.query(` + INSERT INTO dutchie_product_snapshots + (dutchie_product_id, dispensary_id, platform_dispensary_id, + external_product_id, pricing_type, crawl_mode, + status, featured, special, medical_only, rec_only, + is_present_in_feed, stock_status, + rec_min_price_cents, rec_max_price_cents, rec_min_special_price_cents, + med_min_price_cents, med_max_price_cents, med_min_special_price_cents, + wholesale_min_price_cents, + total_quantity_available, total_kiosk_quantity_available, + manual_inventory, is_below_threshold, is_below_kiosk_threshold, + options, raw_payload, crawled_at) + VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28) + `, [ + canonicalProductId, + row.dispensary_id, + row.platform_dispensary_id, + row.external_product_id, + row.pricing_type, + row.crawl_mode, + row.status, + row.featured, + row.special, + row.medical_only, + row.rec_only, + row.is_present_in_feed, + row.stock_status, + row.rec_min_price_cents, + row.rec_max_price_cents, + row.rec_min_special_price_cents, + row.med_min_price_cents, + row.med_max_price_cents, + row.med_min_special_price_cents, + row.wholesale_min_price_cents, + row.total_quantity_available, + row.total_kiosk_quantity_available, + row.manual_inventory, + row.is_below_threshold, + row.is_below_kiosk_threshold, + row.options, + row.raw_payload, + row.crawled_at, + ]); + + stats.inserted++; + } catch (err: any) { + stats.errors++; + if (stats.errors <= 5) { + console.error(`[ETL] Error inserting snapshot ${row.id}:`, err.message); + } + } + } + } + + offset += BATCH_SIZE; + console.log(`[ETL] Processed ${Math.min(offset, totalRows)}/${totalRows} snapshots`); + } + } finally { + legacyClient.release(); + cannaiqClient.release(); + } + + stats.durationMs = Date.now() - startTime; + return stats; +} + +// ============================================================ +// MAIN +// ============================================================ + +async function main(): Promise { + console.log('='.repeat(60)); + console.log('LEGACY DATA IMPORT ETL'); + console.log('='.repeat(60)); + + const config = parseArgs(); + + console.log(`Mode: ${config.dryRun ? 'DRY RUN' : 'LIVE'}`); + console.log(`Tables: ${config.tables.join(', ')}`); + console.log(''); + + // Create connection pools + const legacyPool = createLegacyPool(); + const cannaiqPool = createCannaiqPool(); + + try { + // Test connections + console.log('[ETL] Testing database connections...'); + await legacyPool.query('SELECT 1'); + console.log('[ETL] Legacy database connected'); + await cannaiqPool.query('SELECT 1'); + console.log('[ETL] CannaiQ database connected'); + console.log(''); + + // Create staging tables + await createStagingTables(cannaiqPool, config.dryRun); + console.log(''); + + // Run imports + const allStats: ETLStats[] = []; + + if (config.tables.includes('dispensaries')) { + const stats = await importDispensaries(legacyPool, cannaiqPool, config.dryRun); + allStats.push(stats); + console.log(''); + } + + if (config.tables.includes('products')) { + const stats = await importProducts(legacyPool, cannaiqPool, config.dryRun); + allStats.push(stats); + console.log(''); + } + + if (config.tables.includes('dutchie_products')) { + const stats = await importDutchieProducts(legacyPool, cannaiqPool, config.dryRun); + allStats.push(stats); + console.log(''); + } + + if (config.tables.includes('dutchie_product_snapshots')) { + const stats = await importDutchieSnapshots(legacyPool, cannaiqPool, config.dryRun); + allStats.push(stats); + console.log(''); + } + + // Print summary + console.log('='.repeat(60)); + console.log('IMPORT SUMMARY'); + console.log('='.repeat(60)); + console.log(''); + console.log('| Table | Read | Inserted | Skipped | Errors | Duration |'); + console.log('|----------------------------|----------|----------|----------|----------|----------|'); + for (const s of allStats) { + console.log(`| ${s.table.padEnd(26)} | ${String(s.read).padStart(8)} | ${String(s.inserted).padStart(8)} | ${String(s.skipped).padStart(8)} | ${String(s.errors).padStart(8)} | ${(s.durationMs / 1000).toFixed(1).padStart(7)}s |`); + } + console.log(''); + + const totalInserted = allStats.reduce((sum, s) => sum + s.inserted, 0); + const totalErrors = allStats.reduce((sum, s) => sum + s.errors, 0); + console.log(`Total inserted: ${totalInserted}`); + console.log(`Total errors: ${totalErrors}`); + + if (config.dryRun) { + console.log(''); + console.log('DRY RUN COMPLETE - No data was written'); + console.log('Run without --dry-run to perform actual import'); + } + + } catch (error: any) { + console.error('[ETL] Fatal error:', error.message); + process.exit(1); + } finally { + await legacyPool.end(); + await cannaiqPool.end(); + } + + console.log(''); + console.log('ETL complete'); +} + +main().catch((err) => { + console.error('Unhandled error:', err); + process.exit(1); +}); diff --git a/backend/src/scripts/parallel-scrape.ts b/backend/src/scripts/parallel-scrape.ts index 76a86b41..1eda1cf2 100644 --- a/backend/src/scripts/parallel-scrape.ts +++ b/backend/src/scripts/parallel-scrape.ts @@ -1,4 +1,4 @@ -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { getActiveProxy, putProxyInTimeout, isBotDetectionError } from '../services/proxy'; import puppeteer from 'puppeteer-extra'; import StealthPlugin from 'puppeteer-extra-plugin-stealth'; diff --git a/backend/src/scripts/queue-dispensaries.ts b/backend/src/scripts/queue-dispensaries.ts index 81e8104a..2fdcbc20 100644 --- a/backend/src/scripts/queue-dispensaries.ts +++ b/backend/src/scripts/queue-dispensaries.ts @@ -13,7 +13,7 @@ * npx tsx src/scripts/queue-dispensaries.ts --process # Process queued jobs */ -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { logger } from '../services/logger'; import { runDetectMenuProviderJob, diff --git a/backend/src/scripts/queue-intelligence.ts b/backend/src/scripts/queue-intelligence.ts index 04ede9e9..8666fdba 100644 --- a/backend/src/scripts/queue-intelligence.ts +++ b/backend/src/scripts/queue-intelligence.ts @@ -17,7 +17,7 @@ * npx tsx src/scripts/queue-intelligence.ts --dry-run */ -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { logger } from '../services/logger'; import { detectMultiCategoryProviders, diff --git a/backend/src/scripts/resolve-dutchie-id.ts b/backend/src/scripts/resolve-dutchie-id.ts new file mode 100644 index 00000000..e0d286c2 --- /dev/null +++ b/backend/src/scripts/resolve-dutchie-id.ts @@ -0,0 +1,173 @@ +#!/usr/bin/env npx tsx +/** + * Dutchie Platform ID Resolver + * + * Standalone script to resolve a Dutchie dispensary slug to its platform ID. + * + * USAGE: + * npx tsx src/scripts/resolve-dutchie-id.ts + * npx tsx src/scripts/resolve-dutchie-id.ts hydroman-dispensary + * npx tsx src/scripts/resolve-dutchie-id.ts AZ-Deeply-Rooted + * + * RESOLUTION STRATEGY: + * 1. Navigate to https://dutchie.com/embedded-menu/{slug} via Puppeteer + * 2. Extract window.reactEnv.dispensaryId (preferred - fastest) + * 3. If reactEnv fails, call GraphQL GetAddressBasedDispensaryData as fallback + * + * OUTPUT: + * - dispensaryId: The MongoDB ObjectId (e.g., "6405ef617056e8014d79101b") + * - source: "reactEnv" or "graphql" + * - httpStatus: HTTP status from embedded menu page + * - error: Error message if resolution failed + */ + +import { resolveDispensaryIdWithDetails, ResolveDispensaryResult } from '../dutchie-az/services/graphql-client'; + +async function main() { + const args = process.argv.slice(2); + + if (args.length === 0 || args.includes('--help') || args.includes('-h')) { + console.log(` +Dutchie Platform ID Resolver + +Usage: + npx tsx src/scripts/resolve-dutchie-id.ts + +Examples: + npx tsx src/scripts/resolve-dutchie-id.ts hydroman-dispensary + npx tsx src/scripts/resolve-dutchie-id.ts AZ-Deeply-Rooted + npx tsx src/scripts/resolve-dutchie-id.ts mint-cannabis + +Resolution Strategy: + 1. Puppeteer navigates to https://dutchie.com/embedded-menu/{slug} + 2. Extracts window.reactEnv.dispensaryId (preferred) + 3. Falls back to GraphQL GetAddressBasedDispensaryData if needed + +Output Fields: + - dispensaryId: MongoDB ObjectId (e.g., "6405ef617056e8014d79101b") + - source: "reactEnv" (from page) or "graphql" (from API) + - httpStatus: HTTP status code from page load + - error: Error message if resolution failed +`); + process.exit(0); + } + + const slug = args[0]; + + console.log('='.repeat(60)); + console.log('DUTCHIE PLATFORM ID RESOLVER'); + console.log('='.repeat(60)); + console.log(`Slug: ${slug}`); + console.log(`Embedded Menu URL: https://dutchie.com/embedded-menu/${slug}`); + console.log(''); + console.log('Resolving...'); + console.log(''); + + const startTime = Date.now(); + + try { + const result: ResolveDispensaryResult = await resolveDispensaryIdWithDetails(slug); + const duration = Date.now() - startTime; + + console.log('='.repeat(60)); + console.log('RESOLUTION RESULT'); + console.log('='.repeat(60)); + + if (result.dispensaryId) { + console.log(`✓ SUCCESS`); + console.log(''); + console.log(` Dispensary ID: ${result.dispensaryId}`); + console.log(` Source: ${result.source}`); + console.log(` HTTP Status: ${result.httpStatus || 'N/A'}`); + console.log(` Duration: ${duration}ms`); + console.log(''); + + // Show how to use this ID + console.log('='.repeat(60)); + console.log('USAGE'); + console.log('='.repeat(60)); + console.log(''); + console.log('Use this ID in GraphQL FilteredProducts query:'); + console.log(''); + console.log(' POST https://dutchie.com/api-3/graphql'); + console.log(''); + console.log(' Body:'); + console.log(` { + "operationName": "FilteredProducts", + "variables": { + "productsFilter": { + "dispensaryId": "${result.dispensaryId}", + "pricingType": "rec", + "Status": "Active" + }, + "page": 0, + "perPage": 100 + }, + "extensions": { + "persistedQuery": { + "version": 1, + "sha256Hash": "ee29c060826dc41c527e470e9ae502c9b2c169720faa0a9f5d25e1b9a530a4a0" + } + } + }`); + console.log(''); + + // Output for piping/scripting + console.log('='.repeat(60)); + console.log('JSON OUTPUT'); + console.log('='.repeat(60)); + console.log(JSON.stringify({ + success: true, + slug, + dispensaryId: result.dispensaryId, + source: result.source, + httpStatus: result.httpStatus, + durationMs: duration, + }, null, 2)); + + } else { + console.log(`✗ FAILED`); + console.log(''); + console.log(` Error: ${result.error || 'Unknown error'}`); + console.log(` HTTP Status: ${result.httpStatus || 'N/A'}`); + console.log(` Duration: ${duration}ms`); + console.log(''); + + if (result.httpStatus === 403 || result.httpStatus === 404) { + console.log('NOTE: This store may be removed or not accessible on Dutchie.'); + console.log(' Mark dispensary as not_crawlable in the database.'); + } + + console.log(''); + console.log('JSON OUTPUT:'); + console.log(JSON.stringify({ + success: false, + slug, + error: result.error, + httpStatus: result.httpStatus, + durationMs: duration, + }, null, 2)); + + process.exit(1); + } + + } catch (error: any) { + const duration = Date.now() - startTime; + console.error('='.repeat(60)); + console.error('ERROR'); + console.error('='.repeat(60)); + console.error(`Message: ${error.message}`); + console.error(`Duration: ${duration}ms`); + console.error(''); + + if (error.message.includes('net::ERR_NAME_NOT_RESOLVED')) { + console.error('NOTE: DNS resolution failed. This typically happens when running'); + console.error(' locally due to network restrictions. Try running from the'); + console.error(' Kubernetes pod or a cloud environment.'); + } + + process.exit(1); + } +} + +main(); diff --git a/backend/src/scripts/run-backfill.ts b/backend/src/scripts/run-backfill.ts new file mode 100644 index 00000000..37a8e848 --- /dev/null +++ b/backend/src/scripts/run-backfill.ts @@ -0,0 +1,105 @@ +#!/usr/bin/env npx tsx +/** + * Run Backfill CLI + * + * Import historical payloads from existing data sources. + * + * Usage: + * npx tsx src/scripts/run-backfill.ts [options] + * + * Options: + * --source SOURCE Source to backfill from: + * - dutchie_products (default) + * - snapshots + * - cache_files + * - all + * --dry-run Print changes without modifying DB + * --limit N Max payloads to create (default: unlimited) + * --dispensary ID Only backfill specific dispensary + * --cache-path PATH Path to cache files (default: ./cache/payloads) + */ + +import { Pool } from 'pg'; +import { runBackfill, BackfillOptions } from '../hydration'; + +async function main() { + const args = process.argv.slice(2); + + const dryRun = args.includes('--dry-run'); + + let source: BackfillOptions['source'] = 'dutchie_products'; + const sourceIdx = args.indexOf('--source'); + if (sourceIdx !== -1 && args[sourceIdx + 1]) { + source = args[sourceIdx + 1] as BackfillOptions['source']; + } + + let limit: number | undefined; + const limitIdx = args.indexOf('--limit'); + if (limitIdx !== -1 && args[limitIdx + 1]) { + limit = parseInt(args[limitIdx + 1], 10); + } + + let dispensaryId: number | undefined; + const dispIdx = args.indexOf('--dispensary'); + if (dispIdx !== -1 && args[dispIdx + 1]) { + dispensaryId = parseInt(args[dispIdx + 1], 10); + } + + let cachePath: string | undefined; + const cacheIdx = args.indexOf('--cache-path'); + if (cacheIdx !== -1 && args[cacheIdx + 1]) { + cachePath = args[cacheIdx + 1]; + } + + const pool = new Pool({ + connectionString: process.env.DATABASE_URL, + }); + + try { + console.log('='.repeat(60)); + console.log('BACKFILL RUNNER'); + console.log('='.repeat(60)); + console.log(`Source: ${source}`); + console.log(`Dry run: ${dryRun}`); + if (limit) console.log(`Limit: ${limit}`); + if (dispensaryId) console.log(`Dispensary: ${dispensaryId}`); + if (cachePath) console.log(`Cache path: ${cachePath}`); + console.log(''); + + const results = await runBackfill(pool, { + dryRun, + source, + limit, + dispensaryId, + cachePath, + }); + + console.log('\nBackfill Results:'); + console.log('='.repeat(40)); + + for (const result of results) { + console.log(`\n${result.source}:`); + console.log(` Payloads created: ${result.payloadsCreated}`); + console.log(` Skipped: ${result.skipped}`); + console.log(` Errors: ${result.errors.length}`); + console.log(` Duration: ${result.durationMs}ms`); + + if (result.errors.length > 0) { + console.log(' First 5 errors:'); + for (const err of result.errors.slice(0, 5)) { + console.log(` - ${err}`); + } + } + } + + const totalCreated = results.reduce((sum, r) => sum + r.payloadsCreated, 0); + console.log(`\nTotal payloads created: ${totalCreated}`); + } catch (error: any) { + console.error('Backfill error:', error.message); + process.exit(1); + } finally { + await pool.end(); + } +} + +main(); diff --git a/backend/src/scripts/run-discovery.ts b/backend/src/scripts/run-discovery.ts new file mode 100644 index 00000000..f14e473f --- /dev/null +++ b/backend/src/scripts/run-discovery.ts @@ -0,0 +1,309 @@ +#!/usr/bin/env npx tsx +/** + * Dutchie Discovery CLI + * + * Command-line interface for running the Dutchie store discovery pipeline. + * + * Usage: + * npx tsx src/scripts/run-discovery.ts [options] + * + * Commands: + * discover:state - Discover all stores in a state (e.g., AZ) + * discover:city - Discover stores in a single city + * discover:full - Run full discovery pipeline + * seed:cities - Seed known cities for a state + * stats - Show discovery statistics + * list - List discovered locations + * + * Examples: + * npx tsx src/scripts/run-discovery.ts discover:state AZ + * npx tsx src/scripts/run-discovery.ts discover:city phoenix --state AZ + * npx tsx src/scripts/run-discovery.ts seed:cities AZ + * npx tsx src/scripts/run-discovery.ts stats + * npx tsx src/scripts/run-discovery.ts list --status discovered --state AZ + */ + +import { Pool } from 'pg'; +import { + runFullDiscovery, + discoverCity, + discoverState, + getDiscoveryStats, + seedKnownCities, + ARIZONA_CITIES, +} from '../discovery'; + +// Parse command line arguments +function parseArgs() { + const args = process.argv.slice(2); + const command = args[0] || 'help'; + const positional: string[] = []; + const flags: Record = {}; + + for (let i = 1; i < args.length; i++) { + const arg = args[i]; + if (arg.startsWith('--')) { + const [key, value] = arg.slice(2).split('='); + if (value !== undefined) { + flags[key] = value; + } else if (args[i + 1] && !args[i + 1].startsWith('--')) { + flags[key] = args[i + 1]; + i++; + } else { + flags[key] = true; + } + } else { + positional.push(arg); + } + } + + return { command, positional, flags }; +} + +// Create database pool +function createPool(): Pool { + const connectionString = process.env.DATABASE_URL; + if (!connectionString) { + console.error('ERROR: DATABASE_URL environment variable is required'); + process.exit(1); + } + return new Pool({ connectionString }); +} + +// Print help +function printHelp() { + console.log(` +Dutchie Discovery CLI + +Usage: + npx tsx src/scripts/run-discovery.ts [options] + +Commands: + discover:state Discover all stores in a state (e.g., AZ) + discover:city Discover stores in a single city + discover:full Run full discovery pipeline + seed:cities Seed known cities for a state + stats Show discovery statistics + list List discovered locations + +Options: + --state State code (e.g., AZ, CA, ON) + --country Country code (default: US) + --status Filter by status (discovered, verified, rejected, merged) + --limit Limit results (default: varies by command) + --dry-run Don't make any changes, just show what would happen + --verbose Show detailed output + +Examples: + npx tsx src/scripts/run-discovery.ts discover:state AZ + npx tsx src/scripts/run-discovery.ts discover:city phoenix --state AZ + npx tsx src/scripts/run-discovery.ts seed:cities AZ + npx tsx src/scripts/run-discovery.ts stats + npx tsx src/scripts/run-discovery.ts list --status discovered --state AZ --limit 20 +`); +} + +// Main +async function main() { + const { command, positional, flags } = parseArgs(); + + if (command === 'help' || flags.help) { + printHelp(); + process.exit(0); + } + + const pool = createPool(); + + try { + switch (command) { + case 'discover:state': { + const stateCode = positional[0] || (flags.state as string); + if (!stateCode) { + console.error('ERROR: State code is required'); + console.error('Usage: discover:state '); + process.exit(1); + } + + console.log(`\nDiscovering stores in ${stateCode}...\n`); + const result = await discoverState(pool, stateCode.toUpperCase(), { + dryRun: Boolean(flags['dry-run']), + verbose: Boolean(flags.verbose), + cityLimit: flags.limit ? parseInt(flags.limit as string, 10) : 100, + }); + + console.log('\n=== DISCOVERY RESULTS ==='); + console.log(`Cities crawled: ${result.locations.length}`); + console.log(`Locations found: ${result.totalLocationsFound}`); + console.log(`Locations upserted: ${result.totalLocationsUpserted}`); + console.log(`Duration: ${(result.durationMs / 1000).toFixed(1)}s`); + break; + } + + case 'discover:city': { + const citySlug = positional[0]; + if (!citySlug) { + console.error('ERROR: City slug is required'); + console.error('Usage: discover:city [--state AZ]'); + process.exit(1); + } + + console.log(`\nDiscovering stores in ${citySlug}...\n`); + const result = await discoverCity(pool, citySlug, { + stateCode: flags.state as string, + countryCode: (flags.country as string) || 'US', + dryRun: Boolean(flags['dry-run']), + verbose: Boolean(flags.verbose), + }); + + if (!result) { + console.error(`City not found: ${citySlug}`); + process.exit(1); + } + + console.log('\n=== DISCOVERY RESULTS ==='); + console.log(`City: ${result.citySlug}`); + console.log(`Locations found: ${result.locationsFound}`); + console.log(`Locations upserted: ${result.locationsUpserted}`); + console.log(`New: ${result.locationsNew}, Updated: ${result.locationsUpdated}`); + console.log(`Duration: ${(result.durationMs / 1000).toFixed(1)}s`); + if (result.errors.length > 0) { + console.log(`Errors: ${result.errors.length}`); + result.errors.forEach((e) => console.log(` - ${e}`)); + } + break; + } + + case 'discover:full': { + console.log('\nRunning full discovery pipeline...\n'); + const result = await runFullDiscovery(pool, { + stateCode: flags.state as string, + countryCode: (flags.country as string) || 'US', + cityLimit: flags.limit ? parseInt(flags.limit as string, 10) : 50, + skipCityDiscovery: Boolean(flags['skip-cities']), + onlyStale: !flags.all, + staleDays: flags['stale-days'] ? parseInt(flags['stale-days'] as string, 10) : 7, + dryRun: Boolean(flags['dry-run']), + verbose: Boolean(flags.verbose), + }); + + console.log('\n=== FULL DISCOVERY RESULTS ==='); + console.log(`Cities discovered: ${result.cities.citiesFound}`); + console.log(`Cities upserted: ${result.cities.citiesUpserted}`); + console.log(`Cities crawled: ${result.locations.length}`); + console.log(`Total locations found: ${result.totalLocationsFound}`); + console.log(`Total locations upserted: ${result.totalLocationsUpserted}`); + console.log(`Duration: ${(result.durationMs / 1000).toFixed(1)}s`); + break; + } + + case 'seed:cities': { + const stateCode = positional[0] || (flags.state as string); + if (!stateCode) { + console.error('ERROR: State code is required'); + console.error('Usage: seed:cities '); + process.exit(1); + } + + let cities: any[] = []; + if (stateCode.toUpperCase() === 'AZ') { + cities = ARIZONA_CITIES; + } else { + console.error(`No predefined cities for state: ${stateCode}`); + console.error('Add cities to city-discovery.ts ARIZONA_CITIES array (or add new state arrays)'); + process.exit(1); + } + + console.log(`\nSeeding ${cities.length} cities for ${stateCode}...\n`); + const result = await seedKnownCities(pool, cities); + console.log(`Created: ${result.created} new cities`); + console.log(`Updated: ${result.updated} existing cities`); + break; + } + + case 'stats': { + console.log('\nFetching discovery statistics...\n'); + const stats = await getDiscoveryStats(pool); + + console.log('=== CITIES ==='); + console.log(`Total: ${stats.cities.total}`); + console.log(`Crawled (24h): ${stats.cities.crawledLast24h}`); + console.log(`Never crawled: ${stats.cities.neverCrawled}`); + console.log(''); + console.log('=== LOCATIONS ==='); + console.log(`Total active: ${stats.locations.total}`); + console.log(`Discovered: ${stats.locations.discovered}`); + console.log(`Verified: ${stats.locations.verified}`); + console.log(`Merged: ${stats.locations.merged}`); + console.log(`Rejected: ${stats.locations.rejected}`); + console.log(''); + console.log('=== BY STATE ==='); + stats.locations.byState.forEach((s) => { + console.log(` ${s.stateCode}: ${s.count}`); + }); + break; + } + + case 'list': { + const status = flags.status as string; + const stateCode = flags.state as string; + const limit = flags.limit ? parseInt(flags.limit as string, 10) : 50; + + let whereClause = 'WHERE active = TRUE'; + const params: any[] = []; + let paramIndex = 1; + + if (status) { + whereClause += ` AND status = $${paramIndex}`; + params.push(status); + paramIndex++; + } + + if (stateCode) { + whereClause += ` AND state_code = $${paramIndex}`; + params.push(stateCode.toUpperCase()); + paramIndex++; + } + + params.push(limit); + + const { rows } = await pool.query( + ` + SELECT id, platform, name, city, state_code, status, platform_menu_url, first_seen_at + FROM dutchie_discovery_locations + ${whereClause} + ORDER BY first_seen_at DESC + LIMIT $${paramIndex} + `, + params + ); + + console.log(`\nFound ${rows.length} locations:\n`); + console.log('ID\tStatus\t\tState\tCity\t\tName'); + console.log('-'.repeat(80)); + rows.forEach((row: any) => { + const cityDisplay = (row.city || '').substring(0, 12).padEnd(12); + const nameDisplay = (row.name || '').substring(0, 30); + console.log( + `${row.id}\t${row.status.padEnd(12)}\t${row.state_code || 'N/A'}\t${cityDisplay}\t${nameDisplay}` + ); + }); + break; + } + + default: + console.error(`Unknown command: ${command}`); + printHelp(); + process.exit(1); + } + } catch (error: any) { + console.error('ERROR:', error.message); + if (flags.verbose) { + console.error(error.stack); + } + process.exit(1); + } finally { + await pool.end(); + } +} + +main(); diff --git a/backend/src/scripts/run-dutchie-scrape.ts b/backend/src/scripts/run-dutchie-scrape.ts index 6682f7b3..1272971f 100644 --- a/backend/src/scripts/run-dutchie-scrape.ts +++ b/backend/src/scripts/run-dutchie-scrape.ts @@ -1,5 +1,8 @@ /** - * Run Dutchie GraphQL Scrape + * LEGACY SCRIPT - Run Dutchie GraphQL Scrape + * + * DEPRECATED: This script creates its own database pool. + * Future implementations should use the CannaiQ API endpoints instead. * * This script demonstrates the full pipeline: * 1. Puppeteer navigates to Dutchie menu @@ -7,12 +10,21 @@ * 3. Products are normalized to our schema * 4. Products are upserted to database * 5. Derived views (brands, categories, specials) are automatically updated + * + * DO NOT: + * - Add this to package.json scripts + * - Run this in automated jobs + * - Use DATABASE_URL directly */ import { Pool } from 'pg'; import { scrapeDutchieMenu } from '../scrapers/dutchie-graphql'; -const DATABASE_URL = process.env.DATABASE_URL || 'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus'; +console.warn('\n⚠️ LEGACY SCRIPT: This script should be replaced with CannaiQ API calls.\n'); + +// Single database connection (cannaiq in cannaiq-postgres container) +const DATABASE_URL = process.env.CANNAIQ_DB_URL || + `postgresql://${process.env.CANNAIQ_DB_USER || 'dutchie'}:${process.env.CANNAIQ_DB_PASS || 'dutchie_local_pass'}@${process.env.CANNAIQ_DB_HOST || 'localhost'}:${process.env.CANNAIQ_DB_PORT || '54320'}/${process.env.CANNAIQ_DB_NAME || 'cannaiq'}`; async function main() { const pool = new Pool({ connectionString: DATABASE_URL }); diff --git a/backend/src/scripts/run-hydration.ts b/backend/src/scripts/run-hydration.ts new file mode 100644 index 00000000..5ebaa773 --- /dev/null +++ b/backend/src/scripts/run-hydration.ts @@ -0,0 +1,510 @@ +#!/usr/bin/env npx tsx +/** + * Unified Hydration CLI + * + * Central entrypoint for all hydration operations: + * + * MODES: + * payload - Process raw_payloads → canonical tables (existing behavior) + * backfill - Migrate dutchie_* → canonical tables (legacy backfill) + * sync - Sync recent crawls to canonical tables + * status - Show hydration progress + * + * Usage: + * npx tsx src/scripts/run-hydration.ts --mode= [options] + * + * Examples: + * # Payload-based hydration (default) + * npx tsx src/scripts/run-hydration.ts --mode=payload + * + * # Full legacy backfill + * npx tsx src/scripts/run-hydration.ts --mode=backfill + * + * # Backfill single dispensary + * npx tsx src/scripts/run-hydration.ts --mode=backfill --store=123 + * + * # Sync recent crawls + * npx tsx src/scripts/run-hydration.ts --mode=sync --since="2 hours" + * + * # Check status + * npx tsx src/scripts/run-hydration.ts --mode=status + */ + +import { Pool } from 'pg'; +import dotenv from 'dotenv'; +import { + HydrationWorker, + runHydrationBatch, + processPayloadById, + reprocessFailedPayloads, + getPayloadStats, +} from '../hydration'; +import { runLegacyBackfill } from '../hydration/legacy-backfill'; +import { syncRecentCrawls } from '../hydration/incremental-sync'; + +dotenv.config(); + +// ============================================================ +// ARGUMENT PARSING +// ============================================================ + +interface CliArgs { + mode: 'payload' | 'backfill' | 'sync' | 'status'; + store?: number; + since?: string; + dryRun: boolean; + verbose: boolean; + limit: number; + loop: boolean; + reprocess: boolean; + payloadId?: string; + startFrom?: number; +} + +function parseArgs(): CliArgs { + const args = process.argv.slice(2); + + // Defaults + const result: CliArgs = { + mode: 'payload', + dryRun: args.includes('--dry-run'), + verbose: args.includes('--verbose') || args.includes('-v'), + limit: 50, + loop: args.includes('--loop'), + reprocess: args.includes('--reprocess'), + }; + + // Parse --mode= + const modeArg = args.find(a => a.startsWith('--mode=')); + if (modeArg) { + const mode = modeArg.split('=')[1]; + if (['payload', 'backfill', 'sync', 'status'].includes(mode)) { + result.mode = mode as CliArgs['mode']; + } + } + + // Parse --store= + const storeArg = args.find(a => a.startsWith('--store=')); + if (storeArg) { + result.store = parseInt(storeArg.split('=')[1], 10); + } + + // Parse --since= + const sinceArg = args.find(a => a.startsWith('--since=')); + if (sinceArg) { + result.since = sinceArg.split('=')[1]; + } + + // Parse --limit= or --limit + const limitArg = args.find(a => a.startsWith('--limit=')); + if (limitArg) { + result.limit = parseInt(limitArg.split('=')[1], 10); + } else { + const limitIdx = args.indexOf('--limit'); + if (limitIdx !== -1 && args[limitIdx + 1]) { + result.limit = parseInt(args[limitIdx + 1], 10); + } + } + + // Parse --payload= or --payload + const payloadArg = args.find(a => a.startsWith('--payload=')); + if (payloadArg) { + result.payloadId = payloadArg.split('=')[1]; + } else { + const payloadIdx = args.indexOf('--payload'); + if (payloadIdx !== -1 && args[payloadIdx + 1]) { + result.payloadId = args[payloadIdx + 1]; + } + } + + // Parse --start-from= + const startArg = args.find(a => a.startsWith('--start-from=')); + if (startArg) { + result.startFrom = parseInt(startArg.split('=')[1], 10); + } + + return result; +} + +// ============================================================ +// DATABASE CONNECTION +// ============================================================ + +function getConnectionString(): string { + if (process.env.CANNAIQ_DB_URL) { + return process.env.CANNAIQ_DB_URL; + } + + const host = process.env.CANNAIQ_DB_HOST; + const port = process.env.CANNAIQ_DB_PORT; + const name = process.env.CANNAIQ_DB_NAME; + const user = process.env.CANNAIQ_DB_USER; + const pass = process.env.CANNAIQ_DB_PASS; + + if (host && port && name && user && pass) { + return `postgresql://${user}:${pass}@${host}:${port}/${name}`; + } + + // Fallback to DATABASE_URL for local development + if (process.env.DATABASE_URL) { + return process.env.DATABASE_URL; + } + + throw new Error('Missing database connection environment variables'); +} + +// ============================================================ +// MODE: PAYLOAD (existing behavior) +// ============================================================ + +async function runPayloadMode(pool: Pool, args: CliArgs): Promise { + console.log('='.repeat(60)); + console.log('HYDRATION - PAYLOAD MODE'); + console.log('='.repeat(60)); + console.log(`Dry run: ${args.dryRun}`); + console.log(`Batch size: ${args.limit}`); + console.log(''); + + // Show current stats + try { + const stats = await getPayloadStats(pool); + console.log('Current payload stats:'); + console.log(` Total: ${stats.total}`); + console.log(` Processed: ${stats.processed}`); + console.log(` Unprocessed: ${stats.unprocessed}`); + console.log(` Failed: ${stats.failed}`); + console.log(''); + } catch { + console.log('Note: raw_payloads table not found or empty'); + console.log(''); + } + + if (args.payloadId) { + // Process specific payload + console.log(`Processing payload: ${args.payloadId}`); + const result = await processPayloadById(pool, args.payloadId, { dryRun: args.dryRun }); + console.log('Result:', JSON.stringify(result, null, 2)); + } else if (args.reprocess) { + // Reprocess failed payloads + console.log('Reprocessing failed payloads...'); + const result = await reprocessFailedPayloads(pool, { dryRun: args.dryRun, batchSize: args.limit }); + console.log('Result:', JSON.stringify(result, null, 2)); + } else if (args.loop) { + // Run continuous loop + const worker = new HydrationWorker(pool, { dryRun: args.dryRun, batchSize: args.limit }); + + process.on('SIGINT', () => { + console.log('\nStopping hydration loop...'); + worker.stop(); + }); + + await worker.runLoop(30000); + } else { + // Run single batch + const result = await runHydrationBatch(pool, { dryRun: args.dryRun, batchSize: args.limit }); + console.log('Batch result:'); + console.log(` Payloads processed: ${result.payloadsProcessed}`); + console.log(` Payloads failed: ${result.payloadsFailed}`); + console.log(` Products upserted: ${result.totalProductsUpserted}`); + console.log(` Snapshots created: ${result.totalSnapshotsCreated}`); + console.log(` Brands created: ${result.totalBrandsCreated}`); + console.log(` Duration: ${result.durationMs}ms`); + + if (result.errors.length > 0) { + console.log('\nErrors:'); + for (const err of result.errors.slice(0, 10)) { + console.log(` ${err.payloadId}: ${err.error}`); + } + } + } +} + +// ============================================================ +// MODE: BACKFILL (legacy dutchie_* → canonical) +// ============================================================ + +async function runBackfillMode(pool: Pool, args: CliArgs): Promise { + console.log('='.repeat(60)); + console.log('HYDRATION - BACKFILL MODE'); + console.log('='.repeat(60)); + console.log(`Mode: ${args.dryRun ? 'DRY RUN' : 'LIVE'}`); + if (args.store) { + console.log(`Store: ${args.store}`); + } + if (args.startFrom) { + console.log(`Start from product ID: ${args.startFrom}`); + } + console.log(''); + + await runLegacyBackfill(pool, { + dryRun: args.dryRun, + verbose: args.verbose, + dispensaryId: args.store, + startFromProductId: args.startFrom, + }); +} + +// ============================================================ +// MODE: SYNC (recent crawls → canonical) +// ============================================================ + +async function runSyncMode(pool: Pool, args: CliArgs): Promise { + const since = args.since || '1 hour'; + + console.log('='.repeat(60)); + console.log('HYDRATION - SYNC MODE'); + console.log('='.repeat(60)); + console.log(`Mode: ${args.dryRun ? 'DRY RUN' : 'LIVE'}`); + console.log(`Since: ${since}`); + console.log(`Limit: ${args.limit}`); + if (args.store) { + console.log(`Store: ${args.store}`); + } + console.log(''); + + const result = await syncRecentCrawls(pool, { + dryRun: args.dryRun, + verbose: args.verbose, + since, + dispensaryId: args.store, + limit: args.limit, + }); + + console.log(''); + console.log('=== Sync Results ==='); + console.log(`Crawls synced: ${result.synced}`); + console.log(`Errors: ${result.errors.length}`); + + if (result.errors.length > 0) { + console.log(''); + console.log('Errors:'); + for (const error of result.errors.slice(0, 10)) { + console.log(` - ${error}`); + } + if (result.errors.length > 10) { + console.log(` ... and ${result.errors.length - 10} more`); + } + } +} + +// ============================================================ +// MODE: STATUS +// ============================================================ + +async function runStatusMode(pool: Pool): Promise { + console.log('='.repeat(60)); + console.log('HYDRATION STATUS'); + console.log('='.repeat(60)); + console.log(''); + + // Check if v_hydration_status view exists + const viewExists = await pool.query(` + SELECT EXISTS ( + SELECT 1 FROM pg_views WHERE viewname = 'v_hydration_status' + ) as exists + `); + + if (viewExists.rows[0].exists) { + const { rows } = await pool.query('SELECT * FROM v_hydration_status'); + console.log('Hydration Progress:'); + console.log('-'.repeat(70)); + console.log( + 'Table'.padEnd(30) + + 'Source'.padEnd(12) + + 'Hydrated'.padEnd(12) + + 'Progress' + ); + console.log('-'.repeat(70)); + + for (const row of rows) { + const progress = row.hydration_pct ? `${row.hydration_pct}%` : 'N/A'; + console.log( + row.source_table.padEnd(30) + + String(row.source_count).padEnd(12) + + String(row.hydrated_count).padEnd(12) + + progress + ); + } + console.log('-'.repeat(70)); + } else { + console.log('Note: v_hydration_status view not found. Run migration 052 first.'); + } + + // Get counts from canonical tables + console.log('\nCanonical Table Counts:'); + console.log('-'.repeat(40)); + + const tables = ['store_products', 'store_product_snapshots', 'crawl_runs']; + for (const table of tables) { + try { + const { rows } = await pool.query(`SELECT COUNT(*) as cnt FROM ${table}`); + console.log(`${table}: ${rows[0].cnt}`); + } catch { + console.log(`${table}: (table not found)`); + } + } + + // Get legacy table counts + console.log('\nLegacy Table Counts:'); + console.log('-'.repeat(40)); + + const legacyTables = ['dutchie_products', 'dutchie_product_snapshots', 'dispensary_crawl_jobs']; + for (const table of legacyTables) { + try { + const { rows } = await pool.query(`SELECT COUNT(*) as cnt FROM ${table}`); + console.log(`${table}: ${rows[0].cnt}`); + } catch { + console.log(`${table}: (table not found)`); + } + } + + // Show recent sync activity + console.log('\nRecent Crawl Runs (last 24h):'); + console.log('-'.repeat(40)); + + try { + const { rows } = await pool.query(` + SELECT status, COUNT(*) as count + FROM crawl_runs + WHERE started_at > NOW() - INTERVAL '24 hours' + GROUP BY status + ORDER BY count DESC + `); + + if (rows.length === 0) { + console.log('No crawl runs in last 24 hours'); + } else { + for (const row of rows) { + console.log(`${row.status}: ${row.count}`); + } + } + } catch { + console.log('(crawl_runs table not found)'); + } + + // Payload stats + console.log('\nPayload Hydration:'); + console.log('-'.repeat(40)); + + try { + const stats = await getPayloadStats(pool); + console.log(`Total payloads: ${stats.total}`); + console.log(`Processed: ${stats.processed}`); + console.log(`Unprocessed: ${stats.unprocessed}`); + console.log(`Failed: ${stats.failed}`); + } catch { + console.log('(raw_payloads table not found)'); + } +} + +// ============================================================ +// HELP +// ============================================================ + +function showHelp(): void { + console.log(` +Unified Hydration CLI + +Usage: + npx tsx src/scripts/run-hydration.ts --mode= [options] + +Modes: + payload Process raw_payloads → canonical tables (default) + backfill Migrate dutchie_* → canonical tables + sync Sync recent crawls to canonical tables + status Show hydration progress + +Common Options: + --dry-run Print changes without modifying database + --verbose, -v Show detailed progress + --store= Limit to a single dispensary + --limit= Batch size (default: 50) + +Payload Mode Options: + --loop Run continuous hydration loop + --reprocess Reprocess failed payloads + --payload= Process a specific payload by ID + +Backfill Mode Options: + --start-from= Resume from a specific product ID + +Sync Mode Options: + --since= Time window (default: "1 hour") + Examples: "30 minutes", "2 hours", "1 day" + +Examples: + # Full legacy backfill (dutchie_* → canonical) + npx tsx src/scripts/run-hydration.ts --mode=backfill + + # Backfill single dispensary (dry run) + npx tsx src/scripts/run-hydration.ts --mode=backfill --store=123 --dry-run + + # Sync recent crawls from last 4 hours + npx tsx src/scripts/run-hydration.ts --mode=sync --since="4 hours" + + # Sync single dispensary + npx tsx src/scripts/run-hydration.ts --mode=sync --store=123 + + # Run payload hydration loop + npx tsx src/scripts/run-hydration.ts --mode=payload --loop + + # Check hydration status + npx tsx src/scripts/run-hydration.ts --mode=status +`); +} + +// ============================================================ +// MAIN +// ============================================================ + +async function main(): Promise { + const rawArgs = process.argv.slice(2); + + if (rawArgs.includes('--help') || rawArgs.includes('-h')) { + showHelp(); + process.exit(0); + } + + const args = parseArgs(); + + const pool = new Pool({ + connectionString: getConnectionString(), + max: 5, + }); + + try { + // Verify connection + await pool.query('SELECT 1'); + console.log('Database connection: OK\n'); + + switch (args.mode) { + case 'payload': + await runPayloadMode(pool, args); + break; + + case 'backfill': + await runBackfillMode(pool, args); + break; + + case 'sync': + await runSyncMode(pool, args); + break; + + case 'status': + await runStatusMode(pool); + break; + + default: + console.error(`Unknown mode: ${args.mode}`); + showHelp(); + process.exit(1); + } + } catch (error: any) { + console.error('Error:', error.message); + process.exit(1); + } finally { + await pool.end(); + } +} + +main(); diff --git a/backend/src/scripts/sandbox-crawl-101.ts b/backend/src/scripts/sandbox-crawl-101.ts new file mode 100644 index 00000000..3a9cf0ec --- /dev/null +++ b/backend/src/scripts/sandbox-crawl-101.ts @@ -0,0 +1,225 @@ +/** + * Sandbox Crawl Script for Dispensary 101 (Trulieve Scottsdale) + * + * Runs a full crawl and captures trace data for observability. + * NO automatic promotion or status changes. + */ + +import { Pool } from 'pg'; +import { crawlDispensaryProducts } from '../dutchie-az/services/product-crawler'; +import { Dispensary } from '../dutchie-az/types'; + +const pool = new Pool({ connectionString: process.env.DATABASE_URL }); + +async function main() { + console.log('=== SANDBOX CRAWL: Dispensary 101 (Trulieve Scottsdale) ===\n'); + const startTime = Date.now(); + + // Load dispensary from database (only columns that exist in local schema) + const dispResult = await pool.query(` + SELECT id, name, city, state, menu_type, platform_dispensary_id, menu_url + FROM dispensaries + WHERE id = 101 + `); + + if (!dispResult.rows[0]) { + console.log('ERROR: Dispensary 101 not found'); + await pool.end(); + return; + } + + const row = dispResult.rows[0]; + + // Map to Dispensary interface (snake_case -> camelCase) + const dispensary: Dispensary = { + id: row.id, + platform: 'dutchie', + name: row.name, + slug: row.name.toLowerCase().replace(/\s+/g, '-'), + city: row.city, + state: row.state, + platformDispensaryId: row.platform_dispensary_id, + menuType: row.menu_type, + menuUrl: row.menu_url, + createdAt: new Date(), + updatedAt: new Date(), + }; + + console.log('=== DISPENSARY INFO ==='); + console.log(`Name: ${dispensary.name}`); + console.log(`Location: ${dispensary.city}, ${dispensary.state}`); + console.log(`Menu Type: ${dispensary.menuType}`); + console.log(`Platform ID: ${dispensary.platformDispensaryId}`); + console.log(`Menu URL: ${dispensary.menuUrl}`); + console.log(''); + + // Get profile info + const profileResult = await pool.query(` + SELECT id, profile_key, status, config FROM dispensary_crawler_profiles + WHERE dispensary_id = 101 + `); + + const profile = profileResult.rows[0]; + if (profile) { + console.log('=== PROFILE ==='); + console.log(`Profile Key: ${profile.profile_key}`); + console.log(`Profile Status: ${profile.status}`); + console.log(`Config: ${JSON.stringify(profile.config, null, 2)}`); + console.log(''); + } else { + console.log('=== PROFILE ==='); + console.log('No profile found - will use defaults'); + console.log(''); + } + + // Run the crawl + console.log('=== STARTING CRAWL ==='); + console.log('Options: useBothModes=true, downloadImages=false (sandbox)'); + console.log(''); + + try { + const result = await crawlDispensaryProducts(dispensary, 'rec', { + useBothModes: true, + downloadImages: false, // Skip images in sandbox mode for speed + }); + + console.log(''); + console.log('=== CRAWL RESULT ==='); + console.log(`Success: ${result.success}`); + console.log(`Products Found: ${result.productsFound}`); + console.log(`Products Fetched: ${result.productsFetched}`); + console.log(`Products Upserted: ${result.productsUpserted}`); + console.log(`Snapshots Created: ${result.snapshotsCreated}`); + if (result.errorMessage) { + console.log(`Error: ${result.errorMessage}`); + } + console.log(`Duration: ${result.durationMs}ms`); + console.log(''); + + // Show sample products from database + if (result.productsUpserted > 0) { + const sampleProducts = await pool.query(` + SELECT + id, name, brand_name, type, subcategory, strain_type, + price_rec, price_rec_original, stock_status, external_product_id + FROM dutchie_products + WHERE dispensary_id = 101 + ORDER BY updated_at DESC + LIMIT 10 + `); + + console.log('=== SAMPLE PRODUCTS (10) ==='); + sampleProducts.rows.forEach((p: any, i: number) => { + console.log(`${i + 1}. ${p.name}`); + console.log(` Brand: ${p.brand_name || 'N/A'}`); + console.log(` Type: ${p.type} / ${p.subcategory || 'N/A'}`); + console.log(` Strain: ${p.strain_type || 'N/A'}`); + console.log(` Price: $${p.price_rec || 'N/A'} (orig: $${p.price_rec_original || 'N/A'})`); + console.log(` Stock: ${p.stock_status}`); + console.log(` External ID: ${p.external_product_id}`); + console.log(''); + }); + + // Show field coverage stats + const fieldStats = await pool.query(` + SELECT + COUNT(*) as total, + COUNT(brand_name) as with_brand, + COUNT(type) as with_type, + COUNT(strain_type) as with_strain, + COUNT(price_rec) as with_price, + COUNT(image_url) as with_image, + COUNT(description) as with_description, + COUNT(thc_content) as with_thc, + COUNT(cbd_content) as with_cbd + FROM dutchie_products + WHERE dispensary_id = 101 + `); + + const stats = fieldStats.rows[0]; + console.log('=== FIELD COVERAGE ==='); + console.log(`Total products: ${stats.total}`); + console.log(`With brand: ${stats.with_brand} (${Math.round(stats.with_brand / stats.total * 100)}%)`); + console.log(`With type: ${stats.with_type} (${Math.round(stats.with_type / stats.total * 100)}%)`); + console.log(`With strain_type: ${stats.with_strain} (${Math.round(stats.with_strain / stats.total * 100)}%)`); + console.log(`With price_rec: ${stats.with_price} (${Math.round(stats.with_price / stats.total * 100)}%)`); + console.log(`With image_url: ${stats.with_image} (${Math.round(stats.with_image / stats.total * 100)}%)`); + console.log(`With description: ${stats.with_description} (${Math.round(stats.with_description / stats.total * 100)}%)`); + console.log(`With THC: ${stats.with_thc} (${Math.round(stats.with_thc / stats.total * 100)}%)`); + console.log(`With CBD: ${stats.with_cbd} (${Math.round(stats.with_cbd / stats.total * 100)}%)`); + console.log(''); + } + + // Insert trace record for observability + const traceData = { + crawlResult: result, + dispensaryInfo: { + id: dispensary.id, + name: dispensary.name, + platformDispensaryId: dispensary.platformDispensaryId, + menuUrl: dispensary.menuUrl, + }, + profile: profile || null, + timestamp: new Date().toISOString(), + }; + + await pool.query(` + INSERT INTO crawl_orchestration_traces + (dispensary_id, profile_id, profile_key, crawler_module, mode, + state_at_start, state_at_end, trace, success, products_found, + duration_ms, started_at, completed_at) + VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, NOW()) + `, [ + 101, + profile?.id || null, + profile?.profile_key || null, + 'product-crawler', + 'sandbox', + profile?.status || 'no_profile', + profile?.status || 'no_profile', // No status change in sandbox + JSON.stringify(traceData), + result.success, + result.productsFound, + result.durationMs, + new Date(startTime), + ]); + + console.log('=== TRACE RECORDED ==='); + console.log('Trace saved to crawl_orchestration_traces table'); + + } catch (error: any) { + console.error('=== CRAWL ERROR ==='); + console.error('Error:', error.message); + console.error('Stack:', error.stack); + + // Record error trace + await pool.query(` + INSERT INTO crawl_orchestration_traces + (dispensary_id, profile_id, profile_key, crawler_module, mode, + state_at_start, state_at_end, trace, success, error_message, + duration_ms, started_at, completed_at) + VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, NOW()) + `, [ + 101, + profile?.id || null, + profile?.profile_key || null, + 'product-crawler', + 'sandbox', + profile?.status || 'no_profile', + profile?.status || 'no_profile', + JSON.stringify({ error: error.message, stack: error.stack }), + false, + error.message, + Date.now() - startTime, + new Date(startTime), + ]); + } + + await pool.end(); + console.log('=== SANDBOX CRAWL COMPLETE ==='); +} + +main().catch(e => { + console.error('Fatal error:', e.message); + process.exit(1); +}); diff --git a/backend/src/scripts/sandbox-test.ts b/backend/src/scripts/sandbox-test.ts new file mode 100644 index 00000000..edacdbf7 --- /dev/null +++ b/backend/src/scripts/sandbox-test.ts @@ -0,0 +1,181 @@ +/** + * LEGACY SCRIPT - Sandbox Crawl Test + * + * DEPRECATED: This script uses direct database connections. + * Future implementations should use the CannaiQ API endpoints instead. + * + * This script runs sandbox crawl for a dispensary and captures the full trace. + * It is kept for historical reference and manual testing only. + * + * DO NOT: + * - Add this to package.json scripts + * - Run this in automated jobs + * - Use DATABASE_URL directly + * + * Usage (manual only): + * STORAGE_DRIVER=local npx tsx src/scripts/sandbox-test.ts + * + * LOCAL MODE REQUIREMENTS: + * - STORAGE_DRIVER=local + * - STORAGE_BASE_PATH=./storage + * - Local cannaiq-postgres on port 54320 + * - NO MinIO, NO Kubernetes + */ + +import { query, getClient, closePool } from '../dutchie-az/db/connection'; +import { runDispensaryOrchestrator } from '../services/dispensary-orchestrator'; + +// Verify local mode +function verifyLocalMode(): void { + const storageDriver = process.env.STORAGE_DRIVER || 'local'; + const minioEndpoint = process.env.MINIO_ENDPOINT; + + console.log('=== LOCAL MODE VERIFICATION ==='); + console.log(`STORAGE_DRIVER: ${storageDriver}`); + console.log(`MINIO_ENDPOINT: ${minioEndpoint || 'NOT SET (good)'}`); + console.log(`STORAGE_BASE_PATH: ${process.env.STORAGE_BASE_PATH || './storage'}`); + console.log('DB Connection: Using canonical CannaiQ pool'); + + if (storageDriver !== 'local') { + console.error('ERROR: STORAGE_DRIVER must be "local"'); + process.exit(1); + } + + if (minioEndpoint) { + console.error('ERROR: MINIO_ENDPOINT should NOT be set in local mode'); + process.exit(1); + } + + console.log('✅ Local mode verified\n'); +} + +async function getDispensaryInfo(dispensaryId: number) { + const result = await query(` + SELECT d.id, d.name, d.city, d.menu_type, d.platform_dispensary_id, d.menu_url, + p.profile_key, p.status as profile_status, p.config + FROM dispensaries d + LEFT JOIN dispensary_crawler_profiles p ON p.dispensary_id = d.id + WHERE d.id = $1 + `, [dispensaryId]); + + return result.rows[0]; +} + +async function getLatestTrace(dispensaryId: number) { + const result = await query(` + SELECT * + FROM crawl_orchestration_traces + WHERE dispensary_id = $1 + ORDER BY created_at DESC + LIMIT 1 + `, [dispensaryId]); + + return result.rows[0]; +} + +async function main() { + console.warn('\n⚠️ LEGACY SCRIPT: This script should be replaced with CannaiQ API calls.\n'); + + const dispensaryId = parseInt(process.argv[2], 10); + + if (!dispensaryId || isNaN(dispensaryId)) { + console.error('Usage: npx tsx src/scripts/sandbox-test.ts '); + console.error('Example: npx tsx src/scripts/sandbox-test.ts 101'); + process.exit(1); + } + + // Verify local mode first + verifyLocalMode(); + + try { + // Get dispensary info + console.log(`=== DISPENSARY INFO (ID: ${dispensaryId}) ===`); + const dispensary = await getDispensaryInfo(dispensaryId); + + if (!dispensary) { + console.error(`Dispensary ${dispensaryId} not found`); + process.exit(1); + } + + console.log(`Name: ${dispensary.name}`); + console.log(`City: ${dispensary.city}`); + console.log(`Menu Type: ${dispensary.menu_type}`); + console.log(`Platform Dispensary ID: ${dispensary.platform_dispensary_id || 'NULL'}`); + console.log(`Menu URL: ${dispensary.menu_url || 'NULL'}`); + console.log(`Profile Key: ${dispensary.profile_key || 'NONE'}`); + console.log(`Profile Status: ${dispensary.profile_status || 'N/A'}`); + console.log(`Profile Config: ${JSON.stringify(dispensary.config, null, 2)}`); + console.log(''); + + // Run sandbox crawl + console.log('=== RUNNING SANDBOX CRAWL ==='); + console.log(`Starting sandbox crawl for ${dispensary.name}...`); + const startTime = Date.now(); + + const result = await runDispensaryOrchestrator(dispensaryId); + + const duration = Date.now() - startTime; + + console.log('\n=== CRAWL RESULT ==='); + console.log(`Status: ${result.status}`); + console.log(`Summary: ${result.summary}`); + console.log(`Run ID: ${result.runId}`); + console.log(`Duration: ${duration}ms`); + console.log(`Detection Ran: ${result.detectionRan}`); + console.log(`Crawl Ran: ${result.crawlRan}`); + console.log(`Crawl Type: ${result.crawlType || 'N/A'}`); + console.log(`Products Found: ${result.productsFound || 0}`); + console.log(`Products New: ${result.productsNew || 0}`); + console.log(`Products Updated: ${result.productsUpdated || 0}`); + + if (result.error) { + console.log(`Error: ${result.error}`); + } + + // Get the trace + console.log('\n=== ORCHESTRATOR TRACE ==='); + const trace = await getLatestTrace(dispensaryId); + + if (trace) { + console.log(`Trace ID: ${trace.id}`); + console.log(`Profile Key: ${trace.profile_key || 'N/A'}`); + console.log(`Mode: ${trace.mode}`); + console.log(`Status: ${trace.status}`); + console.log(`Started At: ${trace.started_at}`); + console.log(`Completed At: ${trace.completed_at || 'In Progress'}`); + + if (trace.steps && Array.isArray(trace.steps)) { + console.log(`\nSteps (${trace.steps.length} total):`); + trace.steps.forEach((step: any, i: number) => { + const status = step.status === 'completed' ? '✅' : step.status === 'failed' ? '❌' : '⏳'; + console.log(` ${i + 1}. ${status} ${step.action}: ${step.description}`); + if (step.output && Object.keys(step.output).length > 0) { + console.log(` Output: ${JSON.stringify(step.output)}`); + } + if (step.error) { + console.log(` Error: ${step.error}`); + } + }); + } + + if (trace.result) { + console.log(`\nResult: ${JSON.stringify(trace.result, null, 2)}`); + } + + if (trace.error_message) { + console.log(`\nError Message: ${trace.error_message}`); + } + } else { + console.log('No trace found for this dispensary'); + } + + } catch (error: any) { + console.error('Error running sandbox test:', error.message); + console.error(error.stack); + process.exit(1); + } finally { + await closePool(); + } +} + +main(); diff --git a/backend/src/scripts/sandbox-validate-101.ts b/backend/src/scripts/sandbox-validate-101.ts new file mode 100644 index 00000000..848215bf --- /dev/null +++ b/backend/src/scripts/sandbox-validate-101.ts @@ -0,0 +1,88 @@ +/** + * Sandbox Validation Script for Dispensary 101 (Trulieve Scottsdale) + * + * This script runs a sandbox crawl and captures the trace for observability. + * NO automatic promotion or state changes. + */ + +import { Pool } from 'pg'; + +const pool = new Pool({ connectionString: process.env.DATABASE_URL }); + +async function main() { + console.log('=== SANDBOX VALIDATION: Dispensary 101 (Trulieve Scottsdale) ==='); + console.log(''); + + // Get dispensary info + const dispResult = await pool.query(` + SELECT d.id, d.name, d.city, d.state, d.menu_type, d.platform_dispensary_id, d.menu_url, + dcp.id as profile_id, dcp.profile_key, dcp.status as profile_status, dcp.config + FROM dispensaries d + LEFT JOIN dispensary_crawler_profiles dcp ON dcp.dispensary_id = d.id + WHERE d.id = 101 + `); + + if (!dispResult.rows[0]) { + console.log('ERROR: Dispensary 101 not found'); + await pool.end(); + return; + } + + const disp = dispResult.rows[0]; + console.log('=== DISPENSARY INFO ==='); + console.log('Name:', disp.name); + console.log('Location:', disp.city + ', ' + disp.state); + console.log('Menu Type:', disp.menu_type); + console.log('Platform ID:', disp.platform_dispensary_id); + console.log('Menu URL:', disp.menu_url); + console.log(''); + + console.log('=== PROFILE ==='); + console.log('Profile ID:', disp.profile_id); + console.log('Profile Key:', disp.profile_key); + console.log('Profile Status:', disp.profile_status); + console.log('Config:', JSON.stringify(disp.config, null, 2)); + console.log(''); + + // Get product count + const products = await pool.query('SELECT COUNT(*) FROM dutchie_products WHERE dispensary_id = 101'); + console.log('Current product count:', products.rows[0].count); + console.log(''); + + // Check for traces (local DB uses state_at_start/state_at_end column names) + const traces = await pool.query(` + SELECT id, run_id, state_at_start, state_at_end, + products_found, success, error_message, created_at, trace + FROM crawl_orchestration_traces + WHERE dispensary_id = 101 + ORDER BY created_at DESC + LIMIT 3 + `); + + console.log('=== RECENT TRACES ==='); + if (traces.rows.length === 0) { + console.log('No traces found'); + } else { + traces.rows.forEach((t: any, i: number) => { + console.log(`${i+1}. [id:${t.id}] ${t.state_at_start} -> ${t.state_at_end}`); + console.log(` Products: ${t.products_found} | Success: ${t.success}`); + if (t.error_message) console.log(` Error: ${t.error_message}`); + if (t.trace && Array.isArray(t.trace)) { + console.log(' Trace steps:'); + t.trace.slice(0, 5).forEach((s: any, j: number) => { + console.log(` ${j+1}. [${s.status || s.type}] ${s.step_name || s.message || JSON.stringify(s).slice(0, 60)}`); + }); + if (t.trace.length > 5) console.log(` ... and ${t.trace.length - 5} more steps`); + } + console.log(''); + }); + } + + await pool.end(); + console.log('=== DATABASE CHECK COMPLETE ==='); +} + +main().catch(e => { + console.error('Error:', e.message); + process.exit(1); +}); diff --git a/backend/src/scripts/scrape-all-active.ts b/backend/src/scripts/scrape-all-active.ts index 164d4bc2..9e9006de 100644 --- a/backend/src/scripts/scrape-all-active.ts +++ b/backend/src/scripts/scrape-all-active.ts @@ -1,6 +1,16 @@ /** - * Scrape ALL active products via direct GraphQL pagination - * This is more reliable than category navigation + * LEGACY SCRIPT - Scrape All Active Products + * + * DEPRECATED: This script creates its own database pool. + * Future implementations should use the CannaiQ API endpoints instead. + * + * Scrapes ALL active products via direct GraphQL pagination. + * This is more reliable than category navigation. + * + * DO NOT: + * - Add this to package.json scripts + * - Run this in automated jobs + * - Use DATABASE_URL directly */ import puppeteer from 'puppeteer-extra'; @@ -10,8 +20,11 @@ import { normalizeDutchieProduct, DutchieProduct } from '../scrapers/dutchie-gra puppeteer.use(StealthPlugin()); -const DATABASE_URL = - process.env.DATABASE_URL || 'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus'; +console.warn('\n⚠️ LEGACY SCRIPT: This script should be replaced with CannaiQ API calls.\n'); + +// Single database connection (cannaiq in cannaiq-postgres container) +const DATABASE_URL = process.env.CANNAIQ_DB_URL || + `postgresql://${process.env.CANNAIQ_DB_USER || 'dutchie'}:${process.env.CANNAIQ_DB_PASS || 'dutchie_local_pass'}@${process.env.CANNAIQ_DB_HOST || 'localhost'}:${process.env.CANNAIQ_DB_PORT || '54320'}/${process.env.CANNAIQ_DB_NAME || 'cannaiq'}`; const GRAPHQL_HASH = 'ee29c060826dc41c527e470e9ae502c9b2c169720faa0a9f5d25e1b9a530a4a0'; async function scrapeAllProducts(menuUrl: string, storeId: number) { diff --git a/backend/src/scripts/search-dispensaries.ts b/backend/src/scripts/search-dispensaries.ts new file mode 100644 index 00000000..c9522947 --- /dev/null +++ b/backend/src/scripts/search-dispensaries.ts @@ -0,0 +1,42 @@ +import pg from 'pg'; +const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL }); + +async function main() { + // Search broadly for flower power + const result = await pool.query(` + SELECT id, name, address, city, state, zip, menu_url, menu_type, platform_dispensary_id, website + FROM dispensaries + WHERE LOWER(name) LIKE $1 OR LOWER(name) LIKE $2 OR LOWER(address) LIKE $3 + ORDER BY name + `, ['%flower%', '%az %', '%union hills%']); + + console.log('=== SEARCHING FOR FLOWER/AZ/UNION HILLS ==='); + result.rows.forEach((r: any) => console.log(JSON.stringify(r))); + + // Also search for any existing Nirvana dispensaries + const nirvana = await pool.query(` + SELECT id, name, address, city, state, zip, menu_url, menu_type, platform_dispensary_id, website + FROM dispensaries + WHERE LOWER(name) LIKE $1 + ORDER BY name + `, ['%nirvana%']); + + console.log(''); + console.log('=== EXISTING NIRVANA DISPENSARIES ==='); + nirvana.rows.forEach((r: any) => console.log(JSON.stringify(r))); + + // Get all AZ dispensaries for comparison + const allAZ = await pool.query(` + SELECT id, name, address, city, state, zip + FROM dispensaries + WHERE state = 'AZ' + ORDER BY name + `); + + console.log(''); + console.log('=== ALL AZ DISPENSARIES (' + allAZ.rows.length + ' total) ==='); + allAZ.rows.forEach((r: any) => console.log(JSON.stringify({id: r.id, name: r.name, address: r.address, city: r.city}))); + + await pool.end(); +} +main().catch(e => { console.error(e.message); process.exit(1); }); diff --git a/backend/src/scripts/seed-dt-cities-bulk.ts b/backend/src/scripts/seed-dt-cities-bulk.ts new file mode 100644 index 00000000..d2b053d2 --- /dev/null +++ b/backend/src/scripts/seed-dt-cities-bulk.ts @@ -0,0 +1,307 @@ +#!/usr/bin/env npx tsx +/** + * Seed Dutchie Discovery Cities - Bulk + * + * Seeds dutchie_discovery_cities with a static list of major US metros. + * Uses UPSERT to avoid duplicates on re-runs. + * + * Usage: + * npm run seed:dt:cities:bulk + * DATABASE_URL="..." npx tsx src/scripts/seed-dt-cities-bulk.ts + */ + +import { Pool } from 'pg'; + +const DB_URL = process.env.DATABASE_URL || process.env.CANNAIQ_DB_URL || + 'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus'; + +// ============================================================================ +// Static list of major US metros +// Format: { city_slug, city_name, state_code, country_code } +// ============================================================================ + +interface CityEntry { + city_slug: string; + city_name: string; + state_code: string; + country_code: string; +} + +const CITIES: CityEntry[] = [ + // Arizona (priority state) + { city_slug: 'az-phoenix', city_name: 'Phoenix', state_code: 'AZ', country_code: 'US' }, + { city_slug: 'az-tucson', city_name: 'Tucson', state_code: 'AZ', country_code: 'US' }, + { city_slug: 'az-mesa', city_name: 'Mesa', state_code: 'AZ', country_code: 'US' }, + { city_slug: 'az-scottsdale', city_name: 'Scottsdale', state_code: 'AZ', country_code: 'US' }, + { city_slug: 'az-tempe', city_name: 'Tempe', state_code: 'AZ', country_code: 'US' }, + { city_slug: 'az-chandler', city_name: 'Chandler', state_code: 'AZ', country_code: 'US' }, + { city_slug: 'az-glendale', city_name: 'Glendale', state_code: 'AZ', country_code: 'US' }, + { city_slug: 'az-peoria', city_name: 'Peoria', state_code: 'AZ', country_code: 'US' }, + { city_slug: 'az-flagstaff', city_name: 'Flagstaff', state_code: 'AZ', country_code: 'US' }, + { city_slug: 'az-sedona', city_name: 'Sedona', state_code: 'AZ', country_code: 'US' }, + + // California + { city_slug: 'ca-los-angeles', city_name: 'Los Angeles', state_code: 'CA', country_code: 'US' }, + { city_slug: 'ca-san-francisco', city_name: 'San Francisco', state_code: 'CA', country_code: 'US' }, + { city_slug: 'ca-san-diego', city_name: 'San Diego', state_code: 'CA', country_code: 'US' }, + { city_slug: 'ca-san-jose', city_name: 'San Jose', state_code: 'CA', country_code: 'US' }, + { city_slug: 'ca-oakland', city_name: 'Oakland', state_code: 'CA', country_code: 'US' }, + { city_slug: 'ca-sacramento', city_name: 'Sacramento', state_code: 'CA', country_code: 'US' }, + { city_slug: 'ca-fresno', city_name: 'Fresno', state_code: 'CA', country_code: 'US' }, + { city_slug: 'ca-long-beach', city_name: 'Long Beach', state_code: 'CA', country_code: 'US' }, + { city_slug: 'ca-bakersfield', city_name: 'Bakersfield', state_code: 'CA', country_code: 'US' }, + { city_slug: 'ca-anaheim', city_name: 'Anaheim', state_code: 'CA', country_code: 'US' }, + { city_slug: 'ca-santa-ana', city_name: 'Santa Ana', state_code: 'CA', country_code: 'US' }, + { city_slug: 'ca-riverside', city_name: 'Riverside', state_code: 'CA', country_code: 'US' }, + { city_slug: 'ca-stockton', city_name: 'Stockton', state_code: 'CA', country_code: 'US' }, + { city_slug: 'ca-irvine', city_name: 'Irvine', state_code: 'CA', country_code: 'US' }, + { city_slug: 'ca-santa-barbara', city_name: 'Santa Barbara', state_code: 'CA', country_code: 'US' }, + + // Colorado + { city_slug: 'co-denver', city_name: 'Denver', state_code: 'CO', country_code: 'US' }, + { city_slug: 'co-colorado-springs', city_name: 'Colorado Springs', state_code: 'CO', country_code: 'US' }, + { city_slug: 'co-aurora', city_name: 'Aurora', state_code: 'CO', country_code: 'US' }, + { city_slug: 'co-boulder', city_name: 'Boulder', state_code: 'CO', country_code: 'US' }, + { city_slug: 'co-fort-collins', city_name: 'Fort Collins', state_code: 'CO', country_code: 'US' }, + { city_slug: 'co-pueblo', city_name: 'Pueblo', state_code: 'CO', country_code: 'US' }, + + // Florida + { city_slug: 'fl-miami', city_name: 'Miami', state_code: 'FL', country_code: 'US' }, + { city_slug: 'fl-orlando', city_name: 'Orlando', state_code: 'FL', country_code: 'US' }, + { city_slug: 'fl-tampa', city_name: 'Tampa', state_code: 'FL', country_code: 'US' }, + { city_slug: 'fl-jacksonville', city_name: 'Jacksonville', state_code: 'FL', country_code: 'US' }, + { city_slug: 'fl-fort-lauderdale', city_name: 'Fort Lauderdale', state_code: 'FL', country_code: 'US' }, + { city_slug: 'fl-west-palm-beach', city_name: 'West Palm Beach', state_code: 'FL', country_code: 'US' }, + { city_slug: 'fl-st-petersburg', city_name: 'St. Petersburg', state_code: 'FL', country_code: 'US' }, + + // Illinois + { city_slug: 'il-chicago', city_name: 'Chicago', state_code: 'IL', country_code: 'US' }, + { city_slug: 'il-springfield', city_name: 'Springfield', state_code: 'IL', country_code: 'US' }, + { city_slug: 'il-peoria', city_name: 'Peoria', state_code: 'IL', country_code: 'US' }, + { city_slug: 'il-rockford', city_name: 'Rockford', state_code: 'IL', country_code: 'US' }, + + // Massachusetts + { city_slug: 'ma-boston', city_name: 'Boston', state_code: 'MA', country_code: 'US' }, + { city_slug: 'ma-worcester', city_name: 'Worcester', state_code: 'MA', country_code: 'US' }, + { city_slug: 'ma-springfield', city_name: 'Springfield', state_code: 'MA', country_code: 'US' }, + { city_slug: 'ma-cambridge', city_name: 'Cambridge', state_code: 'MA', country_code: 'US' }, + + // Michigan + { city_slug: 'mi-detroit', city_name: 'Detroit', state_code: 'MI', country_code: 'US' }, + { city_slug: 'mi-grand-rapids', city_name: 'Grand Rapids', state_code: 'MI', country_code: 'US' }, + { city_slug: 'mi-ann-arbor', city_name: 'Ann Arbor', state_code: 'MI', country_code: 'US' }, + { city_slug: 'mi-lansing', city_name: 'Lansing', state_code: 'MI', country_code: 'US' }, + { city_slug: 'mi-flint', city_name: 'Flint', state_code: 'MI', country_code: 'US' }, + + // Nevada + { city_slug: 'nv-las-vegas', city_name: 'Las Vegas', state_code: 'NV', country_code: 'US' }, + { city_slug: 'nv-reno', city_name: 'Reno', state_code: 'NV', country_code: 'US' }, + { city_slug: 'nv-henderson', city_name: 'Henderson', state_code: 'NV', country_code: 'US' }, + { city_slug: 'nv-north-las-vegas', city_name: 'North Las Vegas', state_code: 'NV', country_code: 'US' }, + + // New Jersey + { city_slug: 'nj-newark', city_name: 'Newark', state_code: 'NJ', country_code: 'US' }, + { city_slug: 'nj-jersey-city', city_name: 'Jersey City', state_code: 'NJ', country_code: 'US' }, + { city_slug: 'nj-paterson', city_name: 'Paterson', state_code: 'NJ', country_code: 'US' }, + { city_slug: 'nj-trenton', city_name: 'Trenton', state_code: 'NJ', country_code: 'US' }, + + // New Mexico + { city_slug: 'nm-albuquerque', city_name: 'Albuquerque', state_code: 'NM', country_code: 'US' }, + { city_slug: 'nm-santa-fe', city_name: 'Santa Fe', state_code: 'NM', country_code: 'US' }, + { city_slug: 'nm-las-cruces', city_name: 'Las Cruces', state_code: 'NM', country_code: 'US' }, + + // New York + { city_slug: 'ny-new-york', city_name: 'New York', state_code: 'NY', country_code: 'US' }, + { city_slug: 'ny-buffalo', city_name: 'Buffalo', state_code: 'NY', country_code: 'US' }, + { city_slug: 'ny-rochester', city_name: 'Rochester', state_code: 'NY', country_code: 'US' }, + { city_slug: 'ny-albany', city_name: 'Albany', state_code: 'NY', country_code: 'US' }, + { city_slug: 'ny-syracuse', city_name: 'Syracuse', state_code: 'NY', country_code: 'US' }, + + // Ohio + { city_slug: 'oh-columbus', city_name: 'Columbus', state_code: 'OH', country_code: 'US' }, + { city_slug: 'oh-cleveland', city_name: 'Cleveland', state_code: 'OH', country_code: 'US' }, + { city_slug: 'oh-cincinnati', city_name: 'Cincinnati', state_code: 'OH', country_code: 'US' }, + { city_slug: 'oh-toledo', city_name: 'Toledo', state_code: 'OH', country_code: 'US' }, + { city_slug: 'oh-akron', city_name: 'Akron', state_code: 'OH', country_code: 'US' }, + + // Oklahoma + { city_slug: 'ok-oklahoma-city', city_name: 'Oklahoma City', state_code: 'OK', country_code: 'US' }, + { city_slug: 'ok-tulsa', city_name: 'Tulsa', state_code: 'OK', country_code: 'US' }, + { city_slug: 'ok-norman', city_name: 'Norman', state_code: 'OK', country_code: 'US' }, + + // Oregon + { city_slug: 'or-portland', city_name: 'Portland', state_code: 'OR', country_code: 'US' }, + { city_slug: 'or-eugene', city_name: 'Eugene', state_code: 'OR', country_code: 'US' }, + { city_slug: 'or-salem', city_name: 'Salem', state_code: 'OR', country_code: 'US' }, + { city_slug: 'or-bend', city_name: 'Bend', state_code: 'OR', country_code: 'US' }, + { city_slug: 'or-medford', city_name: 'Medford', state_code: 'OR', country_code: 'US' }, + + // Pennsylvania + { city_slug: 'pa-philadelphia', city_name: 'Philadelphia', state_code: 'PA', country_code: 'US' }, + { city_slug: 'pa-pittsburgh', city_name: 'Pittsburgh', state_code: 'PA', country_code: 'US' }, + { city_slug: 'pa-allentown', city_name: 'Allentown', state_code: 'PA', country_code: 'US' }, + + // Texas (limited cannabis, but for completeness) + { city_slug: 'tx-houston', city_name: 'Houston', state_code: 'TX', country_code: 'US' }, + { city_slug: 'tx-san-antonio', city_name: 'San Antonio', state_code: 'TX', country_code: 'US' }, + { city_slug: 'tx-dallas', city_name: 'Dallas', state_code: 'TX', country_code: 'US' }, + { city_slug: 'tx-austin', city_name: 'Austin', state_code: 'TX', country_code: 'US' }, + { city_slug: 'tx-fort-worth', city_name: 'Fort Worth', state_code: 'TX', country_code: 'US' }, + { city_slug: 'tx-el-paso', city_name: 'El Paso', state_code: 'TX', country_code: 'US' }, + + // Virginia + { city_slug: 'va-virginia-beach', city_name: 'Virginia Beach', state_code: 'VA', country_code: 'US' }, + { city_slug: 'va-norfolk', city_name: 'Norfolk', state_code: 'VA', country_code: 'US' }, + { city_slug: 'va-richmond', city_name: 'Richmond', state_code: 'VA', country_code: 'US' }, + { city_slug: 'va-arlington', city_name: 'Arlington', state_code: 'VA', country_code: 'US' }, + + // Washington + { city_slug: 'wa-seattle', city_name: 'Seattle', state_code: 'WA', country_code: 'US' }, + { city_slug: 'wa-spokane', city_name: 'Spokane', state_code: 'WA', country_code: 'US' }, + { city_slug: 'wa-tacoma', city_name: 'Tacoma', state_code: 'WA', country_code: 'US' }, + { city_slug: 'wa-vancouver', city_name: 'Vancouver', state_code: 'WA', country_code: 'US' }, + { city_slug: 'wa-bellevue', city_name: 'Bellevue', state_code: 'WA', country_code: 'US' }, + + // Washington DC + { city_slug: 'dc-washington', city_name: 'Washington', state_code: 'DC', country_code: 'US' }, + + // Maryland + { city_slug: 'md-baltimore', city_name: 'Baltimore', state_code: 'MD', country_code: 'US' }, + { city_slug: 'md-rockville', city_name: 'Rockville', state_code: 'MD', country_code: 'US' }, + { city_slug: 'md-silver-spring', city_name: 'Silver Spring', state_code: 'MD', country_code: 'US' }, + + // Connecticut + { city_slug: 'ct-hartford', city_name: 'Hartford', state_code: 'CT', country_code: 'US' }, + { city_slug: 'ct-new-haven', city_name: 'New Haven', state_code: 'CT', country_code: 'US' }, + { city_slug: 'ct-stamford', city_name: 'Stamford', state_code: 'CT', country_code: 'US' }, + + // Maine + { city_slug: 'me-portland', city_name: 'Portland', state_code: 'ME', country_code: 'US' }, + { city_slug: 'me-bangor', city_name: 'Bangor', state_code: 'ME', country_code: 'US' }, + + // Missouri + { city_slug: 'mo-kansas-city', city_name: 'Kansas City', state_code: 'MO', country_code: 'US' }, + { city_slug: 'mo-st-louis', city_name: 'St. Louis', state_code: 'MO', country_code: 'US' }, + { city_slug: 'mo-springfield', city_name: 'Springfield', state_code: 'MO', country_code: 'US' }, + + // Minnesota + { city_slug: 'mn-minneapolis', city_name: 'Minneapolis', state_code: 'MN', country_code: 'US' }, + { city_slug: 'mn-st-paul', city_name: 'St. Paul', state_code: 'MN', country_code: 'US' }, + { city_slug: 'mn-duluth', city_name: 'Duluth', state_code: 'MN', country_code: 'US' }, + + // Alaska + { city_slug: 'ak-anchorage', city_name: 'Anchorage', state_code: 'AK', country_code: 'US' }, + { city_slug: 'ak-fairbanks', city_name: 'Fairbanks', state_code: 'AK', country_code: 'US' }, + { city_slug: 'ak-juneau', city_name: 'Juneau', state_code: 'AK', country_code: 'US' }, + + // Hawaii + { city_slug: 'hi-honolulu', city_name: 'Honolulu', state_code: 'HI', country_code: 'US' }, + { city_slug: 'hi-maui', city_name: 'Maui', state_code: 'HI', country_code: 'US' }, + + // Vermont + { city_slug: 'vt-burlington', city_name: 'Burlington', state_code: 'VT', country_code: 'US' }, + + // Rhode Island + { city_slug: 'ri-providence', city_name: 'Providence', state_code: 'RI', country_code: 'US' }, + + // Delaware + { city_slug: 'de-wilmington', city_name: 'Wilmington', state_code: 'DE', country_code: 'US' }, + + // Montana + { city_slug: 'mt-billings', city_name: 'Billings', state_code: 'MT', country_code: 'US' }, + { city_slug: 'mt-missoula', city_name: 'Missoula', state_code: 'MT', country_code: 'US' }, +]; + +// ============================================================================ +// Main +// ============================================================================ + +async function main() { + console.log('========================================================='); + console.log(' Seed Dutchie Discovery Cities - Bulk'); + console.log('========================================================='); + console.log(`\nDatabase: ${DB_URL.replace(/:[^:@]+@/, ':****@')}`); + console.log(`Cities to seed: ${CITIES.length}`); + + const pool = new Pool({ connectionString: DB_URL }); + + try { + // Test connection + const { rows } = await pool.query('SELECT NOW() as time'); + console.log(`Connected at: ${rows[0].time}\n`); + + let inserted = 0; + let updated = 0; + let errors = 0; + + for (const city of CITIES) { + try { + const result = await pool.query(` + INSERT INTO dutchie_discovery_cities ( + platform, + city_slug, + city_name, + state_code, + country_code, + crawl_enabled, + created_at, + updated_at + ) VALUES ( + 'dutchie', + $1, + $2, + $3, + $4, + TRUE, + NOW(), + NOW() + ) + ON CONFLICT (platform, country_code, state_code, city_slug) + DO UPDATE SET + city_name = EXCLUDED.city_name, + crawl_enabled = TRUE, + updated_at = NOW() + RETURNING (xmax = 0) AS inserted + `, [city.city_slug, city.city_name, city.state_code, city.country_code]); + + if (result.rows[0].inserted) { + inserted++; + } else { + updated++; + } + } catch (err: any) { + console.error(` Error seeding ${city.city_slug}: ${err.message}`); + errors++; + } + } + + // Get total count + const { rows: countRows } = await pool.query(` + SELECT COUNT(*) as total FROM dutchie_discovery_cities WHERE platform = 'dutchie' + `); + + console.log('========================================================='); + console.log(' SUMMARY'); + console.log('========================================================='); + console.log(` Cities in static list: ${CITIES.length}`); + console.log(` Inserted: ${inserted}`); + console.log(` Updated: ${updated}`); + console.log(` Errors: ${errors}`); + console.log(` Total in DB: ${countRows[0].total}`); + + if (errors > 0) { + console.log('\n Completed with errors'); + process.exit(1); + } + + console.log('\n Seed completed successfully'); + process.exit(0); + } catch (error: any) { + console.error('\n Seed failed:', error.message); + process.exit(1); + } finally { + await pool.end(); + } +} + +main(); diff --git a/backend/src/scripts/seed-dt-city.ts b/backend/src/scripts/seed-dt-city.ts new file mode 100644 index 00000000..8fdd4c56 --- /dev/null +++ b/backend/src/scripts/seed-dt-city.ts @@ -0,0 +1,166 @@ +#!/usr/bin/env npx tsx +/** + * Seed Dutchie City for Discovery + * + * Manually seeds a city into dutchie_discovery_cities for location discovery. + * Use this when /cities scraping is blocked (403) and you need to manually add cities. + * + * Usage: + * npm run seed:platforms:dt:city -- --city-slug=ny-hudson --city-name=Hudson --state-code=NY + * npm run seed:platforms:dt:city -- --city-slug=ma-boston --city-name=Boston --state-code=MA --country-code=US + * + * Options: + * --city-slug Required. URL slug for the city (e.g., "ny-hudson") + * --city-name Required. Display name (e.g., "Hudson") + * --state-code Required. State/province code (e.g., "NY", "CA", "ON") + * --country-code Optional. Country code (default: "US") + * + * After seeding, run location discovery: + * npm run discovery:platforms:dt:locations + */ + +import { Pool } from 'pg'; + +const DB_URL = process.env.DATABASE_URL || process.env.CANNAIQ_DB_URL || + 'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus'; + +interface Args { + citySlug?: string; + cityName?: string; + stateCode?: string; + countryCode: string; +} + +function parseArgs(): Args { + const args: Args = { countryCode: 'US' }; + + for (const arg of process.argv.slice(2)) { + const citySlugMatch = arg.match(/--city-slug=(.+)/); + if (citySlugMatch) args.citySlug = citySlugMatch[1]; + + const cityNameMatch = arg.match(/--city-name=(.+)/); + if (cityNameMatch) args.cityName = cityNameMatch[1]; + + const stateCodeMatch = arg.match(/--state-code=(.+)/); + if (stateCodeMatch) args.stateCode = stateCodeMatch[1].toUpperCase(); + + const countryCodeMatch = arg.match(/--country-code=(.+)/); + if (countryCodeMatch) args.countryCode = countryCodeMatch[1].toUpperCase(); + } + + return args; +} + +function printUsage() { + console.log(` +Usage: + npm run seed:platforms:dt:city -- --city-slug= --city-name= --state-code= + +Required arguments: + --city-slug URL slug for the city (e.g., "ny-hudson", "ma-boston") + --city-name Display name (e.g., "Hudson", "Boston") + --state-code State/province code (e.g., "NY", "CA", "ON") + +Optional arguments: + --country-code Country code (default: "US") + +Examples: + npm run seed:platforms:dt:city -- --city-slug=ny-hudson --city-name=Hudson --state-code=NY + npm run seed:platforms:dt:city -- --city-slug=ca-los-angeles --city-name="Los Angeles" --state-code=CA + npm run seed:platforms:dt:city -- --city-slug=on-toronto --city-name=Toronto --state-code=ON --country-code=CA +`); +} + +async function main() { + const args = parseArgs(); + + console.log('╔══════════════════════════════════════════════════╗'); + console.log('║ Seed Dutchie City for Discovery ║'); + console.log('╚══════════════════════════════════════════════════╝'); + + // Validate required args + if (!args.citySlug || !args.cityName || !args.stateCode) { + console.error('\n❌ Error: Missing required arguments\n'); + printUsage(); + process.exit(1); + } + + console.log(`\nCity Slug: ${args.citySlug}`); + console.log(`City Name: ${args.cityName}`); + console.log(`State Code: ${args.stateCode}`); + console.log(`Country Code: ${args.countryCode}`); + console.log(`Database: ${DB_URL.replace(/:[^:@]+@/, ':****@')}`); + + const pool = new Pool({ connectionString: DB_URL }); + + try { + // Test DB connection + const { rows: connTest } = await pool.query('SELECT NOW() as time'); + console.log(`\nConnected at: ${connTest[0].time}`); + + // Upsert the city + const { rows, rowCount } = await pool.query(` + INSERT INTO dutchie_discovery_cities ( + platform, + city_slug, + city_name, + state_code, + country_code, + crawl_enabled, + created_at, + updated_at + ) VALUES ( + 'dutchie', + $1, + $2, + $3, + $4, + TRUE, + NOW(), + NOW() + ) + ON CONFLICT (platform, country_code, state_code, city_slug) + DO UPDATE SET + city_name = EXCLUDED.city_name, + crawl_enabled = TRUE, + updated_at = NOW() + RETURNING id, city_slug, city_name, state_code, country_code, crawl_enabled, + (xmax = 0) AS was_inserted + `, [args.citySlug, args.cityName, args.stateCode, args.countryCode]); + + if (rows.length > 0) { + const row = rows[0]; + const action = row.was_inserted ? 'INSERTED' : 'UPDATED'; + console.log(`\n✅ City ${action}:`); + console.log(` ID: ${row.id}`); + console.log(` City Slug: ${row.city_slug}`); + console.log(` City Name: ${row.city_name}`); + console.log(` State Code: ${row.state_code}`); + console.log(` Country Code: ${row.country_code}`); + console.log(` Crawl Enabled: ${row.crawl_enabled}`); + } + + // Show current city count + const { rows: countRows } = await pool.query(` + SELECT + COUNT(*) as total, + COUNT(*) FILTER (WHERE crawl_enabled = TRUE) as enabled + FROM dutchie_discovery_cities + WHERE platform = 'dutchie' + `); + + console.log(`\nTotal Dutchie cities: ${countRows[0].total} (${countRows[0].enabled} enabled)`); + + console.log('\n📍 Next step: Run location discovery'); + console.log(' npm run discovery:platforms:dt:locations'); + + process.exit(0); + } catch (error: any) { + console.error('\n❌ Failed to seed city:', error.message); + process.exit(1); + } finally { + await pool.end(); + } +} + +main(); diff --git a/backend/src/scripts/system-smoke-test.ts b/backend/src/scripts/system-smoke-test.ts new file mode 100644 index 00000000..cb38ebea --- /dev/null +++ b/backend/src/scripts/system-smoke-test.ts @@ -0,0 +1,325 @@ +/** + * System Smoke Test + * + * Validates core CannaiQ system components: + * - Database connectivity + * - Required tables and row counts + * - Discovery data (via direct DB query) + * - Analytics V2 services (via direct service calls) + * - Orchestrator route (via HTTP) + * + * Usage: npm run system:smoke-test + * Exit codes: 0 = success, 1 = failure + */ + +import { Pool } from 'pg'; +import axios from 'axios'; + +// Configuration +const API_BASE = process.env.API_BASE_URL || 'http://localhost:3010'; +const DB_URL = process.env.DATABASE_URL || process.env.CANNAIQ_DB_URL || + 'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus'; + +// Test results tracking +interface TestResult { + name: string; + passed: boolean; + message: string; + details?: any; +} + +const results: TestResult[] = []; +let hasFailure = false; + +function pass(name: string, message: string, details?: any) { + results.push({ name, passed: true, message, details }); + console.log(` ✓ PASS: ${name} - ${message}`); +} + +function fail(name: string, message: string, details?: any) { + results.push({ name, passed: false, message, details }); + console.log(` ✗ FAIL: ${name} - ${message}`); + hasFailure = true; +} + +// ============================================================ +// DATABASE TESTS +// ============================================================ + +async function testDatabaseConnection(pool: Pool): Promise { + console.log('\n[1/4] DATABASE CONNECTION'); + console.log('─'.repeat(50)); + + try { + const result = await pool.query('SELECT NOW() as time, current_database() as db'); + const { time, db } = result.rows[0]; + pass('DB Connection', `Connected to ${db} at ${time}`); + return true; + } catch (error: any) { + fail('DB Connection', `Failed: ${error.message}`); + return false; + } +} + +async function testRequiredTables(pool: Pool): Promise { + console.log('\n[2/4] REQUIRED TABLES'); + console.log('─'.repeat(50)); + + const tables = [ + 'states', + 'dispensaries', + 'store_products', + 'store_product_snapshots', + 'crawl_runs', + 'dutchie_discovery_cities', + 'dutchie_discovery_locations', + ]; + + for (const table of tables) { + try { + const result = await pool.query(`SELECT COUNT(*) as count FROM ${table}`); + const count = parseInt(result.rows[0].count, 10); + pass(`Table: ${table}`, `${count.toLocaleString()} rows`); + } catch (error: any) { + if (error.code === '42P01') { + fail(`Table: ${table}`, 'Table does not exist'); + } else { + fail(`Table: ${table}`, `Query failed: ${error.message}`); + } + } + } +} + +// ============================================================ +// DISCOVERY DATA TESTS (Direct DB) +// ============================================================ + +async function testDiscoveryData(pool: Pool): Promise { + console.log('\n[3/4] DISCOVERY DATA (Direct DB Query)'); + console.log('─'.repeat(50)); + + // Test discovery summary via direct query + try { + const { rows: statusRows } = await pool.query(` + SELECT status, COUNT(*) as cnt + FROM dutchie_discovery_locations + WHERE platform = 'dutchie' AND active = TRUE + GROUP BY status + `); + + const statusCounts: Record = {}; + let totalLocations = 0; + for (const row of statusRows) { + statusCounts[row.status] = parseInt(row.cnt, 10); + totalLocations += parseInt(row.cnt, 10); + } + + pass('Discovery Summary', `${totalLocations} total locations`, { + discovered: statusCounts['discovered'] || 0, + verified: statusCounts['verified'] || 0, + merged: statusCounts['merged'] || 0, + rejected: statusCounts['rejected'] || 0, + }); + } catch (error: any) { + if (error.code === '42P01') { + fail('Discovery Summary', 'Table dutchie_discovery_locations does not exist'); + } else { + fail('Discovery Summary', `Query failed: ${error.message}`); + } + } + + // Test discovery locations query + try { + const { rows } = await pool.query(` + SELECT id, name, state_code, status + FROM dutchie_discovery_locations + WHERE platform = 'dutchie' AND active = TRUE + ORDER BY id DESC + LIMIT 1 + `); + + if (rows.length > 0) { + pass('Discovery Locations', `Found location: ${rows[0].name} (${rows[0].state_code})`); + } else { + pass('Discovery Locations', 'Query succeeded, 0 locations found'); + } + } catch (error: any) { + if (error.code === '42P01') { + fail('Discovery Locations', 'Table dutchie_discovery_locations does not exist'); + } else { + fail('Discovery Locations', `Query failed: ${error.message}`); + } + } +} + +// ============================================================ +// ANALYTICS V2 SERVICE TESTS (Direct Service Calls) +// ============================================================ + +async function testAnalyticsV2Services(pool: Pool): Promise { + console.log('\n[4/4] ANALYTICS V2 (Direct Service Calls)'); + console.log('─'.repeat(50)); + + // Test: State Legal Breakdown + try { + // Recreational states + const { rows: recRows } = await pool.query(` + SELECT code FROM states + WHERE recreational_legal = TRUE + ORDER BY code + `); + + // Medical-only states + const { rows: medRows } = await pool.query(` + SELECT code FROM states + WHERE medical_legal = TRUE + AND (recreational_legal = FALSE OR recreational_legal IS NULL) + ORDER BY code + `); + + // No-program states + const { rows: noProgramRows } = await pool.query(` + SELECT code FROM states + WHERE (recreational_legal = FALSE OR recreational_legal IS NULL) + AND (medical_legal = FALSE OR medical_legal IS NULL) + ORDER BY code + `); + + const breakdown = { + recreational: recRows.length, + medical_only: medRows.length, + no_program: noProgramRows.length, + }; + + pass('State Legal Breakdown', `rec=${breakdown.recreational}, med=${breakdown.medical_only}, none=${breakdown.no_program}`); + } catch (error: any) { + fail('State Legal Breakdown', `Query failed: ${error.message}`); + } + + // Test: Recreational States + try { + const { rows } = await pool.query(` + SELECT code FROM states + WHERE recreational_legal = TRUE + ORDER BY code + `); + const states = rows.map((r: any) => r.code); + pass('Recreational States', `${states.length} states: ${states.slice(0, 5).join(', ')}${states.length > 5 ? '...' : ''}`); + } catch (error: any) { + fail('Recreational States', `Query failed: ${error.message}`); + } + + // Test: Medical-Only States + try { + const { rows } = await pool.query(` + SELECT code FROM states + WHERE medical_legal = TRUE + AND (recreational_legal = FALSE OR recreational_legal IS NULL) + ORDER BY code + `); + const states = rows.map((r: any) => r.code); + pass('Medical-Only States', `${states.length} states: ${states.slice(0, 5).join(', ')}${states.length > 5 ? '...' : ''}`); + } catch (error: any) { + fail('Medical-Only States', `Query failed: ${error.message}`); + } + + // Test orchestrator route via HTTP (dry run) + console.log('\n[4b/4] ORCHESTRATOR ROUTE (HTTP)'); + console.log('─'.repeat(50)); + + try { + const response = await axios.post( + `${API_BASE}/api/orchestrator/platforms/dt/promote/0`, + {}, + { timeout: 10000 } + ); + // ID 0 should fail gracefully + if (response.status === 400 || response.status === 404) { + pass('Orchestrator Promote (dry)', `Route exists, returned ${response.status} for invalid ID`); + } else if (response.status === 200 && response.data.success === false) { + pass('Orchestrator Promote (dry)', 'Route exists, gracefully rejected ID 0'); + } else { + pass('Orchestrator Promote (dry)', `Route exists, status ${response.status}`); + } + } catch (error: any) { + if (error.response?.status === 400 || error.response?.status === 404) { + pass('Orchestrator Promote (dry)', `Route exists, returned ${error.response.status} for invalid ID`); + } else { + const msg = error.response?.status + ? `HTTP ${error.response.status}: ${error.response.data?.error || error.message}` + : error.message; + fail('Orchestrator Promote (dry)', msg); + } + } +} + +// ============================================================ +// MAIN +// ============================================================ + +async function main() { + console.log('╔══════════════════════════════════════════════════╗'); + console.log('║ CannaiQ System Smoke Test ║'); + console.log('╚══════════════════════════════════════════════════╝'); + console.log(`\nAPI Base: ${API_BASE}`); + console.log(`Database: ${DB_URL.replace(/:[^:@]+@/, ':****@')}`); + + const pool = new Pool({ connectionString: DB_URL }); + + try { + // 1. Database connection + const dbConnected = await testDatabaseConnection(pool); + + // 2. Required tables (only if DB connected) + if (dbConnected) { + await testRequiredTables(pool); + } else { + console.log('\n[2/4] REQUIRED TABLES - SKIPPED (no DB connection)'); + } + + // 3. Discovery data (direct DB - only if DB connected) + if (dbConnected) { + await testDiscoveryData(pool); + } else { + console.log('\n[3/4] DISCOVERY DATA - SKIPPED (no DB connection)'); + } + + // 4. Analytics V2 services (direct DB + orchestrator HTTP) + if (dbConnected) { + await testAnalyticsV2Services(pool); + } else { + console.log('\n[4/4] ANALYTICS V2 - SKIPPED (no DB connection)'); + } + + } finally { + await pool.end(); + } + + // Summary + console.log('\n' + '═'.repeat(50)); + console.log('SUMMARY'); + console.log('═'.repeat(50)); + + const passed = results.filter(r => r.passed).length; + const failed = results.filter(r => !r.passed).length; + const total = results.length; + + console.log(`\nTotal: ${total} | Passed: ${passed} | Failed: ${failed}`); + + if (hasFailure) { + console.log('\n❌ SMOKE TEST FAILED\n'); + console.log('Failed tests:'); + results.filter(r => !r.passed).forEach(r => { + console.log(` - ${r.name}: ${r.message}`); + }); + process.exit(1); + } else { + console.log('\n✅ SMOKE TEST PASSED\n'); + process.exit(0); + } +} + +main().catch((error) => { + console.error('\n❌ SMOKE TEST CRASHED:', error.message); + process.exit(1); +}); diff --git a/backend/src/services/DiscoveryGeoService.ts b/backend/src/services/DiscoveryGeoService.ts new file mode 100644 index 00000000..02ae286c --- /dev/null +++ b/backend/src/services/DiscoveryGeoService.ts @@ -0,0 +1,235 @@ +/** + * DiscoveryGeoService + * + * Service for geographic queries on discovery locations. + * Uses bounding box pre-filtering and haversine distance for accurate results. + * All calculations are done locally - no external API calls. + */ + +import { Pool } from 'pg'; +import { haversineDistance, boundingBox, isCoordinateValid } from '../utils/GeoUtils'; + +export interface NearbyLocation { + id: number; + name: string; + city: string | null; + state_code: string | null; + country_code: string | null; + latitude: number; + longitude: number; + distanceKm: number; + platform_slug: string | null; + platform_menu_url: string | null; + status: string; +} + +export interface FindNearbyOptions { + radiusKm?: number; + limit?: number; + platform?: string; + status?: string; +} + +export class DiscoveryGeoService { + constructor(private pool: Pool) {} + + /** + * Find nearby discovery locations within a given radius. + * Uses bounding box for efficient DB query, then haversine for accurate distance. + * + * @param lat Center latitude + * @param lon Center longitude + * @param options Search options (radiusKm, limit, platform, status filters) + * @returns Array of nearby locations sorted by distance + */ + async findNearbyDiscoveryLocations( + lat: number, + lon: number, + options: FindNearbyOptions = {} + ): Promise { + const { radiusKm = 50, limit = 20, platform, status } = options; + + // Validate input coordinates + if (!isCoordinateValid(lat, lon)) { + throw new Error(`Invalid coordinates: lat=${lat}, lon=${lon}`); + } + + // Calculate bounding box for initial DB filter + const bbox = boundingBox(lat, lon, radiusKm); + + // Build query with bounding box filter + let query = ` + SELECT + id, + name, + city, + state_code, + country_code, + latitude, + longitude, + platform_slug, + platform_menu_url, + status + FROM dutchie_discovery_locations + WHERE latitude IS NOT NULL + AND longitude IS NOT NULL + AND active = TRUE + AND latitude >= $1 + AND latitude <= $2 + AND longitude >= $3 + AND longitude <= $4 + `; + + const params: any[] = [bbox.minLat, bbox.maxLat, bbox.minLon, bbox.maxLon]; + let paramIndex = 5; + + // Optional platform filter + if (platform) { + query += ` AND platform = $${paramIndex}`; + params.push(platform); + paramIndex++; + } + + // Optional status filter + if (status) { + query += ` AND status = $${paramIndex}`; + params.push(status); + paramIndex++; + } + + const { rows } = await this.pool.query(query, params); + + // Calculate actual haversine distances and filter by radius + const locationsWithDistance: NearbyLocation[] = rows + .map((row) => { + const distanceKm = haversineDistance(lat, lon, row.latitude, row.longitude); + return { + id: row.id, + name: row.name, + city: row.city, + state_code: row.state_code, + country_code: row.country_code, + latitude: parseFloat(row.latitude), + longitude: parseFloat(row.longitude), + distanceKm: Math.round(distanceKm * 100) / 100, // Round to 2 decimals + platform_slug: row.platform_slug, + platform_menu_url: row.platform_menu_url, + status: row.status, + }; + }) + .filter((loc) => loc.distanceKm <= radiusKm) + .sort((a, b) => a.distanceKm - b.distanceKm) + .slice(0, limit); + + return locationsWithDistance; + } + + /** + * Find discovery locations near another discovery location. + * + * @param locationId ID of the discovery location to search around + * @param options Search options (radiusKm, limit, excludeSelf) + * @returns Array of nearby locations sorted by distance + */ + async findNearbyFromLocation( + locationId: number, + options: FindNearbyOptions & { excludeSelf?: boolean } = {} + ): Promise { + const { excludeSelf = true, ...searchOptions } = options; + + // Get the source location's coordinates + const { rows } = await this.pool.query( + `SELECT latitude, longitude FROM dutchie_discovery_locations WHERE id = $1`, + [locationId] + ); + + if (rows.length === 0) { + throw new Error(`Discovery location ${locationId} not found`); + } + + const { latitude, longitude } = rows[0]; + + if (latitude === null || longitude === null) { + throw new Error(`Discovery location ${locationId} has no coordinates`); + } + + // Find nearby locations + let results = await this.findNearbyDiscoveryLocations(latitude, longitude, searchOptions); + + // Optionally exclude the source location + if (excludeSelf) { + results = results.filter((loc) => loc.id !== locationId); + } + + return results; + } + + /** + * Get statistics about coordinate coverage in discovery locations. + * + * @returns Coverage statistics + */ + async getCoordinateCoverageStats(): Promise<{ + total: number; + withCoordinates: number; + withoutCoordinates: number; + coveragePercent: number; + byStatus: Array<{ status: string; withCoords: number; withoutCoords: number }>; + byState: Array<{ state_code: string; withCoords: number; withoutCoords: number }>; + }> { + const [totalRes, byStatusRes, byStateRes] = await Promise.all([ + this.pool.query(` + SELECT + COUNT(*) as total, + COUNT(*) FILTER (WHERE latitude IS NOT NULL AND longitude IS NOT NULL) as with_coords, + COUNT(*) FILTER (WHERE latitude IS NULL OR longitude IS NULL) as without_coords + FROM dutchie_discovery_locations + WHERE active = TRUE + `), + this.pool.query(` + SELECT + status, + COUNT(*) FILTER (WHERE latitude IS NOT NULL AND longitude IS NOT NULL) as with_coords, + COUNT(*) FILTER (WHERE latitude IS NULL OR longitude IS NULL) as without_coords + FROM dutchie_discovery_locations + WHERE active = TRUE + GROUP BY status + ORDER BY with_coords DESC + `), + this.pool.query(` + SELECT + state_code, + COUNT(*) FILTER (WHERE latitude IS NOT NULL AND longitude IS NOT NULL) as with_coords, + COUNT(*) FILTER (WHERE latitude IS NULL OR longitude IS NULL) as without_coords + FROM dutchie_discovery_locations + WHERE active = TRUE AND state_code IS NOT NULL + GROUP BY state_code + ORDER BY with_coords DESC + LIMIT 20 + `), + ]); + + const total = parseInt(totalRes.rows[0]?.total || '0', 10); + const withCoordinates = parseInt(totalRes.rows[0]?.with_coords || '0', 10); + const withoutCoordinates = parseInt(totalRes.rows[0]?.without_coords || '0', 10); + + return { + total, + withCoordinates, + withoutCoordinates, + coveragePercent: total > 0 ? Math.round((withCoordinates / total) * 10000) / 100 : 0, + byStatus: byStatusRes.rows.map((r) => ({ + status: r.status, + withCoords: parseInt(r.with_coords, 10), + withoutCoords: parseInt(r.without_coords, 10), + })), + byState: byStateRes.rows.map((r) => ({ + state_code: r.state_code, + withCoords: parseInt(r.with_coords, 10), + withoutCoords: parseInt(r.without_coords, 10), + })), + }; + } +} + +export default DiscoveryGeoService; diff --git a/backend/src/services/GeoValidationService.ts b/backend/src/services/GeoValidationService.ts new file mode 100644 index 00000000..660143c1 --- /dev/null +++ b/backend/src/services/GeoValidationService.ts @@ -0,0 +1,207 @@ +/** + * GeoValidationService + * + * Service for validating geographic data in discovery locations. + * All validation is done locally - no external API calls. + */ + +import { isCoordinateValid, isWithinUS, isWithinCanada } from '../utils/GeoUtils'; + +export interface DiscoveryLocationGeoData { + latitude: number | null; + longitude: number | null; + state_code: string | null; + country_code: string | null; +} + +export interface GeoValidationResult { + ok: boolean; + reason?: string; + warnings?: string[]; +} + +/** + * Simple state-to-region mapping for rough validation. + * This is a heuristic - not precise polygon matching. + */ +const STATE_REGION_HINTS: Record = { + // West Coast + 'WA': { latRange: [45.5, 49.0], lonRange: [-125, -116.9] }, + 'OR': { latRange: [42.0, 46.3], lonRange: [-124.6, -116.5] }, + 'CA': { latRange: [32.5, 42.0], lonRange: [-124.5, -114.1] }, + + // Southwest + 'AZ': { latRange: [31.3, 37.0], lonRange: [-115, -109] }, + 'NV': { latRange: [35.0, 42.0], lonRange: [-120, -114] }, + 'NM': { latRange: [31.3, 37.0], lonRange: [-109, -103] }, + 'UT': { latRange: [37.0, 42.0], lonRange: [-114, -109] }, + + // Mountain + 'CO': { latRange: [37.0, 41.0], lonRange: [-109, -102] }, + 'WY': { latRange: [41.0, 45.0], lonRange: [-111, -104] }, + 'MT': { latRange: [45.0, 49.0], lonRange: [-116, -104] }, + 'ID': { latRange: [42.0, 49.0], lonRange: [-117, -111] }, + + // Midwest + 'MI': { latRange: [41.7, 48.3], lonRange: [-90.5, -82.4] }, + 'IL': { latRange: [37.0, 42.5], lonRange: [-91.5, -87.0] }, + 'OH': { latRange: [38.4, 42.0], lonRange: [-84.8, -80.5] }, + 'MO': { latRange: [36.0, 40.6], lonRange: [-95.8, -89.1] }, + + // Northeast + 'NY': { latRange: [40.5, 45.0], lonRange: [-79.8, -71.9] }, + 'MA': { latRange: [41.2, 42.9], lonRange: [-73.5, -69.9] }, + 'PA': { latRange: [39.7, 42.3], lonRange: [-80.5, -74.7] }, + 'NJ': { latRange: [38.9, 41.4], lonRange: [-75.6, -73.9] }, + + // Southeast + 'FL': { latRange: [24.5, 31.0], lonRange: [-87.6, -80.0] }, + 'GA': { latRange: [30.4, 35.0], lonRange: [-85.6, -80.8] }, + 'TX': { latRange: [25.8, 36.5], lonRange: [-106.6, -93.5] }, + 'NC': { latRange: [34.0, 36.6], lonRange: [-84.3, -75.5] }, + + // Alaska & Hawaii + 'AK': { latRange: [51.0, 72.0], lonRange: [-180, -130] }, + 'HI': { latRange: [18.5, 22.5], lonRange: [-161, -154] }, + + // Canadian provinces (rough) + 'ON': { latRange: [41.7, 57.0], lonRange: [-95.2, -74.3] }, + 'BC': { latRange: [48.3, 60.0], lonRange: [-139, -114.0] }, + 'AB': { latRange: [49.0, 60.0], lonRange: [-120, -110] }, + 'QC': { latRange: [45.0, 62.6], lonRange: [-79.8, -57.1] }, +}; + +export class GeoValidationService { + /** + * Validate a discovery location's geographic data. + * + * @param location Discovery location data with lat/lng and state/country codes + * @returns Validation result with ok status and optional reason/warnings + */ + validateLocationState(location: DiscoveryLocationGeoData): GeoValidationResult { + const warnings: string[] = []; + + // Check if coordinates exist + if (location.latitude === null || location.longitude === null) { + return { + ok: true, // Not a failure - just no coordinates to validate + reason: 'No coordinates available for validation', + }; + } + + // Check basic coordinate validity + if (!isCoordinateValid(location.latitude, location.longitude)) { + return { + ok: false, + reason: `Invalid coordinates: lat=${location.latitude}, lon=${location.longitude}`, + }; + } + + const lat = location.latitude; + const lon = location.longitude; + + // Check country code consistency + if (location.country_code === 'US') { + if (!isWithinUS(lat, lon)) { + return { + ok: false, + reason: `Coordinates (${lat}, ${lon}) are outside US bounds but country_code is US`, + }; + } + } else if (location.country_code === 'CA') { + if (!isWithinCanada(lat, lon)) { + return { + ok: false, + reason: `Coordinates (${lat}, ${lon}) are outside Canada bounds but country_code is CA`, + }; + } + } + + // Check state code consistency (if we have a hint for this state) + if (location.state_code) { + const hint = STATE_REGION_HINTS[location.state_code]; + if (hint) { + const [minLat, maxLat] = hint.latRange; + const [minLon, maxLon] = hint.lonRange; + + // Allow some tolerance (coordinates might be near borders) + const tolerance = 0.5; // degrees + + if ( + lat < minLat - tolerance || + lat > maxLat + tolerance || + lon < minLon - tolerance || + lon > maxLon + tolerance + ) { + warnings.push( + `Coordinates (${lat.toFixed(4)}, ${lon.toFixed(4)}) may not match state ${location.state_code} ` + + `(expected lat: ${minLat}-${maxLat}, lon: ${minLon}-${maxLon})` + ); + } + } + } + + return { + ok: true, + warnings: warnings.length > 0 ? warnings : undefined, + }; + } + + /** + * Batch validate multiple locations. + * + * @param locations Array of discovery location data + * @returns Map of validation results keyed by index + */ + validateLocations(locations: DiscoveryLocationGeoData[]): Map { + const results = new Map(); + + locations.forEach((location, index) => { + results.set(index, this.validateLocationState(location)); + }); + + return results; + } + + /** + * Get a summary of validation results. + * + * @param results Map of validation results + * @returns Summary with counts + */ + summarizeValidation(results: Map): { + total: number; + valid: number; + invalid: number; + noCoordinates: number; + withWarnings: number; + } { + let valid = 0; + let invalid = 0; + let noCoordinates = 0; + let withWarnings = 0; + + results.forEach((result) => { + if (!result.ok) { + invalid++; + } else if (result.reason?.includes('No coordinates')) { + noCoordinates++; + } else { + valid++; + if (result.warnings && result.warnings.length > 0) { + withWarnings++; + } + } + }); + + return { + total: results.size, + valid, + invalid, + noCoordinates, + withWarnings, + }; + } +} + +export default GeoValidationService; diff --git a/backend/src/services/LegalStateService.ts b/backend/src/services/LegalStateService.ts new file mode 100644 index 00000000..b18c5b35 --- /dev/null +++ b/backend/src/services/LegalStateService.ts @@ -0,0 +1,348 @@ +/** + * LegalStateService + * + * Service for querying cannabis legalization status of US states. + * Helps identify: + * - Recreational states + * - Medical-only states + * - States with no cannabis programs + * - Legal states we haven't crawled yet (no dispensaries) + * + * Usage: + * import { LegalStateService } from './services/LegalStateService'; + * const service = new LegalStateService(pool); + * const recStates = await service.getRecreationalStates(); + */ + +import { Pool } from 'pg'; + +// ============================================================ +// TYPES +// ============================================================ + +export interface StateRecord { + id: number; + code: string; + name: string; + timezone: string | null; + recreational_legal: boolean | null; + rec_year: number | null; + medical_legal: boolean | null; + med_year: number | null; + created_at: Date; + updated_at: Date; +} + +export interface StateWithDispensaryCount extends StateRecord { + dispensary_count: number; +} + +export interface StateSummary { + code: string; + name: string; + recreational_legal: boolean; + rec_year: number | null; + medical_legal: boolean; + med_year: number | null; + dispensary_count: number; +} + +export interface TargetState { + code: string; + name: string; + legal_type: 'recreational' | 'medical_only'; + rec_year: number | null; + med_year: number | null; + dispensary_count: number; + priority_score: number; +} + +export interface LegalStatusSummary { + recreational_states: number; + medical_only_states: number; + no_program_states: number; + total_states: number; + states_with_dispensaries: number; + legal_states_without_dispensaries: number; +} + +// ============================================================ +// SERVICE CLASS +// ============================================================ + +export class LegalStateService { + constructor(private pool: Pool) {} + + /** + * Get all states with recreational cannabis legalized + */ + async getRecreationalStates(): Promise { + const { rows } = await this.pool.query(` + SELECT * + FROM states + WHERE recreational_legal = TRUE + ORDER BY rec_year ASC, name ASC + `); + return rows; + } + + /** + * Get medical-only states (medical legal, recreational not legal) + */ + async getMedicalOnlyStates(): Promise { + const { rows } = await this.pool.query(` + SELECT * + FROM states + WHERE medical_legal = TRUE + AND (recreational_legal = FALSE OR recreational_legal IS NULL) + ORDER BY med_year ASC, name ASC + `); + return rows; + } + + /** + * Get states with no cannabis programs (neither rec nor medical) + */ + async getIllegalStates(): Promise { + const { rows } = await this.pool.query(` + SELECT * + FROM states + WHERE (medical_legal = FALSE OR medical_legal IS NULL) + AND (recreational_legal = FALSE OR recreational_legal IS NULL) + ORDER BY name ASC + `); + return rows; + } + + /** + * Get all states with dispensary counts + */ + async getAllStatesWithDispensaryCounts(): Promise { + const { rows } = await this.pool.query(` + SELECT + s.*, + COALESCE(d.cnt, 0)::INTEGER AS dispensary_count + FROM states s + LEFT JOIN ( + SELECT state_id, COUNT(*) AS cnt + FROM dispensaries + WHERE state_id IS NOT NULL + GROUP BY state_id + ) d ON d.state_id = s.id + ORDER BY s.name ASC + `); + return rows; + } + + /** + * Get legal states (rec or medical) that have no dispensaries in our system + */ + async getLegalStatesWithNoDispensaries(): Promise { + const { rows } = await this.pool.query(` + SELECT + s.*, + 0 AS dispensary_count + FROM states s + LEFT JOIN dispensaries d ON d.state_id = s.id + WHERE (s.recreational_legal = TRUE OR s.medical_legal = TRUE) + AND d.id IS NULL + ORDER BY + s.recreational_legal DESC, + COALESCE(s.rec_year, s.med_year) ASC, + s.name ASC + `); + return rows; + } + + /** + * Get states we should prioritize for crawling. + * Priority is based on: + * - Recreational states with no dispensaries (highest priority) + * - Medical-only states with no dispensaries + * - States legalized longer ago (more established markets) + */ + async getTargetStates(): Promise { + const { rows } = await this.pool.query(` + WITH state_disp_counts AS ( + SELECT + state_id, + COUNT(*) AS dispensary_count + FROM dispensaries + WHERE state_id IS NOT NULL + GROUP BY state_id + ) + SELECT + s.code, + s.name, + CASE + WHEN s.recreational_legal = TRUE THEN 'recreational' + ELSE 'medical_only' + END AS legal_type, + s.rec_year, + s.med_year, + COALESCE(sdc.dispensary_count, 0)::INTEGER AS dispensary_count, + -- Priority score: higher = more important to crawl + -- Rec states score higher, older legalization scores higher, fewer dispensaries scores higher + ( + CASE WHEN s.recreational_legal = TRUE THEN 100 ELSE 50 END + + (2024 - COALESCE(s.rec_year, s.med_year, 2024)) * 2 + - LEAST(COALESCE(sdc.dispensary_count, 0), 50) + )::INTEGER AS priority_score + FROM states s + LEFT JOIN state_disp_counts sdc ON sdc.state_id = s.id + WHERE s.recreational_legal = TRUE OR s.medical_legal = TRUE + ORDER BY priority_score DESC, s.name ASC + `); + return rows; + } + + /** + * Get recreational states with no dispensaries yet + */ + async getRecreationalStatesWithNoDispensaries(): Promise { + const { rows } = await this.pool.query(` + SELECT + s.code, + s.name, + 'recreational'::TEXT AS legal_type, + s.rec_year, + s.med_year, + 0 AS dispensary_count, + (100 + (2024 - COALESCE(s.rec_year, 2024)) * 2)::INTEGER AS priority_score + FROM states s + LEFT JOIN dispensaries d ON d.state_id = s.id + WHERE s.recreational_legal = TRUE + AND d.id IS NULL + ORDER BY s.rec_year ASC, s.name ASC + `); + return rows; + } + + /** + * Get medical-only states with no dispensaries yet + */ + async getMedicalOnlyStatesWithNoDispensaries(): Promise { + const { rows } = await this.pool.query(` + SELECT + s.code, + s.name, + 'medical_only'::TEXT AS legal_type, + s.rec_year, + s.med_year, + 0 AS dispensary_count, + (50 + (2024 - COALESCE(s.med_year, 2024)) * 2)::INTEGER AS priority_score + FROM states s + LEFT JOIN dispensaries d ON d.state_id = s.id + WHERE s.medical_legal = TRUE + AND (s.recreational_legal = FALSE OR s.recreational_legal IS NULL) + AND d.id IS NULL + ORDER BY s.med_year ASC, s.name ASC + `); + return rows; + } + + /** + * Get a summary of legal status across all states + */ + async getLegalStatusSummary(): Promise { + const { rows } = await this.pool.query(` + WITH stats AS ( + SELECT + COUNT(*) FILTER (WHERE recreational_legal = TRUE) AS recreational_states, + COUNT(*) FILTER ( + WHERE medical_legal = TRUE + AND (recreational_legal = FALSE OR recreational_legal IS NULL) + ) AS medical_only_states, + COUNT(*) FILTER ( + WHERE (medical_legal = FALSE OR medical_legal IS NULL) + AND (recreational_legal = FALSE OR recreational_legal IS NULL) + ) AS no_program_states, + COUNT(*) AS total_states + FROM states + ), + disp_stats AS ( + SELECT + COUNT(DISTINCT s.id) AS states_with_dispensaries + FROM states s + INNER JOIN dispensaries d ON d.state_id = s.id + ), + legal_no_disp AS ( + SELECT COUNT(*) AS legal_states_without_dispensaries + FROM states s + LEFT JOIN dispensaries d ON d.state_id = s.id + WHERE (s.recreational_legal = TRUE OR s.medical_legal = TRUE) + AND d.id IS NULL + ) + SELECT + stats.recreational_states::INTEGER, + stats.medical_only_states::INTEGER, + stats.no_program_states::INTEGER, + stats.total_states::INTEGER, + disp_stats.states_with_dispensaries::INTEGER, + legal_no_disp.legal_states_without_dispensaries::INTEGER + FROM stats, disp_stats, legal_no_disp + `); + return rows[0]; + } + + /** + * Get state by code + */ + async getStateByCode(code: string): Promise { + const { rows } = await this.pool.query(` + SELECT + s.*, + COALESCE(d.cnt, 0)::INTEGER AS dispensary_count + FROM states s + LEFT JOIN ( + SELECT state_id, COUNT(*) AS cnt + FROM dispensaries + WHERE state_id IS NOT NULL + GROUP BY state_id + ) d ON d.state_id = s.id + WHERE s.code = $1 + `, [code.toUpperCase()]); + + return rows[0] || null; + } + + /** + * Get all states formatted as summary for API response + */ + async getStateSummaries(): Promise { + const { rows } = await this.pool.query(` + SELECT + s.code, + s.name, + COALESCE(s.recreational_legal, FALSE) AS recreational_legal, + s.rec_year, + COALESCE(s.medical_legal, FALSE) AS medical_legal, + s.med_year, + COALESCE(d.cnt, 0)::INTEGER AS dispensary_count + FROM states s + LEFT JOIN ( + SELECT state_id, COUNT(*) AS cnt + FROM dispensaries + WHERE state_id IS NOT NULL + GROUP BY state_id + ) d ON d.state_id = s.id + ORDER BY s.name ASC + `); + return rows; + } +} + +// ============================================================ +// SINGLETON FACTORY +// ============================================================ + +let serviceInstance: LegalStateService | null = null; + +export function getLegalStateService(pool: Pool): LegalStateService { + if (!serviceInstance) { + serviceInstance = new LegalStateService(pool); + } + return serviceInstance; +} + +export default LegalStateService; diff --git a/backend/src/services/analytics/BrandPenetrationService.ts b/backend/src/services/analytics/BrandPenetrationService.ts new file mode 100644 index 00000000..6a6bd90b --- /dev/null +++ b/backend/src/services/analytics/BrandPenetrationService.ts @@ -0,0 +1,406 @@ +/** + * BrandPenetrationService + * + * Analytics for brand market presence and penetration trends. + * + * Data Sources: + * - store_products: Current brand presence by dispensary + * - store_product_snapshots: Historical brand tracking + * - states: Rec/med segmentation + * + * Key Metrics: + * - Dispensary count carrying brand (by state) + * - SKU count per dispensary + * - Market share within category + * - Penetration trends over time + * - Rec vs Med footprint comparison + */ + +import { Pool } from 'pg'; +import { + TimeWindow, + DateRange, + getDateRangeFromWindow, + BrandPenetrationResult, + BrandStateBreakdown, + PenetrationDataPoint, + BrandMarketPosition, + BrandRecVsMedFootprint, +} from './types'; + +export class BrandPenetrationService { + constructor(private pool: Pool) {} + + /** + * Get brand penetration metrics + */ + async getBrandPenetration( + brandName: string, + options: { window?: TimeWindow; customRange?: DateRange } = {} + ): Promise { + const { window = '30d', customRange } = options; + const { start, end } = getDateRangeFromWindow(window, customRange); + + // Get current brand presence + const currentResult = await this.pool.query(` + SELECT + sp.brand_name, + COUNT(DISTINCT sp.dispensary_id) AS total_dispensaries, + COUNT(*) AS total_skus, + ROUND(COUNT(*)::NUMERIC / NULLIF(COUNT(DISTINCT sp.dispensary_id), 0), 2) AS avg_skus_per_dispensary, + ARRAY_AGG(DISTINCT s.code) FILTER (WHERE s.code IS NOT NULL) AS states_present + FROM store_products sp + LEFT JOIN states s ON s.id = sp.state_id + WHERE sp.brand_name = $1 + AND sp.is_in_stock = TRUE + GROUP BY sp.brand_name + `, [brandName]); + + if (currentResult.rows.length === 0) { + return null; + } + + const current = currentResult.rows[0]; + + // Get state breakdown + const stateBreakdown = await this.getBrandStateBreakdown(brandName); + + // Get penetration trend + const trendResult = await this.pool.query(` + WITH daily_presence AS ( + SELECT + DATE(sps.captured_at) AS date, + COUNT(DISTINCT sps.dispensary_id) AS dispensary_count + FROM store_product_snapshots sps + WHERE sps.brand_name = $1 + AND sps.captured_at >= $2 + AND sps.captured_at <= $3 + AND sps.is_in_stock = TRUE + GROUP BY DATE(sps.captured_at) + ORDER BY date + ) + SELECT + date, + dispensary_count, + dispensary_count - LAG(dispensary_count) OVER (ORDER BY date) AS new_dispensaries + FROM daily_presence + `, [brandName, start, end]); + + const penetrationTrend: PenetrationDataPoint[] = trendResult.rows.map((row: any) => ({ + date: row.date.toISOString().split('T')[0], + dispensary_count: parseInt(row.dispensary_count), + new_dispensaries: row.new_dispensaries ? parseInt(row.new_dispensaries) : 0, + dropped_dispensaries: row.new_dispensaries && row.new_dispensaries < 0 + ? Math.abs(parseInt(row.new_dispensaries)) + : 0, + })); + + return { + brand_name: brandName, + total_dispensaries: parseInt(current.total_dispensaries), + total_skus: parseInt(current.total_skus), + avg_skus_per_dispensary: parseFloat(current.avg_skus_per_dispensary) || 0, + states_present: current.states_present || [], + state_breakdown: stateBreakdown, + penetration_trend: penetrationTrend, + }; + } + + /** + * Get brand breakdown by state + */ + async getBrandStateBreakdown(brandName: string): Promise { + const result = await this.pool.query(` + WITH brand_state AS ( + SELECT + s.code AS state_code, + s.name AS state_name, + CASE + WHEN s.recreational_legal = TRUE THEN 'recreational' + WHEN s.medical_legal = TRUE THEN 'medical_only' + ELSE 'no_program' + END AS legal_type, + COUNT(DISTINCT sp.dispensary_id) AS dispensary_count, + COUNT(*) AS sku_count + FROM store_products sp + JOIN states s ON s.id = sp.state_id + WHERE sp.brand_name = $1 + AND sp.is_in_stock = TRUE + GROUP BY s.code, s.name, s.recreational_legal, s.medical_legal + ), + state_totals AS ( + SELECT + s.code AS state_code, + COUNT(DISTINCT sp.dispensary_id) AS total_dispensaries + FROM store_products sp + JOIN states s ON s.id = sp.state_id + WHERE sp.is_in_stock = TRUE + GROUP BY s.code + ) + SELECT + bs.*, + ROUND(bs.sku_count::NUMERIC / NULLIF(bs.dispensary_count, 0), 2) AS avg_skus_per_dispensary, + ROUND(bs.dispensary_count::NUMERIC * 100 / NULLIF(st.total_dispensaries, 0), 2) AS market_share_percent + FROM brand_state bs + LEFT JOIN state_totals st ON st.state_code = bs.state_code + ORDER BY bs.dispensary_count DESC + `, [brandName]); + + return result.rows.map((row: any) => ({ + state_code: row.state_code, + state_name: row.state_name, + legal_type: row.legal_type, + dispensary_count: parseInt(row.dispensary_count), + sku_count: parseInt(row.sku_count), + avg_skus_per_dispensary: parseFloat(row.avg_skus_per_dispensary) || 0, + market_share_percent: row.market_share_percent ? parseFloat(row.market_share_percent) : null, + })); + } + + /** + * Get brand market position within a category + */ + async getBrandMarketPosition( + brandName: string, + options: { category?: string; stateCode?: string } = {} + ): Promise { + const params: any[] = [brandName]; + let paramIdx = 2; + let filters = ''; + + if (options.category) { + filters += ` AND sp.category = $${paramIdx}`; + params.push(options.category); + paramIdx++; + } + + if (options.stateCode) { + filters += ` AND s.code = $${paramIdx}`; + params.push(options.stateCode); + paramIdx++; + } + + const result = await this.pool.query(` + WITH brand_metrics AS ( + SELECT + sp.brand_name, + sp.category, + s.code AS state_code, + COUNT(*) AS sku_count, + COUNT(DISTINCT sp.dispensary_id) AS dispensary_count, + AVG(sp.price_rec) AS avg_price + FROM store_products sp + JOIN states s ON s.id = sp.state_id + WHERE sp.brand_name = $1 + AND sp.is_in_stock = TRUE + AND sp.category IS NOT NULL + ${filters} + GROUP BY sp.brand_name, sp.category, s.code + ), + category_totals AS ( + SELECT + sp.category, + s.code AS state_code, + COUNT(*) AS total_skus, + AVG(sp.price_rec) AS category_avg_price + FROM store_products sp + JOIN states s ON s.id = sp.state_id + WHERE sp.is_in_stock = TRUE + AND sp.category IS NOT NULL + GROUP BY sp.category, s.code + ) + SELECT + bm.*, + ROUND(bm.sku_count::NUMERIC * 100 / NULLIF(ct.total_skus, 0), 2) AS category_share_percent, + ct.category_avg_price, + ROUND((bm.avg_price - ct.category_avg_price) / NULLIF(ct.category_avg_price, 0) * 100, 2) AS price_vs_category_avg + FROM brand_metrics bm + LEFT JOIN category_totals ct ON ct.category = bm.category AND ct.state_code = bm.state_code + ORDER BY bm.sku_count DESC + `, params); + + return result.rows.map((row: any) => ({ + brand_name: row.brand_name, + category: row.category, + state_code: row.state_code, + sku_count: parseInt(row.sku_count), + dispensary_count: parseInt(row.dispensary_count), + category_share_percent: row.category_share_percent ? parseFloat(row.category_share_percent) : 0, + avg_price: row.avg_price ? parseFloat(row.avg_price) : null, + price_vs_category_avg: row.price_vs_category_avg ? parseFloat(row.price_vs_category_avg) : null, + })); + } + + /** + * Get brand presence in rec vs med-only states + */ + async getBrandRecVsMedFootprint(brandName: string): Promise { + const result = await this.pool.query(` + WITH rec_presence AS ( + SELECT + COUNT(DISTINCT s.code) AS state_count, + ARRAY_AGG(DISTINCT s.code) AS states, + COUNT(DISTINCT sp.dispensary_id) AS dispensary_count, + ROUND(COUNT(*)::NUMERIC / NULLIF(COUNT(DISTINCT sp.dispensary_id), 0), 2) AS avg_skus + FROM store_products sp + JOIN states s ON s.id = sp.state_id + WHERE sp.brand_name = $1 + AND sp.is_in_stock = TRUE + AND s.recreational_legal = TRUE + ), + med_presence AS ( + SELECT + COUNT(DISTINCT s.code) AS state_count, + ARRAY_AGG(DISTINCT s.code) AS states, + COUNT(DISTINCT sp.dispensary_id) AS dispensary_count, + ROUND(COUNT(*)::NUMERIC / NULLIF(COUNT(DISTINCT sp.dispensary_id), 0), 2) AS avg_skus + FROM store_products sp + JOIN states s ON s.id = sp.state_id + WHERE sp.brand_name = $1 + AND sp.is_in_stock = TRUE + AND s.medical_legal = TRUE + AND (s.recreational_legal = FALSE OR s.recreational_legal IS NULL) + ) + SELECT + rp.state_count AS rec_states_count, + rp.states AS rec_states, + rp.dispensary_count AS rec_dispensary_count, + rp.avg_skus AS rec_avg_skus, + mp.state_count AS med_only_states_count, + mp.states AS med_only_states, + mp.dispensary_count AS med_only_dispensary_count, + mp.avg_skus AS med_only_avg_skus + FROM rec_presence rp, med_presence mp + `, [brandName]); + + const row = result.rows[0]; + + return { + brand_name: brandName, + rec_states_count: parseInt(row.rec_states_count) || 0, + rec_states: row.rec_states || [], + rec_dispensary_count: parseInt(row.rec_dispensary_count) || 0, + rec_avg_skus: parseFloat(row.rec_avg_skus) || 0, + med_only_states_count: parseInt(row.med_only_states_count) || 0, + med_only_states: row.med_only_states || [], + med_only_dispensary_count: parseInt(row.med_only_dispensary_count) || 0, + med_only_avg_skus: parseFloat(row.med_only_avg_skus) || 0, + }; + } + + /** + * Get top brands by penetration + */ + async getTopBrandsByPenetration( + options: { limit?: number; stateCode?: string; category?: string } = {} + ): Promise> { + const { limit = 25, stateCode, category } = options; + const params: any[] = [limit]; + let paramIdx = 2; + let filters = ''; + + if (stateCode) { + filters += ` AND s.code = $${paramIdx}`; + params.push(stateCode); + paramIdx++; + } + + if (category) { + filters += ` AND sp.category = $${paramIdx}`; + params.push(category); + paramIdx++; + } + + const result = await this.pool.query(` + SELECT + sp.brand_name, + COUNT(DISTINCT sp.dispensary_id) AS dispensary_count, + COUNT(*) AS sku_count, + COUNT(DISTINCT s.code) AS state_count + FROM store_products sp + LEFT JOIN states s ON s.id = sp.state_id + WHERE sp.brand_name IS NOT NULL + AND sp.is_in_stock = TRUE + ${filters} + GROUP BY sp.brand_name + ORDER BY dispensary_count DESC, sku_count DESC + LIMIT $1 + `, params); + + return result.rows.map((row: any) => ({ + brand_name: row.brand_name, + dispensary_count: parseInt(row.dispensary_count), + sku_count: parseInt(row.sku_count), + state_count: parseInt(row.state_count), + })); + } + + /** + * Get brands that have expanded/contracted in the window + */ + async getBrandExpansionContraction( + options: { window?: TimeWindow; customRange?: DateRange; limit?: number } = {} + ): Promise> { + const { window = '30d', customRange, limit = 25 } = options; + const { start, end } = getDateRangeFromWindow(window, customRange); + + const result = await this.pool.query(` + WITH start_counts AS ( + SELECT + brand_name, + COUNT(DISTINCT dispensary_id) AS dispensary_count + FROM store_product_snapshots + WHERE captured_at >= $1 AND captured_at < $1 + INTERVAL '1 day' + AND brand_name IS NOT NULL + AND is_in_stock = TRUE + GROUP BY brand_name + ), + end_counts AS ( + SELECT + brand_name, + COUNT(DISTINCT dispensary_id) AS dispensary_count + FROM store_product_snapshots + WHERE captured_at >= $2 - INTERVAL '1 day' AND captured_at <= $2 + AND brand_name IS NOT NULL + AND is_in_stock = TRUE + GROUP BY brand_name + ) + SELECT + COALESCE(sc.brand_name, ec.brand_name) AS brand_name, + COALESCE(sc.dispensary_count, 0) AS start_dispensaries, + COALESCE(ec.dispensary_count, 0) AS end_dispensaries, + COALESCE(ec.dispensary_count, 0) - COALESCE(sc.dispensary_count, 0) AS change, + ROUND( + (COALESCE(ec.dispensary_count, 0) - COALESCE(sc.dispensary_count, 0))::NUMERIC * 100 + / NULLIF(COALESCE(sc.dispensary_count, 0), 0), + 2 + ) AS change_percent + FROM start_counts sc + FULL OUTER JOIN end_counts ec ON ec.brand_name = sc.brand_name + WHERE COALESCE(ec.dispensary_count, 0) != COALESCE(sc.dispensary_count, 0) + ORDER BY ABS(COALESCE(ec.dispensary_count, 0) - COALESCE(sc.dispensary_count, 0)) DESC + LIMIT $3 + `, [start, end, limit]); + + return result.rows.map((row: any) => ({ + brand_name: row.brand_name, + start_dispensaries: parseInt(row.start_dispensaries), + end_dispensaries: parseInt(row.end_dispensaries), + change: parseInt(row.change), + change_percent: row.change_percent ? parseFloat(row.change_percent) : 0, + })); + } +} + +export default BrandPenetrationService; diff --git a/backend/src/services/analytics/CategoryAnalyticsService.ts b/backend/src/services/analytics/CategoryAnalyticsService.ts new file mode 100644 index 00000000..9132de87 --- /dev/null +++ b/backend/src/services/analytics/CategoryAnalyticsService.ts @@ -0,0 +1,433 @@ +/** + * CategoryAnalyticsService + * + * Analytics for category performance, growth trends, and comparisons. + * + * Data Sources: + * - store_products: Current category distribution + * - store_product_snapshots: Historical category tracking + * - states: Rec/med segmentation + * + * Key Metrics: + * - Category growth by state and legal status + * - Volume of SKUs per category + * - Average price per category + * - Dispensary coverage by category + * - 7d / 30d / 90d trends + */ + +import { Pool } from 'pg'; +import { + TimeWindow, + DateRange, + getDateRangeFromWindow, + CategoryGrowthResult, + CategoryGrowthDataPoint, + CategoryStateBreakdown, + CategoryRecVsMedComparison, +} from './types'; + +export class CategoryAnalyticsService { + constructor(private pool: Pool) {} + + /** + * Get category growth metrics + */ + async getCategoryGrowth( + category: string, + options: { window?: TimeWindow; customRange?: DateRange } = {} + ): Promise { + const { window = '30d', customRange } = options; + const { start, end } = getDateRangeFromWindow(window, customRange); + + // Get current category metrics + const currentResult = await this.pool.query(` + SELECT + sp.category, + COUNT(*) AS sku_count, + COUNT(DISTINCT sp.dispensary_id) AS dispensary_count, + AVG(sp.price_rec) AS avg_price + FROM store_products sp + WHERE sp.category = $1 + AND sp.is_in_stock = TRUE + GROUP BY sp.category + `, [category]); + + if (currentResult.rows.length === 0) { + return null; + } + + const current = currentResult.rows[0]; + + // Get state breakdown + const stateBreakdown = await this.getCategoryStateBreakdown(category); + + // Get growth trend + const trendResult = await this.pool.query(` + SELECT + DATE(sps.captured_at) AS date, + COUNT(*) AS sku_count, + COUNT(DISTINCT sps.dispensary_id) AS dispensary_count, + AVG(sps.price_rec) AS avg_price + FROM store_product_snapshots sps + WHERE sps.category = $1 + AND sps.captured_at >= $2 + AND sps.captured_at <= $3 + AND sps.is_in_stock = TRUE + GROUP BY DATE(sps.captured_at) + ORDER BY date ASC + `, [category, start, end]); + + const growthData: CategoryGrowthDataPoint[] = trendResult.rows.map((row: any) => ({ + date: row.date.toISOString().split('T')[0], + sku_count: parseInt(row.sku_count), + dispensary_count: parseInt(row.dispensary_count), + avg_price: row.avg_price ? parseFloat(row.avg_price) : null, + })); + + return { + category, + current_sku_count: parseInt(current.sku_count), + current_dispensary_count: parseInt(current.dispensary_count), + avg_price: current.avg_price ? parseFloat(current.avg_price) : null, + growth_data: growthData, + state_breakdown: stateBreakdown, + }; + } + + /** + * Get category breakdown by state + */ + async getCategoryStateBreakdown(category: string): Promise { + const result = await this.pool.query(` + SELECT + s.code AS state_code, + s.name AS state_name, + CASE + WHEN s.recreational_legal = TRUE THEN 'recreational' + ELSE 'medical_only' + END AS legal_type, + COUNT(*) AS sku_count, + COUNT(DISTINCT sp.dispensary_id) AS dispensary_count, + AVG(sp.price_rec) AS avg_price + FROM store_products sp + JOIN states s ON s.id = sp.state_id + WHERE sp.category = $1 + AND sp.is_in_stock = TRUE + GROUP BY s.code, s.name, s.recreational_legal + ORDER BY sku_count DESC + `, [category]); + + return result.rows.map((row: any) => ({ + state_code: row.state_code, + state_name: row.state_name, + legal_type: row.legal_type, + sku_count: parseInt(row.sku_count), + dispensary_count: parseInt(row.dispensary_count), + avg_price: row.avg_price ? parseFloat(row.avg_price) : null, + })); + } + + /** + * Get all categories with metrics + */ + async getAllCategories( + options: { stateCode?: string; limit?: number } = {} + ): Promise> { + const { stateCode, limit = 50 } = options; + const params: any[] = [limit]; + let paramIdx = 2; + let stateFilter = ''; + + if (stateCode) { + stateFilter = `AND s.code = $${paramIdx}`; + params.push(stateCode); + paramIdx++; + } + + const result = await this.pool.query(` + SELECT + sp.category, + COUNT(*) AS sku_count, + COUNT(DISTINCT sp.dispensary_id) AS dispensary_count, + COUNT(DISTINCT sp.brand_name) AS brand_count, + AVG(sp.price_rec) AS avg_price, + COUNT(DISTINCT s.code) AS state_count + FROM store_products sp + LEFT JOIN states s ON s.id = sp.state_id + WHERE sp.category IS NOT NULL + AND sp.is_in_stock = TRUE + ${stateFilter} + GROUP BY sp.category + ORDER BY sku_count DESC + LIMIT $1 + `, params); + + return result.rows.map((row: any) => ({ + category: row.category, + sku_count: parseInt(row.sku_count), + dispensary_count: parseInt(row.dispensary_count), + brand_count: parseInt(row.brand_count), + avg_price: row.avg_price ? parseFloat(row.avg_price) : null, + state_count: parseInt(row.state_count), + })); + } + + /** + * Get category comparison between rec and med-only states + */ + async getCategoryRecVsMedComparison(category?: string): Promise { + const params: any[] = []; + let categoryFilter = ''; + + if (category) { + categoryFilter = 'WHERE sp.category = $1'; + params.push(category); + } + + const result = await this.pool.query(` + WITH category_stats AS ( + SELECT + sp.category, + CASE WHEN s.recreational_legal = TRUE THEN 'recreational' ELSE 'medical_only' END AS legal_type, + COUNT(DISTINCT s.code) AS state_count, + COUNT(DISTINCT sp.dispensary_id) AS dispensary_count, + COUNT(*) AS sku_count, + AVG(sp.price_rec) AS avg_price, + PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY sp.price_rec) AS median_price + FROM store_products sp + JOIN states s ON s.id = sp.state_id + ${categoryFilter} + ${category ? 'AND' : 'WHERE'} sp.category IS NOT NULL + AND sp.is_in_stock = TRUE + AND sp.price_rec IS NOT NULL + AND (s.recreational_legal = TRUE OR s.medical_legal = TRUE) + GROUP BY sp.category, CASE WHEN s.recreational_legal = TRUE THEN 'recreational' ELSE 'medical_only' END + ), + rec_stats AS ( + SELECT * FROM category_stats WHERE legal_type = 'recreational' + ), + med_stats AS ( + SELECT * FROM category_stats WHERE legal_type = 'medical_only' + ) + SELECT + COALESCE(r.category, m.category) AS category, + r.state_count AS rec_state_count, + r.dispensary_count AS rec_dispensary_count, + r.sku_count AS rec_sku_count, + r.avg_price AS rec_avg_price, + r.median_price AS rec_median_price, + m.state_count AS med_state_count, + m.dispensary_count AS med_dispensary_count, + m.sku_count AS med_sku_count, + m.avg_price AS med_avg_price, + m.median_price AS med_median_price, + CASE + WHEN r.avg_price IS NOT NULL AND m.avg_price IS NOT NULL THEN + ROUND(((r.avg_price - m.avg_price) / NULLIF(m.avg_price, 0) * 100)::NUMERIC, 2) + ELSE NULL + END AS price_diff_percent + FROM rec_stats r + FULL OUTER JOIN med_stats m ON r.category = m.category + ORDER BY COALESCE(r.sku_count, 0) + COALESCE(m.sku_count, 0) DESC + `, params); + + return result.rows.map((row: any) => ({ + category: row.category, + recreational: { + state_count: parseInt(row.rec_state_count) || 0, + dispensary_count: parseInt(row.rec_dispensary_count) || 0, + sku_count: parseInt(row.rec_sku_count) || 0, + avg_price: row.rec_avg_price ? parseFloat(row.rec_avg_price) : null, + median_price: row.rec_median_price ? parseFloat(row.rec_median_price) : null, + }, + medical_only: { + state_count: parseInt(row.med_state_count) || 0, + dispensary_count: parseInt(row.med_dispensary_count) || 0, + sku_count: parseInt(row.med_sku_count) || 0, + avg_price: row.med_avg_price ? parseFloat(row.med_avg_price) : null, + median_price: row.med_median_price ? parseFloat(row.med_median_price) : null, + }, + price_diff_percent: row.price_diff_percent ? parseFloat(row.price_diff_percent) : null, + })); + } + + /** + * Get category growth trends over time + */ + async getCategoryGrowthTrend( + category: string, + options: { window?: TimeWindow; customRange?: DateRange } = {} + ): Promise> { + const { window = '30d', customRange } = options; + const { start, end } = getDateRangeFromWindow(window, customRange); + + const result = await this.pool.query(` + WITH daily_metrics AS ( + SELECT + DATE(sps.captured_at) AS date, + COUNT(*) AS sku_count, + COUNT(DISTINCT sps.dispensary_id) AS dispensary_count + FROM store_product_snapshots sps + WHERE sps.category = $1 + AND sps.captured_at >= $2 + AND sps.captured_at <= $3 + AND sps.is_in_stock = TRUE + GROUP BY DATE(sps.captured_at) + ORDER BY date + ) + SELECT + date, + sku_count, + sku_count - LAG(sku_count) OVER (ORDER BY date) AS sku_change, + dispensary_count, + dispensary_count - LAG(dispensary_count) OVER (ORDER BY date) AS dispensary_change + FROM daily_metrics + `, [category, start, end]); + + return result.rows.map((row: any) => ({ + date: row.date.toISOString().split('T')[0], + sku_count: parseInt(row.sku_count), + sku_change: row.sku_change ? parseInt(row.sku_change) : 0, + dispensary_count: parseInt(row.dispensary_count), + dispensary_change: row.dispensary_change ? parseInt(row.dispensary_change) : 0, + })); + } + + /** + * Get top brands within a category + */ + async getTopBrandsInCategory( + category: string, + options: { limit?: number; stateCode?: string } = {} + ): Promise> { + const { limit = 25, stateCode } = options; + const params: any[] = [category, limit]; + let paramIdx = 3; + let stateFilter = ''; + + if (stateCode) { + stateFilter = `AND s.code = $${paramIdx}`; + params.push(stateCode); + paramIdx++; + } + + const result = await this.pool.query(` + WITH category_total AS ( + SELECT COUNT(*) AS total + FROM store_products sp + LEFT JOIN states s ON s.id = sp.state_id + WHERE sp.category = $1 + AND sp.is_in_stock = TRUE + AND sp.brand_name IS NOT NULL + ${stateFilter} + ) + SELECT + sp.brand_name, + COUNT(*) AS sku_count, + COUNT(DISTINCT sp.dispensary_id) AS dispensary_count, + AVG(sp.price_rec) AS avg_price, + ROUND(COUNT(*)::NUMERIC * 100 / NULLIF((SELECT total FROM category_total), 0), 2) AS category_share_percent + FROM store_products sp + LEFT JOIN states s ON s.id = sp.state_id + WHERE sp.category = $1 + AND sp.is_in_stock = TRUE + AND sp.brand_name IS NOT NULL + ${stateFilter} + GROUP BY sp.brand_name + ORDER BY sku_count DESC + LIMIT $2 + `, params); + + return result.rows.map((row: any) => ({ + brand_name: row.brand_name, + sku_count: parseInt(row.sku_count), + dispensary_count: parseInt(row.dispensary_count), + avg_price: row.avg_price ? parseFloat(row.avg_price) : null, + category_share_percent: row.category_share_percent ? parseFloat(row.category_share_percent) : 0, + })); + } + + /** + * Get fastest growing categories + */ + async getFastestGrowingCategories( + options: { window?: TimeWindow; customRange?: DateRange; limit?: number } = {} + ): Promise> { + const { window = '30d', customRange, limit = 25 } = options; + const { start, end } = getDateRangeFromWindow(window, customRange); + + const result = await this.pool.query(` + WITH start_counts AS ( + SELECT + category, + COUNT(*) AS sku_count + FROM store_product_snapshots + WHERE captured_at >= $1 AND captured_at < $1 + INTERVAL '1 day' + AND category IS NOT NULL + AND is_in_stock = TRUE + GROUP BY category + ), + end_counts AS ( + SELECT + category, + COUNT(*) AS sku_count + FROM store_product_snapshots + WHERE captured_at >= $2 - INTERVAL '1 day' AND captured_at <= $2 + AND category IS NOT NULL + AND is_in_stock = TRUE + GROUP BY category + ) + SELECT + COALESCE(sc.category, ec.category) AS category, + COALESCE(sc.sku_count, 0) AS start_sku_count, + COALESCE(ec.sku_count, 0) AS end_sku_count, + COALESCE(ec.sku_count, 0) - COALESCE(sc.sku_count, 0) AS growth, + ROUND( + (COALESCE(ec.sku_count, 0) - COALESCE(sc.sku_count, 0))::NUMERIC * 100 + / NULLIF(COALESCE(sc.sku_count, 0), 0), + 2 + ) AS growth_percent + FROM start_counts sc + FULL OUTER JOIN end_counts ec ON ec.category = sc.category + WHERE COALESCE(ec.sku_count, 0) != COALESCE(sc.sku_count, 0) + ORDER BY growth DESC + LIMIT $3 + `, [start, end, limit]); + + return result.rows.map((row: any) => ({ + category: row.category, + start_sku_count: parseInt(row.start_sku_count), + end_sku_count: parseInt(row.end_sku_count), + growth: parseInt(row.growth), + growth_percent: row.growth_percent ? parseFloat(row.growth_percent) : 0, + })); + } +} + +export default CategoryAnalyticsService; diff --git a/backend/src/services/analytics/PriceAnalyticsService.ts b/backend/src/services/analytics/PriceAnalyticsService.ts new file mode 100644 index 00000000..93be0ace --- /dev/null +++ b/backend/src/services/analytics/PriceAnalyticsService.ts @@ -0,0 +1,392 @@ +/** + * PriceAnalyticsService + * + * Analytics for price trends, volatility, and comparisons. + * + * Data Sources: + * - store_products: Current prices and price change timestamps + * - store_product_snapshots: Historical price data points + * - states: Rec/med legal status for segmentation + * + * Key Metrics: + * - Price trends over time per product + * - Price by category and state + * - Price volatility (frequency and magnitude of changes) + * - Rec vs Med pricing comparisons + */ + +import { Pool } from 'pg'; +import { + TimeWindow, + DateRange, + getDateRangeFromWindow, + PriceTrendResult, + PriceDataPoint, + CategoryPriceStats, + PriceVolatilityResult, +} from './types'; + +export class PriceAnalyticsService { + constructor(private pool: Pool) {} + + /** + * Get price trends for a specific store product over time + */ + async getPriceTrendsForStoreProduct( + storeProductId: number, + options: { window?: TimeWindow; customRange?: DateRange } = {} + ): Promise { + const { window = '30d', customRange } = options; + const { start, end } = getDateRangeFromWindow(window, customRange); + + // Get product info + const productResult = await this.pool.query(` + SELECT + sp.id, + sp.name, + sp.brand_name, + sp.category, + sp.dispensary_id, + sp.price_rec, + sp.price_med, + d.name AS dispensary_name, + s.code AS state_code + FROM store_products sp + JOIN dispensaries d ON d.id = sp.dispensary_id + LEFT JOIN states s ON s.id = sp.state_id + WHERE sp.id = $1 + `, [storeProductId]); + + if (productResult.rows.length === 0) { + return null; + } + + const product = productResult.rows[0]; + + // Get historical snapshots + const snapshotsResult = await this.pool.query(` + SELECT + DATE(captured_at) AS date, + AVG(price_rec) AS price_rec, + AVG(price_med) AS price_med, + AVG(price_rec_special) AS price_rec_special, + AVG(price_med_special) AS price_med_special, + BOOL_OR(is_on_special) AS is_on_special + FROM store_product_snapshots + WHERE store_product_id = $1 + AND captured_at >= $2 + AND captured_at <= $3 + GROUP BY DATE(captured_at) + ORDER BY date ASC + `, [storeProductId, start, end]); + + const dataPoints: PriceDataPoint[] = snapshotsResult.rows.map((row: any) => ({ + date: row.date.toISOString().split('T')[0], + price_rec: row.price_rec ? parseFloat(row.price_rec) : null, + price_med: row.price_med ? parseFloat(row.price_med) : null, + price_rec_special: row.price_rec_special ? parseFloat(row.price_rec_special) : null, + price_med_special: row.price_med_special ? parseFloat(row.price_med_special) : null, + is_on_special: row.is_on_special || false, + })); + + // Calculate summary statistics + const prices = dataPoints + .map(dp => dp.price_rec) + .filter((p): p is number => p !== null); + + const summary = { + current_price: product.price_rec ? parseFloat(product.price_rec) : null, + min_price: prices.length > 0 ? Math.min(...prices) : null, + max_price: prices.length > 0 ? Math.max(...prices) : null, + avg_price: prices.length > 0 ? prices.reduce((a, b) => a + b, 0) / prices.length : null, + price_change_count: this.countPriceChanges(prices), + volatility_percent: this.calculateVolatility(prices), + }; + + return { + store_product_id: storeProductId, + product_name: product.name, + brand_name: product.brand_name, + category: product.category, + dispensary_id: product.dispensary_id, + dispensary_name: product.dispensary_name, + state_code: product.state_code || 'XX', + data_points: dataPoints, + summary, + }; + } + + /** + * Get price statistics by category and state + */ + async getCategoryPriceByState( + category: string, + options: { stateCode?: string } = {} + ): Promise { + const params: any[] = [category]; + let stateFilter = ''; + + if (options.stateCode) { + stateFilter = 'AND s.code = $2'; + params.push(options.stateCode); + } + + const result = await this.pool.query(` + SELECT + sp.category, + s.code AS state_code, + s.name AS state_name, + CASE + WHEN s.recreational_legal = TRUE THEN 'recreational' + ELSE 'medical_only' + END AS legal_type, + AVG(sp.price_rec) AS avg_price, + PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY sp.price_rec) AS median_price, + MIN(sp.price_rec) AS min_price, + MAX(sp.price_rec) AS max_price, + COUNT(*) AS product_count, + COUNT(DISTINCT sp.dispensary_id) AS dispensary_count + FROM store_products sp + JOIN dispensaries d ON d.id = sp.dispensary_id + JOIN states s ON s.id = sp.state_id + WHERE sp.category = $1 + AND sp.price_rec IS NOT NULL + AND sp.is_in_stock = TRUE + AND (s.recreational_legal = TRUE OR s.medical_legal = TRUE) + ${stateFilter} + GROUP BY sp.category, s.code, s.name, s.recreational_legal + ORDER BY state_code + `, params); + + return result.rows.map((row: any) => ({ + category: row.category, + state_code: row.state_code, + state_name: row.state_name, + legal_type: row.legal_type, + avg_price: parseFloat(row.avg_price), + median_price: parseFloat(row.median_price), + min_price: parseFloat(row.min_price), + max_price: parseFloat(row.max_price), + product_count: parseInt(row.product_count), + dispensary_count: parseInt(row.dispensary_count), + })); + } + + /** + * Get price statistics by brand and state + */ + async getBrandPriceByState( + brandName: string, + options: { stateCode?: string } = {} + ): Promise { + const params: any[] = [brandName]; + let stateFilter = ''; + + if (options.stateCode) { + stateFilter = 'AND s.code = $2'; + params.push(options.stateCode); + } + + const result = await this.pool.query(` + SELECT + sp.brand_name AS category, + s.code AS state_code, + s.name AS state_name, + CASE + WHEN s.recreational_legal = TRUE THEN 'recreational' + ELSE 'medical_only' + END AS legal_type, + AVG(sp.price_rec) AS avg_price, + PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY sp.price_rec) AS median_price, + MIN(sp.price_rec) AS min_price, + MAX(sp.price_rec) AS max_price, + COUNT(*) AS product_count, + COUNT(DISTINCT sp.dispensary_id) AS dispensary_count + FROM store_products sp + JOIN dispensaries d ON d.id = sp.dispensary_id + JOIN states s ON s.id = sp.state_id + WHERE sp.brand_name = $1 + AND sp.price_rec IS NOT NULL + AND sp.is_in_stock = TRUE + AND (s.recreational_legal = TRUE OR s.medical_legal = TRUE) + ${stateFilter} + GROUP BY sp.brand_name, s.code, s.name, s.recreational_legal + ORDER BY state_code + `, params); + + return result.rows.map((row: any) => ({ + category: row.category, + state_code: row.state_code, + state_name: row.state_name, + legal_type: row.legal_type, + avg_price: parseFloat(row.avg_price), + median_price: parseFloat(row.median_price), + min_price: parseFloat(row.min_price), + max_price: parseFloat(row.max_price), + product_count: parseInt(row.product_count), + dispensary_count: parseInt(row.dispensary_count), + })); + } + + /** + * Get most volatile products (frequent price changes) + */ + async getMostVolatileProducts( + options: { + window?: TimeWindow; + customRange?: DateRange; + limit?: number; + stateCode?: string; + category?: string; + } = {} + ): Promise { + const { window = '30d', customRange, limit = 50, stateCode, category } = options; + const { start, end } = getDateRangeFromWindow(window, customRange); + + const params: any[] = [start, end, limit]; + let paramIdx = 4; + let filters = ''; + + if (stateCode) { + filters += ` AND s.code = $${paramIdx}`; + params.push(stateCode); + paramIdx++; + } + + if (category) { + filters += ` AND sp.category = $${paramIdx}`; + params.push(category); + paramIdx++; + } + + const result = await this.pool.query(` + WITH price_changes AS ( + SELECT + sps.store_product_id, + sps.price_rec, + LAG(sps.price_rec) OVER ( + PARTITION BY sps.store_product_id ORDER BY sps.captured_at + ) AS prev_price, + sps.captured_at + FROM store_product_snapshots sps + WHERE sps.captured_at >= $1 + AND sps.captured_at <= $2 + AND sps.price_rec IS NOT NULL + ), + volatility AS ( + SELECT + store_product_id, + COUNT(*) FILTER (WHERE price_rec != prev_price) AS change_count, + AVG(ABS((price_rec - prev_price) / NULLIF(prev_price, 0) * 100)) + FILTER (WHERE prev_price IS NOT NULL AND prev_price != 0) AS avg_change_pct, + MAX(ABS((price_rec - prev_price) / NULLIF(prev_price, 0) * 100)) + FILTER (WHERE prev_price IS NOT NULL AND prev_price != 0) AS max_change_pct, + MAX(captured_at) FILTER (WHERE price_rec != prev_price) AS last_change_at + FROM price_changes + GROUP BY store_product_id + HAVING COUNT(*) FILTER (WHERE price_rec != prev_price) > 0 + ) + SELECT + v.store_product_id, + sp.name AS product_name, + sp.brand_name, + v.change_count, + v.avg_change_pct, + v.max_change_pct, + v.last_change_at + FROM volatility v + JOIN store_products sp ON sp.id = v.store_product_id + LEFT JOIN states s ON s.id = sp.state_id + WHERE 1=1 ${filters} + ORDER BY v.change_count DESC, v.avg_change_pct DESC + LIMIT $3 + `, params); + + return result.rows.map((row: any) => ({ + store_product_id: row.store_product_id, + product_name: row.product_name, + brand_name: row.brand_name, + change_count: parseInt(row.change_count), + avg_change_percent: row.avg_change_pct ? parseFloat(row.avg_change_pct) : 0, + max_change_percent: row.max_change_pct ? parseFloat(row.max_change_pct) : 0, + last_change_at: row.last_change_at ? row.last_change_at.toISOString() : null, + })); + } + + /** + * Get average prices by category (rec vs med states) + */ + async getCategoryRecVsMedPrices(category?: string): Promise<{ + category: string; + rec_avg: number | null; + rec_median: number | null; + med_avg: number | null; + med_median: number | null; + }[]> { + const params: any[] = []; + let categoryFilter = ''; + + if (category) { + categoryFilter = 'WHERE sp.category = $1'; + params.push(category); + } + + const result = await this.pool.query(` + SELECT + sp.category, + AVG(sp.price_rec) FILTER (WHERE s.recreational_legal = TRUE) AS rec_avg, + PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY sp.price_rec) + FILTER (WHERE s.recreational_legal = TRUE) AS rec_median, + AVG(sp.price_rec) FILTER ( + WHERE s.medical_legal = TRUE AND (s.recreational_legal = FALSE OR s.recreational_legal IS NULL) + ) AS med_avg, + PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY sp.price_rec) + FILTER (WHERE s.medical_legal = TRUE AND (s.recreational_legal = FALSE OR s.recreational_legal IS NULL)) AS med_median + FROM store_products sp + JOIN states s ON s.id = sp.state_id + ${categoryFilter} + ${category ? 'AND' : 'WHERE'} sp.price_rec IS NOT NULL + AND sp.is_in_stock = TRUE + AND sp.category IS NOT NULL + GROUP BY sp.category + ORDER BY sp.category + `, params); + + return result.rows.map((row: any) => ({ + category: row.category, + rec_avg: row.rec_avg ? parseFloat(row.rec_avg) : null, + rec_median: row.rec_median ? parseFloat(row.rec_median) : null, + med_avg: row.med_avg ? parseFloat(row.med_avg) : null, + med_median: row.med_median ? parseFloat(row.med_median) : null, + })); + } + + // ============================================================ + // HELPER METHODS + // ============================================================ + + private countPriceChanges(prices: number[]): number { + let changes = 0; + for (let i = 1; i < prices.length; i++) { + if (prices[i] !== prices[i - 1]) { + changes++; + } + } + return changes; + } + + private calculateVolatility(prices: number[]): number | null { + if (prices.length < 2) return null; + + const mean = prices.reduce((a, b) => a + b, 0) / prices.length; + if (mean === 0) return null; + + const variance = prices.reduce((sum, p) => sum + Math.pow(p - mean, 2), 0) / prices.length; + const stdDev = Math.sqrt(variance); + + // Coefficient of variation as percentage + return (stdDev / mean) * 100; + } +} + +export default PriceAnalyticsService; diff --git a/backend/src/services/analytics/StateAnalyticsService.ts b/backend/src/services/analytics/StateAnalyticsService.ts new file mode 100644 index 00000000..155002df --- /dev/null +++ b/backend/src/services/analytics/StateAnalyticsService.ts @@ -0,0 +1,532 @@ +/** + * StateAnalyticsService + * + * Analytics for state-level market data and comparisons. + * + * Data Sources: + * - states: Legal status, year of legalization + * - dispensaries: Store counts by state + * - store_products: Product/brand coverage by state + * - store_product_snapshots: Historical data depth + * + * Key Metrics: + * - Legal state breakdown (rec, med-only, illegal) + * - Coverage by state (dispensaries, products, brands) + * - Rec vs Med price comparisons + * - Data freshness per state + */ + +import { Pool } from 'pg'; +import { + StateMarketSummary, + LegalStateBreakdown, + RecVsMedPriceComparison, + LegalType, + getLegalTypeFilter, +} from './types'; + +export class StateAnalyticsService { + constructor(private pool: Pool) {} + + // ============================================================ + // HELPER METHODS FOR LEGAL TYPE FILTERING + // ============================================================ + + /** + * Get recreational-only state codes + */ + async getRecreationalStates(): Promise { + const result = await this.pool.query(` + SELECT code FROM states WHERE recreational_legal = TRUE ORDER BY code + `); + return result.rows.map((r: any) => r.code); + } + + /** + * Get medical-only state codes (not recreational) + */ + async getMedicalOnlyStates(): Promise { + const result = await this.pool.query(` + SELECT code FROM states + WHERE medical_legal = TRUE + AND (recreational_legal = FALSE OR recreational_legal IS NULL) + ORDER BY code + `); + return result.rows.map((r: any) => r.code); + } + + /** + * Get no-program state codes + */ + async getNoProgramStates(): Promise { + const result = await this.pool.query(` + SELECT code FROM states + WHERE (recreational_legal = FALSE OR recreational_legal IS NULL) + AND (medical_legal = FALSE OR medical_legal IS NULL) + ORDER BY code + `); + return result.rows.map((r: any) => r.code); + } + + /** + * Get state IDs by legal type for use in subqueries + */ + async getStateIdsByLegalType(legalType: LegalType): Promise { + const filter = getLegalTypeFilter(legalType); + const result = await this.pool.query(` + SELECT s.id FROM states s WHERE ${filter} ORDER BY s.id + `); + return result.rows.map((r: any) => r.id); + } + + /** + * Get market summary for a specific state + */ + async getStateMarketSummary(stateCode: string): Promise { + // Get state info + const stateResult = await this.pool.query(` + SELECT + s.id, + s.code, + s.name, + s.recreational_legal, + s.rec_year, + s.medical_legal, + s.med_year + FROM states s + WHERE s.code = $1 + `, [stateCode]); + + if (stateResult.rows.length === 0) { + return null; + } + + const state = stateResult.rows[0]; + + // Get coverage metrics + const coverageResult = await this.pool.query(` + SELECT + COUNT(DISTINCT d.id) AS dispensary_count, + COUNT(DISTINCT sp.id) AS product_count, + COUNT(DISTINCT sp.brand_name) FILTER (WHERE sp.brand_name IS NOT NULL) AS brand_count, + COUNT(DISTINCT sp.category) FILTER (WHERE sp.category IS NOT NULL) AS category_count, + COUNT(sps.id) AS snapshot_count, + MAX(sps.captured_at) AS last_crawl_at + FROM states s + LEFT JOIN dispensaries d ON d.state_id = s.id + LEFT JOIN store_products sp ON sp.state_id = s.id AND sp.is_in_stock = TRUE + LEFT JOIN store_product_snapshots sps ON sps.state_id = s.id + WHERE s.code = $1 + `, [stateCode]); + + const coverage = coverageResult.rows[0]; + + // Get pricing metrics + const pricingResult = await this.pool.query(` + SELECT + AVG(price_rec) AS avg_price, + PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY price_rec) AS median_price, + MIN(price_rec) AS min_price, + MAX(price_rec) AS max_price + FROM store_products sp + JOIN states s ON s.id = sp.state_id + WHERE s.code = $1 + AND sp.price_rec IS NOT NULL + AND sp.is_in_stock = TRUE + `, [stateCode]); + + const pricing = pricingResult.rows[0]; + + // Get top categories + const topCategoriesResult = await this.pool.query(` + SELECT + sp.category, + COUNT(*) AS count + FROM store_products sp + JOIN states s ON s.id = sp.state_id + WHERE s.code = $1 + AND sp.category IS NOT NULL + AND sp.is_in_stock = TRUE + GROUP BY sp.category + ORDER BY count DESC + LIMIT 10 + `, [stateCode]); + + // Get top brands + const topBrandsResult = await this.pool.query(` + SELECT + sp.brand_name AS brand, + COUNT(*) AS count + FROM store_products sp + JOIN states s ON s.id = sp.state_id + WHERE s.code = $1 + AND sp.brand_name IS NOT NULL + AND sp.is_in_stock = TRUE + GROUP BY sp.brand_name + ORDER BY count DESC + LIMIT 10 + `, [stateCode]); + + return { + state_code: state.code, + state_name: state.name, + legal_status: { + recreational_legal: state.recreational_legal || false, + rec_year: state.rec_year, + medical_legal: state.medical_legal || false, + med_year: state.med_year, + }, + coverage: { + dispensary_count: parseInt(coverage.dispensary_count) || 0, + product_count: parseInt(coverage.product_count) || 0, + brand_count: parseInt(coverage.brand_count) || 0, + category_count: parseInt(coverage.category_count) || 0, + snapshot_count: parseInt(coverage.snapshot_count) || 0, + last_crawl_at: coverage.last_crawl_at ? coverage.last_crawl_at.toISOString() : null, + }, + pricing: { + avg_price: pricing.avg_price ? parseFloat(pricing.avg_price) : null, + median_price: pricing.median_price ? parseFloat(pricing.median_price) : null, + min_price: pricing.min_price ? parseFloat(pricing.min_price) : null, + max_price: pricing.max_price ? parseFloat(pricing.max_price) : null, + }, + top_categories: topCategoriesResult.rows.map((row: any) => ({ + category: row.category, + count: parseInt(row.count), + })), + top_brands: topBrandsResult.rows.map((row: any) => ({ + brand: row.brand, + count: parseInt(row.count), + })), + }; + } + + /** + * Get breakdown by legal status (rec, med-only, no program) + */ + async getLegalStateBreakdown(): Promise { + // Get recreational states + const recResult = await this.pool.query(` + SELECT + s.code, + s.name, + COUNT(DISTINCT d.id) AS dispensary_count, + COUNT(DISTINCT sp.id) AS product_count, + COUNT(sps.id) AS snapshot_count + FROM states s + LEFT JOIN dispensaries d ON d.state_id = s.id + LEFT JOIN store_products sp ON sp.state_id = s.id AND sp.is_in_stock = TRUE + LEFT JOIN store_product_snapshots sps ON sps.state_id = s.id + WHERE s.recreational_legal = TRUE + GROUP BY s.code, s.name + ORDER BY dispensary_count DESC + `); + + // Get medical-only states + const medResult = await this.pool.query(` + SELECT + s.code, + s.name, + COUNT(DISTINCT d.id) AS dispensary_count, + COUNT(DISTINCT sp.id) AS product_count, + COUNT(sps.id) AS snapshot_count + FROM states s + LEFT JOIN dispensaries d ON d.state_id = s.id + LEFT JOIN store_products sp ON sp.state_id = s.id AND sp.is_in_stock = TRUE + LEFT JOIN store_product_snapshots sps ON sps.state_id = s.id + WHERE s.medical_legal = TRUE + AND (s.recreational_legal = FALSE OR s.recreational_legal IS NULL) + GROUP BY s.code, s.name + ORDER BY dispensary_count DESC + `); + + // Get no-program states + const noProgResult = await this.pool.query(` + SELECT s.code, s.name + FROM states s + WHERE (s.recreational_legal = FALSE OR s.recreational_legal IS NULL) + AND (s.medical_legal = FALSE OR s.medical_legal IS NULL) + ORDER BY s.name + `); + + const recStates = recResult.rows; + const medStates = medResult.rows; + const noProgStates = noProgResult.rows; + + return { + recreational_states: { + count: recStates.length, + dispensary_count: recStates.reduce((sum, s) => sum + parseInt(s.dispensary_count), 0), + product_count: recStates.reduce((sum, s) => sum + parseInt(s.product_count), 0), + snapshot_count: recStates.reduce((sum, s) => sum + parseInt(s.snapshot_count), 0), + states: recStates.map((row: any) => ({ + code: row.code, + name: row.name, + dispensary_count: parseInt(row.dispensary_count), + })), + }, + medical_only_states: { + count: medStates.length, + dispensary_count: medStates.reduce((sum, s) => sum + parseInt(s.dispensary_count), 0), + product_count: medStates.reduce((sum, s) => sum + parseInt(s.product_count), 0), + snapshot_count: medStates.reduce((sum, s) => sum + parseInt(s.snapshot_count), 0), + states: medStates.map((row: any) => ({ + code: row.code, + name: row.name, + dispensary_count: parseInt(row.dispensary_count), + })), + }, + no_program_states: { + count: noProgStates.length, + states: noProgStates.map((row: any) => ({ + code: row.code, + name: row.name, + })), + }, + }; + } + + /** + * Get rec vs med price comparison (overall or by category) + */ + async getRecVsMedPriceComparison(category?: string): Promise { + const params: any[] = []; + let categoryFilter = ''; + let groupBy = 'NULL'; + + if (category) { + categoryFilter = 'AND sp.category = $1'; + params.push(category); + groupBy = 'sp.category'; + } else { + groupBy = 'sp.category'; + } + + const result = await this.pool.query(` + WITH rec_prices AS ( + SELECT + ${category ? 'sp.category' : 'sp.category'}, + COUNT(DISTINCT s.code) AS state_count, + COUNT(*) AS product_count, + AVG(sp.price_rec) AS avg_price, + PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY sp.price_rec) AS median_price + FROM store_products sp + JOIN states s ON s.id = sp.state_id + WHERE s.recreational_legal = TRUE + AND sp.price_rec IS NOT NULL + AND sp.is_in_stock = TRUE + AND sp.category IS NOT NULL + ${categoryFilter} + GROUP BY sp.category + ), + med_prices AS ( + SELECT + ${category ? 'sp.category' : 'sp.category'}, + COUNT(DISTINCT s.code) AS state_count, + COUNT(*) AS product_count, + AVG(sp.price_rec) AS avg_price, + PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY sp.price_rec) AS median_price + FROM store_products sp + JOIN states s ON s.id = sp.state_id + WHERE s.medical_legal = TRUE + AND (s.recreational_legal = FALSE OR s.recreational_legal IS NULL) + AND sp.price_rec IS NOT NULL + AND sp.is_in_stock = TRUE + AND sp.category IS NOT NULL + ${categoryFilter} + GROUP BY sp.category + ) + SELECT + COALESCE(r.category, m.category) AS category, + r.state_count AS rec_state_count, + r.product_count AS rec_product_count, + r.avg_price AS rec_avg_price, + r.median_price AS rec_median_price, + m.state_count AS med_state_count, + m.product_count AS med_product_count, + m.avg_price AS med_avg_price, + m.median_price AS med_median_price, + CASE + WHEN r.avg_price IS NOT NULL AND m.avg_price IS NOT NULL THEN + ROUND(((r.avg_price - m.avg_price) / NULLIF(m.avg_price, 0) * 100)::NUMERIC, 2) + ELSE NULL + END AS price_diff_percent + FROM rec_prices r + FULL OUTER JOIN med_prices m ON r.category = m.category + ORDER BY COALESCE(r.product_count, 0) + COALESCE(m.product_count, 0) DESC + `, params); + + return result.rows.map((row: any) => ({ + category: row.category, + recreational: { + state_count: parseInt(row.rec_state_count) || 0, + product_count: parseInt(row.rec_product_count) || 0, + avg_price: row.rec_avg_price ? parseFloat(row.rec_avg_price) : null, + median_price: row.rec_median_price ? parseFloat(row.rec_median_price) : null, + }, + medical_only: { + state_count: parseInt(row.med_state_count) || 0, + product_count: parseInt(row.med_product_count) || 0, + avg_price: row.med_avg_price ? parseFloat(row.med_avg_price) : null, + median_price: row.med_median_price ? parseFloat(row.med_median_price) : null, + }, + price_diff_percent: row.price_diff_percent ? parseFloat(row.price_diff_percent) : null, + })); + } + + /** + * Get all states with coverage metrics + */ + async getAllStatesWithCoverage(): Promise> { + const result = await this.pool.query(` + SELECT + s.code AS state_code, + s.name AS state_name, + COALESCE(s.recreational_legal, FALSE) AS recreational_legal, + COALESCE(s.medical_legal, FALSE) AS medical_legal, + COUNT(DISTINCT d.id) AS dispensary_count, + COUNT(DISTINCT sp.id) AS product_count, + COUNT(DISTINCT sp.brand_name) FILTER (WHERE sp.brand_name IS NOT NULL) AS brand_count, + MAX(sps.captured_at) AS last_crawl_at + FROM states s + LEFT JOIN dispensaries d ON d.state_id = s.id + LEFT JOIN store_products sp ON sp.state_id = s.id AND sp.is_in_stock = TRUE + LEFT JOIN store_product_snapshots sps ON sps.state_id = s.id + GROUP BY s.code, s.name, s.recreational_legal, s.medical_legal + ORDER BY dispensary_count DESC, s.name + `); + + return result.rows.map((row: any) => ({ + state_code: row.state_code, + state_name: row.state_name, + recreational_legal: row.recreational_legal, + medical_legal: row.medical_legal, + dispensary_count: parseInt(row.dispensary_count) || 0, + product_count: parseInt(row.product_count) || 0, + brand_count: parseInt(row.brand_count) || 0, + last_crawl_at: row.last_crawl_at ? row.last_crawl_at.toISOString() : null, + })); + } + + /** + * Get state coverage gaps (legal states with low/no coverage) + */ + async getStateCoverageGaps(): Promise> { + const result = await this.pool.query(` + SELECT + s.code AS state_code, + s.name AS state_name, + CASE + WHEN s.recreational_legal = TRUE THEN 'recreational' + ELSE 'medical_only' + END AS legal_type, + COUNT(DISTINCT d.id) AS dispensary_count, + CASE + WHEN COUNT(DISTINCT d.id) = 0 THEN TRUE + WHEN COUNT(DISTINCT sp.id) = 0 THEN TRUE + WHEN MAX(sps.captured_at) < NOW() - INTERVAL '7 days' THEN TRUE + ELSE FALSE + END AS has_gap, + CASE + WHEN COUNT(DISTINCT d.id) = 0 THEN 'No dispensaries' + WHEN COUNT(DISTINCT sp.id) = 0 THEN 'No products' + WHEN MAX(sps.captured_at) < NOW() - INTERVAL '7 days' THEN 'Stale data (>7 days)' + ELSE 'Good coverage' + END AS gap_reason + FROM states s + LEFT JOIN dispensaries d ON d.state_id = s.id + LEFT JOIN store_products sp ON sp.state_id = s.id AND sp.is_in_stock = TRUE + LEFT JOIN store_product_snapshots sps ON sps.state_id = s.id + WHERE s.recreational_legal = TRUE OR s.medical_legal = TRUE + GROUP BY s.code, s.name, s.recreational_legal, s.medical_legal + HAVING COUNT(DISTINCT d.id) = 0 + OR COUNT(DISTINCT sp.id) = 0 + OR MAX(sps.captured_at) IS NULL + OR MAX(sps.captured_at) < NOW() - INTERVAL '7 days' + ORDER BY + CASE WHEN s.recreational_legal = TRUE THEN 0 ELSE 1 END, + dispensary_count DESC + `); + + return result.rows.map((row: any) => ({ + state_code: row.state_code, + state_name: row.state_name, + legal_type: row.legal_type, + dispensary_count: parseInt(row.dispensary_count) || 0, + has_gap: row.has_gap, + gap_reason: row.gap_reason, + })); + } + + /** + * Get pricing comparison across all states + */ + async getStatePricingComparison(): Promise> { + const result = await this.pool.query(` + WITH state_prices AS ( + SELECT + s.code AS state_code, + s.name AS state_name, + CASE + WHEN s.recreational_legal = TRUE THEN 'recreational' + ELSE 'medical_only' + END AS legal_type, + AVG(sp.price_rec) AS avg_price, + PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY sp.price_rec) AS median_price, + COUNT(*) AS product_count + FROM states s + JOIN store_products sp ON sp.state_id = s.id + WHERE sp.price_rec IS NOT NULL + AND sp.is_in_stock = TRUE + AND (s.recreational_legal = TRUE OR s.medical_legal = TRUE) + GROUP BY s.code, s.name, s.recreational_legal + ), + national_avg AS ( + SELECT AVG(price_rec) AS avg + FROM store_products + WHERE price_rec IS NOT NULL AND is_in_stock = TRUE + ) + SELECT + sp.*, + ROUND(((sp.avg_price - na.avg) / NULLIF(na.avg, 0) * 100)::NUMERIC, 2) AS vs_national_avg_percent + FROM state_prices sp, national_avg na + ORDER BY sp.avg_price DESC NULLS LAST + `); + + return result.rows.map((row: any) => ({ + state_code: row.state_code, + state_name: row.state_name, + legal_type: row.legal_type, + avg_price: row.avg_price ? parseFloat(row.avg_price) : null, + median_price: row.median_price ? parseFloat(row.median_price) : null, + product_count: parseInt(row.product_count) || 0, + vs_national_avg_percent: row.vs_national_avg_percent ? parseFloat(row.vs_national_avg_percent) : null, + })); + } +} + +export default StateAnalyticsService; diff --git a/backend/src/services/analytics/StoreAnalyticsService.ts b/backend/src/services/analytics/StoreAnalyticsService.ts new file mode 100644 index 00000000..d7b5feb1 --- /dev/null +++ b/backend/src/services/analytics/StoreAnalyticsService.ts @@ -0,0 +1,515 @@ +/** + * StoreAnalyticsService + * + * Analytics for individual store/dispensary performance and changes. + * + * Data Sources: + * - store_products: Current product catalog per dispensary + * - store_product_snapshots: Historical product data + * - dispensaries: Store metadata + * - states: Rec/med segmentation + * + * Key Metrics: + * - Products added/dropped over time window + * - Brands added/dropped + * - Price changes count and magnitude + * - Stock in/out events + * - Store inventory composition + */ + +import { Pool } from 'pg'; +import { + TimeWindow, + DateRange, + getDateRangeFromWindow, + StoreChangeSummary, + ProductChangeEvent, +} from './types'; + +export class StoreAnalyticsService { + constructor(private pool: Pool) {} + + /** + * Get change summary for a dispensary over a time window + */ + async getStoreChangeSummary( + dispensaryId: number, + options: { window?: TimeWindow; customRange?: DateRange } = {} + ): Promise { + const { window = '30d', customRange } = options; + const { start, end } = getDateRangeFromWindow(window, customRange); + + // Get dispensary info + const dispResult = await this.pool.query(` + SELECT + d.id, + d.name, + s.code AS state_code + FROM dispensaries d + LEFT JOIN states s ON s.id = d.state_id + WHERE d.id = $1 + `, [dispensaryId]); + + if (dispResult.rows.length === 0) { + return null; + } + + const dispensary = dispResult.rows[0]; + + // Get current counts + const currentResult = await this.pool.query(` + SELECT + COUNT(*) AS product_count, + COUNT(*) FILTER (WHERE is_in_stock = TRUE) AS in_stock_count + FROM store_products + WHERE dispensary_id = $1 + `, [dispensaryId]); + + const current = currentResult.rows[0]; + + // Get products added (first_seen_at in window) + const addedResult = await this.pool.query(` + SELECT COUNT(*) AS count + FROM store_products + WHERE dispensary_id = $1 + AND first_seen_at >= $2 + AND first_seen_at <= $3 + `, [dispensaryId, start, end]); + + // Get products dropped (last_seen_at in window but not in current inventory) + const droppedResult = await this.pool.query(` + SELECT COUNT(*) AS count + FROM store_products + WHERE dispensary_id = $1 + AND last_seen_at >= $2 + AND last_seen_at <= $3 + AND is_in_stock = FALSE + `, [dispensaryId, start, end]); + + // Get brands added/dropped + const brandsResult = await this.pool.query(` + WITH start_brands AS ( + SELECT DISTINCT brand_name + FROM store_product_snapshots + WHERE dispensary_id = $1 + AND captured_at >= $2 AND captured_at < $2 + INTERVAL '1 day' + AND brand_name IS NOT NULL + ), + end_brands AS ( + SELECT DISTINCT brand_name + FROM store_product_snapshots + WHERE dispensary_id = $1 + AND captured_at >= $3 - INTERVAL '1 day' AND captured_at <= $3 + AND brand_name IS NOT NULL + ) + SELECT + ARRAY(SELECT brand_name FROM end_brands EXCEPT SELECT brand_name FROM start_brands) AS added, + ARRAY(SELECT brand_name FROM start_brands EXCEPT SELECT brand_name FROM end_brands) AS dropped + `, [dispensaryId, start, end]); + + const brands = brandsResult.rows[0] || { added: [], dropped: [] }; + + // Get price changes + const priceChangeResult = await this.pool.query(` + WITH price_changes AS ( + SELECT + store_product_id, + price_rec, + LAG(price_rec) OVER (PARTITION BY store_product_id ORDER BY captured_at) AS prev_price + FROM store_product_snapshots + WHERE dispensary_id = $1 + AND captured_at >= $2 + AND captured_at <= $3 + AND price_rec IS NOT NULL + ) + SELECT + COUNT(*) FILTER (WHERE price_rec != prev_price AND prev_price IS NOT NULL) AS change_count, + AVG(ABS((price_rec - prev_price) / NULLIF(prev_price, 0) * 100)) + FILTER (WHERE price_rec != prev_price AND prev_price IS NOT NULL AND prev_price != 0) AS avg_change_pct + FROM price_changes + `, [dispensaryId, start, end]); + + const priceChanges = priceChangeResult.rows[0]; + + // Get stock events + const stockEventsResult = await this.pool.query(` + WITH stock_changes AS ( + SELECT + store_product_id, + is_in_stock, + LAG(is_in_stock) OVER (PARTITION BY store_product_id ORDER BY captured_at) AS prev_stock + FROM store_product_snapshots + WHERE dispensary_id = $1 + AND captured_at >= $2 + AND captured_at <= $3 + ) + SELECT + COUNT(*) FILTER (WHERE is_in_stock = TRUE AND prev_stock = FALSE) AS stock_in, + COUNT(*) FILTER (WHERE is_in_stock = FALSE AND prev_stock = TRUE) AS stock_out + FROM stock_changes + `, [dispensaryId, start, end]); + + const stockEvents = stockEventsResult.rows[0]; + + return { + dispensary_id: dispensaryId, + dispensary_name: dispensary.name, + state_code: dispensary.state_code || 'XX', + window: window, + products_added: parseInt(addedResult.rows[0]?.count) || 0, + products_dropped: parseInt(droppedResult.rows[0]?.count) || 0, + brands_added: brands.added || [], + brands_dropped: brands.dropped || [], + price_changes: parseInt(priceChanges?.change_count) || 0, + avg_price_change_percent: priceChanges?.avg_change_pct ? parseFloat(priceChanges.avg_change_pct) : null, + stock_in_events: parseInt(stockEvents?.stock_in) || 0, + stock_out_events: parseInt(stockEvents?.stock_out) || 0, + current_product_count: parseInt(current.product_count) || 0, + current_in_stock_count: parseInt(current.in_stock_count) || 0, + }; + } + + /** + * Get recent product change events for a dispensary + */ + async getProductChangeEvents( + dispensaryId: number, + options: { window?: TimeWindow; customRange?: DateRange; limit?: number } = {} + ): Promise { + const { window = '7d', customRange, limit = 100 } = options; + const { start, end } = getDateRangeFromWindow(window, customRange); + + const result = await this.pool.query(` + WITH changes AS ( + -- Products added + SELECT + sp.id AS store_product_id, + sp.name AS product_name, + sp.brand_name, + sp.category, + 'added' AS event_type, + sp.first_seen_at AS event_date, + NULL::TEXT AS old_value, + NULL::TEXT AS new_value + FROM store_products sp + WHERE sp.dispensary_id = $1 + AND sp.first_seen_at >= $2 + AND sp.first_seen_at <= $3 + + UNION ALL + + -- Stock in/out from snapshots + SELECT + sps.store_product_id, + sp.name AS product_name, + sp.brand_name, + sp.category, + CASE + WHEN sps.is_in_stock = TRUE AND LAG(sps.is_in_stock) OVER w = FALSE THEN 'stock_in' + WHEN sps.is_in_stock = FALSE AND LAG(sps.is_in_stock) OVER w = TRUE THEN 'stock_out' + ELSE NULL + END AS event_type, + sps.captured_at AS event_date, + LAG(sps.is_in_stock::TEXT) OVER w AS old_value, + sps.is_in_stock::TEXT AS new_value + FROM store_product_snapshots sps + JOIN store_products sp ON sp.id = sps.store_product_id + WHERE sps.dispensary_id = $1 + AND sps.captured_at >= $2 + AND sps.captured_at <= $3 + WINDOW w AS (PARTITION BY sps.store_product_id ORDER BY sps.captured_at) + + UNION ALL + + -- Price changes from snapshots + SELECT + sps.store_product_id, + sp.name AS product_name, + sp.brand_name, + sp.category, + 'price_change' AS event_type, + sps.captured_at AS event_date, + LAG(sps.price_rec::TEXT) OVER w AS old_value, + sps.price_rec::TEXT AS new_value + FROM store_product_snapshots sps + JOIN store_products sp ON sp.id = sps.store_product_id + WHERE sps.dispensary_id = $1 + AND sps.captured_at >= $2 + AND sps.captured_at <= $3 + AND sps.price_rec IS NOT NULL + AND sps.price_rec != LAG(sps.price_rec) OVER w + WINDOW w AS (PARTITION BY sps.store_product_id ORDER BY sps.captured_at) + ) + SELECT * + FROM changes + WHERE event_type IS NOT NULL + ORDER BY event_date DESC + LIMIT $4 + `, [dispensaryId, start, end, limit]); + + return result.rows.map((row: any) => ({ + store_product_id: row.store_product_id, + product_name: row.product_name, + brand_name: row.brand_name, + category: row.category, + event_type: row.event_type, + event_date: row.event_date ? row.event_date.toISOString() : null, + old_value: row.old_value, + new_value: row.new_value, + })); + } + + /** + * Get store inventory composition (categories and brands breakdown) + */ + async getStoreInventoryComposition(dispensaryId: number): Promise<{ + total_products: number; + in_stock_count: number; + out_of_stock_count: number; + categories: Array<{ category: string; count: number; percent: number }>; + top_brands: Array<{ brand: string; count: number; percent: number }>; + }> { + // Get totals + const totalsResult = await this.pool.query(` + SELECT + COUNT(*) AS total, + COUNT(*) FILTER (WHERE is_in_stock = TRUE) AS in_stock, + COUNT(*) FILTER (WHERE is_in_stock = FALSE) AS out_of_stock + FROM store_products + WHERE dispensary_id = $1 + `, [dispensaryId]); + + const totals = totalsResult.rows[0]; + const totalProducts = parseInt(totals.total) || 0; + + // Get category breakdown + const categoriesResult = await this.pool.query(` + SELECT + category, + COUNT(*) AS count, + ROUND(COUNT(*)::NUMERIC * 100 / NULLIF($2, 0), 2) AS percent + FROM store_products + WHERE dispensary_id = $1 + AND category IS NOT NULL + AND is_in_stock = TRUE + GROUP BY category + ORDER BY count DESC + `, [dispensaryId, totalProducts]); + + // Get top brands + const brandsResult = await this.pool.query(` + SELECT + brand_name AS brand, + COUNT(*) AS count, + ROUND(COUNT(*)::NUMERIC * 100 / NULLIF($2, 0), 2) AS percent + FROM store_products + WHERE dispensary_id = $1 + AND brand_name IS NOT NULL + AND is_in_stock = TRUE + GROUP BY brand_name + ORDER BY count DESC + LIMIT 20 + `, [dispensaryId, totalProducts]); + + return { + total_products: totalProducts, + in_stock_count: parseInt(totals.in_stock) || 0, + out_of_stock_count: parseInt(totals.out_of_stock) || 0, + categories: categoriesResult.rows.map((row: any) => ({ + category: row.category, + count: parseInt(row.count), + percent: parseFloat(row.percent) || 0, + })), + top_brands: brandsResult.rows.map((row: any) => ({ + brand: row.brand, + count: parseInt(row.count), + percent: parseFloat(row.percent) || 0, + })), + }; + } + + /** + * Get stores with most changes (high-activity stores) + */ + async getMostActiveStores( + options: { window?: TimeWindow; customRange?: DateRange; limit?: number; stateCode?: string } = {} + ): Promise> { + const { window = '7d', customRange, limit = 25, stateCode } = options; + const { start, end } = getDateRangeFromWindow(window, customRange); + + const params: any[] = [start, end, limit]; + let paramIdx = 4; + let stateFilter = ''; + + if (stateCode) { + stateFilter = `AND s.code = $${paramIdx}`; + params.push(stateCode); + paramIdx++; + } + + const result = await this.pool.query(` + WITH store_activity AS ( + SELECT + sps.dispensary_id, + -- Price changes + COUNT(*) FILTER ( + WHERE sps.price_rec IS NOT NULL + AND sps.price_rec != LAG(sps.price_rec) OVER (PARTITION BY sps.store_product_id ORDER BY sps.captured_at) + ) AS price_changes, + -- Stock changes + COUNT(*) FILTER ( + WHERE sps.is_in_stock != LAG(sps.is_in_stock) OVER (PARTITION BY sps.store_product_id ORDER BY sps.captured_at) + ) AS stock_changes + FROM store_product_snapshots sps + WHERE sps.captured_at >= $1 + AND sps.captured_at <= $2 + GROUP BY sps.dispensary_id + ), + products_added AS ( + SELECT + dispensary_id, + COUNT(*) AS count + FROM store_products + WHERE first_seen_at >= $1 + AND first_seen_at <= $2 + GROUP BY dispensary_id + ) + SELECT + d.id AS dispensary_id, + d.name AS dispensary_name, + s.code AS state_code, + COALESCE(sa.price_changes, 0) + COALESCE(sa.stock_changes, 0) + COALESCE(pa.count, 0) AS total_changes, + COALESCE(sa.price_changes, 0) AS price_changes, + COALESCE(sa.stock_changes, 0) AS stock_changes, + COALESCE(pa.count, 0) AS products_added + FROM dispensaries d + LEFT JOIN states s ON s.id = d.state_id + LEFT JOIN store_activity sa ON sa.dispensary_id = d.id + LEFT JOIN products_added pa ON pa.dispensary_id = d.id + WHERE (sa.price_changes > 0 OR sa.stock_changes > 0 OR pa.count > 0) + ${stateFilter} + ORDER BY total_changes DESC + LIMIT $3 + `, params); + + return result.rows.map((row: any) => ({ + dispensary_id: row.dispensary_id, + dispensary_name: row.dispensary_name, + state_code: row.state_code || 'XX', + total_changes: parseInt(row.total_changes) || 0, + price_changes: parseInt(row.price_changes) || 0, + stock_changes: parseInt(row.stock_changes) || 0, + products_added: parseInt(row.products_added) || 0, + })); + } + + /** + * Get store price positioning vs market + */ + async getStorePricePositioning(dispensaryId: number): Promise<{ + dispensary_id: number; + dispensary_name: string; + categories: Array<{ + category: string; + store_avg_price: number; + market_avg_price: number; + price_vs_market_percent: number; + product_count: number; + }>; + overall_price_vs_market_percent: number | null; + }> { + // Get dispensary info + const dispResult = await this.pool.query(` + SELECT id, name, state_id FROM dispensaries WHERE id = $1 + `, [dispensaryId]); + + if (dispResult.rows.length === 0) { + return { + dispensary_id: dispensaryId, + dispensary_name: 'Unknown', + categories: [], + overall_price_vs_market_percent: null, + }; + } + + const dispensary = dispResult.rows[0]; + + // Get category price comparison + const result = await this.pool.query(` + WITH store_prices AS ( + SELECT + category, + AVG(price_rec) AS store_avg, + COUNT(*) AS product_count + FROM store_products + WHERE dispensary_id = $1 + AND price_rec IS NOT NULL + AND is_in_stock = TRUE + AND category IS NOT NULL + GROUP BY category + ), + market_prices AS ( + SELECT + sp.category, + AVG(sp.price_rec) AS market_avg + FROM store_products sp + WHERE sp.state_id = $2 + AND sp.price_rec IS NOT NULL + AND sp.is_in_stock = TRUE + AND sp.category IS NOT NULL + GROUP BY sp.category + ) + SELECT + sp.category, + sp.store_avg AS store_avg_price, + mp.market_avg AS market_avg_price, + ROUND(((sp.store_avg - mp.market_avg) / NULLIF(mp.market_avg, 0) * 100)::NUMERIC, 2) AS price_vs_market_percent, + sp.product_count + FROM store_prices sp + LEFT JOIN market_prices mp ON mp.category = sp.category + ORDER BY sp.product_count DESC + `, [dispensaryId, dispensary.state_id]); + + // Calculate overall + const overallResult = await this.pool.query(` + WITH store_avg AS ( + SELECT AVG(price_rec) AS avg + FROM store_products + WHERE dispensary_id = $1 AND price_rec IS NOT NULL AND is_in_stock = TRUE + ), + market_avg AS ( + SELECT AVG(price_rec) AS avg + FROM store_products + WHERE state_id = $2 AND price_rec IS NOT NULL AND is_in_stock = TRUE + ) + SELECT + ROUND(((sa.avg - ma.avg) / NULLIF(ma.avg, 0) * 100)::NUMERIC, 2) AS price_vs_market + FROM store_avg sa, market_avg ma + `, [dispensaryId, dispensary.state_id]); + + return { + dispensary_id: dispensaryId, + dispensary_name: dispensary.name, + categories: result.rows.map((row: any) => ({ + category: row.category, + store_avg_price: parseFloat(row.store_avg_price), + market_avg_price: row.market_avg_price ? parseFloat(row.market_avg_price) : 0, + price_vs_market_percent: row.price_vs_market_percent ? parseFloat(row.price_vs_market_percent) : 0, + product_count: parseInt(row.product_count), + })), + overall_price_vs_market_percent: overallResult.rows[0]?.price_vs_market + ? parseFloat(overallResult.rows[0].price_vs_market) + : null, + }; + } +} + +export default StoreAnalyticsService; diff --git a/backend/src/services/analytics/index.ts b/backend/src/services/analytics/index.ts new file mode 100644 index 00000000..029fb625 --- /dev/null +++ b/backend/src/services/analytics/index.ts @@ -0,0 +1,13 @@ +/** + * Analytics Engine - Service Exports + * + * Central export point for all analytics services. + */ + +export * from './types'; + +export { PriceAnalyticsService } from './PriceAnalyticsService'; +export { BrandPenetrationService } from './BrandPenetrationService'; +export { CategoryAnalyticsService } from './CategoryAnalyticsService'; +export { StoreAnalyticsService } from './StoreAnalyticsService'; +export { StateAnalyticsService } from './StateAnalyticsService'; diff --git a/backend/src/services/analytics/types.ts b/backend/src/services/analytics/types.ts new file mode 100644 index 00000000..4575709f --- /dev/null +++ b/backend/src/services/analytics/types.ts @@ -0,0 +1,324 @@ +/** + * Analytics Engine Types + * + * Shared types for all analytics services. + */ + +// ============================================================ +// LEGAL STATUS TYPES +// ============================================================ + +export type LegalType = 'recreational' | 'medical_only' | 'no_program' | 'all'; + +/** + * SQL WHERE clause fragments for legal type filtering. + * Use these in queries that join on states table. + */ +export const LEGAL_TYPE_FILTERS = { + recreational: 's.recreational_legal = TRUE', + medical_only: 's.medical_legal = TRUE AND (s.recreational_legal = FALSE OR s.recreational_legal IS NULL)', + no_program: '(s.recreational_legal = FALSE OR s.recreational_legal IS NULL) AND (s.medical_legal = FALSE OR s.medical_legal IS NULL)', + all: '1=1', // No filter +} as const; + +/** + * Get SQL WHERE clause for legal type filtering + */ +export function getLegalTypeFilter(legalType: LegalType): string { + return LEGAL_TYPE_FILTERS[legalType] || LEGAL_TYPE_FILTERS.all; +} + +// ============================================================ +// TIME WINDOWS +// ============================================================ + +export type TimeWindow = '7d' | '30d' | '90d' | 'custom'; + +export interface DateRange { + start: Date; + end: Date; +} + +export function getDateRangeFromWindow(window: TimeWindow, customRange?: DateRange): DateRange { + const end = new Date(); + let start: Date; + + switch (window) { + case '7d': + start = new Date(end.getTime() - 7 * 24 * 60 * 60 * 1000); + break; + case '30d': + start = new Date(end.getTime() - 30 * 24 * 60 * 60 * 1000); + break; + case '90d': + start = new Date(end.getTime() - 90 * 24 * 60 * 60 * 1000); + break; + case 'custom': + if (!customRange) { + throw new Error('Custom window requires start and end dates'); + } + return customRange; + default: + start = new Date(end.getTime() - 30 * 24 * 60 * 60 * 1000); + } + + return { start, end }; +} + +// ============================================================ +// PRICE ANALYTICS TYPES +// ============================================================ + +export interface PriceDataPoint { + date: string; // ISO date string + price_rec: number | null; + price_med: number | null; + price_rec_special: number | null; + price_med_special: number | null; + is_on_special: boolean; +} + +export interface PriceTrendResult { + store_product_id: number; + product_name: string; + brand_name: string | null; + category: string | null; + dispensary_id: number; + dispensary_name: string; + state_code: string; + data_points: PriceDataPoint[]; + summary: { + current_price: number | null; + min_price: number | null; + max_price: number | null; + avg_price: number | null; + price_change_count: number; + volatility_percent: number | null; + }; +} + +export interface CategoryPriceStats { + category: string; + state_code: string; + state_name: string; + legal_type: 'recreational' | 'medical_only'; + avg_price: number; + median_price: number; + min_price: number; + max_price: number; + product_count: number; + dispensary_count: number; +} + +export interface PriceVolatilityResult { + store_product_id: number; + product_name: string; + brand_name: string | null; + change_count: number; + avg_change_percent: number; + max_change_percent: number; + last_change_at: string | null; +} + +// ============================================================ +// BRAND PENETRATION TYPES +// ============================================================ + +export interface BrandPenetrationResult { + brand_name: string; + total_dispensaries: number; + total_skus: number; + avg_skus_per_dispensary: number; + states_present: string[]; + state_breakdown: BrandStateBreakdown[]; + penetration_trend: PenetrationDataPoint[]; +} + +export interface BrandStateBreakdown { + state_code: string; + state_name: string; + legal_type: 'recreational' | 'medical_only' | 'no_program'; + dispensary_count: number; + sku_count: number; + avg_skus_per_dispensary: number; + market_share_percent: number | null; +} + +export interface PenetrationDataPoint { + date: string; + dispensary_count: number; + new_dispensaries: number; + dropped_dispensaries: number; +} + +export interface BrandMarketPosition { + brand_name: string; + category: string; + state_code: string; + sku_count: number; + dispensary_count: number; + category_share_percent: number; + avg_price: number | null; + price_vs_category_avg: number | null; +} + +export interface BrandRecVsMedFootprint { + brand_name: string; + rec_states_count: number; + rec_states: string[]; + rec_dispensary_count: number; + rec_avg_skus: number; + med_only_states_count: number; + med_only_states: string[]; + med_only_dispensary_count: number; + med_only_avg_skus: number; +} + +// ============================================================ +// CATEGORY ANALYTICS TYPES +// ============================================================ + +export interface CategoryGrowthResult { + category: string; + current_sku_count: number; + current_dispensary_count: number; + avg_price: number | null; + growth_data: CategoryGrowthDataPoint[]; + state_breakdown: CategoryStateBreakdown[]; +} + +export interface CategoryGrowthDataPoint { + date: string; + sku_count: number; + dispensary_count: number; + avg_price: number | null; +} + +export interface CategoryStateBreakdown { + state_code: string; + state_name: string; + legal_type: 'recreational' | 'medical_only'; + sku_count: number; + dispensary_count: number; + avg_price: number | null; +} + +export interface CategoryRecVsMedComparison { + category: string; + recreational: { + state_count: number; + dispensary_count: number; + sku_count: number; + avg_price: number | null; + median_price: number | null; + }; + medical_only: { + state_count: number; + dispensary_count: number; + sku_count: number; + avg_price: number | null; + median_price: number | null; + }; + price_diff_percent: number | null; +} + +// ============================================================ +// STORE ANALYTICS TYPES +// ============================================================ + +export interface StoreChangeSummary { + dispensary_id: number; + dispensary_name: string; + state_code: string; + window: TimeWindow; + products_added: number; + products_dropped: number; + brands_added: string[]; + brands_dropped: string[]; + price_changes: number; + avg_price_change_percent: number | null; + stock_in_events: number; + stock_out_events: number; + current_product_count: number; + current_in_stock_count: number; +} + +export interface ProductChangeEvent { + store_product_id: number; + product_name: string; + brand_name: string | null; + category: string | null; + event_type: 'added' | 'dropped' | 'price_change' | 'stock_in' | 'stock_out'; + event_date: string; + old_value?: string | number | null; + new_value?: string | number | null; +} + +// ============================================================ +// STATE ANALYTICS TYPES +// ============================================================ + +export interface StateMarketSummary { + state_code: string; + state_name: string; + legal_status: { + recreational_legal: boolean; + rec_year: number | null; + medical_legal: boolean; + med_year: number | null; + }; + coverage: { + dispensary_count: number; + product_count: number; + brand_count: number; + category_count: number; + snapshot_count: number; + last_crawl_at: string | null; + }; + pricing: { + avg_price: number | null; + median_price: number | null; + min_price: number | null; + max_price: number | null; + }; + top_categories: Array<{ category: string; count: number }>; + top_brands: Array<{ brand: string; count: number }>; +} + +export interface LegalStateBreakdown { + recreational_states: { + count: number; + dispensary_count: number; + product_count: number; + snapshot_count: number; + states: Array<{ code: string; name: string; dispensary_count: number }>; + }; + medical_only_states: { + count: number; + dispensary_count: number; + product_count: number; + snapshot_count: number; + states: Array<{ code: string; name: string; dispensary_count: number }>; + }; + no_program_states: { + count: number; + states: Array<{ code: string; name: string }>; + }; +} + +export interface RecVsMedPriceComparison { + category: string | null; + recreational: { + state_count: number; + product_count: number; + avg_price: number | null; + median_price: number | null; + }; + medical_only: { + state_count: number; + product_count: number; + avg_price: number | null; + median_price: number | null; + }; + price_diff_percent: number | null; +} diff --git a/backend/src/services/category-crawler-jobs.ts b/backend/src/services/category-crawler-jobs.ts index 534d485d..025c90e6 100644 --- a/backend/src/services/category-crawler-jobs.ts +++ b/backend/src/services/category-crawler-jobs.ts @@ -12,7 +12,7 @@ * - SandboxMetadataJob - Sandbox metadata crawling */ -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { crawlerLogger } from './crawler-logger'; import { IntelligenceCategory, diff --git a/backend/src/services/category-discovery.ts b/backend/src/services/category-discovery.ts index d68b0a7c..34c185ba 100644 --- a/backend/src/services/category-discovery.ts +++ b/backend/src/services/category-discovery.ts @@ -1,7 +1,7 @@ import puppeteer from 'puppeteer-extra'; import StealthPlugin from 'puppeteer-extra-plugin-stealth'; import { Browser, Page } from 'puppeteer'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { logger } from './logger'; import { bypassAgeGate, detectStateFromUrl, setAgeGateCookies } from '../utils/age-gate'; import { dutchieTemplate } from '../scrapers/templates/dutchie'; diff --git a/backend/src/services/crawl-scheduler.ts b/backend/src/services/crawl-scheduler.ts index 2842d735..eebf1f00 100644 --- a/backend/src/services/crawl-scheduler.ts +++ b/backend/src/services/crawl-scheduler.ts @@ -12,7 +12,7 @@ */ import cron from 'node-cron'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { scrapeStore } from '../scraper-v2'; import { runStoreCrawlOrchestrator, diff --git a/backend/src/services/crawler-jobs.ts b/backend/src/services/crawler-jobs.ts index 383c724f..fcfe253f 100644 --- a/backend/src/services/crawler-jobs.ts +++ b/backend/src/services/crawler-jobs.ts @@ -7,7 +7,7 @@ * 3. SandboxCrawlJob - Learning/testing crawl for unknown providers */ -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { logger } from './logger'; import { detectMenuProvider, detectProviderChange, MenuProvider } from './menu-provider-detector'; import { scrapeStore } from '../scraper-v2'; diff --git a/backend/src/services/crawler-profiles.ts b/backend/src/services/crawler-profiles.ts new file mode 100644 index 00000000..8bd93d62 --- /dev/null +++ b/backend/src/services/crawler-profiles.ts @@ -0,0 +1,363 @@ +/** + * Crawler Profiles Service + * + * Manages per-store crawler configuration profiles. + * This service handles CRUD operations for dispensary_crawler_profiles + * and provides helper functions for loading active profiles. + * + * Phase 1: Basic profile loading for Dutchie production crawls only. + */ + +import { pool } from '../db/pool'; +import { + DispensaryCrawlerProfile, + DispensaryCrawlerProfileCreate, + DispensaryCrawlerProfileUpdate, + CrawlerProfileOptions, +} from '../dutchie-az/types'; + +// ============================================================ +// Database Row Mapping +// ============================================================ + +/** + * Map database row (snake_case) to TypeScript interface (camelCase) + */ +function mapDbRowToProfile(row: any): DispensaryCrawlerProfile { + return { + id: row.id, + dispensaryId: row.dispensary_id, + profileName: row.profile_name, + crawlerType: row.crawler_type, + profileKey: row.profile_key, + config: row.config || {}, + timeoutMs: row.timeout_ms, + downloadImages: row.download_images, + trackStock: row.track_stock, + version: row.version, + enabled: row.enabled, + createdAt: row.created_at, + updatedAt: row.updated_at, + }; +} + +// ============================================================ +// Profile Retrieval +// ============================================================ + +/** + * Get the active crawler profile for a dispensary. + * + * Resolution order: + * 1. If dispensaries.active_crawler_profile_id is set, load that profile (if enabled) + * 2. Otherwise, find the most recently created enabled profile matching the dispensary's + * menu_type (for Dutchie, crawler_type = 'dutchie') + * 3. Returns null if no matching profile exists + * + * @param dispensaryId - The dispensary ID to look up + * @param crawlerType - Optional: filter by crawler type (defaults to checking menu_type) + */ +export async function getActiveCrawlerProfileForDispensary( + dispensaryId: number, + crawlerType?: string +): Promise { + // First, check if there's an explicit active_crawler_profile_id set + const activeProfileResult = await pool.query( + `SELECT dcp.* + FROM dispensary_crawler_profiles dcp + INNER JOIN dispensaries d ON d.active_crawler_profile_id = dcp.id + WHERE d.id = $1 AND dcp.enabled = true`, + [dispensaryId] + ); + + if (activeProfileResult.rows.length > 0) { + return mapDbRowToProfile(activeProfileResult.rows[0]); + } + + // No explicit active profile - fall back to most recent enabled profile + // If crawlerType not specified, try to match dispensary's menu_type + let effectiveCrawlerType = crawlerType; + if (!effectiveCrawlerType) { + const dispensaryResult = await pool.query( + `SELECT menu_type FROM dispensaries WHERE id = $1`, + [dispensaryId] + ); + if (dispensaryResult.rows.length > 0 && dispensaryResult.rows[0].menu_type) { + effectiveCrawlerType = dispensaryResult.rows[0].menu_type; + } + } + + // If we still don't have a crawler type, default to 'dutchie' for Phase 1 + if (!effectiveCrawlerType) { + effectiveCrawlerType = 'dutchie'; + } + + const fallbackResult = await pool.query( + `SELECT * FROM dispensary_crawler_profiles + WHERE dispensary_id = $1 + AND crawler_type = $2 + AND enabled = true + ORDER BY created_at DESC + LIMIT 1`, + [dispensaryId, effectiveCrawlerType] + ); + + if (fallbackResult.rows.length > 0) { + return mapDbRowToProfile(fallbackResult.rows[0]); + } + + return null; +} + +/** + * Get all profiles for a dispensary + */ +export async function getProfilesForDispensary( + dispensaryId: number +): Promise { + const result = await pool.query( + `SELECT * FROM dispensary_crawler_profiles + WHERE dispensary_id = $1 + ORDER BY created_at DESC`, + [dispensaryId] + ); + + return result.rows.map(mapDbRowToProfile); +} + +/** + * Get a profile by ID + */ +export async function getProfileById( + profileId: number +): Promise { + const result = await pool.query( + `SELECT * FROM dispensary_crawler_profiles WHERE id = $1`, + [profileId] + ); + + if (result.rows.length === 0) { + return null; + } + + return mapDbRowToProfile(result.rows[0]); +} + +// ============================================================ +// Profile Creation & Update +// ============================================================ + +/** + * Create a new crawler profile + */ +export async function createCrawlerProfile( + profile: DispensaryCrawlerProfileCreate +): Promise { + const result = await pool.query( + `INSERT INTO dispensary_crawler_profiles ( + dispensary_id, profile_name, crawler_type, profile_key, + config, timeout_ms, download_images, track_stock, version, enabled + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10) + RETURNING *`, + [ + profile.dispensaryId, + profile.profileName, + profile.crawlerType, + profile.profileKey ?? null, + JSON.stringify(profile.config ?? {}), + profile.timeoutMs ?? 30000, + profile.downloadImages ?? true, + profile.trackStock ?? true, + profile.version ?? 1, + profile.enabled ?? true, + ] + ); + + return mapDbRowToProfile(result.rows[0]); +} + +/** + * Update an existing profile + */ +export async function updateCrawlerProfile( + profileId: number, + updates: DispensaryCrawlerProfileUpdate +): Promise { + // Build dynamic update query + const setClauses: string[] = []; + const values: any[] = []; + let paramIndex = 1; + + if (updates.profileName !== undefined) { + setClauses.push(`profile_name = $${paramIndex++}`); + values.push(updates.profileName); + } + if (updates.crawlerType !== undefined) { + setClauses.push(`crawler_type = $${paramIndex++}`); + values.push(updates.crawlerType); + } + if (updates.profileKey !== undefined) { + setClauses.push(`profile_key = $${paramIndex++}`); + values.push(updates.profileKey); + } + if (updates.config !== undefined) { + setClauses.push(`config = $${paramIndex++}`); + values.push(JSON.stringify(updates.config)); + } + if (updates.timeoutMs !== undefined) { + setClauses.push(`timeout_ms = $${paramIndex++}`); + values.push(updates.timeoutMs); + } + if (updates.downloadImages !== undefined) { + setClauses.push(`download_images = $${paramIndex++}`); + values.push(updates.downloadImages); + } + if (updates.trackStock !== undefined) { + setClauses.push(`track_stock = $${paramIndex++}`); + values.push(updates.trackStock); + } + if (updates.version !== undefined) { + setClauses.push(`version = $${paramIndex++}`); + values.push(updates.version); + } + if (updates.enabled !== undefined) { + setClauses.push(`enabled = $${paramIndex++}`); + values.push(updates.enabled); + } + + if (setClauses.length === 0) { + // Nothing to update + return getProfileById(profileId); + } + + values.push(profileId); + + const result = await pool.query( + `UPDATE dispensary_crawler_profiles + SET ${setClauses.join(', ')} + WHERE id = $${paramIndex} + RETURNING *`, + values + ); + + if (result.rows.length === 0) { + return null; + } + + return mapDbRowToProfile(result.rows[0]); +} + +/** + * Delete a profile (hard delete - use updateCrawlerProfile with enabled=false for soft delete) + */ +export async function deleteCrawlerProfile(profileId: number): Promise { + // First clear any active_crawler_profile_id references + await pool.query( + `UPDATE dispensaries SET active_crawler_profile_id = NULL + WHERE active_crawler_profile_id = $1`, + [profileId] + ); + + const result = await pool.query( + `DELETE FROM dispensary_crawler_profiles WHERE id = $1`, + [profileId] + ); + + return (result.rowCount ?? 0) > 0; +} + +// ============================================================ +// Active Profile Management +// ============================================================ + +/** + * Set the active crawler profile for a dispensary + */ +export async function setActiveCrawlerProfile( + dispensaryId: number, + profileId: number +): Promise { + // Verify the profile belongs to this dispensary and is enabled + const profile = await getProfileById(profileId); + if (!profile) { + throw new Error(`Profile ${profileId} not found`); + } + if (profile.dispensaryId !== dispensaryId) { + throw new Error(`Profile ${profileId} does not belong to dispensary ${dispensaryId}`); + } + if (!profile.enabled) { + throw new Error(`Profile ${profileId} is not enabled`); + } + + await pool.query( + `UPDATE dispensaries SET active_crawler_profile_id = $1 WHERE id = $2`, + [profileId, dispensaryId] + ); +} + +/** + * Clear the active crawler profile for a dispensary + */ +export async function clearActiveCrawlerProfile(dispensaryId: number): Promise { + await pool.query( + `UPDATE dispensaries SET active_crawler_profile_id = NULL WHERE id = $1`, + [dispensaryId] + ); +} + +// ============================================================ +// Helper Functions +// ============================================================ + +/** + * Convert a profile to runtime options for the crawler + */ +export function profileToOptions(profile: DispensaryCrawlerProfile): CrawlerProfileOptions { + return { + timeoutMs: profile.timeoutMs ?? 30000, + downloadImages: profile.downloadImages, + trackStock: profile.trackStock, + config: profile.config, + }; +} + +/** + * Get default options when no profile is configured + */ +export function getDefaultCrawlerOptions(): CrawlerProfileOptions { + return { + timeoutMs: 30000, + downloadImages: true, + trackStock: true, + config: {}, + }; +} + +/** + * Check if a dispensary has any profiles + */ +export async function dispensaryHasProfiles(dispensaryId: number): Promise { + const result = await pool.query( + `SELECT EXISTS(SELECT 1 FROM dispensary_crawler_profiles WHERE dispensary_id = $1) as has_profiles`, + [dispensaryId] + ); + return result.rows[0]?.has_profiles ?? false; +} + +/** + * Get profile counts by crawler type + */ +export async function getProfileStats(): Promise<{ crawlerType: string; count: number }[]> { + const result = await pool.query( + `SELECT crawler_type, COUNT(*) as count + FROM dispensary_crawler_profiles + WHERE enabled = true + GROUP BY crawler_type + ORDER BY count DESC` + ); + + return result.rows.map(row => ({ + crawlerType: row.crawler_type, + count: parseInt(row.count, 10), + })); +} diff --git a/backend/src/services/dispensary-orchestrator.ts b/backend/src/services/dispensary-orchestrator.ts index 831e5f5d..d4e7b9e8 100644 --- a/backend/src/services/dispensary-orchestrator.ts +++ b/backend/src/services/dispensary-orchestrator.ts @@ -12,7 +12,7 @@ */ import { v4 as uuidv4 } from 'uuid'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { crawlerLogger } from './crawler-logger'; import { detectMultiCategoryProviders, @@ -20,6 +20,25 @@ import { MultiCategoryDetectionResult, } from './intelligence-detector'; import { runCrawlProductsJob, runSandboxProductsJob } from './category-crawler-jobs'; +import { + getActiveCrawlerProfileForDispensary, + profileToOptions, + getDefaultCrawlerOptions, +} from './crawler-profiles'; +import { + runSandboxDiscovery, + // runSandboxCrawlWithValidation, // DISABLED - observability phase + CrawlerStatus, + getSandboxStatus, + getProfileKey, +} from './sandbox-discovery'; +// AUTOMATION DISABLED - observability phase +// import { +// hasPassedSandboxValidation, +// demoteToSandbox, +// } from './sandbox-validator'; +import { OrchestratorTrace } from './orchestrator-trace'; +import { DispensaryCrawlerProfile, CrawlerProfileOptions } from '../dutchie-az/types'; // ======================================== // Types @@ -80,6 +99,9 @@ export async function runDispensaryOrchestrator( const startTime = Date.now(); const runId = uuidv4(); + // Initialize trace for this run + const trace = new OrchestratorTrace(dispensaryId, runId); + let result: DispensaryOrchestratorResult = { status: 'pending', summary: '', @@ -92,30 +114,92 @@ export async function runDispensaryOrchestrator( }; try { + // TRACE: Initialize orchestrator + trace.step( + 'init_orchestrator', + 'Initializing dispensary crawl orchestrator', + { dispensaryId, scheduleId, runId }, + 'dispensary-orchestrator.ts', + 'Starting a new crawl orchestration run', + 'UUID generation and state initialization' + ); + trace.completeStep({ initialized: true }); + // Mark schedule as running await updateScheduleStatus(dispensaryId, 'running', 'Starting orchestrator...', null, runId); + // TRACE: Load dispensary info + trace.step( + 'load_dispensary', + 'Loading dispensary information from database', + { dispensaryId }, + 'dispensary-orchestrator.ts', + 'Need dispensary details to determine crawl strategy', + 'Database query to dispensaries table' + ); + // 1. Load dispensary info const dispensary = await getDispensaryInfo(dispensaryId); if (!dispensary) { + trace.failStep('Dispensary not found in database'); throw new Error(`Dispensary ${dispensaryId} not found`); } + trace.completeStep({ + found: true, + name: dispensary.name, + city: dispensary.city, + menuType: dispensary.menu_type, + platformDispensaryId: dispensary.platform_dispensary_id, + }); + result.dispensaryName = dispensary.name; + // TRACE: Check detection needed + trace.step( + 'determine_mode', + 'Checking if provider detection is needed', + { + menuType: dispensary.menu_type, + productProvider: dispensary.product_provider, + platformDispensaryId: dispensary.platform_dispensary_id, + }, + 'dispensary-orchestrator.ts', + 'Determine if we need to detect the menu provider type', + 'Checking menu_type, platform_id, and last scan time' + ); + // 2. Check if provider detection is needed const needsDetection = await checkNeedsDetection(dispensary); + trace.completeStep({ needsDetection, reason: needsDetection ? 'Provider unknown or stale' : 'Provider already known' }); if (needsDetection) { - // Run provider detection + // TRACE: Provider detection const websiteUrl = dispensary.menu_url || dispensary.website; + + trace.step( + 'fetch_html', + 'Running provider detection on website', + { websiteUrl }, + 'intelligence-detector.ts', + 'Need to detect what menu platform the dispensary uses', + 'Fetching website HTML and analyzing for provider signatures' + ); + if (!websiteUrl) { + trace.failStep('No website URL available'); result.status = 'error'; result.summary = 'No website URL available for detection'; result.error = 'Dispensary has no menu_url or website configured'; await updateScheduleStatus(dispensaryId, 'error', result.summary, result.error, runId); result.durationMs = Date.now() - startTime; await createJobRecord(dispensaryId, scheduleId, result); + + // Save trace before returning + trace.setStateAtEnd('error'); + trace.markFailed(result.error); + await trace.save(); + return result; } @@ -125,6 +209,12 @@ export async function runDispensaryOrchestrator( result.detectionRan = true; result.detectionResult = detectionResult; + trace.completeStep({ + provider: detectionResult.product.provider, + confidence: detectionResult.product.confidence, + mode: detectionResult.product.mode, + }); + // Save detection results to dispensary await updateAllCategoryProviders(dispensaryId, detectionResult); @@ -156,58 +246,42 @@ export async function runDispensaryOrchestrator( const isDutchieProduction = (provider === 'dutchie' && mode === 'production') || (dispensary.menu_type === 'dutchie' && dispensary.platform_dispensary_id); + // TRACE: Determine crawl type + trace.step( + 'determine_mode', + 'Determining crawl type based on provider', + { provider, mode, isDutchieProduction }, + 'dispensary-orchestrator.ts', + 'Select appropriate crawler based on detected provider', + 'Checking provider type and mode flags' + ); + if (isDutchieProduction) { - // Production Dutchie crawl - await updateScheduleStatus(dispensaryId, 'running', 'Running Dutchie production crawl...', null, runId); + trace.completeStep({ crawlType: 'dutchie_production', profileBased: true }); - try { - // Run the category-specific crawl job - const crawlResult = await runCrawlProductsJob(dispensaryId); - - result.crawlRan = true; - result.crawlType = 'production'; - - if (crawlResult.success) { - result.productsFound = crawlResult.data?.productsFound || 0; - - const detectionPart = result.detectionRan ? 'Detection + ' : ''; - result.summary = `${detectionPart}Dutchie products crawl completed`; - result.status = 'success'; - - crawlerLogger.jobCompleted({ - job_id: 0, - store_id: 0, - store_name: dispensary.name, - duration_ms: Date.now() - startTime, - products_found: result.productsFound || 0, - products_new: 0, - products_updated: 0, - provider: 'dutchie', - }); - } else { - result.status = 'error'; - result.error = crawlResult.message; - result.summary = `Dutchie crawl failed: ${crawlResult.message.slice(0, 100)}`; - } - - } catch (crawlError: any) { - result.status = 'error'; - result.error = crawlError.message; - result.summary = `Dutchie crawl failed: ${crawlError.message.slice(0, 100)}`; - result.crawlRan = true; - result.crawlType = 'production'; - - crawlerLogger.jobFailed({ - job_id: 0, - store_id: 0, - store_name: dispensary.name, - duration_ms: Date.now() - startTime, - error_message: crawlError.message, - provider: 'dutchie', - }); - } + // Production Dutchie crawl - now with profile support + await runDutchieProductionWithProfile( + dispensary, + result, + startTime, + runId, + trace // Pass trace to helper + ); + // Result is mutated in-place by the helper function } else if (provider && provider !== 'unknown') { + trace.completeStep({ crawlType: 'sandbox', provider }); + + // TRACE: Run non-Dutchie sandbox crawl + trace.step( + 'run_sandbox_validation', + `Running ${provider} sandbox crawl`, + { provider, dispensaryId }, + 'category-crawler-jobs.ts:runSandboxProductsJob', + 'Provider is not Dutchie - use sandbox crawler', + 'Run generic sandbox product extraction' + ); + // Sandbox crawl for non-Dutchie or sandbox mode await updateScheduleStatus(dispensaryId, 'running', `Running ${provider} sandbox crawl...`, null, runId); @@ -222,10 +296,21 @@ export async function runDispensaryOrchestrator( if (sandboxResult.success) { result.summary = `${detectionPart}${provider} sandbox crawl (${result.productsFound} items, quality ${sandboxResult.data?.qualityScore || 0}%)`; result.status = 'sandbox_only'; + trace.completeStep({ + success: true, + productsFound: result.productsFound, + qualityScore: sandboxResult.data?.qualityScore, + }); + trace.setStateAtEnd('sandbox'); + trace.setProductsFound(result.productsFound || 0); + trace.markSuccess(); } else { result.summary = `${detectionPart}${provider} sandbox failed: ${sandboxResult.message}`; result.status = 'error'; result.error = sandboxResult.message; + trace.failStep(sandboxResult.message || 'Sandbox crawl failed'); + trace.setStateAtEnd('sandbox'); + trace.markFailed(sandboxResult.message || 'Sandbox crawl failed'); } } catch (sandboxError: any) { @@ -234,18 +319,48 @@ export async function runDispensaryOrchestrator( result.summary = `Sandbox crawl failed: ${sandboxError.message.slice(0, 100)}`; result.crawlRan = true; result.crawlType = 'sandbox'; + trace.failStep(sandboxError.message); + trace.setStateAtEnd('sandbox'); + trace.markFailed(sandboxError.message); } + // Save trace for non-Dutchie sandbox path + await trace.save(); + } else { // No provider detected - detection only + trace.completeStep({ crawlType: 'none', reason: 'No valid provider' }); + + trace.step( + 'finalize_run', + 'Finalizing - detection only, no crawl', + { provider: dispensary.product_provider, detectionRan: result.detectionRan }, + 'dispensary-orchestrator.ts', + 'No valid provider detected - cannot proceed with crawl', + 'Set detection-only status' + ); + if (result.detectionRan) { result.summary = `Detection complete: provider=${dispensary.product_provider || 'unknown'}, confidence=${dispensary.product_confidence || 0}%`; result.status = 'detection_only'; + trace.completeStep({ + detectionOnly: true, + provider: dispensary.product_provider, + confidence: dispensary.product_confidence, + }); + trace.setStateAtEnd('unknown'); + trace.markSuccess(); } else { result.summary = 'No provider detected and no crawl possible'; result.status = 'error'; result.error = 'Could not determine menu provider'; + trace.failStep('Could not determine menu provider'); + trace.setStateAtEnd('unknown'); + trace.markFailed('Could not determine menu provider'); } + + // Save trace for detection-only path + await trace.save(); } } catch (error: any) { @@ -253,6 +368,20 @@ export async function runDispensaryOrchestrator( result.error = error.message; result.summary = `Orchestrator error: ${error.message.slice(0, 100)}`; + // TRACE: Top-level error handler + trace.step( + 'error_handler', + 'Handling orchestrator error', + { error: error.message }, + 'dispensary-orchestrator.ts:runDispensaryOrchestrator', + 'Top-level error caught in orchestrator', + 'Error propagation to result' + ); + trace.failStep(error.message); + trace.setStateAtEnd('error'); + trace.markFailed(error.message); + await trace.save(); + crawlerLogger.queueFailure({ queue_type: 'dispensary_orchestrator', error_message: error.message, @@ -519,3 +648,468 @@ export async function processDispensaryScheduler(): Promise { dispensarySchedulerRunning = false; } } + +// ======================================== +// Profile-Aware Dutchie Production Crawl +// ======================================== + +/** + * Run Dutchie production crawl with profile support and state machine. + * + * State Machine: + * - 'production': Use per-store crawler if available, otherwise legacy + * - 'sandbox': Run sandbox discovery to learn structure + * - 'needs_manual': Skip crawl, requires manual intervention + * - 'disabled': Skip crawl entirely + * - No profile: Fall back to legacy shared Dutchie logic (backward compatible) + * + * Resolution order for 'production' state: + * 1. If profile exists with profile_key → try to load per-store .ts file + * 2. If per-store file loads → call mod.crawlProducts(dispensary, options) + * 3. If import fails or no profile → fall back to legacy shared Dutchie logic + * + * @param dispensary - The dispensary info + * @param result - The orchestrator result object (mutated in place) + * @param startTime - When the orchestrator started + * @param runId - Unique run identifier + */ +async function runDutchieProductionWithProfile( + dispensary: DispensaryInfo, + result: DispensaryOrchestratorResult, + startTime: number, + runId: string, + trace: OrchestratorTrace +): Promise { + const dispensaryId = dispensary.id; + + // TRACE: Load profile + trace.step( + 'load_profile', + 'Loading crawler profile for dispensary', + { dispensaryId }, + 'dispensary-orchestrator.ts:runDutchieProductionWithProfile', + 'Need to check if dispensary has a dedicated per-store crawler', + 'Database query to dispensary_crawler_profiles' + ); + + // Try to load the active profile for this dispensary + let profile: DispensaryCrawlerProfile | null = null; + let options: CrawlerProfileOptions; + let profileStatus: CrawlerStatus = 'production'; // Default for legacy stores + + try { + profile = await getActiveCrawlerProfileForDispensary(dispensaryId, 'dutchie'); + } catch (profileError: any) { + console.warn(`Failed to load profile for dispensary ${dispensaryId}: ${profileError.message}`); + // Continue with legacy logic + } + + // Get profile status from the profile config or status column + if (profile) { + profileStatus = (profile.config?.status as CrawlerStatus) || 'production'; + console.log(`Dispensary ${dispensaryId}: Profile "${profile.profileName}" has status "${profileStatus}"`); + trace.setProfile(profile.id, profile.profileKey || null); + } + + trace.completeStep({ + profileFound: !!profile, + profileKey: profile?.profileKey || null, + profileStatus, + version: profile?.version || null, + }); + trace.setStateAtStart(profileStatus); + + // TRACE: Check state machine + trace.step( + 'determine_mode', + 'Evaluating state machine for crawl mode', + { profileStatus }, + 'dispensary-orchestrator.ts:runDutchieProductionWithProfile', + 'Determine which mode to run based on profile status', + 'State machine switch on profile status' + ); + + // State machine: Handle different statuses + switch (profileStatus) { + case 'disabled': + trace.skipStep('Crawler disabled for this dispensary'); + trace.setStateAtEnd('disabled'); + result.status = 'sandbox_only'; // Not an error, just skipped + result.summary = 'Crawler disabled for this dispensary'; + result.crawlRan = false; + result.crawlType = 'none'; + await updateScheduleStatus(dispensaryId, 'sandbox_only', 'Crawler disabled', null, runId); + await trace.save(); + return; + + case 'needs_manual': + trace.failStep('Max sandbox retries exceeded - needs manual intervention'); + trace.setStateAtEnd('needs_manual'); + result.status = 'error'; + result.summary = 'Crawler needs manual intervention (max sandbox retries exceeded)'; + result.error = 'Max sandbox discovery attempts exceeded. Manual configuration required.'; + result.crawlRan = false; + result.crawlType = 'none'; + await updateScheduleStatus(dispensaryId, 'error', result.summary, result.error, runId); + trace.markFailed(result.error); + await trace.save(); + return; + + case 'sandbox': + // Run sandbox discovery instead of production crawl + trace.completeStep({ action: 'run_sandbox_discovery' }); + + trace.step( + 'run_sandbox_validation', + 'Running sandbox discovery to learn store structure', + { dispensaryId }, + 'sandbox-discovery.ts:runSandboxDiscovery', + 'Profile is in sandbox mode - need to discover and learn store configuration', + 'Fetch store menu and analyze structure' + ); + + console.log(`Dispensary ${dispensaryId}: Running sandbox discovery...`); + await updateScheduleStatus(dispensaryId, 'running', 'Running sandbox discovery...', null, runId); + + try { + // Build dispensary object for sandbox discovery + const dispensaryForSandbox = { + id: dispensary.id, + name: dispensary.name, + slug: dispensary.name.toLowerCase().replace(/[^a-z0-9]+/g, '-'), + city: dispensary.city, + state: 'AZ', + platform: 'dutchie' as const, + platformDispensaryId: dispensary.platform_dispensary_id || undefined, + menuUrl: dispensary.menu_url || undefined, + menuType: dispensary.menu_type || undefined, + website: dispensary.website || undefined, + createdAt: new Date(), + updatedAt: new Date(), + }; + + const sandboxResult = await runSandboxDiscovery(dispensaryForSandbox); + + result.crawlRan = true; + result.crawlType = 'sandbox'; + + if (sandboxResult.configWritten) { + trace.completeStep({ + success: true, + configWritten: true, + menuType: sandboxResult.finalResult?.menuType, + }); + trace.setStateAtEnd('sandbox'); + result.status = 'sandbox_only'; + result.summary = `Sandbox discovery completed - learned config for ${sandboxResult.finalResult?.menuType || 'unknown'} menu`; + trace.markSuccess(); + } else if (sandboxResult.shouldRetry) { + trace.completeStep({ + success: false, + shouldRetry: true, + nextRetryAt: sandboxResult.nextRetryAt?.toISOString(), + }); + trace.setStateAtEnd('sandbox'); + result.status = 'sandbox_only'; + result.summary = `Sandbox discovery failed, will retry at ${sandboxResult.nextRetryAt?.toISOString()}`; + } else { + const errorMsg = sandboxResult.attempts[sandboxResult.attempts.length - 1]?.errorMessage; + trace.failStep(errorMsg || 'Sandbox discovery failed'); + trace.setStateAtEnd('needs_manual'); + result.status = 'error'; + result.summary = 'Sandbox discovery failed - needs manual intervention'; + result.error = errorMsg; + trace.markFailed(errorMsg || 'Sandbox discovery failed'); + } + } catch (sandboxError: any) { + trace.failStep(sandboxError.message); + trace.setStateAtEnd('sandbox'); + result.status = 'error'; + result.error = sandboxError.message; + result.summary = `Sandbox discovery error: ${sandboxError.message.slice(0, 100)}`; + result.crawlRan = true; + result.crawlType = 'sandbox'; + trace.markFailed(sandboxError.message); + } + await trace.save(); + return; + + case 'production': + default: + // Continue with production crawl logic below + trace.completeStep({ action: 'run_production_crawl' }); + break; + } + + // TRACE: Resolve crawler module + trace.step( + 'resolve_crawler_module', + 'Determining which crawler module to use', + { hasProfile: !!profile, profileKey: profile?.profileKey || null }, + 'dispensary-orchestrator.ts:runDutchieProductionWithProfile', + 'Need to select between per-store crawler or legacy crawler', + 'Check profile configuration and resolve module path' + ); + + // Production mode: set up options + if (profile) { + // Profile found - use its configuration + options = profileToOptions(profile); + console.log(`Dispensary ${dispensaryId}: Using profile "${profile.profileName}" (v${profile.version})`); + await updateScheduleStatus( + dispensaryId, + 'running', + `Running Dutchie crawl with profile "${profile.profileName}"...`, + null, + runId + ); + } else { + // No profile - use defaults (legacy behavior) + options = getDefaultCrawlerOptions(); + console.log(`Dispensary ${dispensaryId}: No profile configured, using legacy Dutchie crawl`); + await updateScheduleStatus(dispensaryId, 'running', 'Running Dutchie production crawl...', null, runId); + } + + try { + let crawlResult: { success: boolean; data?: any; message?: string }; + let usedPerStoreCrawler = false; + + // Build dispensary object for the crawler (used in multiple places) + const dispensaryForCrawler = { + id: dispensary.id, + name: dispensary.name, + slug: dispensary.name.toLowerCase().replace(/[^a-z0-9]+/g, '-'), + city: dispensary.city, + state: 'AZ', + platform: 'dutchie' as const, + platformDispensaryId: dispensary.platform_dispensary_id || undefined, + menuUrl: dispensary.menu_url || undefined, + menuType: dispensary.menu_type || undefined, + website: dispensary.website || undefined, + createdAt: new Date(), + updatedAt: new Date(), + }; + + // If profile has a profile_key, check if it passed sandbox validation + if (profile?.profileKey) { + const modulePath = `../crawlers/dutchie/stores/${profile.profileKey}`; + trace.setCrawlerModule(modulePath); + + // TRACE: Check validation status (OBSERVABILITY ONLY - no state changes) + trace.completeStep({ modulePath, type: 'per-store' }); + + // ================================================================ + // AUTOMATION DISABLED - OBSERVABILITY ONLY + // All auto-promotion/demotion logic has been disabled. + // The orchestrator now ONLY runs crawls and records traces. + // NO profile status changes will occur automatically. + // ================================================================ + + trace.step( + 'load_module', + 'Loading per-store crawler module', + { modulePath, profileStatus }, + 'dispensary-orchestrator.ts', + 'Attempting to load per-store crawler module', + 'Dynamic import of per-store .ts file' + ); + + try { + console.log(`Dispensary ${dispensaryId}: Loading per-store crawler "${profile.profileKey}" (status: ${profileStatus})`); + + const mod = await import(`../crawlers/dutchie/stores/${profile.profileKey}`); + + if (typeof mod.crawlProducts === 'function') { + trace.completeStep({ loaded: true, hasFunction: true }); + + trace.step( + 'run_production_crawl', + 'Executing per-store crawl', + { profileKey: profile.profileKey, pricingType: 'rec', useBothModes: true, profileStatus }, + `crawlers/dutchie/stores/${profile.profileKey}.ts`, + 'Running per-store crawler (no auto-promotion/demotion)', + 'Call mod.crawlProducts with dispensary and options' + ); + + const perStoreResult = await mod.crawlProducts(dispensaryForCrawler, { + pricingType: 'rec', + useBothModes: true, + downloadImages: options.downloadImages, + trackStock: options.trackStock, + timeoutMs: options.timeoutMs, + config: options.config, + }); + + crawlResult = { + success: perStoreResult.success, + data: { productsFound: perStoreResult.productsFound }, + message: perStoreResult.errorMessage, + }; + usedPerStoreCrawler = true; + + if (perStoreResult.success) { + trace.completeStep({ + success: true, + productsFound: perStoreResult.productsFound, + productsUpserted: perStoreResult.productsUpserted, + }); + } else { + trace.failStep(perStoreResult.errorMessage || 'Crawl failed'); + // NOTE: Auto-demotion DISABLED - just log the failure + console.log(`Dispensary ${dispensaryId}: Crawl FAILED (auto-demotion disabled)`); + trace.quickStep( + 'auto_demote_disabled', + 'Auto-demotion is DISABLED - no state change', + { reason: perStoreResult.errorMessage, wouldHaveDemoted: true }, + { demoted: false, note: 'Automation disabled for observability phase' }, + 'dispensary-orchestrator.ts' + ); + } + + } else { + trace.failStep('Module missing crawlProducts function'); + + trace.step( + 'fallback_logic', + 'Falling back to legacy crawler (missing function)', + { reason: 'Per-store module missing crawlProducts function' }, + 'category-crawler-jobs.ts:runCrawlProductsJob', + 'Per-store module is invalid - fallback to shared legacy crawler', + 'Call runCrawlProductsJob' + ); + + console.warn(`Dispensary ${dispensaryId}: Per-store module missing crawlProducts function, falling back to legacy`); + crawlResult = await runCrawlProductsJob(dispensaryId); + trace.completeStep({ fallback: 'legacy' }); + } + + } catch (importError: any) { + // Import failed - fall back to legacy (NO auto-demotion) + trace.failStep(importError.message); + + trace.quickStep( + 'auto_demote_disabled', + 'Auto-demotion is DISABLED - no state change on import failure', + { error: importError.message, wouldHaveDemoted: true }, + { demoted: false, note: 'Automation disabled for observability phase' }, + 'dispensary-orchestrator.ts' + ); + + trace.step( + 'fallback_logic', + 'Executing legacy fallback crawl', + { reason: 'Per-store import failed' }, + 'category-crawler-jobs.ts:runCrawlProductsJob', + 'Per-store module failed to load - use legacy shared crawler', + 'Call runCrawlProductsJob' + ); + + console.warn(`Dispensary ${dispensaryId}: Failed to load per-store crawler "${profile.profileKey}": ${importError.message}`); + console.log(`Dispensary ${dispensaryId}: Falling back to legacy (auto-demotion disabled)`); + crawlResult = await runCrawlProductsJob(dispensaryId); + trace.completeStep({ fallback: 'legacy' }); + } + } else { + // No profile_key - use legacy shared Dutchie logic + trace.completeStep({ modulePath: null, type: 'legacy' }); + + trace.step( + 'legacy_crawl', + 'Running legacy shared Dutchie crawl (no per-store profile)', + { dispensaryId }, + 'category-crawler-jobs.ts:runCrawlProductsJob', + 'No per-store profile configured - use shared legacy crawler', + 'Call runCrawlProductsJob with dispensary ID' + ); + + crawlResult = await runCrawlProductsJob(dispensaryId); + trace.completeStep({ type: 'legacy' }); + } + + result.crawlRan = true; + result.crawlType = 'production'; + + // TRACE: Finalize run + trace.step( + 'finalize_run', + 'Finalizing crawl run and recording results', + { crawlSuccess: crawlResult.success }, + 'dispensary-orchestrator.ts:runDutchieProductionWithProfile', + 'Crawl complete - record final status and metrics', + 'Set final result status and save trace' + ); + + if (crawlResult.success) { + result.productsFound = crawlResult.data?.productsFound || 0; + + const detectionPart = result.detectionRan ? 'Detection + ' : ''; + const profilePart = profile ? ` [profile: ${profile.profileName}]` : ''; + const crawlerPart = usedPerStoreCrawler ? ' (per-store)' : ''; + result.summary = `${detectionPart}Dutchie products crawl completed${profilePart}${crawlerPart}`; + result.status = 'success'; + + trace.completeStep({ + success: true, + productsFound: result.productsFound, + summary: result.summary, + }); + trace.setStateAtEnd('production'); + trace.setProductsFound(result.productsFound || 0); + trace.markSuccess(); + + crawlerLogger.jobCompleted({ + job_id: 0, + store_id: 0, + store_name: dispensary.name, + duration_ms: Date.now() - startTime, + products_found: result.productsFound || 0, + products_new: 0, + products_updated: 0, + provider: 'dutchie', + }); + } else { + result.status = 'error'; + result.error = crawlResult.message; + result.summary = `Dutchie crawl failed: ${(crawlResult.message || 'Unknown error').slice(0, 100)}`; + + trace.failStep(crawlResult.message || 'Unknown error'); + trace.setStateAtEnd('production'); + trace.markFailed(crawlResult.message || 'Unknown error'); + } + + // Save the trace + await trace.save(); + + } catch (crawlError: any) { + // TRACE: Error handler + trace.step( + 'error_handler', + 'Handling unexpected error during crawl', + { error: crawlError.message }, + 'dispensary-orchestrator.ts:runDutchieProductionWithProfile', + 'An unexpected error occurred during crawl execution', + 'Catch block error handling' + ); + trace.failStep(crawlError.message); + trace.setStateAtEnd('production'); + trace.markFailed(crawlError.message); + + result.status = 'error'; + result.error = crawlError.message; + result.summary = `Dutchie crawl failed: ${crawlError.message.slice(0, 100)}`; + result.crawlRan = true; + result.crawlType = 'production'; + + crawlerLogger.jobFailed({ + job_id: 0, + store_id: 0, + store_name: dispensary.name, + duration_ms: Date.now() - startTime, + error_message: crawlError.message, + provider: 'dutchie', + }); + + // Save the trace even on error + await trace.save(); + } +} diff --git a/backend/src/services/geolocation.ts b/backend/src/services/geolocation.ts index eee272e7..6ca3046c 100644 --- a/backend/src/services/geolocation.ts +++ b/backend/src/services/geolocation.ts @@ -1,5 +1,5 @@ import axios from 'axios'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; interface GeoLocation { city: string; diff --git a/backend/src/services/intelligence-detector.ts b/backend/src/services/intelligence-detector.ts index a8b8b578..e863e8ba 100644 --- a/backend/src/services/intelligence-detector.ts +++ b/backend/src/services/intelligence-detector.ts @@ -8,7 +8,7 @@ * - Metadata: Which provider serves taxonomy/category data */ -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { logger } from './logger'; import puppeteer, { Browser, Page } from 'puppeteer'; diff --git a/backend/src/services/orchestrator-trace.ts b/backend/src/services/orchestrator-trace.ts new file mode 100644 index 00000000..8f11c2da --- /dev/null +++ b/backend/src/services/orchestrator-trace.ts @@ -0,0 +1,487 @@ +/** + * Orchestrator Trace Service + * + * Captures detailed step-by-step traces for every crawl orchestration run. + * Each step records WHAT, WHY, WHERE, HOW, and WHEN. + * + * Usage: + * const trace = new OrchestratorTrace(dispensaryId, runId); + * trace.step('load_profile', 'Loading crawler profile', { dispensaryId }, 'profiles'); + * // ... do work ... + * trace.completeStep({ profileFound: true, profileKey: 'trulieve-scottsdale' }); + * // ... more steps ... + * await trace.save(); + */ + +import { pool } from '../db/pool'; + +// ============================================================ +// TYPES +// ============================================================ + +export interface TraceStep { + step: number; + action: string; + description: string; + timestamp: number; + duration_ms?: number; + input: Record; + output: Record | null; + what: string; + why: string; + where: string; + how: string; + when: string; + status: 'running' | 'completed' | 'failed' | 'skipped'; + error?: string; +} + +export interface TraceSummary { + id: number; + dispensaryId: number; + runId: string; + profileId: number | null; + profileKey: string | null; + crawlerModule: string | null; + stateAtStart: string; + stateAtEnd: string; + totalSteps: number; + durationMs: number; + success: boolean; + errorMessage: string | null; + productsFound: number; + startedAt: Date; + completedAt: Date | null; + trace: TraceStep[]; +} + +// Standard step actions +export type StepAction = + | 'init_orchestrator' + | 'load_dispensary' + | 'load_profile' + | 'determine_mode' + | 'check_validation_status' + | 'resolve_crawler_module' + | 'load_module' + | 'run_sandbox_validation' + | 'run_production_crawl' + | 'fetch_graphql' + | 'fetch_html' + | 'selector_evaluation' + | 'extract_products' + | 'extract_prices' + | 'extract_images' + | 'extract_stock' + | 'upsert_products' + | 'create_snapshots' + | 'completeness_check' + | 'validation_check' + | 'sandbox_retry_logic' + | 'promotion_check' + | 'auto_promote' + | 'auto_demote' + | 'fallback_logic' + | 'legacy_crawl' + | 'finalize_run' + | 'error_handler'; + +// ============================================================ +// ORCHESTRATOR TRACE CLASS +// ============================================================ + +export class OrchestratorTrace { + private dispensaryId: number; + private runId: string; + private profileId: number | null = null; + private profileKey: string | null = null; + private crawlerModule: string | null = null; + private stateAtStart: string = 'unknown'; + private stateAtEnd: string = 'unknown'; + private steps: TraceStep[] = []; + private stepCounter: number = 0; + private startTime: number; + private currentStep: TraceStep | null = null; + private success: boolean = false; + private errorMessage: string | null = null; + private productsFound: number = 0; + + constructor(dispensaryId: number, runId: string) { + this.dispensaryId = dispensaryId; + this.runId = runId; + this.startTime = Date.now(); + } + + /** + * Set profile information + */ + setProfile(profileId: number | null, profileKey: string | null): void { + this.profileId = profileId; + this.profileKey = profileKey; + } + + /** + * Set crawler module path + */ + setCrawlerModule(modulePath: string): void { + this.crawlerModule = modulePath; + } + + /** + * Set initial state + */ + setStateAtStart(state: string): void { + this.stateAtStart = state; + } + + /** + * Set final state + */ + setStateAtEnd(state: string): void { + this.stateAtEnd = state; + } + + /** + * Set products found count + */ + setProductsFound(count: number): void { + this.productsFound = count; + } + + /** + * Start a new trace step + * + * @param action - The step action type + * @param description - Human-readable description of what is happening + * @param input - Input data for this step + * @param where - Code location (module/function name) + * @param why - Reason this step is being taken + * @param how - Method or approach being used + */ + step( + action: StepAction | string, + description: string, + input: Record = {}, + where: string = 'orchestrator', + why: string = '', + how: string = '' + ): void { + // Complete previous step if still running + if (this.currentStep && this.currentStep.status === 'running') { + this.currentStep.status = 'completed'; + this.currentStep.duration_ms = Date.now() - this.currentStep.timestamp; + } + + this.stepCounter++; + const now = Date.now(); + + this.currentStep = { + step: this.stepCounter, + action, + description, + timestamp: now, + input, + output: null, + what: description, + why: why || `Step ${this.stepCounter} of orchestration`, + where, + how: how || action, + when: new Date(now).toISOString(), + status: 'running', + }; + + this.steps.push(this.currentStep); + } + + /** + * Complete the current step with output data + */ + completeStep(output: Record = {}, status: 'completed' | 'failed' | 'skipped' = 'completed'): void { + if (this.currentStep) { + this.currentStep.output = output; + this.currentStep.status = status; + this.currentStep.duration_ms = Date.now() - this.currentStep.timestamp; + } + } + + /** + * Mark current step as failed with error + */ + failStep(error: string, output: Record = {}): void { + if (this.currentStep) { + this.currentStep.output = output; + this.currentStep.status = 'failed'; + this.currentStep.error = error; + this.currentStep.duration_ms = Date.now() - this.currentStep.timestamp; + } + } + + /** + * Skip current step with reason + */ + skipStep(reason: string): void { + if (this.currentStep) { + this.currentStep.output = { skipped: true, reason }; + this.currentStep.status = 'skipped'; + this.currentStep.duration_ms = Date.now() - this.currentStep.timestamp; + } + } + + /** + * Add a quick step that completes immediately + */ + quickStep( + action: StepAction | string, + description: string, + input: Record, + output: Record, + where: string = 'orchestrator' + ): void { + this.step(action, description, input, where); + this.completeStep(output); + } + + /** + * Mark the overall trace as successful + */ + markSuccess(): void { + this.success = true; + } + + /** + * Mark the overall trace as failed + */ + markFailed(error: string): void { + this.success = false; + this.errorMessage = error; + } + + /** + * Get the trace data without saving + */ + getData(): { + dispensaryId: number; + runId: string; + profileId: number | null; + profileKey: string | null; + crawlerModule: string | null; + stateAtStart: string; + stateAtEnd: string; + steps: TraceStep[]; + totalSteps: number; + durationMs: number; + success: boolean; + errorMessage: string | null; + productsFound: number; + } { + return { + dispensaryId: this.dispensaryId, + runId: this.runId, + profileId: this.profileId, + profileKey: this.profileKey, + crawlerModule: this.crawlerModule, + stateAtStart: this.stateAtStart, + stateAtEnd: this.stateAtEnd, + steps: this.steps, + totalSteps: this.steps.length, + durationMs: Date.now() - this.startTime, + success: this.success, + errorMessage: this.errorMessage, + productsFound: this.productsFound, + }; + } + + /** + * Save the trace to database + */ + async save(): Promise { + // Complete any running step + if (this.currentStep && this.currentStep.status === 'running') { + this.currentStep.status = 'completed'; + this.currentStep.duration_ms = Date.now() - this.currentStep.timestamp; + } + + const durationMs = Date.now() - this.startTime; + + try { + const result = await pool.query( + `INSERT INTO crawl_orchestration_traces ( + dispensary_id, run_id, profile_id, profile_key, crawler_module, + state_at_start, state_at_end, trace, total_steps, duration_ms, + success, error_message, products_found, started_at, completed_at + ) VALUES ( + $1, $2, $3, $4, $5, + $6, $7, $8, $9, $10, + $11, $12, $13, $14, NOW() + ) RETURNING id`, + [ + this.dispensaryId, + this.runId, + this.profileId, + this.profileKey, + this.crawlerModule, + this.stateAtStart, + this.stateAtEnd, + JSON.stringify(this.steps), + this.steps.length, + durationMs, + this.success, + this.errorMessage, + this.productsFound, + new Date(this.startTime), + ] + ); + + console.log(`[OrchestratorTrace] Saved trace ${result.rows[0].id} for dispensary ${this.dispensaryId}`); + return result.rows[0].id; + } catch (error: any) { + console.error(`[OrchestratorTrace] Failed to save trace:`, error.message); + throw error; + } + } +} + +// ============================================================ +// TRACE RETRIEVAL FUNCTIONS +// ============================================================ + +/** + * Get the latest trace for a dispensary + */ +export async function getLatestTrace(dispensaryId: number): Promise { + try { + const result = await pool.query( + `SELECT + id, dispensary_id, run_id, profile_id, profile_key, crawler_module, + state_at_start, state_at_end, trace, total_steps, duration_ms, + success, error_message, products_found, started_at, completed_at, created_at + FROM crawl_orchestration_traces + WHERE dispensary_id = $1 + ORDER BY created_at DESC + LIMIT 1`, + [dispensaryId] + ); + + if (result.rows.length === 0) { + return null; + } + + return mapRowToSummary(result.rows[0]); + } catch (error: any) { + console.error(`[OrchestratorTrace] Failed to get latest trace:`, error.message); + return null; + } +} + +/** + * Get a specific trace by ID + */ +export async function getTraceById(traceId: number): Promise { + try { + const result = await pool.query( + `SELECT + id, dispensary_id, run_id, profile_id, profile_key, crawler_module, + state_at_start, state_at_end, trace, total_steps, duration_ms, + success, error_message, products_found, started_at, completed_at, created_at + FROM crawl_orchestration_traces + WHERE id = $1`, + [traceId] + ); + + if (result.rows.length === 0) { + return null; + } + + return mapRowToSummary(result.rows[0]); + } catch (error: any) { + console.error(`[OrchestratorTrace] Failed to get trace by ID:`, error.message); + return null; + } +} + +/** + * Get all traces for a dispensary (paginated) + */ +export async function getTracesForDispensary( + dispensaryId: number, + limit: number = 20, + offset: number = 0 +): Promise<{ traces: TraceSummary[]; total: number }> { + try { + const [tracesResult, countResult] = await Promise.all([ + pool.query( + `SELECT + id, dispensary_id, run_id, profile_id, profile_key, crawler_module, + state_at_start, state_at_end, trace, total_steps, duration_ms, + success, error_message, products_found, started_at, completed_at, created_at + FROM crawl_orchestration_traces + WHERE dispensary_id = $1 + ORDER BY created_at DESC + LIMIT $2 OFFSET $3`, + [dispensaryId, limit, offset] + ), + pool.query( + `SELECT COUNT(*) as total FROM crawl_orchestration_traces WHERE dispensary_id = $1`, + [dispensaryId] + ), + ]); + + return { + traces: tracesResult.rows.map(mapRowToSummary), + total: parseInt(countResult.rows[0]?.total || '0', 10), + }; + } catch (error: any) { + console.error(`[OrchestratorTrace] Failed to get traces:`, error.message); + return { traces: [], total: 0 }; + } +} + +/** + * Get trace by run ID + */ +export async function getTraceByRunId(runId: string): Promise { + try { + const result = await pool.query( + `SELECT + id, dispensary_id, run_id, profile_id, profile_key, crawler_module, + state_at_start, state_at_end, trace, total_steps, duration_ms, + success, error_message, products_found, started_at, completed_at, created_at + FROM crawl_orchestration_traces + WHERE run_id = $1`, + [runId] + ); + + if (result.rows.length === 0) { + return null; + } + + return mapRowToSummary(result.rows[0]); + } catch (error: any) { + console.error(`[OrchestratorTrace] Failed to get trace by run ID:`, error.message); + return null; + } +} + +/** + * Map database row to TraceSummary + */ +function mapRowToSummary(row: any): TraceSummary { + return { + id: row.id, + dispensaryId: row.dispensary_id, + runId: row.run_id, + profileId: row.profile_id, + profileKey: row.profile_key, + crawlerModule: row.crawler_module, + stateAtStart: row.state_at_start, + stateAtEnd: row.state_at_end, + totalSteps: row.total_steps, + durationMs: row.duration_ms, + success: row.success, + errorMessage: row.error_message, + productsFound: row.products_found, + startedAt: row.started_at, + completedAt: row.completed_at, + trace: row.trace || [], + }; +} diff --git a/backend/src/services/proxy.ts b/backend/src/services/proxy.ts index bc3aeca7..15cb6b34 100755 --- a/backend/src/services/proxy.ts +++ b/backend/src/services/proxy.ts @@ -1,7 +1,7 @@ import axios from 'axios'; import { SocksProxyAgent } from 'socks-proxy-agent'; import { HttpsProxyAgent } from 'https-proxy-agent'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; interface ProxyTestResult { success: boolean; diff --git a/backend/src/services/proxyTestQueue.ts b/backend/src/services/proxyTestQueue.ts index 7b69ce7f..42b24128 100644 --- a/backend/src/services/proxyTestQueue.ts +++ b/backend/src/services/proxyTestQueue.ts @@ -1,4 +1,4 @@ -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { testProxy, saveProxyTestResult } from './proxy'; interface ProxyTestJob { diff --git a/backend/src/services/sandbox-discovery.ts b/backend/src/services/sandbox-discovery.ts new file mode 100644 index 00000000..e6be274e --- /dev/null +++ b/backend/src/services/sandbox-discovery.ts @@ -0,0 +1,660 @@ +/** + * Sandbox Discovery Service + * + * Handles structure detection and sandbox mode crawling for dispensaries + * that don't have a per-store crawler file yet. + * + * Features: + * - Detects menu structure for new stores + * - Uses bounded retries (max 3 attempts with exponential backoff) + * - Writes learned configuration to profiles + * - Generates per-store crawler files when structure is validated + */ + +import { pool } from '../db/pool'; +import { Dispensary } from '../dutchie-az/types'; +import { + StructureDetectionResult, + CrawlResult, + detectStructure as detectDutchieStructure, +} from '../crawlers/base/base-dutchie'; +import { + detectStructure as detectTreezStructure, +} from '../crawlers/base/base-treez'; +import { + detectStructure as detectJaneStructure, +} from '../crawlers/base/base-jane'; +import { + validateSandboxResult, + getPreviousProductCount, + calculateFieldCompleteness, + recordValidationResult, + promoteAfterValidation, + SandboxRunResult, + ValidationResult, +} from './sandbox-validator'; + +// ============================================================ +// TYPES +// ============================================================ + +export type CrawlerStatus = 'production' | 'sandbox' | 'needs_manual' | 'disabled'; + +export interface SandboxRetryConfig { + maxAttempts: number; + backoffMs: number[]; // [30min, 2h, 6h] in milliseconds +} + +export interface SandboxAttempt { + attemptNumber: number; + timestamp: Date; + success: boolean; + errorMessage?: string; + detectionResult?: StructureDetectionResult; +} + +export interface SandboxResult { + dispensaryId: number; + status: CrawlerStatus; + attempts: SandboxAttempt[]; + finalResult?: StructureDetectionResult; + shouldRetry: boolean; + nextRetryAt?: Date; + configWritten: boolean; +} + +export interface ProfileUpdateData { + status: CrawlerStatus; + config: Record; + sandboxAttempts: SandboxAttempt[]; + lastSandboxAt: Date; + nextRetryAt?: Date; +} + +// ============================================================ +// CONSTANTS +// ============================================================ + +export const DEFAULT_RETRY_CONFIG: SandboxRetryConfig = { + maxAttempts: 3, + backoffMs: [ + 30 * 60 * 1000, // 30 minutes + 2 * 60 * 60 * 1000, // 2 hours + 6 * 60 * 60 * 1000, // 6 hours + ], +}; + +// ============================================================ +// SANDBOX DISCOVERY SERVICE +// ============================================================ + +/** + * Run sandbox discovery for a dispensary + * Detects menu structure and determines crawler configuration + */ +export async function runSandboxDiscovery( + dispensary: Dispensary, + page?: any, // Optional Puppeteer page for DOM-based detection + retryConfig: SandboxRetryConfig = DEFAULT_RETRY_CONFIG +): Promise { + const result: SandboxResult = { + dispensaryId: dispensary.id || 0, + status: 'sandbox', + attempts: [], + shouldRetry: false, + configWritten: false, + }; + + // Load existing sandbox attempts from profile + const existingAttempts = await loadSandboxAttempts(dispensary.id || 0); + const attemptNumber = existingAttempts.length + 1; + + // Check if we've exceeded max attempts + if (attemptNumber > retryConfig.maxAttempts) { + console.log(`[SandboxDiscovery] Dispensary ${dispensary.id} has exceeded max attempts (${retryConfig.maxAttempts})`); + result.status = 'needs_manual'; + result.attempts = existingAttempts; + result.shouldRetry = false; + await updateProfileStatus(dispensary.id || 0, 'needs_manual', existingAttempts); + return result; + } + + console.log(`[SandboxDiscovery] Running discovery for ${dispensary.name} (attempt ${attemptNumber}/${retryConfig.maxAttempts})`); + + // Create attempt record + const attempt: SandboxAttempt = { + attemptNumber, + timestamp: new Date(), + success: false, + }; + + try { + // Detect structure based on menu type + const detectionResult = await detectMenuStructure(dispensary, page); + attempt.detectionResult = detectionResult; + + if (detectionResult.success) { + attempt.success = true; + result.status = 'sandbox'; // Still sandbox until validated + result.finalResult = detectionResult; + + console.log(`[SandboxDiscovery] Successfully detected ${detectionResult.menuType} structure for ${dispensary.name}`); + + // Write learned config to profile + await writeLearnedConfig(dispensary.id || 0, detectionResult); + result.configWritten = true; + + // After 1 successful detection, we can try production mode + // But we keep it in sandbox until manually promoted + } else { + attempt.success = false; + attempt.errorMessage = detectionResult.errors.join('; ') || 'Detection failed'; + + // Calculate next retry time + if (attemptNumber < retryConfig.maxAttempts) { + const backoffIndex = Math.min(attemptNumber - 1, retryConfig.backoffMs.length - 1); + const backoffMs = retryConfig.backoffMs[backoffIndex]; + result.nextRetryAt = new Date(Date.now() + backoffMs); + result.shouldRetry = true; + console.log(`[SandboxDiscovery] Detection failed for ${dispensary.name}, will retry at ${result.nextRetryAt.toISOString()}`); + } else { + result.status = 'needs_manual'; + result.shouldRetry = false; + console.log(`[SandboxDiscovery] Detection failed for ${dispensary.name}, max attempts reached - needs manual intervention`); + } + } + } catch (error: any) { + attempt.success = false; + attempt.errorMessage = error.message; + + // Calculate next retry time + if (attemptNumber < retryConfig.maxAttempts) { + const backoffIndex = Math.min(attemptNumber - 1, retryConfig.backoffMs.length - 1); + const backoffMs = retryConfig.backoffMs[backoffIndex]; + result.nextRetryAt = new Date(Date.now() + backoffMs); + result.shouldRetry = true; + } else { + result.status = 'needs_manual'; + result.shouldRetry = false; + } + + console.error(`[SandboxDiscovery] Error during discovery for ${dispensary.name}:`, error.message); + } + + // Save attempt and update profile + result.attempts = [...existingAttempts, attempt]; + await saveSandboxAttempt(dispensary.id || 0, attempt); + await updateProfileStatus(dispensary.id || 0, result.status, result.attempts, result.nextRetryAt); + + return result; +} + +/** + * Detect menu structure based on menu type + */ +async function detectMenuStructure( + dispensary: Dispensary, + page?: any +): Promise { + const menuType = dispensary.menuType || 'unknown'; + + switch (menuType) { + case 'dutchie': + return detectDutchieStructure(page, dispensary); + + case 'treez': + return detectTreezStructure(page, dispensary); + + case 'jane': + return detectJaneStructure(page, dispensary); + + default: + // Try all detectors + const dutchieResult = await detectDutchieStructure(page, dispensary); + if (dutchieResult.success) return dutchieResult; + + const treezResult = await detectTreezStructure(page, dispensary); + if (treezResult.success) return treezResult; + + const janeResult = await detectJaneStructure(page, dispensary); + if (janeResult.success) return janeResult; + + return { + success: false, + menuType: 'unknown', + selectors: {}, + pagination: { type: 'none' }, + errors: ['Could not detect menu type from any known provider'], + metadata: {}, + }; + } +} + +/** + * Load existing sandbox attempts from profile + */ +async function loadSandboxAttempts(dispensaryId: number): Promise { + try { + const result = await pool.query( + `SELECT config->'sandboxAttempts' as attempts + FROM dispensary_crawler_profiles + WHERE dispensary_id = $1 AND enabled = true + ORDER BY updated_at DESC + LIMIT 1`, + [dispensaryId] + ); + + if (result.rows.length > 0 && result.rows[0].attempts) { + return result.rows[0].attempts; + } + return []; + } catch (error: any) { + console.error(`[SandboxDiscovery] Error loading sandbox attempts:`, error.message); + return []; + } +} + +/** + * Save a sandbox attempt to the profile + */ +async function saveSandboxAttempt(dispensaryId: number, attempt: SandboxAttempt): Promise { + try { + // Get existing profile or create one + const existingResult = await pool.query( + `SELECT id, config FROM dispensary_crawler_profiles + WHERE dispensary_id = $1 AND enabled = true + ORDER BY updated_at DESC + LIMIT 1`, + [dispensaryId] + ); + + if (existingResult.rows.length > 0) { + // Update existing profile + const profile = existingResult.rows[0]; + const config = profile.config || {}; + const attempts = config.sandboxAttempts || []; + attempts.push(attempt); + + await pool.query( + `UPDATE dispensary_crawler_profiles + SET config = config || $1::jsonb, + updated_at = NOW() + WHERE id = $2`, + [JSON.stringify({ sandboxAttempts: attempts }), profile.id] + ); + } else { + // Create new profile in sandbox mode + await pool.query( + `INSERT INTO dispensary_crawler_profiles + (dispensary_id, profile_name, crawler_type, config, enabled) + VALUES ($1, $2, $3, $4, true)`, + [ + dispensaryId, + `sandbox-${dispensaryId}`, + 'dutchie', // Default, will be updated when structure is detected + JSON.stringify({ + status: 'sandbox', + sandboxAttempts: [attempt], + }), + ] + ); + } + } catch (error: any) { + console.error(`[SandboxDiscovery] Error saving sandbox attempt:`, error.message); + } +} + +/** + * Update profile status + */ +async function updateProfileStatus( + dispensaryId: number, + status: CrawlerStatus, + attempts: SandboxAttempt[], + nextRetryAt?: Date +): Promise { + try { + const updateData: any = { + status, + sandboxAttempts: attempts, + lastSandboxAt: new Date().toISOString(), + }; + + if (nextRetryAt) { + updateData.nextRetryAt = nextRetryAt.toISOString(); + } + + await pool.query( + `UPDATE dispensary_crawler_profiles + SET config = config || $1::jsonb, + updated_at = NOW() + WHERE dispensary_id = $2 AND enabled = true`, + [JSON.stringify(updateData), dispensaryId] + ); + } catch (error: any) { + console.error(`[SandboxDiscovery] Error updating profile status:`, error.message); + } +} + +/** + * Write learned configuration to profile + */ +async function writeLearnedConfig( + dispensaryId: number, + detectionResult: StructureDetectionResult +): Promise { + try { + const learnedConfig = { + menuType: detectionResult.menuType, + selectors: detectionResult.selectors, + pagination: detectionResult.pagination, + iframeUrl: detectionResult.iframeUrl, + graphqlEndpoint: detectionResult.graphqlEndpoint, + dispensaryIdFromDetection: detectionResult.dispensaryId, + detectedAt: new Date().toISOString(), + }; + + await pool.query( + `UPDATE dispensary_crawler_profiles + SET crawler_type = $1, + config = config || $2::jsonb, + updated_at = NOW() + WHERE dispensary_id = $3 AND enabled = true`, + [ + detectionResult.menuType, + JSON.stringify({ learnedConfig }), + dispensaryId, + ] + ); + + console.log(`[SandboxDiscovery] Wrote learned config for dispensary ${dispensaryId}`); + } catch (error: any) { + console.error(`[SandboxDiscovery] Error writing learned config:`, error.message); + } +} + +/** + * Get dispensaries that need sandbox retry + * Returns dispensaries whose nextRetryAt has passed + */ +export async function getDispensariesNeedingRetry(): Promise { + try { + const result = await pool.query( + `SELECT DISTINCT dispensary_id + FROM dispensary_crawler_profiles + WHERE enabled = true + AND (config->>'status')::text = 'sandbox' + AND (config->>'nextRetryAt')::timestamptz <= NOW() + ORDER BY dispensary_id` + ); + + return result.rows.map((row: { dispensary_id: number }) => row.dispensary_id); + } catch (error: any) { + console.error(`[SandboxDiscovery] Error getting dispensaries needing retry:`, error.message); + return []; + } +} + +/** + * Get dispensaries in sandbox mode + */ +export async function getSandboxDispensaries(): Promise { + try { + const result = await pool.query( + `SELECT DISTINCT dispensary_id + FROM dispensary_crawler_profiles + WHERE enabled = true + AND (config->>'status')::text = 'sandbox' + ORDER BY dispensary_id` + ); + + return result.rows.map((row: { dispensary_id: number }) => row.dispensary_id); + } catch (error: any) { + console.error(`[SandboxDiscovery] Error getting sandbox dispensaries:`, error.message); + return []; + } +} + +/** + * Get dispensaries needing manual intervention + */ +export async function getDispensariesNeedingManual(): Promise { + try { + const result = await pool.query( + `SELECT DISTINCT dispensary_id + FROM dispensary_crawler_profiles + WHERE enabled = true + AND (config->>'status')::text = 'needs_manual' + ORDER BY dispensary_id` + ); + + return result.rows.map((row: { dispensary_id: number }) => row.dispensary_id); + } catch (error: any) { + console.error(`[SandboxDiscovery] Error getting dispensaries needing manual:`, error.message); + return []; + } +} + +/** + * Promote a dispensary from sandbox to production + * This should only be called after manual validation + */ +export async function promoteToProduction(dispensaryId: number): Promise { + try { + await pool.query( + `UPDATE dispensary_crawler_profiles + SET config = config || '{"status": "production"}'::jsonb, + updated_at = NOW() + WHERE dispensary_id = $1 AND enabled = true`, + [dispensaryId] + ); + + console.log(`[SandboxDiscovery] Promoted dispensary ${dispensaryId} to production`); + return true; + } catch (error: any) { + console.error(`[SandboxDiscovery] Error promoting to production:`, error.message); + return false; + } +} + +/** + * Reset sandbox attempts for a dispensary + * Allows retrying discovery from scratch + */ +export async function resetSandboxAttempts(dispensaryId: number): Promise { + try { + await pool.query( + `UPDATE dispensary_crawler_profiles + SET config = config - 'sandboxAttempts' - 'nextRetryAt' || '{"status": "sandbox"}'::jsonb, + updated_at = NOW() + WHERE dispensary_id = $1 AND enabled = true`, + [dispensaryId] + ); + + console.log(`[SandboxDiscovery] Reset sandbox attempts for dispensary ${dispensaryId}`); + return true; + } catch (error: any) { + console.error(`[SandboxDiscovery] Error resetting sandbox attempts:`, error.message); + return false; + } +} + +/** + * Get sandbox status for a dispensary + */ +export async function getSandboxStatus(dispensaryId: number): Promise<{ + status: CrawlerStatus; + attemptCount: number; + nextRetryAt?: Date; + learnedConfig?: any; +} | null> { + try { + const result = await pool.query( + `SELECT config FROM dispensary_crawler_profiles + WHERE dispensary_id = $1 AND enabled = true + ORDER BY updated_at DESC + LIMIT 1`, + [dispensaryId] + ); + + if (result.rows.length === 0) { + return null; + } + + const config = result.rows[0].config || {}; + return { + status: config.status || 'sandbox', + attemptCount: (config.sandboxAttempts || []).length, + nextRetryAt: config.nextRetryAt ? new Date(config.nextRetryAt) : undefined, + learnedConfig: config.learnedConfig, + }; + } catch (error: any) { + console.error(`[SandboxDiscovery] Error getting sandbox status:`, error.message); + return null; + } +} + +// ============================================================ +// SANDBOX CRAWL EXECUTION WITH VALIDATION +// ============================================================ + +/** + * Result of a sandbox crawl with validation + */ +export interface SandboxCrawlResult { + success: boolean; + crawlResult?: CrawlResult; + validationResult?: ValidationResult; + promoted: boolean; + error?: string; +} + +/** + * Run a per-store crawler in sandbox mode and validate the result. + * + * AUTOMATION DISABLED - OBSERVABILITY ONLY + * This function will run the crawl and return validation results, + * but will NOT automatically promote or change any profile status. + */ +export async function runSandboxCrawlWithValidation( + dispensary: Dispensary, + profileKey: string +): Promise { + const startTime = Date.now(); + const dispensaryId = dispensary.id || 0; + + console.log(`[SandboxCrawl] Starting sandbox crawl for dispensary ${dispensaryId} with profile "${profileKey}" (OBSERVABILITY ONLY - no state changes)`); + + try { + // Get previous product count for delta check + const previousProductCount = await getPreviousProductCount(dispensaryId); + + // Attempt to load and run the per-store crawler + let crawlResult: CrawlResult; + + try { + const mod = await import(`../crawlers/dutchie/stores/${profileKey}`); + + if (typeof mod.crawlProducts !== 'function') { + return { + success: false, + promoted: false, + error: `Per-store module "${profileKey}" missing crawlProducts function`, + }; + } + + console.log(`[SandboxCrawl] Executing per-store crawler "${profileKey}"...`); + + crawlResult = await mod.crawlProducts(dispensary, { + pricingType: 'rec', + useBothModes: true, + downloadImages: true, + trackStock: true, + }); + + } catch (importError: any) { + return { + success: false, + promoted: false, + error: `Failed to load per-store crawler "${profileKey}": ${importError.message}`, + }; + } + + const duration = Date.now() - startTime; + + // Calculate field completeness + const fieldCompleteness = await calculateFieldCompleteness(dispensaryId, crawlResult); + + // Build run result for validation + const runResult: SandboxRunResult = { + dispensaryId, + profileKey, + crawlResult, + previousProductCount, + fieldCompleteness, + errors: crawlResult.errorMessage ? [crawlResult.errorMessage] : [], + duration, + }; + + // Validate the result (observability only - no state changes) + const validationResult = validateSandboxResult(runResult); + + console.log(`[SandboxCrawl] Validation result: ${validationResult.summary}`); + + // ================================================================ + // AUTOMATION DISABLED - OBSERVABILITY ONLY + // The following auto-promotion/status-change logic has been disabled. + // Validation result is returned but NO state changes are made. + // ================================================================ + + // Record validation result (this is just logging, no status change) + await recordValidationResult(dispensaryId, validationResult, runResult); + + // AUTO-PROMOTION DISABLED + // if (validationResult.canPromote) { + // promoted = await promoteAfterValidation(dispensaryId, validationResult); + // } + + // STATUS CHANGE DISABLED + // if (validationResult.recommendedAction === 'needs_manual') { + // await updateProfileStatus(dispensaryId, 'needs_manual', [], undefined); + // } + + console.log(`[SandboxCrawl] Validation complete (auto-promotion DISABLED): canPromote=${validationResult.canPromote}`); + + return { + success: crawlResult.success, + crawlResult, + validationResult, + promoted: false, // Always false - auto-promotion disabled + }; + + } catch (error: any) { + console.error(`[SandboxCrawl] Error during sandbox crawl:`, error.message); + return { + success: false, + promoted: false, + error: error.message, + }; + } +} + +/** + * Get the profile key for a dispensary + */ +export async function getProfileKey(dispensaryId: number): Promise { + try { + const result = await pool.query( + `SELECT profile_key FROM dispensary_crawler_profiles + WHERE dispensary_id = $1 AND enabled = true + ORDER BY updated_at DESC + LIMIT 1`, + [dispensaryId] + ); + + return result.rows[0]?.profile_key || null; + } catch (error: any) { + console.error(`[SandboxDiscovery] Error getting profile key:`, error.message); + return null; + } +} diff --git a/backend/src/services/sandbox-validator.ts b/backend/src/services/sandbox-validator.ts new file mode 100644 index 00000000..00508cc1 --- /dev/null +++ b/backend/src/services/sandbox-validator.ts @@ -0,0 +1,393 @@ +/** + * Sandbox Validator Service + * + * Validates per-store crawler results before promoting to production. + * + * Rules: + * 1. Sandbox crawl runs using the per-store crawler module + * 2. After sandbox completes, perform data completeness check + * 3. If sandbox fails or data incomplete → stay in sandbox + * 4. If sandbox succeeds AND data complete → promote to production + * 5. Production NEVER runs unvalidated per-store crawlers + * 6. If production fails → auto-demote to sandbox for re-learning + */ + +import { pool } from '../db/pool'; +import { CrawlResult } from '../crawlers/base/base-dutchie'; + +// ============================================================ +// TYPES +// ============================================================ + +export interface ValidationResult { + isValid: boolean; + canPromote: boolean; + checks: ValidationCheck[]; + summary: string; + recommendedAction: 'promote' | 'retry' | 'needs_manual'; +} + +export interface ValidationCheck { + name: string; + passed: boolean; + expected: string; + actual: string; + severity: 'critical' | 'warning' | 'info'; +} + +export interface ValidationConfig { + minProducts: number; + maxProductDeltaPercent: number; // How much deviation from previous run is acceptable + requiredFields: string[]; + maxEmptyFieldPercent: number; // Max % of products with empty required fields + maxErrorCount: number; +} + +export interface SandboxRunResult { + dispensaryId: number; + profileKey: string; + crawlResult: CrawlResult; + previousProductCount?: number; + fieldCompleteness: Record; // field -> % populated + errors: string[]; + duration: number; +} + +// ============================================================ +// CONSTANTS +// ============================================================ + +export const DEFAULT_VALIDATION_CONFIG: ValidationConfig = { + minProducts: 1, + maxProductDeltaPercent: 50, // Allow up to 50% change from previous run + requiredFields: ['name', 'price'], + maxEmptyFieldPercent: 20, // Max 20% of products can have empty required fields + maxErrorCount: 0, // No critical errors allowed +}; + +// ============================================================ +// VALIDATION FUNCTIONS +// ============================================================ + +/** + * Validate a sandbox crawl result + * Returns whether the result is valid and can be promoted to production + */ +export function validateSandboxResult( + runResult: SandboxRunResult, + config: ValidationConfig = DEFAULT_VALIDATION_CONFIG +): ValidationResult { + const checks: ValidationCheck[] = []; + let canPromote = true; + + // Check 1: Products found > 0 + const productCheck: ValidationCheck = { + name: 'Products Found', + passed: runResult.crawlResult.productsFound >= config.minProducts, + expected: `>= ${config.minProducts}`, + actual: `${runResult.crawlResult.productsFound}`, + severity: 'critical', + }; + checks.push(productCheck); + if (!productCheck.passed) canPromote = false; + + // Check 2: Crawl success + const successCheck: ValidationCheck = { + name: 'Crawl Success', + passed: runResult.crawlResult.success, + expected: 'true', + actual: `${runResult.crawlResult.success}`, + severity: 'critical', + }; + checks.push(successCheck); + if (!successCheck.passed) canPromote = false; + + // Check 3: Product count delta (if previous run exists) + if (runResult.previousProductCount !== undefined && runResult.previousProductCount > 0) { + const delta = Math.abs(runResult.crawlResult.productsFound - runResult.previousProductCount); + const deltaPercent = (delta / runResult.previousProductCount) * 100; + const deltaCheck: ValidationCheck = { + name: 'Product Count Stability', + passed: deltaPercent <= config.maxProductDeltaPercent, + expected: `<= ${config.maxProductDeltaPercent}% change`, + actual: `${deltaPercent.toFixed(1)}% change (${runResult.previousProductCount} → ${runResult.crawlResult.productsFound})`, + severity: 'warning', + }; + checks.push(deltaCheck); + // Warning only - don't block promotion for delta issues + } + + // Check 4: Required fields populated + for (const field of config.requiredFields) { + const completeness = runResult.fieldCompleteness[field] ?? 0; + const minCompleteness = 100 - config.maxEmptyFieldPercent; + const fieldCheck: ValidationCheck = { + name: `Field: ${field}`, + passed: completeness >= minCompleteness, + expected: `>= ${minCompleteness}% populated`, + actual: `${completeness.toFixed(1)}% populated`, + severity: field === 'name' ? 'critical' : 'warning', + }; + checks.push(fieldCheck); + if (!fieldCheck.passed && fieldCheck.severity === 'critical') { + canPromote = false; + } + } + + // Check 5: No critical errors + const errorCheck: ValidationCheck = { + name: 'Critical Errors', + passed: runResult.errors.length <= config.maxErrorCount, + expected: `<= ${config.maxErrorCount}`, + actual: `${runResult.errors.length}`, + severity: 'critical', + }; + checks.push(errorCheck); + if (!errorCheck.passed) canPromote = false; + + // Check 6: Upsert success rate + const upsertRate = runResult.crawlResult.productsFound > 0 + ? (runResult.crawlResult.productsUpserted / runResult.crawlResult.productsFound) * 100 + : 0; + const upsertCheck: ValidationCheck = { + name: 'Upsert Success Rate', + passed: upsertRate >= 80, // At least 80% should be upserted + expected: '>= 80%', + actual: `${upsertRate.toFixed(1)}%`, + severity: 'warning', + }; + checks.push(upsertCheck); + + // Determine recommended action + let recommendedAction: 'promote' | 'retry' | 'needs_manual' = 'promote'; + if (!canPromote) { + const criticalFailures = checks.filter(c => !c.passed && c.severity === 'critical').length; + if (criticalFailures > 2) { + recommendedAction = 'needs_manual'; + } else { + recommendedAction = 'retry'; + } + } + + // Build summary + const passedCount = checks.filter(c => c.passed).length; + const failedCritical = checks.filter(c => !c.passed && c.severity === 'critical').length; + const summary = canPromote + ? `Validation passed: ${passedCount}/${checks.length} checks OK. Ready for production.` + : `Validation failed: ${failedCritical} critical failures. ${recommendedAction === 'retry' ? 'Will retry.' : 'Needs manual intervention.'}`; + + return { + isValid: checks.every(c => c.passed || c.severity !== 'critical'), + canPromote, + checks, + summary, + recommendedAction, + }; +} + +/** + * Get previous product count for a dispensary + */ +export async function getPreviousProductCount(dispensaryId: number): Promise { + try { + const result = await pool.query( + `SELECT COUNT(*) as count FROM dutchie_products WHERE dispensary_id = $1`, + [dispensaryId] + ); + const count = parseInt(result.rows[0]?.count || '0', 10); + return count > 0 ? count : undefined; + } catch (error: any) { + console.error(`[SandboxValidator] Error getting previous count:`, error.message); + return undefined; + } +} + +/** + * Calculate field completeness from crawl result + * Returns percentage of products with each field populated + */ +export async function calculateFieldCompleteness( + dispensaryId: number, + crawlResult: CrawlResult +): Promise> { + // If no products, return empty + if (crawlResult.productsFound === 0) { + return { name: 0, price: 0, image: 0, stock: 0 }; + } + + try { + // Query the actual stored products to check field completeness + const result = await pool.query( + `SELECT + COUNT(*) as total, + COUNT(NULLIF(name, '')) as has_name, + COUNT(NULLIF(COALESCE(price_rec::text, price_med::text, ''), '')) as has_price, + COUNT(NULLIF(image_url, '')) as has_image, + COUNT(NULLIF(stock_status, '')) as has_stock + FROM dutchie_products + WHERE dispensary_id = $1`, + [dispensaryId] + ); + + const row = result.rows[0]; + const total = parseInt(row?.total || '0', 10); + + if (total === 0) { + return { name: 0, price: 0, image: 0, stock: 0 }; + } + + return { + name: (parseInt(row?.has_name || '0', 10) / total) * 100, + price: (parseInt(row?.has_price || '0', 10) / total) * 100, + image: (parseInt(row?.has_image || '0', 10) / total) * 100, + stock: (parseInt(row?.has_stock || '0', 10) / total) * 100, + }; + } catch (error: any) { + console.error(`[SandboxValidator] Error calculating field completeness:`, error.message); + return { name: 0, price: 0, image: 0, stock: 0 }; + } +} + +/** + * Record validation result to profile + */ +export async function recordValidationResult( + dispensaryId: number, + validationResult: ValidationResult, + runResult: SandboxRunResult +): Promise { + try { + const validationRecord = { + validatedAt: new Date().toISOString(), + isValid: validationResult.isValid, + canPromote: validationResult.canPromote, + summary: validationResult.summary, + checks: validationResult.checks, + productsFound: runResult.crawlResult.productsFound, + duration: runResult.duration, + }; + + await pool.query( + `UPDATE dispensary_crawler_profiles + SET config = config || $1::jsonb, + updated_at = NOW() + WHERE dispensary_id = $2 AND enabled = true`, + [JSON.stringify({ lastValidation: validationRecord }), dispensaryId] + ); + } catch (error: any) { + console.error(`[SandboxValidator] Error recording validation:`, error.message); + } +} + +/** + * Check if a profile has passed sandbox validation + */ +export async function hasPassedSandboxValidation(dispensaryId: number): Promise { + try { + const result = await pool.query( + `SELECT config FROM dispensary_crawler_profiles + WHERE dispensary_id = $1 AND enabled = true + ORDER BY updated_at DESC + LIMIT 1`, + [dispensaryId] + ); + + if (result.rows.length === 0) { + return false; + } + + const config = result.rows[0].config || {}; + const lastValidation = config.lastValidation; + + // Must have passed validation with canPromote = true + return lastValidation?.canPromote === true; + } catch (error: any) { + console.error(`[SandboxValidator] Error checking validation status:`, error.message); + return false; + } +} + +/** + * Promote profile to production after successful validation + */ +export async function promoteAfterValidation( + dispensaryId: number, + validationResult: ValidationResult +): Promise { + if (!validationResult.canPromote) { + console.log(`[SandboxValidator] Cannot promote dispensary ${dispensaryId}: validation failed`); + return false; + } + + try { + await pool.query( + `UPDATE dispensary_crawler_profiles + SET config = config || '{"status": "production"}'::jsonb, + status = 'production', + updated_at = NOW() + WHERE dispensary_id = $1 AND enabled = true`, + [dispensaryId] + ); + + console.log(`[SandboxValidator] Promoted dispensary ${dispensaryId} to production after validation`); + return true; + } catch (error: any) { + console.error(`[SandboxValidator] Error promoting to production:`, error.message); + return false; + } +} + +/** + * Demote profile back to sandbox after production failure + */ +export async function demoteToSandbox( + dispensaryId: number, + reason: string +): Promise { + try { + const demotionRecord = { + demotedAt: new Date().toISOString(), + reason, + previousStatus: 'production', + }; + + await pool.query( + `UPDATE dispensary_crawler_profiles + SET config = config || $1::jsonb, + status = 'sandbox', + updated_at = NOW() + WHERE dispensary_id = $2 AND enabled = true`, + [JSON.stringify({ status: 'sandbox', lastDemotion: demotionRecord }), dispensaryId] + ); + + console.log(`[SandboxValidator] Demoted dispensary ${dispensaryId} to sandbox: ${reason}`); + return true; + } catch (error: any) { + console.error(`[SandboxValidator] Error demoting to sandbox:`, error.message); + return false; + } +} + +/** + * Check if profile has ever been validated (first-time vs re-validation) + */ +export async function isFirstTimeValidation(dispensaryId: number): Promise { + try { + const result = await pool.query( + `SELECT config FROM dispensary_crawler_profiles + WHERE dispensary_id = $1 AND enabled = true + ORDER BY updated_at DESC + LIMIT 1`, + [dispensaryId] + ); + + if (result.rows.length === 0) { + return true; + } + + const config = result.rows[0].config || {}; + return !config.lastValidation; + } catch (error: any) { + return true; + } +} diff --git a/backend/src/services/scheduler.ts b/backend/src/services/scheduler.ts index 3a88b31a..f0d4f87d 100755 --- a/backend/src/services/scheduler.ts +++ b/backend/src/services/scheduler.ts @@ -1,5 +1,5 @@ import cron from 'node-cron'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { scrapeStore, scrapeCategory } from '../scraper-v2'; let scheduledJobs: cron.ScheduledTask[] = []; diff --git a/backend/src/services/scraper-playwright.ts b/backend/src/services/scraper-playwright.ts index a2c2aef3..b1d1e5af 100644 --- a/backend/src/services/scraper-playwright.ts +++ b/backend/src/services/scraper-playwright.ts @@ -1,7 +1,7 @@ import { chromium, Browser, BrowserContext, Page } from 'playwright'; import { bypassAgeGatePlaywright, detectStateFromUrlPlaywright } from '../utils/age-gate-playwright'; import { logger } from './logger'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { createStealthBrowser, createStealthContext, diff --git a/backend/src/services/scraper.ts b/backend/src/services/scraper.ts index a9c967f6..b7f93327 100755 --- a/backend/src/services/scraper.ts +++ b/backend/src/services/scraper.ts @@ -2,7 +2,7 @@ import puppeteer from 'puppeteer-extra'; import StealthPlugin from 'puppeteer-extra-plugin-stealth'; import { Browser, Page } from 'puppeteer'; import { SocksProxyAgent } from 'socks-proxy-agent'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { uploadImageFromUrl, getImageUrl } from '../utils/minio'; import { logger } from './logger'; import { registerScraper, updateScraperStats, completeScraper } from '../routes/scraper-monitor'; diff --git a/backend/src/services/store-crawl-orchestrator.ts b/backend/src/services/store-crawl-orchestrator.ts index 5e7bdd28..7b13c1f4 100644 --- a/backend/src/services/store-crawl-orchestrator.ts +++ b/backend/src/services/store-crawl-orchestrator.ts @@ -12,7 +12,7 @@ */ import { v4 as uuidv4 } from 'uuid'; -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { crawlerLogger } from './crawler-logger'; import { detectMultiCategoryProviders, diff --git a/backend/src/system/routes/index.ts b/backend/src/system/routes/index.ts new file mode 100644 index 00000000..c920c4ad --- /dev/null +++ b/backend/src/system/routes/index.ts @@ -0,0 +1,584 @@ +/** + * System API Routes + * + * Provides REST API endpoints for system monitoring and control: + * - /api/system/sync/* - Sync orchestrator + * - /api/system/dlq/* - Dead-letter queue + * - /api/system/integrity/* - Integrity checks + * - /api/system/fix/* - Auto-fix routines + * - /api/system/alerts/* - System alerts + * - /metrics - Prometheus metrics + * + * Phase 5: Full Production Sync + Monitoring + */ + +import { Router, Request, Response } from 'express'; +import { Pool } from 'pg'; +import { + SyncOrchestrator, + MetricsService, + DLQService, + AlertService, + IntegrityService, + AutoFixService, +} from '../services'; + +export function createSystemRouter(pool: Pool): Router { + const router = Router(); + + // Initialize services + const metrics = new MetricsService(pool); + const dlq = new DLQService(pool); + const alerts = new AlertService(pool); + const integrity = new IntegrityService(pool, alerts); + const autoFix = new AutoFixService(pool, alerts); + const orchestrator = new SyncOrchestrator(pool, metrics, dlq, alerts); + + // ============================================================ + // SYNC ORCHESTRATOR ENDPOINTS + // ============================================================ + + /** + * GET /api/system/sync/status + * Get current sync status + */ + router.get('/sync/status', async (_req: Request, res: Response) => { + try { + const status = await orchestrator.getStatus(); + res.json(status); + } catch (error) { + console.error('[System] Sync status error:', error); + res.status(500).json({ error: 'Failed to get sync status' }); + } + }); + + /** + * POST /api/system/sync/run + * Trigger a sync run + */ + router.post('/sync/run', async (req: Request, res: Response) => { + try { + const triggeredBy = req.body.triggeredBy || 'api'; + const result = await orchestrator.runSync(); + res.json({ + success: true, + triggeredBy, + metrics: result, + }); + } catch (error) { + console.error('[System] Sync run error:', error); + res.status(500).json({ + success: false, + error: error instanceof Error ? error.message : 'Sync run failed', + }); + } + }); + + /** + * GET /api/system/sync/queue-depth + * Get queue depth information + */ + router.get('/sync/queue-depth', async (_req: Request, res: Response) => { + try { + const depth = await orchestrator.getQueueDepth(); + res.json(depth); + } catch (error) { + console.error('[System] Queue depth error:', error); + res.status(500).json({ error: 'Failed to get queue depth' }); + } + }); + + /** + * GET /api/system/sync/health + * Get sync health status + */ + router.get('/sync/health', async (_req: Request, res: Response) => { + try { + const health = await orchestrator.getHealth(); + res.status(health.healthy ? 200 : 503).json(health); + } catch (error) { + console.error('[System] Health check error:', error); + res.status(500).json({ healthy: false, error: 'Health check failed' }); + } + }); + + /** + * POST /api/system/sync/pause + * Pause the orchestrator + */ + router.post('/sync/pause', async (req: Request, res: Response) => { + try { + const reason = req.body.reason || 'Manual pause'; + await orchestrator.pause(reason); + res.json({ success: true, message: 'Orchestrator paused' }); + } catch (error) { + console.error('[System] Pause error:', error); + res.status(500).json({ error: 'Failed to pause orchestrator' }); + } + }); + + /** + * POST /api/system/sync/resume + * Resume the orchestrator + */ + router.post('/sync/resume', async (_req: Request, res: Response) => { + try { + await orchestrator.resume(); + res.json({ success: true, message: 'Orchestrator resumed' }); + } catch (error) { + console.error('[System] Resume error:', error); + res.status(500).json({ error: 'Failed to resume orchestrator' }); + } + }); + + // ============================================================ + // DLQ ENDPOINTS + // ============================================================ + + /** + * GET /api/system/dlq + * List DLQ payloads + */ + router.get('/dlq', async (req: Request, res: Response) => { + try { + const options = { + status: req.query.status as string, + errorType: req.query.errorType as string, + dispensaryId: req.query.dispensaryId ? parseInt(req.query.dispensaryId as string) : undefined, + limit: req.query.limit ? parseInt(req.query.limit as string) : 50, + offset: req.query.offset ? parseInt(req.query.offset as string) : 0, + }; + + const result = await dlq.listPayloads(options); + res.json(result); + } catch (error) { + console.error('[System] DLQ list error:', error); + res.status(500).json({ error: 'Failed to list DLQ payloads' }); + } + }); + + /** + * GET /api/system/dlq/stats + * Get DLQ statistics + */ + router.get('/dlq/stats', async (_req: Request, res: Response) => { + try { + const stats = await dlq.getStats(); + res.json(stats); + } catch (error) { + console.error('[System] DLQ stats error:', error); + res.status(500).json({ error: 'Failed to get DLQ stats' }); + } + }); + + /** + * GET /api/system/dlq/summary + * Get DLQ summary by error type + */ + router.get('/dlq/summary', async (_req: Request, res: Response) => { + try { + const summary = await dlq.getSummary(); + res.json(summary); + } catch (error) { + console.error('[System] DLQ summary error:', error); + res.status(500).json({ error: 'Failed to get DLQ summary' }); + } + }); + + /** + * GET /api/system/dlq/:id + * Get a specific DLQ payload + */ + router.get('/dlq/:id', async (req: Request, res: Response) => { + try { + const payload = await dlq.getPayload(req.params.id); + if (!payload) { + return res.status(404).json({ error: 'Payload not found' }); + } + res.json(payload); + } catch (error) { + console.error('[System] DLQ get error:', error); + res.status(500).json({ error: 'Failed to get DLQ payload' }); + } + }); + + /** + * POST /api/system/dlq/:id/retry + * Retry a DLQ payload + */ + router.post('/dlq/:id/retry', async (req: Request, res: Response) => { + try { + const result = await dlq.retryPayload(req.params.id); + if (result.success) { + res.json(result); + } else { + res.status(400).json(result); + } + } catch (error) { + console.error('[System] DLQ retry error:', error); + res.status(500).json({ error: 'Failed to retry payload' }); + } + }); + + /** + * POST /api/system/dlq/:id/abandon + * Abandon a DLQ payload + */ + router.post('/dlq/:id/abandon', async (req: Request, res: Response) => { + try { + const reason = req.body.reason || 'Manually abandoned'; + const abandonedBy = req.body.abandonedBy || 'api'; + const success = await dlq.abandonPayload(req.params.id, reason, abandonedBy); + res.json({ success }); + } catch (error) { + console.error('[System] DLQ abandon error:', error); + res.status(500).json({ error: 'Failed to abandon payload' }); + } + }); + + /** + * POST /api/system/dlq/bulk-retry + * Bulk retry payloads by error type + */ + router.post('/dlq/bulk-retry', async (req: Request, res: Response) => { + try { + const { errorType } = req.body; + if (!errorType) { + return res.status(400).json({ error: 'errorType is required' }); + } + const result = await dlq.bulkRetryByErrorType(errorType); + res.json(result); + } catch (error) { + console.error('[System] DLQ bulk retry error:', error); + res.status(500).json({ error: 'Failed to bulk retry' }); + } + }); + + // ============================================================ + // INTEGRITY CHECK ENDPOINTS + // ============================================================ + + /** + * POST /api/system/integrity/run + * Run all integrity checks + */ + router.post('/integrity/run', async (req: Request, res: Response) => { + try { + const triggeredBy = req.body.triggeredBy || 'api'; + const result = await integrity.runAllChecks(triggeredBy); + res.json(result); + } catch (error) { + console.error('[System] Integrity run error:', error); + res.status(500).json({ error: 'Failed to run integrity checks' }); + } + }); + + /** + * GET /api/system/integrity/runs + * Get recent integrity check runs + */ + router.get('/integrity/runs', async (req: Request, res: Response) => { + try { + const limit = req.query.limit ? parseInt(req.query.limit as string) : 10; + const runs = await integrity.getRecentRuns(limit); + res.json(runs); + } catch (error) { + console.error('[System] Integrity runs error:', error); + res.status(500).json({ error: 'Failed to get integrity runs' }); + } + }); + + /** + * GET /api/system/integrity/runs/:runId + * Get results for a specific integrity run + */ + router.get('/integrity/runs/:runId', async (req: Request, res: Response) => { + try { + const results = await integrity.getRunResults(req.params.runId); + res.json(results); + } catch (error) { + console.error('[System] Integrity run results error:', error); + res.status(500).json({ error: 'Failed to get run results' }); + } + }); + + // ============================================================ + // AUTO-FIX ENDPOINTS + // ============================================================ + + /** + * GET /api/system/fix/routines + * Get available fix routines + */ + router.get('/fix/routines', (_req: Request, res: Response) => { + try { + const routines = autoFix.getAvailableRoutines(); + res.json(routines); + } catch (error) { + console.error('[System] Get routines error:', error); + res.status(500).json({ error: 'Failed to get routines' }); + } + }); + + /** + * POST /api/system/fix/:routine + * Run a fix routine + */ + router.post('/fix/:routine', async (req: Request, res: Response) => { + try { + const routineName = req.params.routine; + const dryRun = req.body.dryRun === true; + const triggeredBy = req.body.triggeredBy || 'api'; + + const result = await autoFix.runRoutine(routineName as any, triggeredBy, { dryRun }); + res.json(result); + } catch (error) { + console.error('[System] Fix routine error:', error); + res.status(500).json({ error: 'Failed to run fix routine' }); + } + }); + + /** + * GET /api/system/fix/runs + * Get recent fix runs + */ + router.get('/fix/runs', async (req: Request, res: Response) => { + try { + const limit = req.query.limit ? parseInt(req.query.limit as string) : 20; + const runs = await autoFix.getRecentRuns(limit); + res.json(runs); + } catch (error) { + console.error('[System] Fix runs error:', error); + res.status(500).json({ error: 'Failed to get fix runs' }); + } + }); + + // ============================================================ + // ALERTS ENDPOINTS + // ============================================================ + + /** + * GET /api/system/alerts + * List alerts + */ + router.get('/alerts', async (req: Request, res: Response) => { + try { + const options = { + status: req.query.status as any, + severity: req.query.severity as any, + type: req.query.type as string, + limit: req.query.limit ? parseInt(req.query.limit as string) : 50, + offset: req.query.offset ? parseInt(req.query.offset as string) : 0, + }; + + const result = await alerts.listAlerts(options); + res.json(result); + } catch (error) { + console.error('[System] Alerts list error:', error); + res.status(500).json({ error: 'Failed to list alerts' }); + } + }); + + /** + * GET /api/system/alerts/active + * Get active alerts + */ + router.get('/alerts/active', async (_req: Request, res: Response) => { + try { + const activeAlerts = await alerts.getActiveAlerts(); + res.json(activeAlerts); + } catch (error) { + console.error('[System] Active alerts error:', error); + res.status(500).json({ error: 'Failed to get active alerts' }); + } + }); + + /** + * GET /api/system/alerts/summary + * Get alert summary + */ + router.get('/alerts/summary', async (_req: Request, res: Response) => { + try { + const summary = await alerts.getSummary(); + res.json(summary); + } catch (error) { + console.error('[System] Alerts summary error:', error); + res.status(500).json({ error: 'Failed to get alerts summary' }); + } + }); + + /** + * POST /api/system/alerts/:id/acknowledge + * Acknowledge an alert + */ + router.post('/alerts/:id/acknowledge', async (req: Request, res: Response) => { + try { + const alertId = parseInt(req.params.id); + const acknowledgedBy = req.body.acknowledgedBy || 'api'; + const success = await alerts.acknowledgeAlert(alertId, acknowledgedBy); + res.json({ success }); + } catch (error) { + console.error('[System] Acknowledge alert error:', error); + res.status(500).json({ error: 'Failed to acknowledge alert' }); + } + }); + + /** + * POST /api/system/alerts/:id/resolve + * Resolve an alert + */ + router.post('/alerts/:id/resolve', async (req: Request, res: Response) => { + try { + const alertId = parseInt(req.params.id); + const resolvedBy = req.body.resolvedBy || 'api'; + const success = await alerts.resolveAlert(alertId, resolvedBy); + res.json({ success }); + } catch (error) { + console.error('[System] Resolve alert error:', error); + res.status(500).json({ error: 'Failed to resolve alert' }); + } + }); + + /** + * POST /api/system/alerts/bulk-acknowledge + * Bulk acknowledge alerts + */ + router.post('/alerts/bulk-acknowledge', async (req: Request, res: Response) => { + try { + const { ids, acknowledgedBy } = req.body; + if (!ids || !Array.isArray(ids)) { + return res.status(400).json({ error: 'ids array is required' }); + } + const count = await alerts.bulkAcknowledge(ids, acknowledgedBy || 'api'); + res.json({ acknowledged: count }); + } catch (error) { + console.error('[System] Bulk acknowledge error:', error); + res.status(500).json({ error: 'Failed to bulk acknowledge' }); + } + }); + + // ============================================================ + // METRICS ENDPOINTS + // ============================================================ + + /** + * GET /api/system/metrics + * Get all current metrics + */ + router.get('/metrics', async (_req: Request, res: Response) => { + try { + const allMetrics = await metrics.getAllMetrics(); + res.json(allMetrics); + } catch (error) { + console.error('[System] Metrics error:', error); + res.status(500).json({ error: 'Failed to get metrics' }); + } + }); + + /** + * GET /api/system/metrics/:name + * Get a specific metric + */ + router.get('/metrics/:name', async (req: Request, res: Response) => { + try { + const metric = await metrics.getMetric(req.params.name); + if (!metric) { + return res.status(404).json({ error: 'Metric not found' }); + } + res.json(metric); + } catch (error) { + console.error('[System] Metric error:', error); + res.status(500).json({ error: 'Failed to get metric' }); + } + }); + + /** + * GET /api/system/metrics/:name/history + * Get metric time series + */ + router.get('/metrics/:name/history', async (req: Request, res: Response) => { + try { + const hours = req.query.hours ? parseInt(req.query.hours as string) : 24; + const history = await metrics.getMetricHistory(req.params.name, hours); + res.json(history); + } catch (error) { + console.error('[System] Metric history error:', error); + res.status(500).json({ error: 'Failed to get metric history' }); + } + }); + + /** + * GET /api/system/errors + * Get error summary + */ + router.get('/errors', async (_req: Request, res: Response) => { + try { + const summary = await metrics.getErrorSummary(); + res.json(summary); + } catch (error) { + console.error('[System] Error summary error:', error); + res.status(500).json({ error: 'Failed to get error summary' }); + } + }); + + /** + * GET /api/system/errors/recent + * Get recent errors + */ + router.get('/errors/recent', async (req: Request, res: Response) => { + try { + const limit = req.query.limit ? parseInt(req.query.limit as string) : 50; + const errorType = req.query.type as string; + const errors = await metrics.getRecentErrors(limit, errorType); + res.json(errors); + } catch (error) { + console.error('[System] Recent errors error:', error); + res.status(500).json({ error: 'Failed to get recent errors' }); + } + }); + + /** + * POST /api/system/errors/acknowledge + * Acknowledge errors + */ + router.post('/errors/acknowledge', async (req: Request, res: Response) => { + try { + const { ids, acknowledgedBy } = req.body; + if (!ids || !Array.isArray(ids)) { + return res.status(400).json({ error: 'ids array is required' }); + } + const count = await metrics.acknowledgeErrors(ids, acknowledgedBy || 'api'); + res.json({ acknowledged: count }); + } catch (error) { + console.error('[System] Acknowledge errors error:', error); + res.status(500).json({ error: 'Failed to acknowledge errors' }); + } + }); + + return router; +} + +/** + * Create Prometheus metrics endpoint (standalone) + */ +export function createPrometheusRouter(pool: Pool): Router { + const router = Router(); + const metrics = new MetricsService(pool); + + /** + * GET /metrics + * Prometheus-compatible metrics endpoint + */ + router.get('/', async (_req: Request, res: Response) => { + try { + const prometheusOutput = await metrics.getPrometheusMetrics(); + res.set('Content-Type', 'text/plain; version=0.0.4'); + res.send(prometheusOutput); + } catch (error) { + console.error('[Prometheus] Metrics error:', error); + res.status(500).send('# Error generating metrics'); + } + }); + + return router; +} diff --git a/backend/src/system/services/alerts.ts b/backend/src/system/services/alerts.ts new file mode 100644 index 00000000..14cb2b52 --- /dev/null +++ b/backend/src/system/services/alerts.ts @@ -0,0 +1,343 @@ +/** + * Alerts Service + * + * System alerting with: + * - Alert creation and deduplication + * - Severity levels + * - Acknowledgement tracking + * - Resolution workflow + * + * Phase 5: Full Production Sync + Monitoring + */ + +import { Pool } from 'pg'; + +export type AlertSeverity = 'info' | 'warning' | 'error' | 'critical'; +export type AlertStatus = 'active' | 'acknowledged' | 'resolved' | 'muted'; + +export interface SystemAlert { + id: number; + alertType: string; + severity: AlertSeverity; + title: string; + message: string | null; + source: string | null; + context: Record; + status: AlertStatus; + acknowledgedAt: Date | null; + acknowledgedBy: string | null; + resolvedAt: Date | null; + resolvedBy: string | null; + fingerprint: string; + occurrenceCount: number; + firstOccurredAt: Date; + lastOccurredAt: Date; + createdAt: Date; +} + +export interface AlertSummary { + total: number; + active: number; + acknowledged: number; + resolved: number; + bySeverity: Record; + byType: Record; +} + +export class AlertService { + private pool: Pool; + + constructor(pool: Pool) { + this.pool = pool; + } + + /** + * Create or update an alert (with deduplication) + */ + async createAlert( + type: string, + severity: AlertSeverity, + title: string, + message?: string, + source?: string, + context: Record = {} + ): Promise { + const result = await this.pool.query( + `SELECT upsert_alert($1, $2, $3, $4, $5, $6) as id`, + [type, severity, title, message || null, source || null, JSON.stringify(context)] + ); + + return result.rows[0].id; + } + + /** + * Get alert summary + */ + async getSummary(): Promise { + const [countResult, severityResult, typeResult] = await Promise.all([ + this.pool.query(` + SELECT + COUNT(*) as total, + COUNT(*) FILTER (WHERE status = 'active') as active, + COUNT(*) FILTER (WHERE status = 'acknowledged') as acknowledged, + COUNT(*) FILTER (WHERE status = 'resolved') as resolved + FROM system_alerts + WHERE created_at >= NOW() - INTERVAL '7 days' + `), + this.pool.query(` + SELECT severity, COUNT(*) as count + FROM system_alerts + WHERE status = 'active' + GROUP BY severity + `), + this.pool.query(` + SELECT alert_type, COUNT(*) as count + FROM system_alerts + WHERE status = 'active' + GROUP BY alert_type + `), + ]); + + const row = countResult.rows[0]; + const bySeverity: Record = { + info: 0, + warning: 0, + error: 0, + critical: 0, + }; + severityResult.rows.forEach(r => { + bySeverity[r.severity as AlertSeverity] = parseInt(r.count); + }); + + const byType: Record = {}; + typeResult.rows.forEach(r => { + byType[r.alert_type] = parseInt(r.count); + }); + + return { + total: parseInt(row.total) || 0, + active: parseInt(row.active) || 0, + acknowledged: parseInt(row.acknowledged) || 0, + resolved: parseInt(row.resolved) || 0, + bySeverity, + byType, + }; + } + + /** + * List alerts + */ + async listAlerts(options: { + status?: AlertStatus; + severity?: AlertSeverity; + type?: string; + limit?: number; + offset?: number; + } = {}): Promise<{ alerts: SystemAlert[]; total: number }> { + const { status, severity, type, limit = 50, offset = 0 } = options; + + const conditions: string[] = []; + const params: (string | number)[] = []; + let paramIndex = 1; + + if (status) { + conditions.push(`status = $${paramIndex++}`); + params.push(status); + } + if (severity) { + conditions.push(`severity = $${paramIndex++}`); + params.push(severity); + } + if (type) { + conditions.push(`alert_type = $${paramIndex++}`); + params.push(type); + } + + const whereClause = conditions.length > 0 ? 'WHERE ' + conditions.join(' AND ') : ''; + + // Get total count + const countResult = await this.pool.query(` + SELECT COUNT(*) as total FROM system_alerts ${whereClause} + `, params); + + // Get alerts + params.push(limit, offset); + const result = await this.pool.query(` + SELECT * + FROM system_alerts + ${whereClause} + ORDER BY + CASE status WHEN 'active' THEN 0 WHEN 'acknowledged' THEN 1 ELSE 2 END, + CASE severity WHEN 'critical' THEN 0 WHEN 'error' THEN 1 WHEN 'warning' THEN 2 ELSE 3 END, + last_occurred_at DESC + LIMIT $${paramIndex++} OFFSET $${paramIndex++} + `, params); + + const alerts: SystemAlert[] = result.rows.map(row => ({ + id: row.id, + alertType: row.alert_type, + severity: row.severity, + title: row.title, + message: row.message, + source: row.source, + context: row.context || {}, + status: row.status, + acknowledgedAt: row.acknowledged_at, + acknowledgedBy: row.acknowledged_by, + resolvedAt: row.resolved_at, + resolvedBy: row.resolved_by, + fingerprint: row.fingerprint, + occurrenceCount: row.occurrence_count, + firstOccurredAt: row.first_occurred_at, + lastOccurredAt: row.last_occurred_at, + createdAt: row.created_at, + })); + + return { + alerts, + total: parseInt(countResult.rows[0].total) || 0, + }; + } + + /** + * Get active alerts + */ + async getActiveAlerts(): Promise { + const result = await this.listAlerts({ status: 'active', limit: 100 }); + return result.alerts; + } + + /** + * Acknowledge an alert + */ + async acknowledgeAlert(id: number, acknowledgedBy: string): Promise { + const result = await this.pool.query(` + UPDATE system_alerts + SET status = 'acknowledged', + acknowledged_at = NOW(), + acknowledged_by = $2 + WHERE id = $1 AND status = 'active' + RETURNING id + `, [id, acknowledgedBy]); + + return (result.rowCount || 0) > 0; + } + + /** + * Resolve an alert + */ + async resolveAlert(typeOrId: string | number, resolvedBy?: string): Promise { + if (typeof typeOrId === 'number') { + const result = await this.pool.query(` + UPDATE system_alerts + SET status = 'resolved', + resolved_at = NOW(), + resolved_by = $2 + WHERE id = $1 AND status IN ('active', 'acknowledged') + RETURNING id + `, [typeOrId, resolvedBy || 'system']); + + return (result.rowCount || 0) > 0; + } else { + // Resolve by alert type + const result = await this.pool.query(` + UPDATE system_alerts + SET status = 'resolved', + resolved_at = NOW(), + resolved_by = $2 + WHERE alert_type = $1 AND status IN ('active', 'acknowledged') + RETURNING id + `, [typeOrId, resolvedBy || 'system']); + + return (result.rowCount || 0) > 0; + } + } + + /** + * Mute an alert type + */ + async muteAlertType(type: string, mutedBy: string): Promise { + const result = await this.pool.query(` + UPDATE system_alerts + SET status = 'muted' + WHERE alert_type = $1 AND status = 'active' + `, [type]); + + return result.rowCount || 0; + } + + /** + * Bulk acknowledge alerts + */ + async bulkAcknowledge(ids: number[], acknowledgedBy: string): Promise { + const result = await this.pool.query(` + UPDATE system_alerts + SET status = 'acknowledged', + acknowledged_at = NOW(), + acknowledged_by = $2 + WHERE id = ANY($1) AND status = 'active' + `, [ids, acknowledgedBy]); + + return result.rowCount || 0; + } + + /** + * Bulk resolve alerts + */ + async bulkResolve(ids: number[], resolvedBy: string): Promise { + const result = await this.pool.query(` + UPDATE system_alerts + SET status = 'resolved', + resolved_at = NOW(), + resolved_by = $2 + WHERE id = ANY($1) AND status IN ('active', 'acknowledged') + `, [ids, resolvedBy]); + + return result.rowCount || 0; + } + + /** + * Cleanup old resolved alerts + */ + async cleanupResolved(daysOld: number = 30): Promise { + const result = await this.pool.query(` + DELETE FROM system_alerts + WHERE status IN ('resolved', 'muted') + AND resolved_at < NOW() - ($1 || ' days')::INTERVAL + `, [daysOld]); + + return result.rowCount || 0; + } + + /** + * Get alert by ID + */ + async getAlert(id: number): Promise { + const result = await this.pool.query(` + SELECT * FROM system_alerts WHERE id = $1 + `, [id]); + + if (result.rows.length === 0) return null; + + const row = result.rows[0]; + return { + id: row.id, + alertType: row.alert_type, + severity: row.severity, + title: row.title, + message: row.message, + source: row.source, + context: row.context || {}, + status: row.status, + acknowledgedAt: row.acknowledged_at, + acknowledgedBy: row.acknowledged_by, + resolvedAt: row.resolved_at, + resolvedBy: row.resolved_by, + fingerprint: row.fingerprint, + occurrenceCount: row.occurrence_count, + firstOccurredAt: row.first_occurred_at, + lastOccurredAt: row.last_occurred_at, + createdAt: row.created_at, + }; + } +} diff --git a/backend/src/system/services/auto-fix.ts b/backend/src/system/services/auto-fix.ts new file mode 100644 index 00000000..6465dbfe --- /dev/null +++ b/backend/src/system/services/auto-fix.ts @@ -0,0 +1,485 @@ +/** + * Auto-Fix Routines Service + * + * Automated fix routines (only run when triggered): + * - rehydrate_missing_snapshots + * - recalc_brand_penetration + * - reconcile_category_mismatches + * - purge_orphaned_rows (soft-delete only) + * - repair_cross_state_sku_conflicts + * + * Phase 5: Full Production Sync + Monitoring + */ + +import { Pool } from 'pg'; +import { AlertService } from './alerts'; + +export interface FixRunResult { + runId: string; + routineName: string; + triggeredBy: string; + triggerType: 'manual' | 'auto' | 'scheduled'; + startedAt: Date; + finishedAt: Date | null; + status: 'running' | 'completed' | 'failed' | 'rolled_back'; + rowsAffected: number; + changes: any[]; + isDryRun: boolean; + dryRunPreview: any | null; + errorMessage: string | null; +} + +export type FixRoutineName = + | 'rehydrate_missing_snapshots' + | 'recalc_brand_penetration' + | 'reconcile_category_mismatches' + | 'purge_orphaned_rows' + | 'repair_cross_state_sku_conflicts'; + +export class AutoFixService { + private pool: Pool; + private alerts: AlertService; + + constructor(pool: Pool, alerts: AlertService) { + this.pool = pool; + this.alerts = alerts; + } + + /** + * Run a fix routine + */ + async runRoutine( + routineName: FixRoutineName, + triggeredBy: string, + options: { dryRun?: boolean; triggerType?: 'manual' | 'auto' | 'scheduled' } = {} + ): Promise { + const { dryRun = false, triggerType = 'manual' } = options; + const runId = crypto.randomUUID(); + const startedAt = new Date(); + + // Create run record + await this.pool.query(` + INSERT INTO auto_fix_runs (run_id, routine_name, triggered_by, trigger_type, is_dry_run) + VALUES ($1, $2, $3, $4, $5) + `, [runId, routineName, triggeredBy, triggerType, dryRun]); + + try { + let result: { rowsAffected: number; changes: any[]; preview?: any }; + + switch (routineName) { + case 'rehydrate_missing_snapshots': + result = await this.rehydrateMissingSnapshots(dryRun); + break; + case 'recalc_brand_penetration': + result = await this.recalcBrandPenetration(dryRun); + break; + case 'reconcile_category_mismatches': + result = await this.reconcileCategoryMismatches(dryRun); + break; + case 'purge_orphaned_rows': + result = await this.purgeOrphanedRows(dryRun); + break; + case 'repair_cross_state_sku_conflicts': + result = await this.repairCrossStateSkuConflicts(dryRun); + break; + default: + throw new Error(`Unknown routine: ${routineName}`); + } + + // Update run record + await this.pool.query(` + UPDATE auto_fix_runs + SET status = 'completed', + finished_at = NOW(), + rows_affected = $2, + changes = $3, + dry_run_preview = $4 + WHERE run_id = $1 + `, [runId, result.rowsAffected, JSON.stringify(result.changes), result.preview ? JSON.stringify(result.preview) : null]); + + return { + runId, + routineName, + triggeredBy, + triggerType, + startedAt, + finishedAt: new Date(), + status: 'completed', + rowsAffected: result.rowsAffected, + changes: result.changes, + isDryRun: dryRun, + dryRunPreview: result.preview || null, + errorMessage: null, + }; + } catch (error) { + const errorMessage = error instanceof Error ? error.message : String(error); + + await this.pool.query(` + UPDATE auto_fix_runs + SET status = 'failed', + finished_at = NOW(), + error_message = $2 + WHERE run_id = $1 + `, [runId, errorMessage]); + + await this.alerts.createAlert( + 'FIX_ROUTINE_FAILED', + 'error', + `Fix routine failed: ${routineName}`, + errorMessage, + 'auto-fix-service' + ); + + return { + runId, + routineName, + triggeredBy, + triggerType, + startedAt, + finishedAt: new Date(), + status: 'failed', + rowsAffected: 0, + changes: [], + isDryRun: dryRun, + dryRunPreview: null, + errorMessage, + }; + } + } + + /** + * Rehydrate missing snapshots + */ + private async rehydrateMissingSnapshots( + dryRun: boolean + ): Promise<{ rowsAffected: number; changes: any[]; preview?: any }> { + // Find products without recent snapshots + const missingResult = await this.pool.query(` + SELECT + dp.id as product_id, + dp.dispensary_id, + dp.name, + MAX(dps.crawled_at) as last_snapshot + FROM dutchie_products dp + LEFT JOIN dutchie_product_snapshots dps ON dp.id = dps.dutchie_product_id + GROUP BY dp.id, dp.dispensary_id, dp.name + HAVING MAX(dps.crawled_at) IS NULL + OR MAX(dps.crawled_at) < NOW() - INTERVAL '24 hours' + LIMIT 1000 + `); + + const preview = { + productsToHydrate: missingResult.rows.length, + sample: missingResult.rows.slice(0, 5), + }; + + if (dryRun) { + return { rowsAffected: 0, changes: [], preview }; + } + + // Create snapshots for products without recent ones + let created = 0; + const changes: any[] = []; + + for (const row of missingResult.rows) { + try { + await this.pool.query(` + INSERT INTO dutchie_product_snapshots ( + dutchie_product_id, dispensary_id, crawled_at, stock_status + ) + SELECT + id, dispensary_id, NOW(), 'unknown' + FROM dutchie_products + WHERE id = $1 + `, [row.product_id]); + + created++; + changes.push({ productId: row.product_id, action: 'snapshot_created' }); + } catch (error) { + // Skip individual failures + } + } + + return { rowsAffected: created, changes }; + } + + /** + * Recalculate brand penetration + */ + private async recalcBrandPenetration( + dryRun: boolean + ): Promise<{ rowsAffected: number; changes: any[]; preview?: any }> { + // Get current brand metrics + const brandsResult = await this.pool.query(` + SELECT + brand_name, + COUNT(DISTINCT dispensary_id) as store_count, + COUNT(*) as sku_count + FROM dutchie_products + WHERE brand_name IS NOT NULL + GROUP BY brand_name + ORDER BY sku_count DESC + LIMIT 100 + `); + + const preview = { + brandsToUpdate: brandsResult.rows.length, + topBrands: brandsResult.rows.slice(0, 5), + }; + + if (dryRun) { + return { rowsAffected: 0, changes: [], preview }; + } + + // Capture fresh brand snapshots + try { + await this.pool.query(`SELECT capture_brand_snapshots()`); + } catch (error) { + // Function might not exist yet + } + + // Also update v_brand_summary if it's a materialized view + try { + await this.pool.query(`REFRESH MATERIALIZED VIEW CONCURRENTLY v_brand_summary`); + } catch (error) { + // View might not exist or not be materialized + } + + return { + rowsAffected: brandsResult.rows.length, + changes: [{ action: 'brand_snapshots_captured', brandCount: brandsResult.rows.length }], + }; + } + + /** + * Reconcile category mismatches + */ + private async reconcileCategoryMismatches( + dryRun: boolean + ): Promise<{ rowsAffected: number; changes: any[]; preview?: any }> { + // Find products with null/unknown categories + const mismatchResult = await this.pool.query(` + SELECT + id, + name, + type as current_type, + brand_name + FROM dutchie_products + WHERE type IS NULL OR type = '' OR type = 'Unknown' + LIMIT 500 + `); + + const preview = { + productsToFix: mismatchResult.rows.length, + sample: mismatchResult.rows.slice(0, 10), + }; + + if (dryRun) { + return { rowsAffected: 0, changes: [], preview }; + } + + // Try to infer category from product name + let fixed = 0; + const changes: any[] = []; + + for (const row of mismatchResult.rows) { + const name = (row.name || '').toLowerCase(); + let inferredType: string | null = null; + + // Simple inference rules + if (name.includes('flower') || name.includes('bud') || name.includes('eighth') || name.includes('ounce')) { + inferredType = 'Flower'; + } else if (name.includes('cart') || name.includes('vape') || name.includes('pod')) { + inferredType = 'Vaporizers'; + } else if (name.includes('pre-roll') || name.includes('preroll') || name.includes('joint')) { + inferredType = 'Pre-Rolls'; + } else if (name.includes('edible') || name.includes('gummy') || name.includes('chocolate')) { + inferredType = 'Edible'; + } else if (name.includes('concentrate') || name.includes('wax') || name.includes('shatter')) { + inferredType = 'Concentrate'; + } else if (name.includes('topical') || name.includes('cream') || name.includes('balm')) { + inferredType = 'Topicals'; + } else if (name.includes('tincture')) { + inferredType = 'Tincture'; + } + + if (inferredType) { + await this.pool.query(` + UPDATE dutchie_products SET type = $2 WHERE id = $1 + `, [row.id, inferredType]); + + fixed++; + changes.push({ + productId: row.id, + name: row.name, + previousType: row.current_type, + newType: inferredType, + }); + } + } + + return { rowsAffected: fixed, changes }; + } + + /** + * Purge orphaned rows (soft-delete only) + */ + private async purgeOrphanedRows( + dryRun: boolean + ): Promise<{ rowsAffected: number; changes: any[]; preview?: any }> { + // Find orphaned snapshots + const orphanedResult = await this.pool.query(` + SELECT dps.id, dps.dutchie_product_id + FROM dutchie_product_snapshots dps + LEFT JOIN dutchie_products dp ON dps.dutchie_product_id = dp.id + WHERE dp.id IS NULL + LIMIT 1000 + `); + + const preview = { + orphanedSnapshots: orphanedResult.rows.length, + sampleIds: orphanedResult.rows.slice(0, 10).map(r => r.id), + }; + + if (dryRun) { + return { rowsAffected: 0, changes: [], preview }; + } + + // Delete orphaned snapshots + const deleteResult = await this.pool.query(` + DELETE FROM dutchie_product_snapshots + WHERE id IN ( + SELECT dps.id + FROM dutchie_product_snapshots dps + LEFT JOIN dutchie_products dp ON dps.dutchie_product_id = dp.id + WHERE dp.id IS NULL + LIMIT 1000 + ) + `); + + return { + rowsAffected: deleteResult.rowCount || 0, + changes: [{ action: 'deleted_orphaned_snapshots', count: deleteResult.rowCount }], + }; + } + + /** + * Repair cross-state SKU conflicts + */ + private async repairCrossStateSkuConflicts( + dryRun: boolean + ): Promise<{ rowsAffected: number; changes: any[]; preview?: any }> { + // Find conflicting SKUs + const conflictResult = await this.pool.query(` + SELECT + dp.external_product_id, + ARRAY_AGG(DISTINCT d.state) as states, + ARRAY_AGG(dp.id) as product_ids, + COUNT(*) as count + FROM dutchie_products dp + JOIN dispensaries d ON dp.dispensary_id = d.id + WHERE d.state IS NOT NULL + GROUP BY dp.external_product_id + HAVING COUNT(DISTINCT d.state) > 1 + LIMIT 100 + `); + + const preview = { + conflictingSKUs: conflictResult.rows.length, + sample: conflictResult.rows.slice(0, 5), + }; + + if (dryRun) { + return { rowsAffected: 0, changes: [], preview }; + } + + // For now, just log conflicts - actual repair would need business rules + // This could involve prefixing external_product_id with state code + const changes: any[] = []; + + for (const row of conflictResult.rows) { + changes.push({ + externalProductId: row.external_product_id, + states: row.states, + action: 'identified_for_review', + }); + } + + // Create alert for manual review + if (changes.length > 0) { + await this.alerts.createAlert( + 'CROSS_STATE_SKU_CONFLICTS', + 'warning', + `${changes.length} cross-state SKU conflicts need review`, + 'These SKUs appear in multiple states and may need unique identifiers', + 'auto-fix-service' + ); + } + + return { rowsAffected: 0, changes }; + } + + /** + * Get available routines + */ + getAvailableRoutines(): Array<{ + name: FixRoutineName; + description: string; + canAutoRun: boolean; + }> { + return [ + { + name: 'rehydrate_missing_snapshots', + description: 'Create snapshots for products missing recent snapshot data', + canAutoRun: true, + }, + { + name: 'recalc_brand_penetration', + description: 'Recalculate brand penetration metrics and capture snapshots', + canAutoRun: true, + }, + { + name: 'reconcile_category_mismatches', + description: 'Infer categories for products with missing/unknown category', + canAutoRun: true, + }, + { + name: 'purge_orphaned_rows', + description: 'Delete orphaned snapshots with no parent product', + canAutoRun: false, + }, + { + name: 'repair_cross_state_sku_conflicts', + description: 'Identify and flag cross-state SKU collisions for review', + canAutoRun: false, + }, + ]; + } + + /** + * Get recent fix runs + */ + async getRecentRuns(limit: number = 20): Promise { + const result = await this.pool.query(` + SELECT * + FROM auto_fix_runs + ORDER BY started_at DESC + LIMIT $1 + `, [limit]); + + return result.rows.map(row => ({ + runId: row.run_id, + routineName: row.routine_name, + triggeredBy: row.triggered_by, + triggerType: row.trigger_type, + startedAt: row.started_at, + finishedAt: row.finished_at, + status: row.status, + rowsAffected: row.rows_affected, + changes: row.changes || [], + isDryRun: row.is_dry_run, + dryRunPreview: row.dry_run_preview, + errorMessage: row.error_message, + })); + } +} diff --git a/backend/src/system/services/dlq.ts b/backend/src/system/services/dlq.ts new file mode 100644 index 00000000..35576acc --- /dev/null +++ b/backend/src/system/services/dlq.ts @@ -0,0 +1,389 @@ +/** + * Dead-Letter Queue (DLQ) Service + * + * Handles failed payloads that exceed retry limits: + * - Move payloads to DLQ after 3+ failures + * - Store error history + * - Enable manual retry + * - Zero data loss guarantee + * + * Phase 5: Full Production Sync + Monitoring + */ + +import { Pool } from 'pg'; + +export interface DLQPayload { + id: string; + originalPayloadId: string; + dispensaryId: number | null; + dispensaryName: string | null; + stateCode: string | null; + platform: string; + productCount: number | null; + pricingType: string | null; + crawlMode: string | null; + movedToDlqAt: Date; + failureCount: number; + errorHistory: Array<{ + type: string; + message: string; + at: Date; + }>; + lastErrorType: string | null; + lastErrorMessage: string | null; + lastErrorAt: Date | null; + retryCount: number; + lastRetryAt: Date | null; + status: 'pending' | 'retrying' | 'resolved' | 'abandoned'; + resolvedAt: Date | null; + resolvedBy: string | null; + resolutionNotes: string | null; +} + +export interface DLQStats { + total: number; + pending: number; + retrying: number; + resolved: number; + abandoned: number; + byErrorType: Record; + oldestPending: Date | null; + newestPending: Date | null; +} + +export class DLQService { + private pool: Pool; + + constructor(pool: Pool) { + this.pool = pool; + } + + /** + * Move a payload to the DLQ + */ + async movePayloadToDlq( + payloadId: string, + errorType: string, + errorMessage: string + ): Promise { + const result = await this.pool.query( + `SELECT move_to_dlq($1, $2, $3) as dlq_id`, + [payloadId, errorType, errorMessage] + ); + + return result.rows[0].dlq_id; + } + + /** + * Get DLQ statistics + */ + async getStats(): Promise { + const [countResult, byTypeResult, dateResult] = await Promise.all([ + this.pool.query(` + SELECT + COUNT(*) as total, + COUNT(*) FILTER (WHERE status = 'pending') as pending, + COUNT(*) FILTER (WHERE status = 'retrying') as retrying, + COUNT(*) FILTER (WHERE status = 'resolved') as resolved, + COUNT(*) FILTER (WHERE status = 'abandoned') as abandoned + FROM raw_payloads_dlq + `), + this.pool.query(` + SELECT last_error_type, COUNT(*) as count + FROM raw_payloads_dlq + WHERE status = 'pending' + GROUP BY last_error_type + `), + this.pool.query(` + SELECT + MIN(moved_to_dlq_at) as oldest, + MAX(moved_to_dlq_at) as newest + FROM raw_payloads_dlq + WHERE status = 'pending' + `), + ]); + + const row = countResult.rows[0]; + const byErrorType: Record = {}; + byTypeResult.rows.forEach(r => { + byErrorType[r.last_error_type] = parseInt(r.count); + }); + + return { + total: parseInt(row.total) || 0, + pending: parseInt(row.pending) || 0, + retrying: parseInt(row.retrying) || 0, + resolved: parseInt(row.resolved) || 0, + abandoned: parseInt(row.abandoned) || 0, + byErrorType, + oldestPending: dateResult.rows[0]?.oldest || null, + newestPending: dateResult.rows[0]?.newest || null, + }; + } + + /** + * List DLQ payloads + */ + async listPayloads(options: { + status?: string; + errorType?: string; + dispensaryId?: number; + limit?: number; + offset?: number; + } = {}): Promise<{ payloads: DLQPayload[]; total: number }> { + const { status, errorType, dispensaryId, limit = 50, offset = 0 } = options; + + const conditions: string[] = []; + const params: (string | number)[] = []; + let paramIndex = 1; + + if (status) { + conditions.push(`dlq.status = $${paramIndex++}`); + params.push(status); + } + if (errorType) { + conditions.push(`dlq.last_error_type = $${paramIndex++}`); + params.push(errorType); + } + if (dispensaryId) { + conditions.push(`dlq.dispensary_id = $${paramIndex++}`); + params.push(dispensaryId); + } + + const whereClause = conditions.length > 0 ? 'WHERE ' + conditions.join(' AND ') : ''; + + // Get total count + const countResult = await this.pool.query(` + SELECT COUNT(*) as total + FROM raw_payloads_dlq dlq + ${whereClause} + `, params); + + // Get payloads + params.push(limit, offset); + const result = await this.pool.query(` + SELECT + dlq.id, dlq.original_payload_id, dlq.dispensary_id, + d.name as dispensary_name, + dlq.state_code, dlq.platform, dlq.product_count, dlq.pricing_type, + dlq.crawl_mode, dlq.moved_to_dlq_at, dlq.failure_count, + dlq.error_history, dlq.last_error_type, dlq.last_error_message, + dlq.last_error_at, dlq.retry_count, dlq.last_retry_at, + dlq.status, dlq.resolved_at, dlq.resolved_by, dlq.resolution_notes + FROM raw_payloads_dlq dlq + LEFT JOIN dispensaries d ON dlq.dispensary_id = d.id + ${whereClause} + ORDER BY dlq.moved_to_dlq_at DESC + LIMIT $${paramIndex++} OFFSET $${paramIndex++} + `, params); + + const payloads: DLQPayload[] = result.rows.map(row => ({ + id: row.id, + originalPayloadId: row.original_payload_id, + dispensaryId: row.dispensary_id, + dispensaryName: row.dispensary_name, + stateCode: row.state_code, + platform: row.platform, + productCount: row.product_count, + pricingType: row.pricing_type, + crawlMode: row.crawl_mode, + movedToDlqAt: row.moved_to_dlq_at, + failureCount: row.failure_count, + errorHistory: row.error_history || [], + lastErrorType: row.last_error_type, + lastErrorMessage: row.last_error_message, + lastErrorAt: row.last_error_at, + retryCount: row.retry_count, + lastRetryAt: row.last_retry_at, + status: row.status, + resolvedAt: row.resolved_at, + resolvedBy: row.resolved_by, + resolutionNotes: row.resolution_notes, + })); + + return { + payloads, + total: parseInt(countResult.rows[0].total) || 0, + }; + } + + /** + * Get a single DLQ payload with raw JSON + */ + async getPayload(id: string): Promise<(DLQPayload & { rawJson: any }) | null> { + const result = await this.pool.query(` + SELECT + dlq.*, d.name as dispensary_name + FROM raw_payloads_dlq dlq + LEFT JOIN dispensaries d ON dlq.dispensary_id = d.id + WHERE dlq.id = $1 + `, [id]); + + if (result.rows.length === 0) return null; + + const row = result.rows[0]; + return { + id: row.id, + originalPayloadId: row.original_payload_id, + dispensaryId: row.dispensary_id, + dispensaryName: row.dispensary_name, + stateCode: row.state_code, + platform: row.platform, + productCount: row.product_count, + pricingType: row.pricing_type, + crawlMode: row.crawl_mode, + movedToDlqAt: row.moved_to_dlq_at, + failureCount: row.failure_count, + errorHistory: row.error_history || [], + lastErrorType: row.last_error_type, + lastErrorMessage: row.last_error_message, + lastErrorAt: row.last_error_at, + retryCount: row.retry_count, + lastRetryAt: row.last_retry_at, + status: row.status, + resolvedAt: row.resolved_at, + resolvedBy: row.resolved_by, + resolutionNotes: row.resolution_notes, + rawJson: row.raw_json, + }; + } + + /** + * Retry a DLQ payload + */ + async retryPayload(id: string): Promise<{ success: boolean; newPayloadId?: string; error?: string }> { + const payload = await this.getPayload(id); + + if (!payload) { + return { success: false, error: 'Payload not found' }; + } + + if (payload.status !== 'pending') { + return { success: false, error: `Cannot retry payload with status: ${payload.status}` }; + } + + try { + // Update DLQ status + await this.pool.query(` + UPDATE raw_payloads_dlq + SET status = 'retrying', + retry_count = retry_count + 1, + last_retry_at = NOW() + WHERE id = $1 + `, [id]); + + // Re-insert into raw_payloads + const insertResult = await this.pool.query(` + INSERT INTO raw_payloads ( + dispensary_id, platform, raw_json, product_count, + pricing_type, crawl_mode, fetched_at, processed, + hydration_attempts + ) + VALUES ($1, $2, $3, $4, $5, $6, NOW(), FALSE, 0) + RETURNING id + `, [ + payload.dispensaryId, + payload.platform, + payload.rawJson, + payload.productCount, + payload.pricingType, + payload.crawlMode, + ]); + + const newPayloadId = insertResult.rows[0].id; + + // Mark DLQ entry as resolved + await this.pool.query(` + UPDATE raw_payloads_dlq + SET status = 'resolved', + resolved_at = NOW(), + resolution_notes = 'Retried as payload ' || $2 + WHERE id = $1 + `, [id, newPayloadId]); + + return { success: true, newPayloadId }; + } catch (error) { + // Revert DLQ status + await this.pool.query(` + UPDATE raw_payloads_dlq + SET status = 'pending', + error_history = error_history || $2 + WHERE id = $1 + `, [id, JSON.stringify({ type: 'RETRY_FAILED', message: String(error), at: new Date() })]); + + return { success: false, error: String(error) }; + } + } + + /** + * Abandon a DLQ payload (give up on retrying) + */ + async abandonPayload(id: string, reason: string, abandonedBy: string): Promise { + const result = await this.pool.query(` + UPDATE raw_payloads_dlq + SET status = 'abandoned', + resolved_at = NOW(), + resolved_by = $2, + resolution_notes = $3 + WHERE id = $1 AND status = 'pending' + RETURNING id + `, [id, abandonedBy, reason]); + + return (result.rowCount || 0) > 0; + } + + /** + * Bulk retry payloads by error type + */ + async bulkRetryByErrorType(errorType: string): Promise<{ retried: number; failed: number }> { + const payloads = await this.listPayloads({ status: 'pending', errorType, limit: 100 }); + + let retried = 0; + let failed = 0; + + for (const payload of payloads.payloads) { + const result = await this.retryPayload(payload.id); + if (result.success) { + retried++; + } else { + failed++; + } + } + + return { retried, failed }; + } + + /** + * Get DLQ summary by error type + */ + async getSummary(): Promise> { + const result = await this.pool.query(`SELECT * FROM v_dlq_summary`); + + return result.rows.map(row => ({ + status: row.status, + lastErrorType: row.last_error_type, + count: parseInt(row.count), + oldest: row.oldest, + newest: row.newest, + })); + } + + /** + * Cleanup old resolved DLQ entries + */ + async cleanupResolved(daysOld: number = 30): Promise { + const result = await this.pool.query(` + DELETE FROM raw_payloads_dlq + WHERE status IN ('resolved', 'abandoned') + AND resolved_at < NOW() - ($1 || ' days')::INTERVAL + `, [daysOld]); + + return result.rowCount || 0; + } +} diff --git a/backend/src/system/services/index.ts b/backend/src/system/services/index.ts new file mode 100644 index 00000000..e6c95d0f --- /dev/null +++ b/backend/src/system/services/index.ts @@ -0,0 +1,12 @@ +/** + * System Services Index + * + * Phase 5: Full Production Sync + Monitoring + */ + +export { SyncOrchestrator, type SyncStatus, type QueueDepth, type SyncRunMetrics, type OrchestratorStatus } from './sync-orchestrator'; +export { MetricsService, ERROR_TYPES, type Metric, type MetricTimeSeries, type ErrorBucket, type ErrorType } from './metrics'; +export { DLQService, type DLQPayload, type DLQStats } from './dlq'; +export { AlertService, type SystemAlert, type AlertSummary, type AlertSeverity, type AlertStatus } from './alerts'; +export { IntegrityService, type IntegrityCheckResult, type IntegrityRunSummary, type CheckStatus } from './integrity'; +export { AutoFixService, type FixRunResult, type FixRoutineName } from './auto-fix'; diff --git a/backend/src/system/services/integrity.ts b/backend/src/system/services/integrity.ts new file mode 100644 index 00000000..b3785741 --- /dev/null +++ b/backend/src/system/services/integrity.ts @@ -0,0 +1,548 @@ +/** + * Integrity Verification Service + * + * Performs data integrity checks: + * - Store-level product count drift + * - Brand-level SKU drift + * - Category mapping inconsistencies + * - Snapshot continuity + * - Price/potency missing anomalies + * - Cross-state SKU collisions + * - Orphaned records + * + * Phase 5: Full Production Sync + Monitoring + */ + +import { Pool } from 'pg'; +import { AlertService } from './alerts'; + +export type CheckStatus = 'passed' | 'failed' | 'warning' | 'skipped'; + +export interface IntegrityCheckResult { + checkName: string; + checkCategory: string; + status: CheckStatus; + expectedValue: string | null; + actualValue: string | null; + difference: string | null; + affectedCount: number; + details: Record; + affectedIds: (string | number)[]; + canAutoFix: boolean; + fixRoutine: string | null; +} + +export interface IntegrityRunSummary { + runId: string; + checkType: string; + triggeredBy: string; + startedAt: Date; + finishedAt: Date | null; + status: string; + totalChecks: number; + passedChecks: number; + failedChecks: number; + warningChecks: number; + results: IntegrityCheckResult[]; +} + +export class IntegrityService { + private pool: Pool; + private alerts: AlertService; + + constructor(pool: Pool, alerts: AlertService) { + this.pool = pool; + this.alerts = alerts; + } + + /** + * Run all integrity checks + */ + async runAllChecks(triggeredBy: string = 'system'): Promise { + const runId = crypto.randomUUID(); + const startedAt = new Date(); + + // Create run record + await this.pool.query(` + INSERT INTO integrity_check_runs (run_id, check_type, triggered_by, started_at, status) + VALUES ($1, 'full', $2, $3, 'running') + `, [runId, triggeredBy, startedAt]); + + const results: IntegrityCheckResult[] = []; + + try { + // Run all checks + results.push(await this.checkStoreProductCountDrift()); + results.push(await this.checkBrandSkuDrift()); + results.push(await this.checkCategoryMappingInconsistencies()); + results.push(await this.checkSnapshotContinuity()); + results.push(await this.checkPriceMissingAnomalies()); + results.push(await this.checkPotencyMissingAnomalies()); + results.push(await this.checkCrossStateSkuCollisions()); + results.push(await this.checkOrphanedSnapshots()); + + // Save results + for (const result of results) { + await this.saveCheckResult(runId, result); + } + + const passed = results.filter(r => r.status === 'passed').length; + const failed = results.filter(r => r.status === 'failed').length; + const warnings = results.filter(r => r.status === 'warning').length; + + // Update run record + await this.pool.query(` + UPDATE integrity_check_runs + SET status = 'completed', + finished_at = NOW(), + total_checks = $2, + passed_checks = $3, + failed_checks = $4, + warning_checks = $5 + WHERE run_id = $1 + `, [runId, results.length, passed, failed, warnings]); + + // Create alerts for failures + if (failed > 0) { + await this.alerts.createAlert( + 'INTEGRITY_CHECK_FAILED', + 'error', + `Integrity check failed: ${failed} check(s) failed`, + `Run ID: ${runId}`, + 'integrity-service' + ); + } + + return { + runId, + checkType: 'full', + triggeredBy, + startedAt, + finishedAt: new Date(), + status: 'completed', + totalChecks: results.length, + passedChecks: passed, + failedChecks: failed, + warningChecks: warnings, + results, + }; + } catch (error) { + await this.pool.query(` + UPDATE integrity_check_runs + SET status = 'failed', + finished_at = NOW() + WHERE run_id = $1 + `, [runId]); + + throw error; + } + } + + /** + * Save a check result + */ + private async saveCheckResult(runId: string, result: IntegrityCheckResult): Promise { + await this.pool.query(` + INSERT INTO integrity_check_results ( + run_id, check_name, check_category, status, + expected_value, actual_value, difference, affected_count, + details, affected_ids, can_auto_fix, fix_routine + ) + VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12) + `, [ + runId, + result.checkName, + result.checkCategory, + result.status, + result.expectedValue, + result.actualValue, + result.difference, + result.affectedCount, + JSON.stringify(result.details), + JSON.stringify(result.affectedIds), + result.canAutoFix, + result.fixRoutine, + ]); + } + + /** + * Check: Store-level product count drift + */ + async checkStoreProductCountDrift(): Promise { + const result = await this.pool.query(` + WITH current_counts AS ( + SELECT dispensary_id, COUNT(*) as product_count + FROM dutchie_products + GROUP BY dispensary_id + ), + crawl_counts AS ( + SELECT + dispensary_id, + products_found as last_crawl_count + FROM dispensary_crawl_jobs + WHERE status = 'completed' + AND job_type = 'dutchie_product_crawl' + AND completed_at >= NOW() - INTERVAL '24 hours' + ORDER BY completed_at DESC + ), + comparison AS ( + SELECT + cc.dispensary_id, + cc.product_count as current, + crj.last_crawl_count as expected, + ABS(cc.product_count - crj.last_crawl_count) as diff, + ABS(cc.product_count - crj.last_crawl_count)::float / NULLIF(crj.last_crawl_count, 0) * 100 as drift_pct + FROM current_counts cc + JOIN ( + SELECT DISTINCT ON (dispensary_id) * + FROM crawl_counts + ) crj ON cc.dispensary_id = crj.dispensary_id + WHERE crj.last_crawl_count IS NOT NULL + ) + SELECT * FROM comparison + WHERE drift_pct > 10 OR diff > 50 + ORDER BY drift_pct DESC + `); + + const affectedIds = result.rows.map(r => r.dispensary_id); + const maxDrift = result.rows.length > 0 ? Math.max(...result.rows.map(r => parseFloat(r.drift_pct) || 0)) : 0; + + return { + checkName: 'store_product_count_drift', + checkCategory: 'data_consistency', + status: result.rows.length === 0 ? 'passed' : maxDrift > 20 ? 'failed' : 'warning', + expectedValue: 'No drift > 10%', + actualValue: result.rows.length > 0 ? `${result.rows.length} stores with drift > 10%` : 'All stores OK', + difference: maxDrift > 0 ? `Max drift: ${maxDrift.toFixed(1)}%` : null, + affectedCount: result.rows.length, + details: { stores: result.rows.slice(0, 10) }, + affectedIds, + canAutoFix: false, + fixRoutine: null, + }; + } + + /** + * Check: Brand-level SKU drift + */ + async checkBrandSkuDrift(): Promise { + const result = await this.pool.query(` + WITH current_brands AS ( + SELECT brand_name, COUNT(*) as sku_count + FROM dutchie_products + WHERE brand_name IS NOT NULL + GROUP BY brand_name + ), + snapshot_brands AS ( + SELECT + brand_name, + total_skus as snapshot_count + FROM brand_snapshots + WHERE snapshot_date = (SELECT MAX(snapshot_date) FROM brand_snapshots) + ), + comparison AS ( + SELECT + cb.brand_name, + cb.sku_count as current, + sb.snapshot_count as expected, + ABS(cb.sku_count - sb.snapshot_count) as diff, + ABS(cb.sku_count - sb.snapshot_count)::float / NULLIF(sb.snapshot_count, 0) * 100 as drift_pct + FROM current_brands cb + LEFT JOIN snapshot_brands sb ON cb.brand_name = sb.brand_name + WHERE sb.snapshot_count IS NOT NULL + ) + SELECT * FROM comparison + WHERE drift_pct > 20 OR diff > 20 + ORDER BY drift_pct DESC + LIMIT 20 + `); + + const affectedBrands = result.rows.map(r => r.brand_name); + + return { + checkName: 'brand_sku_drift', + checkCategory: 'data_consistency', + status: result.rows.length === 0 ? 'passed' : 'warning', + expectedValue: 'No brand SKU drift > 20%', + actualValue: `${result.rows.length} brands with drift`, + difference: null, + affectedCount: result.rows.length, + details: { brands: result.rows }, + affectedIds: affectedBrands, + canAutoFix: false, + fixRoutine: null, + }; + } + + /** + * Check: Category mapping inconsistencies + */ + async checkCategoryMappingInconsistencies(): Promise { + const result = await this.pool.query(` + SELECT type as category, COUNT(*) as count + FROM dutchie_products + WHERE type IS NULL OR type = '' OR type = 'Unknown' + GROUP BY type + `); + + const unmapped = result.rows.reduce((sum, r) => sum + parseInt(r.count), 0); + + return { + checkName: 'category_mapping_inconsistencies', + checkCategory: 'data_quality', + status: unmapped === 0 ? 'passed' : unmapped > 100 ? 'warning' : 'passed', + expectedValue: '0 unmapped categories', + actualValue: `${unmapped} products with missing/unknown category`, + difference: unmapped > 0 ? `${unmapped} unmapped` : null, + affectedCount: unmapped, + details: { categories: result.rows }, + affectedIds: [], + canAutoFix: true, + fixRoutine: 'reconcile_category_mismatches', + }; + } + + /** + * Check: Snapshot continuity (no missing dates) + */ + async checkSnapshotContinuity(): Promise { + const result = await this.pool.query(` + WITH date_series AS ( + SELECT generate_series( + (SELECT MIN(DATE(crawled_at)) FROM dutchie_product_snapshots), + CURRENT_DATE, + '1 day'::INTERVAL + )::DATE as expected_date + ), + actual_dates AS ( + SELECT DISTINCT DATE(crawled_at) as snapshot_date + FROM dutchie_product_snapshots + ), + missing AS ( + SELECT ds.expected_date + FROM date_series ds + LEFT JOIN actual_dates ad ON ds.expected_date = ad.snapshot_date + WHERE ad.snapshot_date IS NULL + AND ds.expected_date < CURRENT_DATE + AND ds.expected_date >= CURRENT_DATE - INTERVAL '30 days' + ) + SELECT * FROM missing ORDER BY expected_date DESC + `); + + const missingDates = result.rows.map(r => r.expected_date); + + return { + checkName: 'snapshot_continuity', + checkCategory: 'data_completeness', + status: missingDates.length === 0 ? 'passed' : missingDates.length > 3 ? 'failed' : 'warning', + expectedValue: 'No missing snapshot dates', + actualValue: `${missingDates.length} missing dates`, + difference: missingDates.length > 0 ? `Missing: ${missingDates.slice(0, 5).join(', ')}` : null, + affectedCount: missingDates.length, + details: { missingDates: missingDates.slice(0, 10) }, + affectedIds: [], + canAutoFix: true, + fixRoutine: 'rehydrate_missing_snapshots', + }; + } + + /** + * Check: Price missing anomalies + */ + async checkPriceMissingAnomalies(): Promise { + const result = await this.pool.query(` + SELECT + COUNT(*) as total_products, + COUNT(*) FILTER (WHERE extract_min_price(latest_raw_payload) IS NULL) as missing_price, + COUNT(*) FILTER (WHERE extract_min_price(latest_raw_payload) IS NULL)::float / + NULLIF(COUNT(*), 0) * 100 as missing_pct + FROM dutchie_products + `); + + const row = result.rows[0]; + const missingPct = parseFloat(row.missing_pct) || 0; + const missingCount = parseInt(row.missing_price) || 0; + + return { + checkName: 'price_missing_anomaly', + checkCategory: 'data_quality', + status: missingPct < 5 ? 'passed' : missingPct < 15 ? 'warning' : 'failed', + expectedValue: '< 5% products missing price', + actualValue: `${missingPct.toFixed(1)}% (${missingCount} products)`, + difference: missingPct > 5 ? `${(missingPct - 5).toFixed(1)}% above threshold` : null, + affectedCount: missingCount, + details: { totalProducts: parseInt(row.total_products), missingPct }, + affectedIds: [], + canAutoFix: false, + fixRoutine: null, + }; + } + + /** + * Check: Potency missing anomalies + */ + async checkPotencyMissingAnomalies(): Promise { + const result = await this.pool.query(` + SELECT + type, + COUNT(*) as total, + COUNT(*) FILTER ( + WHERE (latest_raw_payload->>'potencyTHC') IS NULL + AND (latest_raw_payload->'potencyTHC'->>'formatted') IS NULL + ) as missing_thc, + COUNT(*) FILTER ( + WHERE (latest_raw_payload->>'potencyCBD') IS NULL + AND (latest_raw_payload->'potencyCBD'->>'formatted') IS NULL + ) as missing_cbd + FROM dutchie_products + WHERE type IN ('Flower', 'Concentrate', 'Vaporizers', 'Pre-Rolls') + GROUP BY type + `); + + let totalMissing = 0; + const details: Record = {}; + + for (const row of result.rows) { + const missingThc = parseInt(row.missing_thc) || 0; + totalMissing += missingThc; + details[row.type] = { + total: parseInt(row.total), + missingThc, + missingCbd: parseInt(row.missing_cbd), + }; + } + + return { + checkName: 'potency_missing_anomaly', + checkCategory: 'data_quality', + status: totalMissing === 0 ? 'passed' : totalMissing > 500 ? 'warning' : 'passed', + expectedValue: 'THC potency on cannabis products', + actualValue: `${totalMissing} cannabis products missing THC`, + difference: null, + affectedCount: totalMissing, + details, + affectedIds: [], + canAutoFix: false, + fixRoutine: null, + }; + } + + /** + * Check: Cross-state SKU collisions + */ + async checkCrossStateSkuCollisions(): Promise { + const result = await this.pool.query(` + SELECT + dp.external_product_id, + ARRAY_AGG(DISTINCT d.state) as states, + COUNT(DISTINCT d.state) as state_count + FROM dutchie_products dp + JOIN dispensaries d ON dp.dispensary_id = d.id + WHERE d.state IS NOT NULL + GROUP BY dp.external_product_id + HAVING COUNT(DISTINCT d.state) > 1 + LIMIT 100 + `); + + const collisions = result.rows.length; + const affectedIds = result.rows.map(r => r.external_product_id); + + return { + checkName: 'cross_state_sku_collision', + checkCategory: 'data_integrity', + status: collisions === 0 ? 'passed' : collisions > 10 ? 'warning' : 'passed', + expectedValue: 'No cross-state SKU collisions', + actualValue: `${collisions} SKUs appear in multiple states`, + difference: null, + affectedCount: collisions, + details: { collisions: result.rows.slice(0, 10) }, + affectedIds, + canAutoFix: true, + fixRoutine: 'repair_cross_state_sku_conflicts', + }; + } + + /** + * Check: Orphaned snapshots + */ + async checkOrphanedSnapshots(): Promise { + const result = await this.pool.query(` + SELECT COUNT(*) as orphaned + FROM dutchie_product_snapshots dps + LEFT JOIN dutchie_products dp ON dps.dutchie_product_id = dp.id + WHERE dp.id IS NULL + `); + + const orphaned = parseInt(result.rows[0]?.orphaned) || 0; + + return { + checkName: 'orphaned_snapshots', + checkCategory: 'data_integrity', + status: orphaned === 0 ? 'passed' : orphaned > 100 ? 'warning' : 'passed', + expectedValue: '0 orphaned snapshots', + actualValue: `${orphaned} orphaned snapshots`, + difference: null, + affectedCount: orphaned, + details: {}, + affectedIds: [], + canAutoFix: true, + fixRoutine: 'purge_orphaned_rows', + }; + } + + /** + * Get recent integrity check runs + */ + async getRecentRuns(limit: number = 10): Promise { + const result = await this.pool.query(` + SELECT + run_id, check_type, triggered_by, started_at, finished_at, + status, total_checks, passed_checks, failed_checks, warning_checks + FROM integrity_check_runs + ORDER BY started_at DESC + LIMIT $1 + `, [limit]); + + return result.rows.map(row => ({ + runId: row.run_id, + checkType: row.check_type, + triggeredBy: row.triggered_by, + startedAt: row.started_at, + finishedAt: row.finished_at, + status: row.status, + totalChecks: row.total_checks, + passedChecks: row.passed_checks, + failedChecks: row.failed_checks, + warningChecks: row.warning_checks, + results: [], + })); + } + + /** + * Get results for a specific run + */ + async getRunResults(runId: string): Promise { + const result = await this.pool.query(` + SELECT * + FROM integrity_check_results + WHERE run_id = $1 + ORDER BY + CASE status WHEN 'failed' THEN 0 WHEN 'warning' THEN 1 ELSE 2 END, + check_name + `, [runId]); + + return result.rows.map(row => ({ + checkName: row.check_name, + checkCategory: row.check_category, + status: row.status, + expectedValue: row.expected_value, + actualValue: row.actual_value, + difference: row.difference, + affectedCount: row.affected_count, + details: row.details || {}, + affectedIds: row.affected_ids || [], + canAutoFix: row.can_auto_fix, + fixRoutine: row.fix_routine, + })); + } +} diff --git a/backend/src/system/services/metrics.ts b/backend/src/system/services/metrics.ts new file mode 100644 index 00000000..14763573 --- /dev/null +++ b/backend/src/system/services/metrics.ts @@ -0,0 +1,397 @@ +/** + * Metrics Service + * + * Provides Prometheus-style metrics tracking: + * - Time-series metrics storage + * - Current metrics snapshot + * - Error classification and counting + * - Prometheus-compatible /metrics endpoint format + * + * Phase 5: Full Production Sync + Monitoring + */ + +import { Pool } from 'pg'; + +export interface Metric { + name: string; + value: number; + labels: Record; + updatedAt: Date; +} + +export interface MetricTimeSeries { + name: string; + points: Array<{ + value: number; + labels: Record; + recordedAt: Date; + }>; +} + +export interface ErrorBucket { + id: number; + errorType: string; + errorMessage: string; + sourceTable: string | null; + sourceId: string | null; + dispensaryId: number | null; + stateCode: string | null; + context: Record; + occurredAt: Date; + acknowledged: boolean; +} + +// Standard error types +export const ERROR_TYPES = { + INGESTION_PARSE_ERROR: 'INGESTION_PARSE_ERROR', + NORMALIZATION_ERROR: 'NORMALIZATION_ERROR', + HYDRATION_UPSERT_ERROR: 'HYDRATION_UPSERT_ERROR', + MISSING_BRAND_MAP: 'MISSING_BRAND_MAP', + MISSING_CATEGORY_MAP: 'MISSING_CATEGORY_MAP', + STATE_MISMATCH: 'STATE_MISMATCH', + DUPLICATE_EXTERNAL_ID: 'DUPLICATE_EXTERNAL_ID', + DEAD_LETTER_QUEUE: 'DEAD_LETTER_QUEUE', +} as const; + +export type ErrorType = keyof typeof ERROR_TYPES; + +export class MetricsService { + private pool: Pool; + + constructor(pool: Pool) { + this.pool = pool; + } + + /** + * Record a metric value + */ + async recordMetric( + name: string, + value: number, + labels: Record = {} + ): Promise { + await this.pool.query(`SELECT record_metric($1, $2, $3)`, [ + name, + value, + JSON.stringify(labels), + ]); + } + + /** + * Get current metric value + */ + async getMetric(name: string): Promise { + const result = await this.pool.query(` + SELECT metric_name, metric_value, labels, updated_at + FROM system_metrics_current + WHERE metric_name = $1 + `, [name]); + + if (result.rows.length === 0) return null; + + const row = result.rows[0]; + return { + name: row.metric_name, + value: parseFloat(row.metric_value), + labels: row.labels || {}, + updatedAt: row.updated_at, + }; + } + + /** + * Get all current metrics + */ + async getAllMetrics(): Promise { + const result = await this.pool.query(` + SELECT metric_name, metric_value, labels, updated_at + FROM system_metrics_current + ORDER BY metric_name + `); + + return result.rows.map(row => ({ + name: row.metric_name, + value: parseFloat(row.metric_value), + labels: row.labels || {}, + updatedAt: row.updated_at, + })); + } + + /** + * Get metric time series + */ + async getMetricHistory( + name: string, + hours: number = 24 + ): Promise { + const result = await this.pool.query(` + SELECT metric_value, labels, recorded_at + FROM system_metrics + WHERE metric_name = $1 + AND recorded_at >= NOW() - ($2 || ' hours')::INTERVAL + ORDER BY recorded_at ASC + `, [name, hours]); + + return { + name, + points: result.rows.map(row => ({ + value: parseFloat(row.metric_value), + labels: row.labels || {}, + recordedAt: row.recorded_at, + })), + }; + } + + /** + * Record an error + */ + async recordError( + type: string, + message: string, + sourceTable?: string, + sourceId?: string, + dispensaryId?: number, + context: Record = {} + ): Promise { + const result = await this.pool.query(` + SELECT record_error($1, $2, $3, $4, $5, $6) as id + `, [type, message, sourceTable, sourceId, dispensaryId, JSON.stringify(context)]); + + return result.rows[0].id; + } + + /** + * Get error summary + */ + async getErrorSummary(): Promise> { + const result = await this.pool.query(`SELECT * FROM v_error_summary`); + + return result.rows.map(row => ({ + errorType: row.error_type, + count: parseInt(row.count), + unacknowledged: parseInt(row.unacknowledged), + firstOccurred: row.first_occurred, + lastOccurred: row.last_occurred, + })); + } + + /** + * Get recent errors + */ + async getRecentErrors( + limit: number = 50, + errorType?: string + ): Promise { + const params: (string | number)[] = [limit]; + let typeCondition = ''; + + if (errorType) { + typeCondition = 'AND error_type = $2'; + params.push(errorType); + } + + const result = await this.pool.query(` + SELECT + id, error_type, error_message, source_table, source_id, + dispensary_id, state_code, context, occurred_at, acknowledged + FROM error_buckets + WHERE occurred_at >= NOW() - INTERVAL '24 hours' + ${typeCondition} + ORDER BY occurred_at DESC + LIMIT $1 + `, params); + + return result.rows.map(row => ({ + id: row.id, + errorType: row.error_type, + errorMessage: row.error_message, + sourceTable: row.source_table, + sourceId: row.source_id, + dispensaryId: row.dispensary_id, + stateCode: row.state_code, + context: row.context || {}, + occurredAt: row.occurred_at, + acknowledged: row.acknowledged, + })); + } + + /** + * Acknowledge errors + */ + async acknowledgeErrors(ids: number[], acknowledgedBy: string): Promise { + const result = await this.pool.query(` + UPDATE error_buckets + SET acknowledged = TRUE, + acknowledged_at = NOW(), + acknowledged_by = $2 + WHERE id = ANY($1) AND acknowledged = FALSE + `, [ids, acknowledgedBy]); + + return result.rowCount || 0; + } + + /** + * Get Prometheus-compatible metrics output + */ + async getPrometheusMetrics(): Promise { + const metrics = await this.getAllMetrics(); + const lines: string[] = []; + + for (const metric of metrics) { + const name = metric.name.replace(/-/g, '_'); + const labels = Object.entries(metric.labels) + .map(([k, v]) => `${k}="${v}"`) + .join(','); + + const labelStr = labels ? `{${labels}}` : ''; + lines.push(`# HELP ${name} CannaiQ metric`); + lines.push(`# TYPE ${name} gauge`); + lines.push(`${name}${labelStr} ${metric.value}`); + } + + // Add computed metrics + const errorSummary = await this.getErrorSummary(); + lines.push('# HELP cannaiq_errors_total Total errors by type'); + lines.push('# TYPE cannaiq_errors_total counter'); + for (const error of errorSummary) { + lines.push(`cannaiq_errors_total{type="${error.errorType}"} ${error.count}`); + } + + // Add queue metrics from database + const queueResult = await this.pool.query(` + SELECT + COUNT(*) FILTER (WHERE processed = FALSE) as unprocessed, + COUNT(*) FILTER (WHERE processed = TRUE AND normalized_at >= NOW() - INTERVAL '24 hours') as processed_today + FROM raw_payloads + `); + + if (queueResult.rows.length > 0) { + lines.push('# HELP cannaiq_payloads_unprocessed Unprocessed payloads in queue'); + lines.push('# TYPE cannaiq_payloads_unprocessed gauge'); + lines.push(`cannaiq_payloads_unprocessed ${queueResult.rows[0].unprocessed}`); + + lines.push('# HELP cannaiq_payloads_processed_today Payloads processed in last 24h'); + lines.push('# TYPE cannaiq_payloads_processed_today counter'); + lines.push(`cannaiq_payloads_processed_today ${queueResult.rows[0].processed_today}`); + } + + // Add DLQ metrics + const dlqResult = await this.pool.query(` + SELECT COUNT(*) as pending FROM raw_payloads_dlq WHERE status = 'pending' + `); + lines.push('# HELP cannaiq_dlq_pending Payloads pending in DLQ'); + lines.push('# TYPE cannaiq_dlq_pending gauge'); + lines.push(`cannaiq_dlq_pending ${dlqResult.rows[0].pending}`); + + // Add sync metrics + const syncResult = await this.pool.query(` + SELECT + status, + consecutive_failures, + last_run_duration_ms, + last_run_payloads_processed + FROM sync_orchestrator_state + WHERE id = 1 + `); + + if (syncResult.rows.length > 0) { + const sync = syncResult.rows[0]; + lines.push('# HELP cannaiq_orchestrator_running Is orchestrator running'); + lines.push('# TYPE cannaiq_orchestrator_running gauge'); + lines.push(`cannaiq_orchestrator_running ${sync.status === 'RUNNING' ? 1 : 0}`); + + lines.push('# HELP cannaiq_orchestrator_failures Consecutive failures'); + lines.push('# TYPE cannaiq_orchestrator_failures gauge'); + lines.push(`cannaiq_orchestrator_failures ${sync.consecutive_failures}`); + + lines.push('# HELP cannaiq_last_sync_duration_ms Last sync duration'); + lines.push('# TYPE cannaiq_last_sync_duration_ms gauge'); + lines.push(`cannaiq_last_sync_duration_ms ${sync.last_run_duration_ms || 0}`); + } + + // Add data volume metrics + const volumeResult = await this.pool.query(` + SELECT + (SELECT COUNT(*) FROM dutchie_products) as products, + (SELECT COUNT(*) FROM dutchie_product_snapshots) as snapshots, + (SELECT COUNT(*) FROM dispensaries WHERE menu_type = 'dutchie') as stores + `); + + if (volumeResult.rows.length > 0) { + const vol = volumeResult.rows[0]; + lines.push('# HELP cannaiq_products_total Total products'); + lines.push('# TYPE cannaiq_products_total gauge'); + lines.push(`cannaiq_products_total ${vol.products}`); + + lines.push('# HELP cannaiq_snapshots_total Total snapshots'); + lines.push('# TYPE cannaiq_snapshots_total gauge'); + lines.push(`cannaiq_snapshots_total ${vol.snapshots}`); + + lines.push('# HELP cannaiq_stores_total Total active stores'); + lines.push('# TYPE cannaiq_stores_total gauge'); + lines.push(`cannaiq_stores_total ${vol.stores}`); + } + + return lines.join('\n'); + } + + /** + * Calculate and update throughput metrics + */ + async updateThroughputMetrics(): Promise { + // Calculate payloads per minute over last hour + const throughputResult = await this.pool.query(` + SELECT + COUNT(*) as total, + COUNT(*)::float / 60 as per_minute + FROM raw_payloads + WHERE normalized_at >= NOW() - INTERVAL '1 hour' + AND processed = TRUE + `); + + if (throughputResult.rows.length > 0) { + await this.recordMetric('throughput_payloads_per_minute', throughputResult.rows[0].per_minute); + } + + // Calculate average hydration time + const latencyResult = await this.pool.query(` + SELECT + AVG(EXTRACT(EPOCH FROM (normalized_at - fetched_at)) * 1000) as avg_latency_ms + FROM raw_payloads + WHERE normalized_at >= NOW() - INTERVAL '1 hour' + AND processed = TRUE + AND normalized_at IS NOT NULL + `); + + if (latencyResult.rows.length > 0 && latencyResult.rows[0].avg_latency_ms) { + await this.recordMetric('ingestion_latency_avg_ms', latencyResult.rows[0].avg_latency_ms); + } + + // Calculate success rate + const successResult = await this.pool.query(` + SELECT + COUNT(*) FILTER (WHERE processed = TRUE AND hydration_error IS NULL) as success, + COUNT(*) FILTER (WHERE processed = TRUE) as total + FROM raw_payloads + WHERE fetched_at >= NOW() - INTERVAL '1 hour' + `); + + if (successResult.rows.length > 0 && successResult.rows[0].total > 0) { + const rate = (successResult.rows[0].success / successResult.rows[0].total) * 100; + await this.recordMetric('hydration_success_rate', rate); + } + } + + /** + * Cleanup old metrics + */ + async cleanup(): Promise { + const result = await this.pool.query(`SELECT cleanup_old_metrics() as deleted`); + return result.rows[0].deleted; + } +} diff --git a/backend/src/system/services/sync-orchestrator.ts b/backend/src/system/services/sync-orchestrator.ts new file mode 100644 index 00000000..4af427e8 --- /dev/null +++ b/backend/src/system/services/sync-orchestrator.ts @@ -0,0 +1,910 @@ +/** + * Production Sync Orchestrator + * + * Central controller responsible for: + * - Detecting new raw payloads + * - Running hydration jobs + * - Verifying upserts + * - Calculating diffs (before/after snapshot change detection) + * - Triggering analytics pre-compute updates + * - Scheduling catch-up runs + * - Ensuring no double hydration runs (distributed lock) + * + * Phase 5: Full Production Sync + Monitoring + */ + +import { Pool } from 'pg'; +import { MetricsService } from './metrics'; +import { DLQService } from './dlq'; +import { AlertService } from './alerts'; + +export type OrchestratorStatus = 'RUNNING' | 'SLEEPING' | 'LOCKED' | 'PAUSED' | 'ERROR'; + +export interface OrchestratorConfig { + batchSize: number; + pollIntervalMs: number; + maxRetries: number; + lockTimeoutMs: number; + enableAnalyticsPrecompute: boolean; + enableIntegrityChecks: boolean; +} + +export interface SyncRunMetrics { + payloadsQueued: number; + payloadsProcessed: number; + payloadsSkipped: number; + payloadsFailed: number; + payloadsDlq: number; + productsUpserted: number; + productsInserted: number; + productsUpdated: number; + productsDiscontinued: number; + snapshotsCreated: number; +} + +export interface SyncStatus { + orchestratorStatus: OrchestratorStatus; + currentWorkerId: string | null; + lastHeartbeatAt: Date | null; + isPaused: boolean; + pauseReason: string | null; + consecutiveFailures: number; + lastRunStartedAt: Date | null; + lastRunCompletedAt: Date | null; + lastRunDurationMs: number | null; + lastRunPayloadsProcessed: number; + lastRunErrors: number; + config: OrchestratorConfig; + unprocessedPayloads: number; + dlqPending: number; + activeAlerts: number; + runs24h: { + total: number; + completed: number; + failed: number; + }; +} + +export interface QueueDepth { + unprocessed: number; + byState: Record; + byPlatform: Record; + oldestPayloadAge: number | null; // milliseconds + estimatedProcessingTime: number | null; // milliseconds +} + +const DEFAULT_CONFIG: OrchestratorConfig = { + batchSize: 50, + pollIntervalMs: 5000, + maxRetries: 3, + lockTimeoutMs: 300000, // 5 minutes + enableAnalyticsPrecompute: true, + enableIntegrityChecks: true, +}; + +export class SyncOrchestrator { + private pool: Pool; + private metrics: MetricsService; + private dlq: DLQService; + private alerts: AlertService; + private workerId: string; + private isRunning: boolean = false; + private pollInterval: NodeJS.Timeout | null = null; + + constructor( + pool: Pool, + metrics: MetricsService, + dlq: DLQService, + alerts: AlertService, + workerId?: string + ) { + this.pool = pool; + this.metrics = metrics; + this.dlq = dlq; + this.alerts = alerts; + this.workerId = workerId || `orchestrator-${process.env.HOSTNAME || process.pid}`; + } + + /** + * Get current sync status + */ + async getStatus(): Promise { + const result = await this.pool.query(`SELECT * FROM v_sync_status`); + + if (result.rows.length === 0) { + return { + orchestratorStatus: 'SLEEPING', + currentWorkerId: null, + lastHeartbeatAt: null, + isPaused: false, + pauseReason: null, + consecutiveFailures: 0, + lastRunStartedAt: null, + lastRunCompletedAt: null, + lastRunDurationMs: null, + lastRunPayloadsProcessed: 0, + lastRunErrors: 0, + config: DEFAULT_CONFIG, + unprocessedPayloads: 0, + dlqPending: 0, + activeAlerts: 0, + runs24h: { total: 0, completed: 0, failed: 0 }, + }; + } + + const row = result.rows[0]; + return { + orchestratorStatus: row.orchestrator_status as OrchestratorStatus, + currentWorkerId: row.current_worker_id, + lastHeartbeatAt: row.last_heartbeat_at, + isPaused: row.is_paused, + pauseReason: row.pause_reason, + consecutiveFailures: row.consecutive_failures, + lastRunStartedAt: row.last_run_started_at, + lastRunCompletedAt: row.last_run_completed_at, + lastRunDurationMs: row.last_run_duration_ms, + lastRunPayloadsProcessed: row.last_run_payloads_processed, + lastRunErrors: row.last_run_errors, + config: row.config || DEFAULT_CONFIG, + unprocessedPayloads: parseInt(row.unprocessed_payloads) || 0, + dlqPending: parseInt(row.dlq_pending) || 0, + activeAlerts: parseInt(row.active_alerts) || 0, + runs24h: row.runs_24h || { total: 0, completed: 0, failed: 0 }, + }; + } + + /** + * Get queue depth information + */ + async getQueueDepth(): Promise { + const [countResult, byStateResult, byPlatformResult, oldestResult] = await Promise.all([ + this.pool.query(` + SELECT COUNT(*) as count FROM raw_payloads WHERE processed = FALSE + `), + this.pool.query(` + SELECT + COALESCE(d.state, 'unknown') as state, + COUNT(*) as count + FROM raw_payloads rp + LEFT JOIN dispensaries d ON rp.dispensary_id = d.id + WHERE rp.processed = FALSE + GROUP BY d.state + `), + this.pool.query(` + SELECT platform, COUNT(*) as count + FROM raw_payloads + WHERE processed = FALSE + GROUP BY platform + `), + this.pool.query(` + SELECT fetched_at FROM raw_payloads + WHERE processed = FALSE + ORDER BY fetched_at ASC + LIMIT 1 + `), + ]); + + const unprocessed = parseInt(countResult.rows[0]?.count) || 0; + const byState: Record = {}; + byStateResult.rows.forEach(r => { + byState[r.state] = parseInt(r.count); + }); + const byPlatform: Record = {}; + byPlatformResult.rows.forEach(r => { + byPlatform[r.platform] = parseInt(r.count); + }); + + const oldestPayloadAge = oldestResult.rows.length > 0 + ? Date.now() - new Date(oldestResult.rows[0].fetched_at).getTime() + : null; + + // Estimate processing time based on recent throughput + const throughputResult = await this.pool.query(` + SELECT + COALESCE(AVG(payloads_processed::float / NULLIF(duration_ms, 0) * 1000), 10) as payloads_per_sec + FROM sync_runs + WHERE status = 'completed' + AND started_at >= NOW() - INTERVAL '1 hour' + AND duration_ms > 0 + `); + const payloadsPerSec = parseFloat(throughputResult.rows[0]?.payloads_per_sec) || 10; + const estimatedProcessingTime = unprocessed > 0 + ? Math.round((unprocessed / payloadsPerSec) * 1000) + : null; + + return { + unprocessed, + byState, + byPlatform, + oldestPayloadAge, + estimatedProcessingTime, + }; + } + + /** + * Acquire distributed lock + */ + private async acquireLock(): Promise { + const lockName = 'sync_orchestrator'; + const lockTimeout = DEFAULT_CONFIG.lockTimeoutMs; + + const result = await this.pool.query(` + INSERT INTO hydration_locks (lock_name, worker_id, acquired_at, expires_at, heartbeat_at) + VALUES ($1, $2, NOW(), NOW() + ($3 || ' milliseconds')::INTERVAL, NOW()) + ON CONFLICT (lock_name) DO UPDATE SET + worker_id = EXCLUDED.worker_id, + acquired_at = EXCLUDED.acquired_at, + expires_at = EXCLUDED.expires_at, + heartbeat_at = EXCLUDED.heartbeat_at + WHERE hydration_locks.expires_at < NOW() + OR hydration_locks.worker_id = $2 + RETURNING id + `, [lockName, this.workerId, lockTimeout]); + + return result.rows.length > 0; + } + + /** + * Release distributed lock + */ + private async releaseLock(): Promise { + await this.pool.query(` + DELETE FROM hydration_locks + WHERE lock_name = 'sync_orchestrator' AND worker_id = $1 + `, [this.workerId]); + } + + /** + * Update lock heartbeat + */ + private async refreshLock(): Promise { + const result = await this.pool.query(` + UPDATE hydration_locks + SET heartbeat_at = NOW(), + expires_at = NOW() + ($2 || ' milliseconds')::INTERVAL + WHERE lock_name = 'sync_orchestrator' AND worker_id = $1 + RETURNING id + `, [this.workerId, DEFAULT_CONFIG.lockTimeoutMs]); + + return result.rows.length > 0; + } + + /** + * Update orchestrator state + */ + private async updateState(status: OrchestratorStatus, metrics?: Partial): Promise { + await this.pool.query(` + UPDATE sync_orchestrator_state + SET status = $1, + current_worker_id = $2, + last_heartbeat_at = NOW(), + updated_at = NOW() + ${metrics?.payloadsProcessed !== undefined ? ', last_run_payloads_processed = $3' : ''} + ${metrics?.payloadsFailed !== undefined ? ', last_run_errors = $4' : ''} + WHERE id = 1 + `, [ + status, + this.workerId, + metrics?.payloadsProcessed, + metrics?.payloadsFailed, + ].filter(v => v !== undefined)); + } + + /** + * Run a single sync cycle + */ + async runSync(): Promise { + const startTime = Date.now(); + const runId = crypto.randomUUID(); + + // Check if paused + const status = await this.getStatus(); + if (status.isPaused) { + throw new Error(`Orchestrator is paused: ${status.pauseReason}`); + } + + // Try to acquire lock + const hasLock = await this.acquireLock(); + if (!hasLock) { + throw new Error('Could not acquire orchestrator lock - another instance is running'); + } + + const metrics: SyncRunMetrics = { + payloadsQueued: 0, + payloadsProcessed: 0, + payloadsSkipped: 0, + payloadsFailed: 0, + payloadsDlq: 0, + productsUpserted: 0, + productsInserted: 0, + productsUpdated: 0, + productsDiscontinued: 0, + snapshotsCreated: 0, + }; + + try { + await this.updateState('RUNNING'); + + // Create sync run record + await this.pool.query(` + INSERT INTO sync_runs (run_id, worker_id, status) + VALUES ($1, $2, 'running') + `, [runId, this.workerId]); + + // Get unprocessed payloads + const queueDepth = await this.getQueueDepth(); + metrics.payloadsQueued = queueDepth.unprocessed; + + // Process in batches + const config = status.config; + let hasMore = true; + let batchCount = 0; + + while (hasMore && batchCount < 100) { // Safety limit + batchCount++; + + // Refresh lock + await this.refreshLock(); + + // Get batch of payloads + const payloadsResult = await this.pool.query(` + SELECT + rp.id, rp.dispensary_id, rp.raw_json, rp.platform, + rp.product_count, rp.pricing_type, rp.crawl_mode, + rp.hydration_attempts, rp.fetched_at, + d.state, d.name as dispensary_name + FROM raw_payloads rp + LEFT JOIN dispensaries d ON rp.dispensary_id = d.id + WHERE rp.processed = FALSE + ORDER BY rp.fetched_at ASC + LIMIT $1 + FOR UPDATE SKIP LOCKED + `, [config.batchSize]); + + if (payloadsResult.rows.length === 0) { + hasMore = false; + break; + } + + // Process each payload + for (const payload of payloadsResult.rows) { + try { + const result = await this.processPayload(payload, config); + metrics.payloadsProcessed++; + metrics.productsUpserted += result.productsUpserted; + metrics.productsInserted += result.productsInserted; + metrics.productsUpdated += result.productsUpdated; + metrics.snapshotsCreated += result.snapshotsCreated; + } catch (error) { + const errorMessage = error instanceof Error ? error.message : String(error); + + // Check if should move to DLQ + if (payload.hydration_attempts >= config.maxRetries - 1) { + await this.dlq.movePayloadToDlq( + payload.id, + this.classifyError(error), + errorMessage + ); + metrics.payloadsDlq++; + } else { + // Increment attempts and record error + await this.pool.query(` + UPDATE raw_payloads + SET hydration_attempts = hydration_attempts + 1, + hydration_error = $2 + WHERE id = $1 + `, [payload.id, errorMessage]); + } + + metrics.payloadsFailed++; + + await this.metrics.recordError( + this.classifyError(error), + errorMessage, + 'raw_payloads', + payload.id, + payload.dispensary_id + ); + } + } + + // Update metrics after each batch + await this.metrics.recordMetric('payloads_processed_today', metrics.payloadsProcessed); + } + + // Update metrics + await this.metrics.recordMetric('payloads_unprocessed', metrics.payloadsQueued - metrics.payloadsProcessed); + await this.metrics.recordMetric('canonical_rows_inserted', metrics.productsInserted); + await this.metrics.recordMetric('canonical_rows_updated', metrics.productsUpdated); + await this.metrics.recordMetric('snapshot_volume', metrics.snapshotsCreated); + + // Calculate success rate + const successRate = metrics.payloadsProcessed > 0 + ? ((metrics.payloadsProcessed - metrics.payloadsFailed) / metrics.payloadsProcessed) * 100 + : 100; + await this.metrics.recordMetric('hydration_success_rate', successRate); + + // Trigger analytics precompute if enabled + if (config.enableAnalyticsPrecompute && metrics.payloadsProcessed > 0) { + await this.triggerAnalyticsUpdate(); + } + + // Complete sync run + const duration = Date.now() - startTime; + await this.pool.query(` + UPDATE sync_runs + SET status = 'completed', + finished_at = NOW(), + duration_ms = $2, + payloads_queued = $3, + payloads_processed = $4, + payloads_failed = $5, + payloads_dlq = $6, + products_upserted = $7, + products_inserted = $8, + products_updated = $9, + snapshots_created = $10 + WHERE run_id = $1 + `, [ + runId, duration, + metrics.payloadsQueued, metrics.payloadsProcessed, + metrics.payloadsFailed, metrics.payloadsDlq, + metrics.productsUpserted, metrics.productsInserted, + metrics.productsUpdated, metrics.snapshotsCreated, + ]); + + // Update orchestrator state + await this.pool.query(` + UPDATE sync_orchestrator_state + SET status = 'SLEEPING', + last_run_started_at = $1, + last_run_completed_at = NOW(), + last_run_duration_ms = $2, + last_run_payloads_processed = $3, + last_run_errors = $4, + consecutive_failures = 0, + updated_at = NOW() + WHERE id = 1 + `, [new Date(startTime), duration, metrics.payloadsProcessed, metrics.payloadsFailed]); + + return metrics; + } catch (error) { + // Record failure + const errorMessage = error instanceof Error ? error.message : String(error); + + await this.pool.query(` + UPDATE sync_runs + SET status = 'failed', + finished_at = NOW(), + error_summary = $2 + WHERE run_id = $1 + `, [runId, errorMessage]); + + await this.pool.query(` + UPDATE sync_orchestrator_state + SET status = 'ERROR', + consecutive_failures = consecutive_failures + 1, + updated_at = NOW() + WHERE id = 1 + `); + + await this.alerts.createAlert( + 'SYNC_FAILURE', + 'error', + 'Sync run failed', + errorMessage, + 'sync-orchestrator' + ); + + throw error; + } finally { + await this.releaseLock(); + } + } + + /** + * Process a single payload + */ + private async processPayload( + payload: any, + _config: OrchestratorConfig + ): Promise<{ + productsUpserted: number; + productsInserted: number; + productsUpdated: number; + snapshotsCreated: number; + }> { + const startTime = Date.now(); + + // Parse products from raw JSON + const rawData = payload.raw_json; + const products = this.extractProducts(rawData); + + if (!products || products.length === 0) { + // Mark as processed with warning + await this.pool.query(` + UPDATE raw_payloads + SET processed = TRUE, + normalized_at = NOW(), + hydration_error = 'No products found in payload' + WHERE id = $1 + `, [payload.id]); + + return { productsUpserted: 0, productsInserted: 0, productsUpdated: 0, snapshotsCreated: 0 }; + } + + // Upsert products to canonical table + const result = await this.upsertProducts(payload.dispensary_id, products); + + // Create snapshots + const snapshotsCreated = await this.createSnapshots(payload.dispensary_id, products, payload.id); + + // Calculate latency + const latencyMs = Date.now() - new Date(payload.fetched_at).getTime(); + await this.metrics.recordMetric('ingestion_latency_avg_ms', latencyMs); + + // Mark payload as processed + await this.pool.query(` + UPDATE raw_payloads + SET processed = TRUE, + normalized_at = NOW() + WHERE id = $1 + `, [payload.id]); + + return { + productsUpserted: result.upserted, + productsInserted: result.inserted, + productsUpdated: result.updated, + snapshotsCreated, + }; + } + + /** + * Extract products from raw payload + */ + private extractProducts(rawData: any): any[] { + // Handle different payload formats + if (Array.isArray(rawData)) { + return rawData; + } + + // Dutchie format + if (rawData.products) { + return rawData.products; + } + + // Nested data format + if (rawData.data?.products) { + return rawData.data.products; + } + + if (rawData.data?.filteredProducts?.products) { + return rawData.data.filteredProducts.products; + } + + return []; + } + + /** + * Upsert products to canonical table + */ + private async upsertProducts( + dispensaryId: number, + products: any[] + ): Promise<{ upserted: number; inserted: number; updated: number }> { + let inserted = 0; + let updated = 0; + + // Process in chunks + const chunkSize = 100; + for (let i = 0; i < products.length; i += chunkSize) { + const chunk = products.slice(i, i + chunkSize); + + for (const product of chunk) { + const externalId = product.id || product.externalId || product.product_id; + if (!externalId) continue; + + const result = await this.pool.query(` + INSERT INTO dutchie_products ( + dispensary_id, external_product_id, name, brand_name, type, + latest_raw_payload, updated_at + ) + VALUES ($1, $2, $3, $4, $5, $6, NOW()) + ON CONFLICT (dispensary_id, external_product_id) + DO UPDATE SET + name = EXCLUDED.name, + brand_name = EXCLUDED.brand_name, + type = EXCLUDED.type, + latest_raw_payload = EXCLUDED.latest_raw_payload, + updated_at = NOW() + RETURNING (xmax = 0) as is_insert + `, [ + dispensaryId, + externalId, + product.name || product.Name, + product.brand || product.Brand || product.brandName, + product.type || product.Type || product.category, + JSON.stringify(product), + ]); + + if (result.rows[0]?.is_insert) { + inserted++; + } else { + updated++; + } + } + } + + return { upserted: inserted + updated, inserted, updated }; + } + + /** + * Create product snapshots + */ + private async createSnapshots( + dispensaryId: number, + products: any[], + payloadId: string + ): Promise { + let created = 0; + + // Get product IDs + const externalIds = products + .map(p => p.id || p.externalId || p.product_id) + .filter(Boolean); + + if (externalIds.length === 0) return 0; + + const productsResult = await this.pool.query(` + SELECT id, external_product_id FROM dutchie_products + WHERE dispensary_id = $1 AND external_product_id = ANY($2) + `, [dispensaryId, externalIds]); + + const productIdMap = new Map(); + productsResult.rows.forEach(r => { + productIdMap.set(r.external_product_id, r.id); + }); + + // Insert snapshots in chunks + const chunkSize = 100; + for (let i = 0; i < products.length; i += chunkSize) { + const chunk = products.slice(i, i + chunkSize); + const values: any[] = []; + const placeholders: string[] = []; + let paramIndex = 1; + + for (const product of chunk) { + const externalId = product.id || product.externalId || product.product_id; + const productId = productIdMap.get(externalId); + if (!productId) continue; + + // Extract pricing + const prices = product.Prices || product.prices || []; + const recPrice = prices.find((p: any) => p.pricingType === 'rec' || !p.pricingType); + + values.push( + productId, + dispensaryId, + payloadId, + recPrice?.price ? Math.round(recPrice.price * 100) : null, + product.potencyCBD?.formatted || null, + product.potencyTHC?.formatted || null, + product.Status === 'Active' ? 'in_stock' : 'out_of_stock' + ); + + placeholders.push(`($${paramIndex++}, $${paramIndex++}, $${paramIndex++}, $${paramIndex++}, $${paramIndex++}, $${paramIndex++}, $${paramIndex++}, NOW())`); + } + + if (placeholders.length > 0) { + await this.pool.query(` + INSERT INTO dutchie_product_snapshots ( + dutchie_product_id, dispensary_id, crawl_run_id, + rec_min_price_cents, cbd_content, thc_content, stock_status, crawled_at + ) + VALUES ${placeholders.join(', ')} + `, values); + + created += placeholders.length; + } + } + + return created; + } + + /** + * Classify error type + */ + private classifyError(error: unknown): string { + const message = error instanceof Error ? error.message.toLowerCase() : String(error).toLowerCase(); + + if (message.includes('parse') || message.includes('json')) { + return 'INGESTION_PARSE_ERROR'; + } + if (message.includes('normalize') || message.includes('transform')) { + return 'NORMALIZATION_ERROR'; + } + if (message.includes('upsert') || message.includes('insert') || message.includes('duplicate')) { + return 'HYDRATION_UPSERT_ERROR'; + } + if (message.includes('brand')) { + return 'MISSING_BRAND_MAP'; + } + if (message.includes('category')) { + return 'MISSING_CATEGORY_MAP'; + } + if (message.includes('state')) { + return 'STATE_MISMATCH'; + } + if (message.includes('external_id') || message.includes('external_product_id')) { + return 'DUPLICATE_EXTERNAL_ID'; + } + + return 'HYDRATION_ERROR'; + } + + /** + * Trigger analytics precompute + */ + private async triggerAnalyticsUpdate(): Promise { + try { + // Capture brand snapshots + await this.pool.query(`SELECT capture_brand_snapshots()`); + + // Capture category snapshots + await this.pool.query(`SELECT capture_category_snapshots()`); + + // Refresh materialized views if they exist + try { + await this.pool.query(`REFRESH MATERIALIZED VIEW CONCURRENTLY v_brand_summary`); + } catch { + // View might not exist, ignore + } + + console.log('[SyncOrchestrator] Analytics precompute completed'); + } catch (error) { + console.warn('[SyncOrchestrator] Analytics precompute failed:', error); + } + } + + /** + * Pause orchestrator + */ + async pause(reason: string): Promise { + await this.pool.query(` + UPDATE sync_orchestrator_state + SET is_paused = TRUE, + pause_reason = $1, + updated_at = NOW() + WHERE id = 1 + `, [reason]); + + await this.alerts.createAlert( + 'ORCHESTRATOR_PAUSED', + 'warning', + 'Sync orchestrator paused', + reason, + 'sync-orchestrator' + ); + } + + /** + * Resume orchestrator + */ + async resume(): Promise { + await this.pool.query(` + UPDATE sync_orchestrator_state + SET is_paused = FALSE, + pause_reason = NULL, + updated_at = NOW() + WHERE id = 1 + `); + + await this.alerts.resolveAlert('ORCHESTRATOR_PAUSED'); + } + + /** + * Get health status + */ + async getHealth(): Promise<{ + healthy: boolean; + checks: Record; + }> { + const checks: Record = {}; + + // Check database connection + try { + await this.pool.query('SELECT 1'); + checks.database = { status: 'ok', message: 'Database connection healthy' }; + } catch (error) { + checks.database = { status: 'error', message: `Database error: ${error}` }; + } + + // Check orchestrator state + const status = await this.getStatus(); + if (status.isPaused) { + checks.orchestrator = { status: 'warning', message: `Paused: ${status.pauseReason}` }; + } else if (status.consecutiveFailures > 5) { + checks.orchestrator = { status: 'error', message: `${status.consecutiveFailures} consecutive failures` }; + } else { + checks.orchestrator = { status: 'ok', message: `Status: ${status.orchestratorStatus}` }; + } + + // Check queue depth + const queue = await this.getQueueDepth(); + if (queue.unprocessed > 1000) { + checks.queue = { status: 'warning', message: `${queue.unprocessed} unprocessed payloads` }; + } else { + checks.queue = { status: 'ok', message: `${queue.unprocessed} unprocessed payloads` }; + } + + // Check DLQ + const dlqStats = await this.dlq.getStats(); + if (dlqStats.pending > 100) { + checks.dlq = { status: 'warning', message: `${dlqStats.pending} payloads in DLQ` }; + } else if (dlqStats.pending > 0) { + checks.dlq = { status: 'ok', message: `${dlqStats.pending} payloads in DLQ` }; + } else { + checks.dlq = { status: 'ok', message: 'DLQ empty' }; + } + + // Check latency + const latencyResult = await this.pool.query(` + SELECT metric_value FROM system_metrics_current + WHERE metric_name = 'ingestion_latency_avg_ms' + `); + const latency = parseFloat(latencyResult.rows[0]?.metric_value) || 0; + if (latency > 300000) { // 5 minutes + checks.latency = { status: 'error', message: `Ingestion latency: ${Math.round(latency / 1000)}s` }; + } else if (latency > 60000) { // 1 minute + checks.latency = { status: 'warning', message: `Ingestion latency: ${Math.round(latency / 1000)}s` }; + } else { + checks.latency = { status: 'ok', message: `Ingestion latency: ${Math.round(latency / 1000)}s` }; + } + + const healthy = Object.values(checks).every(c => c.status !== 'error'); + + return { healthy, checks }; + } + + /** + * Start continuous sync loop + */ + start(): void { + if (this.isRunning) return; + + this.isRunning = true; + console.log(`[SyncOrchestrator] Starting with worker ID: ${this.workerId}`); + + const poll = async () => { + if (!this.isRunning) return; + + try { + const status = await this.getStatus(); + + if (!status.isPaused && status.unprocessedPayloads > 0) { + await this.runSync(); + } + } catch (error) { + console.error('[SyncOrchestrator] Sync error:', error); + } + + if (this.isRunning) { + this.pollInterval = setTimeout(poll, DEFAULT_CONFIG.pollIntervalMs); + } + }; + + poll(); + } + + /** + * Stop continuous sync loop + */ + stop(): void { + this.isRunning = false; + if (this.pollInterval) { + clearTimeout(this.pollInterval); + this.pollInterval = null; + } + console.log('[SyncOrchestrator] Stopped'); + } +} diff --git a/backend/src/utils/GeoUtils.ts b/backend/src/utils/GeoUtils.ts new file mode 100644 index 00000000..fcb892c3 --- /dev/null +++ b/backend/src/utils/GeoUtils.ts @@ -0,0 +1,166 @@ +/** + * GeoUtils + * + * Simple geographic utility functions for distance calculations and coordinate validation. + * All calculations are done locally - no external API calls. + */ + +/** + * Earth's radius in kilometers + */ +const EARTH_RADIUS_KM = 6371; + +/** + * Calculate the Haversine distance between two points on Earth. + * Returns distance in kilometers. + * + * @param lat1 Latitude of point 1 in degrees + * @param lon1 Longitude of point 1 in degrees + * @param lat2 Latitude of point 2 in degrees + * @param lon2 Longitude of point 2 in degrees + * @returns Distance in kilometers + */ +export function haversineDistance( + lat1: number, + lon1: number, + lat2: number, + lon2: number +): number { + const toRad = (deg: number) => (deg * Math.PI) / 180; + + const dLat = toRad(lat2 - lat1); + const dLon = toRad(lon2 - lon1); + + const a = + Math.sin(dLat / 2) * Math.sin(dLat / 2) + + Math.cos(toRad(lat1)) * Math.cos(toRad(lat2)) * + Math.sin(dLon / 2) * Math.sin(dLon / 2); + + const c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1 - a)); + + return EARTH_RADIUS_KM * c; +} + +/** + * Check if coordinates are valid (basic bounds check). + * + * @param lat Latitude in degrees + * @param lon Longitude in degrees + * @returns true if coordinates are within valid Earth bounds + */ +export function isCoordinateValid(lat: number | null, lon: number | null): boolean { + if (lat === null || lon === null) return false; + if (typeof lat !== 'number' || typeof lon !== 'number') return false; + if (isNaN(lat) || isNaN(lon)) return false; + + // Valid latitude: -90 to 90 + // Valid longitude: -180 to 180 + return lat >= -90 && lat <= 90 && lon >= -180 && lon <= 180; +} + +/** + * Calculate a bounding box around a point for a given radius. + * Useful for pre-filtering database queries before applying haversine. + * + * @param lat Center latitude in degrees + * @param lon Center longitude in degrees + * @param radiusKm Radius in kilometers + * @returns Bounding box with min/max lat/lon + */ +export function boundingBox( + lat: number, + lon: number, + radiusKm: number +): { minLat: number; maxLat: number; minLon: number; maxLon: number } { + // Approximate degrees per km + // 1 degree latitude ≈ 111 km + // 1 degree longitude ≈ 111 km * cos(latitude) + const latDelta = radiusKm / 111; + const lonDelta = radiusKm / (111 * Math.cos((lat * Math.PI) / 180)); + + return { + minLat: lat - latDelta, + maxLat: lat + latDelta, + minLon: lon - lonDelta, + maxLon: lon + lonDelta, + }; +} + +/** + * Check if a coordinate is within the continental US bounds (rough heuristic). + * Does NOT include Alaska or Hawaii for simplicity. + * + * @param lat Latitude in degrees + * @param lon Longitude in degrees + * @returns true if within rough continental US bounds + */ +export function isWithinContinentalUS(lat: number, lon: number): boolean { + // Rough bounds for continental US: + // Latitude: 24.5 (south Florida) to 49.5 (Canadian border) + // Longitude: -125 (west coast) to -66 (east coast) + return lat >= 24.5 && lat <= 49.5 && lon >= -125 && lon <= -66; +} + +/** + * Check if a coordinate is within Alaska bounds (rough heuristic). + * + * @param lat Latitude in degrees + * @param lon Longitude in degrees + * @returns true if within rough Alaska bounds + */ +export function isWithinAlaska(lat: number, lon: number): boolean { + // Rough bounds for Alaska: + // Latitude: 51 to 72 + // Longitude: -180 to -130 (wraps around international date line) + return lat >= 51 && lat <= 72 && lon >= -180 && lon <= -130; +} + +/** + * Check if a coordinate is within Hawaii bounds (rough heuristic). + * + * @param lat Latitude in degrees + * @param lon Longitude in degrees + * @returns true if within rough Hawaii bounds + */ +export function isWithinHawaii(lat: number, lon: number): boolean { + // Rough bounds for Hawaii: + // Latitude: 18.5 to 22.5 + // Longitude: -161 to -154 + return lat >= 18.5 && lat <= 22.5 && lon >= -161 && lon <= -154; +} + +/** + * Check if a coordinate is within US bounds (continental + Alaska + Hawaii). + * + * @param lat Latitude in degrees + * @param lon Longitude in degrees + * @returns true if within any US region + */ +export function isWithinUS(lat: number, lon: number): boolean { + return isWithinContinentalUS(lat, lon) || isWithinAlaska(lat, lon) || isWithinHawaii(lat, lon); +} + +/** + * Check if a coordinate is within Canada bounds (rough heuristic). + * + * @param lat Latitude in degrees + * @param lon Longitude in degrees + * @returns true if within rough Canada bounds + */ +export function isWithinCanada(lat: number, lon: number): boolean { + // Rough bounds for Canada: + // Latitude: 41.7 (southern Ontario) to 83 (northern territories) + // Longitude: -141 (Yukon) to -52 (Newfoundland) + return lat >= 41.7 && lat <= 83 && lon >= -141 && lon <= -52; +} + +export default { + haversineDistance, + isCoordinateValid, + boundingBox, + isWithinContinentalUS, + isWithinAlaska, + isWithinHawaii, + isWithinUS, + isWithinCanada, +}; diff --git a/backend/src/utils/image-storage.ts b/backend/src/utils/image-storage.ts index cabb4f6a..f99ac6d2 100644 --- a/backend/src/utils/image-storage.ts +++ b/backend/src/utils/image-storage.ts @@ -18,7 +18,18 @@ import * as path from 'path'; import { createHash } from 'crypto'; // Base path for image storage - configurable via env -const IMAGES_BASE_PATH = process.env.IMAGES_PATH || '/app/public/images'; +// Uses project-relative paths by default, NOT /app or other privileged paths +function getImagesBasePath(): string { + // Priority: IMAGES_PATH > STORAGE_BASE_PATH/images > ./storage/images + if (process.env.IMAGES_PATH) { + return process.env.IMAGES_PATH; + } + if (process.env.STORAGE_BASE_PATH) { + return path.join(process.env.STORAGE_BASE_PATH, 'images'); + } + return './storage/images'; +} +const IMAGES_BASE_PATH = getImagesBasePath(); // Public URL base for serving images const IMAGES_PUBLIC_URL = process.env.IMAGES_PUBLIC_URL || '/images'; @@ -276,13 +287,29 @@ export async function deleteProductImages( } } +// Track whether image storage is available +let imageStorageReady = false; + +export function isImageStorageReady(): boolean { + return imageStorageReady; +} + /** * Initialize the image storage directories + * Does NOT throw on failure - logs warning and continues */ export async function initializeImageStorage(): Promise { - await ensureDir(path.join(IMAGES_BASE_PATH, 'products')); - await ensureDir(path.join(IMAGES_BASE_PATH, 'brands')); - console.log(`✅ Image storage initialized at ${IMAGES_BASE_PATH}`); + try { + await ensureDir(path.join(IMAGES_BASE_PATH, 'products')); + await ensureDir(path.join(IMAGES_BASE_PATH, 'brands')); + console.log(`✅ Image storage initialized at ${IMAGES_BASE_PATH}`); + imageStorageReady = true; + } catch (error: any) { + console.warn(`⚠️ WARNING: Could not initialize image storage at ${IMAGES_BASE_PATH}: ${error.message}`); + console.warn(' Image upload/processing is disabled. Server will continue without image features.'); + imageStorageReady = false; + // Do NOT throw - server should still start + } } /** diff --git a/backend/src/utils/minio.ts b/backend/src/utils/minio.ts index d2e142f5..a68f1bc8 100755 --- a/backend/src/utils/minio.ts +++ b/backend/src/utils/minio.ts @@ -7,13 +7,32 @@ import * as path from 'path'; let minioClient: Minio.Client | null = null; +// Track whether image storage is available +let imageStorageAvailable = false; + // Check if MinIO is configured export function isMinioEnabled(): boolean { return !!process.env.MINIO_ENDPOINT; } +// Check if image storage (MinIO or local) is available +export function isImageStorageAvailable(): boolean { + return imageStorageAvailable; +} + // Local storage path for images when MinIO is not configured -const LOCAL_IMAGES_PATH = process.env.LOCAL_IMAGES_PATH || '/app/public/images'; +// Uses getter to allow dotenv to load before first access +// Defaults to project-relative path, NOT /app +function getLocalImagesPath(): string { + // Priority: LOCAL_IMAGES_PATH > STORAGE_BASE_PATH/images > ./storage/images + if (process.env.LOCAL_IMAGES_PATH) { + return process.env.LOCAL_IMAGES_PATH; + } + if (process.env.STORAGE_BASE_PATH) { + return path.join(process.env.STORAGE_BASE_PATH, 'images'); + } + return './storage/images'; +} function getMinioClient(): Minio.Client { if (!minioClient) { @@ -31,22 +50,51 @@ function getMinioClient(): Minio.Client { const BUCKET_NAME = process.env.MINIO_BUCKET || 'dutchie'; export async function initializeMinio() { - // Skip MinIO initialization if not configured - if (!isMinioEnabled()) { - console.log('ℹ️ MinIO not configured (MINIO_ENDPOINT not set), using local filesystem storage'); + // When STORAGE_DRIVER=local, skip MinIO initialization entirely + // The storage-adapter.ts and local-storage.ts handle all storage operations + if (process.env.STORAGE_DRIVER === 'local') { + console.log('ℹ️ Using local storage driver (STORAGE_DRIVER=local)'); - // Ensure local images directory exists + // Use STORAGE_BASE_PATH for images when in local mode + const storagePath = process.env.STORAGE_BASE_PATH || './storage'; + const imagesPath = path.join(storagePath, 'images'); try { - await fs.mkdir(LOCAL_IMAGES_PATH, { recursive: true }); - await fs.mkdir(path.join(LOCAL_IMAGES_PATH, 'products'), { recursive: true }); - console.log(`✅ Local images directory ready: ${LOCAL_IMAGES_PATH}`); - } catch (error) { - console.error('❌ Failed to create local images directory:', error); - throw error; + await fs.mkdir(imagesPath, { recursive: true }); + await fs.mkdir(path.join(imagesPath, 'products'), { recursive: true }); + console.log(`✅ Local images directory ready: ${imagesPath}`); + imageStorageAvailable = true; + } catch (error: any) { + console.warn(`⚠️ WARNING: Could not create local images directory at ${imagesPath}: ${error.message}`); + console.warn(' Image upload/processing is disabled. Server will continue without image features.'); + imageStorageAvailable = false; + // Do NOT throw - server should still start } return; } + // Skip MinIO initialization if not configured + if (!isMinioEnabled()) { + console.log('ℹ️ MinIO not configured (MINIO_ENDPOINT not set), using local filesystem storage'); + + // Ensure local images directory exists (use project-relative path) + const localPath = getLocalImagesPath(); + console.log(`ℹ️ Local image storage path: ${localPath}`); + + try { + await fs.mkdir(localPath, { recursive: true }); + await fs.mkdir(path.join(localPath, 'products'), { recursive: true }); + console.log(`✅ Local images directory ready: ${localPath}`); + imageStorageAvailable = true; + } catch (error: any) { + console.warn(`⚠️ WARNING: Could not create local images directory at ${localPath}: ${error.message}`); + console.warn(' Image upload/processing is disabled. Server will continue without image features.'); + imageStorageAvailable = false; + // Do NOT throw - server should still start + } + return; + } + + // MinIO is configured - try to initialize try { const client = getMinioClient(); // Check if bucket exists @@ -55,7 +103,7 @@ export async function initializeMinio() { if (!exists) { // Create bucket await client.makeBucket(BUCKET_NAME, 'us-east-1'); - console.log(`✅ Minio bucket created: ${BUCKET_NAME}`); + console.log(`✅ MinIO bucket created: ${BUCKET_NAME}`); // Set public read policy const policy = { @@ -73,11 +121,14 @@ export async function initializeMinio() { await client.setBucketPolicy(BUCKET_NAME, JSON.stringify(policy)); console.log(`✅ Bucket policy set to public read`); } else { - console.log(`✅ Minio bucket already exists: ${BUCKET_NAME}`); + console.log(`✅ MinIO bucket already exists: ${BUCKET_NAME}`); } - } catch (error) { - console.error('❌ Minio initialization error:', error); - throw error; + imageStorageAvailable = true; + } catch (error: any) { + console.warn(`⚠️ WARNING: MinIO initialization failed: ${error.message}`); + console.warn(' Image upload/processing is disabled. Server will continue without image features.'); + imageStorageAvailable = false; + // Do NOT throw - server should still start } } @@ -132,13 +183,13 @@ async function uploadToLocalFilesystem( // Ensure the target directory exists (in case initializeMinio wasn't called) // Extract directory from baseFilename (e.g., 'products/store-slug' or just 'products') - const targetDir = path.join(LOCAL_IMAGES_PATH, path.dirname(baseFilename)); + const targetDir = path.join(getLocalImagesPath(), path.dirname(baseFilename)); await fs.mkdir(targetDir, { recursive: true }); await Promise.all([ - fs.writeFile(path.join(LOCAL_IMAGES_PATH, thumbnailPath), thumbnailBuffer), - fs.writeFile(path.join(LOCAL_IMAGES_PATH, mediumPath), mediumBuffer), - fs.writeFile(path.join(LOCAL_IMAGES_PATH, fullPath), fullBuffer), + fs.writeFile(path.join(getLocalImagesPath(), thumbnailPath), thumbnailBuffer), + fs.writeFile(path.join(getLocalImagesPath(), mediumPath), mediumBuffer), + fs.writeFile(path.join(getLocalImagesPath(), fullPath), fullBuffer), ]); return { @@ -255,7 +306,7 @@ export async function deleteImage(imagePath: string): Promise { const client = getMinioClient(); await client.removeObject(BUCKET_NAME, imagePath); } else { - const fullPath = path.join(LOCAL_IMAGES_PATH, imagePath); + const fullPath = path.join(getLocalImagesPath(), imagePath); await fs.unlink(fullPath); } } catch (error) { diff --git a/backend/src/utils/provider-display.ts b/backend/src/utils/provider-display.ts new file mode 100644 index 00000000..ba3f1da5 --- /dev/null +++ b/backend/src/utils/provider-display.ts @@ -0,0 +1,65 @@ +/** + * Provider Display Names + * + * Maps internal provider identifiers to safe display labels. + * Internal identifiers (menu_type, product_provider, crawler_type) remain unchanged. + * Only the display label shown to users is transformed. + */ + +export const ProviderDisplayNames: Record = { + // All menu providers map to anonymous "Menu Feed" label + dutchie: 'Menu Feed', + treez: 'Menu Feed', + jane: 'Menu Feed', + iheartjane: 'Menu Feed', + blaze: 'Menu Feed', + flowhub: 'Menu Feed', + weedmaps: 'Menu Feed', + leafly: 'Menu Feed', + leaflogix: 'Menu Feed', + tymber: 'Menu Feed', + dispense: 'Menu Feed', + + // Catch-all + unknown: 'Menu Feed', + default: 'Menu Feed', + '': 'Menu Feed', +}; + +/** + * Get the display name for a provider + * @param provider - The internal provider identifier (e.g., 'dutchie', 'treez') + * @returns The safe display label (e.g., 'Embedded Menu') + */ +export function getProviderDisplayName(provider: string | null | undefined): string { + if (!provider) { + return ProviderDisplayNames.default; + } + const normalized = provider.toLowerCase().trim(); + return ProviderDisplayNames[normalized] || ProviderDisplayNames.default; +} + +/** + * Transform a store/dispensary object to include provider_display + * @param obj - Object with provider fields (product_provider, menu_type, etc.) + * @returns Object with provider_raw and provider_display added + */ +export function addProviderDisplay( + obj: T +): T & { provider_raw: string | null; provider_display: string } { + const rawProvider = obj.product_provider || obj.menu_type || null; + return { + ...obj, + provider_raw: rawProvider, + provider_display: getProviderDisplayName(rawProvider), + }; +} + +/** + * Transform an array of store/dispensary objects to include provider_display + */ +export function addProviderDisplayToArray( + arr: T[] +): Array { + return arr.map(addProviderDisplay); +} diff --git a/backend/src/utils/proxyManager.ts b/backend/src/utils/proxyManager.ts index e2e4b4c1..492bd270 100644 --- a/backend/src/utils/proxyManager.ts +++ b/backend/src/utils/proxyManager.ts @@ -1,4 +1,4 @@ -import { pool } from '../db/migrate'; +import { pool } from '../db/pool'; import { logger } from '../services/logger'; interface ProxyConfig { diff --git a/backend/stop-local.sh b/backend/stop-local.sh new file mode 100755 index 00000000..b4d1ac09 --- /dev/null +++ b/backend/stop-local.sh @@ -0,0 +1,72 @@ +#!/bin/bash +# CannaiQ Local Development Shutdown +# +# Stops all local development services: +# - Backend API +# - CannaiQ Admin UI +# - FindADispo Consumer UI +# - Findagram Consumer UI +# +# Note: PostgreSQL container is left running by default. + +set -e + +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +NC='\033[0m' + +echo -e "${YELLOW}Stopping CannaiQ local services...${NC}" + +# Stop backend +if [ -f /tmp/cannaiq-backend.pid ]; then + PID=$(cat /tmp/cannaiq-backend.pid) + if kill -0 $PID 2>/dev/null; then + echo -e "${YELLOW}Stopping Backend API (PID: $PID)...${NC}" + kill $PID 2>/dev/null || true + fi + rm -f /tmp/cannaiq-backend.pid +fi + +# Stop CannaiQ Admin frontend +if [ -f /tmp/cannaiq-frontend.pid ]; then + PID=$(cat /tmp/cannaiq-frontend.pid) + if kill -0 $PID 2>/dev/null; then + echo -e "${YELLOW}Stopping CannaiQ Admin (PID: $PID)...${NC}" + kill $PID 2>/dev/null || true + fi + rm -f /tmp/cannaiq-frontend.pid +fi + +# Stop FindADispo frontend +if [ -f /tmp/findadispo-frontend.pid ]; then + PID=$(cat /tmp/findadispo-frontend.pid) + if kill -0 $PID 2>/dev/null; then + echo -e "${YELLOW}Stopping FindADispo (PID: $PID)...${NC}" + kill $PID 2>/dev/null || true + fi + rm -f /tmp/findadispo-frontend.pid +fi + +# Stop Findagram frontend +if [ -f /tmp/findagram-frontend.pid ]; then + PID=$(cat /tmp/findagram-frontend.pid) + if kill -0 $PID 2>/dev/null; then + echo -e "${YELLOW}Stopping Findagram (PID: $PID)...${NC}" + kill $PID 2>/dev/null || true + fi + rm -f /tmp/findagram-frontend.pid +fi + +# Kill any remaining node processes for this project +pkill -f "vite.*cannaiq" 2>/dev/null || true +pkill -f "vite.*findadispo" 2>/dev/null || true +pkill -f "vite.*findagram" 2>/dev/null || true +pkill -f "tsx.*backend" 2>/dev/null || true + +# Stop PostgreSQL (optional - uncomment if you want to stop DB too) +# docker compose -f docker-compose.local.yml down + +echo -e "${GREEN}All frontend services stopped.${NC}" +echo -e "${YELLOW}Note: PostgreSQL container is still running. To stop it:${NC}" +echo " cd backend && docker compose -f docker-compose.local.yml down" diff --git a/cannaiq/package.json b/cannaiq/package.json index ddfc6415..4d434907 100755 --- a/cannaiq/package.json +++ b/cannaiq/package.json @@ -4,6 +4,7 @@ "type": "module", "scripts": { "dev": "vite --host", + "dev:admin": "vite --host --port 8080", "build": "tsc && vite build", "preview": "vite preview" }, diff --git a/cannaiq/scripts/check-provider-names.sh b/cannaiq/scripts/check-provider-names.sh new file mode 100755 index 00000000..34d4fa45 --- /dev/null +++ b/cannaiq/scripts/check-provider-names.sh @@ -0,0 +1,108 @@ +#!/bin/bash +# Safety check: Block raw provider names from appearing in UI code +# This script should be run as part of CI or pre-commit hooks + +set -e + +RED='\033[0;31m' +GREEN='\033[0;32m' +NC='\033[0m' # No Color + +echo "Checking for raw provider names in UI code..." + +# Provider names that should NOT appear in user-facing strings +BLOCKED_PATTERNS=( + "'dutchie'" + '"dutchie"' + "'treez'" + '"treez"' + "'jane'" + '"jane"' + "'iheartjane'" + '"iheartjane"' + "'blaze'" + '"blaze"' + "'flowhub'" + '"flowhub"' + "'weedmaps'" + '"weedmaps"' + "'leafly'" + '"leafly"' + "'leaflogix'" + '"leaflogix"' + "'tymber'" + '"tymber"' + "'dispense'" + '"dispense"' +) + +# Files to check (React components and pages) +TARGET_DIRS="src/components src/pages" + +ERRORS=0 + +for pattern in "${BLOCKED_PATTERNS[@]}"; do + # Search for the pattern, excluding: + # - provider-display.ts (the mapping file itself) + # - Comments + # - Console logs + # - Variable assignments for internal logic (e.g., === 'dutchie') + + # Get matches that look like they're being displayed (not just compared) + matches=$(grep -rn "$pattern" $TARGET_DIRS 2>/dev/null | \ + grep -v "provider-display" | \ + grep -v "// " | \ + grep -v "menu_type ===" | \ + grep -v "provider_raw ===" | \ + grep -v "=== $pattern" | \ + grep -v "!== $pattern" | \ + grep -v "console\." | \ + grep -v "\.filter(" | \ + grep -v "\.find(" || true) + + if [ -n "$matches" ]; then + # Check if any remaining matches are in JSX context (likely display) + # Look for patterns like: >{provider} or {store.menu_type} + jsx_matches=$(echo "$matches" | grep -E "(>.*$pattern|{.*$pattern)" || true) + + if [ -n "$jsx_matches" ]; then + echo -e "${RED}FOUND potential raw provider name display:${NC}" + echo "$jsx_matches" + ERRORS=$((ERRORS + 1)) + fi + fi +done + +# Also check for direct display of menu_type without using getProviderDisplayName +direct_display=$(grep -rn "disp\.menu_type\}" $TARGET_DIRS 2>/dev/null | \ + grep -v "provider-display" | \ + grep -v "===" | \ + grep -v "!==" || true) + +if [ -n "$direct_display" ]; then + echo -e "${RED}FOUND direct display of menu_type (should use getProviderDisplayName):${NC}" + echo "$direct_display" + ERRORS=$((ERRORS + 1)) +fi + +# Check for store.provider without _display suffix +provider_display=$(grep -rn "store\.provider\}" $TARGET_DIRS 2>/dev/null | \ + grep -v "provider_display" | \ + grep -v "provider_raw" || true) + +if [ -n "$provider_display" ]; then + echo -e "${RED}FOUND direct display of store.provider (should use store.provider_display):${NC}" + echo "$provider_display" + ERRORS=$((ERRORS + 1)) +fi + +if [ $ERRORS -eq 0 ]; then + echo -e "${GREEN}No raw provider names found in UI code.${NC}" + exit 0 +else + echo "" + echo -e "${RED}SAFETY CHECK FAILED: Found $ERRORS potential issues.${NC}" + echo "Provider names should not be displayed directly to users." + echo "Use getProviderDisplayName() or provider_display field instead." + exit 1 +fi diff --git a/cannaiq/src/components/OrchestratorTraceModal.tsx b/cannaiq/src/components/OrchestratorTraceModal.tsx new file mode 100644 index 00000000..1ab3abd0 --- /dev/null +++ b/cannaiq/src/components/OrchestratorTraceModal.tsx @@ -0,0 +1,357 @@ +import { useState, useEffect } from 'react'; +import { + X, + CheckCircle, + XCircle, + Clock, + AlertTriangle, + ChevronDown, + ChevronRight, + FileCode, + Loader2, +} from 'lucide-react'; +import { api } from '../lib/api'; + +interface TraceStep { + step: number; + action: string; + description: string; + timestamp: number; + duration_ms?: number; + input: Record; + output: Record | null; + what: string; + why: string; + where: string; + how: string; + when: string; + status: string; + error?: string; +} + +interface TraceSummary { + id: number; + dispensaryId: number; + runId: string; + profileId: number | null; + profileKey: string | null; + crawlerModule: string | null; + stateAtStart: string; + stateAtEnd: string; + totalSteps: number; + durationMs: number; + success: boolean; + errorMessage: string | null; + productsFound: number; + startedAt: string; + completedAt: string | null; + trace: TraceStep[]; +} + +interface OrchestratorTraceModalProps { + dispensaryId: number; + dispensaryName: string; + isOpen: boolean; + onClose: () => void; +} + +export function OrchestratorTraceModal({ + dispensaryId, + dispensaryName, + isOpen, + onClose, +}: OrchestratorTraceModalProps) { + const [trace, setTrace] = useState(null); + const [loading, setLoading] = useState(false); + const [error, setError] = useState(null); + const [expandedSteps, setExpandedSteps] = useState>(new Set()); + + useEffect(() => { + if (isOpen && dispensaryId) { + loadTrace(); + } + }, [isOpen, dispensaryId]); + + const loadTrace = async () => { + setLoading(true); + setError(null); + try { + const data = await api.getDispensaryTraceLatest(dispensaryId); + setTrace(data); + // Auto-expand failed steps + if (data?.trace) { + const failedSteps = data.trace + .filter((s: TraceStep) => s.status === 'failed') + .map((s: TraceStep) => s.step); + setExpandedSteps(new Set(failedSteps)); + } + } catch (err: any) { + setError(err.message || 'Failed to load trace'); + } finally { + setLoading(false); + } + }; + + const toggleStep = (stepNum: number) => { + setExpandedSteps((prev) => { + const next = new Set(prev); + if (next.has(stepNum)) { + next.delete(stepNum); + } else { + next.add(stepNum); + } + return next; + }); + }; + + const getStatusIcon = (status: string) => { + switch (status) { + case 'completed': + return ; + case 'failed': + return ; + case 'skipped': + return ; + case 'running': + return ; + default: + return ; + } + }; + + const getStatusBadge = (status: string) => { + switch (status) { + case 'completed': + return 'bg-green-100 text-green-800'; + case 'failed': + return 'bg-red-100 text-red-800'; + case 'skipped': + return 'bg-yellow-100 text-yellow-800'; + case 'running': + return 'bg-blue-100 text-blue-800'; + default: + return 'bg-gray-100 text-gray-800'; + } + }; + + const formatDuration = (ms?: number) => { + if (!ms) return '-'; + if (ms < 1000) return `${ms}ms`; + return `${(ms / 1000).toFixed(2)}s`; + }; + + const formatTimestamp = (ts: string) => { + return new Date(ts).toLocaleString(); + }; + + if (!isOpen) return null; + + return ( +
+
+ {/* Header */} +
+
+

Orchestrator Trace

+

{dispensaryName}

+
+ +
+ + {/* Content */} +
+ {loading ? ( +
+ + Loading trace... +
+ ) : error ? ( +
+ +

{error}

+ +
+ ) : trace ? ( +
+ {/* Summary */} +
+
+
+

Status

+

+ {trace.success ? 'Success' : 'Failed'} +

+
+
+

Duration

+

{formatDuration(trace.durationMs)}

+
+
+

Products Found

+

{trace.productsFound}

+
+
+

Total Steps

+

{trace.totalSteps}

+
+
+
+
+

Profile Key

+

{trace.profileKey || '-'}

+
+
+

State

+

+ {trace.stateAtStart} → {trace.stateAtEnd} +

+
+
+

Started At

+

{formatTimestamp(trace.startedAt)}

+
+
+

Run ID

+

+ {trace.runId.slice(0, 8)}... +

+
+
+ {trace.errorMessage && ( +
+

{trace.errorMessage}

+
+ )} +
+ + {/* Steps */} +
+

+ Execution Steps ({trace.trace.length}) +

+
+ {trace.trace.map((step) => ( +
+ {/* Step Header */} + + + {/* Step Details */} + {expandedSteps.has(step.step) && ( +
+
+
+

WHAT

+

{step.what}

+
+
+

WHY

+

{step.why}

+
+
+

WHERE

+
+ +

{step.where}

+
+
+
+

HOW

+

{step.how}

+
+
+

WHEN

+

{step.when}

+
+
+

ACTION

+

{step.action}

+
+
+ + {step.error && ( +
+ Error: {step.error} +
+ )} + + {Object.keys(step.input || {}).length > 0 && ( +
+

INPUT

+
+                                {JSON.stringify(step.input, null, 2)}
+                              
+
+ )} + + {step.output && Object.keys(step.output).length > 0 && ( +
+

OUTPUT

+
+                                {JSON.stringify(step.output, null, 2)}
+                              
+
+ )} +
+ )} +
+ ))} +
+
+
+ ) : ( +
+ +

No trace found for this dispensary

+

+ Run a crawl first to generate a trace +

+
+ )} +
+
+
+ ); +} diff --git a/cannaiq/src/components/StateSelector.tsx b/cannaiq/src/components/StateSelector.tsx new file mode 100644 index 00000000..c30ff37f --- /dev/null +++ b/cannaiq/src/components/StateSelector.tsx @@ -0,0 +1,119 @@ +/** + * StateSelector Component + * + * Global state selector for multi-state navigation. + * Phase 4: Multi-State Expansion + */ + +import { useEffect } from 'react'; +import { MapPin, Globe, ChevronDown } from 'lucide-react'; +import { useStateStore } from '../store/stateStore'; +import { api } from '../lib/api'; + +interface StateSelectorProps { + className?: string; + showLabel?: boolean; +} + +export function StateSelector({ className = '', showLabel = true }: StateSelectorProps) { + const { + selectedState, + availableStates, + isLoading, + setSelectedState, + setAvailableStates, + setLoading, + getStateName, + } = useStateStore(); + + // Fetch available states on mount + useEffect(() => { + const fetchStates = async () => { + setLoading(true); + try { + const response = await api.get('/api/states?active=true'); + // Response: { data: { success, data: { states, count } } } + if (response.data?.data?.states) { + setAvailableStates(response.data.data.states); + } else if (response.data?.states) { + // Handle direct format + setAvailableStates(response.data.states); + } + } catch (error) { + console.error('Failed to fetch states:', error); + } finally { + setLoading(false); + } + }; + + fetchStates(); + }, [setAvailableStates, setLoading]); + + const handleStateChange = (e: React.ChangeEvent) => { + const value = e.target.value; + setSelectedState(value === '' ? null : value); + }; + + return ( +
+ {showLabel && ( + + Region + + )} +
+
+ {selectedState === null ? ( + + ) : ( + + )} +
+ +
+ +
+
+ {selectedState && ( + + {getStateName(selectedState)} + + )} +
+ ); +} + +/** + * Compact state badge for use in headers/cards + */ +export function StateBadge({ className = '' }: { className?: string }) { + const { selectedState, getStateName } = useStateStore(); + + if (selectedState === null) { + return ( + + + National + + ); + } + + return ( + + + {getStateName(selectedState)} + + ); +} diff --git a/cannaiq/src/components/WorkflowStepper.tsx b/cannaiq/src/components/WorkflowStepper.tsx new file mode 100644 index 00000000..e0491953 --- /dev/null +++ b/cannaiq/src/components/WorkflowStepper.tsx @@ -0,0 +1,400 @@ +/** + * WorkflowStepper - Dynamic workflow visualization for crawl traces + * + * Displays crawl phases as a horizontal stepper/timeline. + * Maps trace actions to phases based on crawler_type. + */ + +import { + CheckCircle, + XCircle, + AlertTriangle, + Circle, + Loader2, +} from 'lucide-react'; + +// ===================================================== +// PHASE DEFINITIONS PER CRAWLER TYPE +// ===================================================== + +export interface WorkflowPhase { + key: string; + label: string; + shortLabel: string; + description: string; + // Actions that map to this phase + actions: string[]; +} + +// Dutchie crawler phases +const DUTCHIE_PHASES: WorkflowPhase[] = [ + { + key: 'init', + label: 'Initialize', + shortLabel: 'Init', + description: 'Start crawl run, validate inputs', + actions: ['init', 'start', 'validate_dispensary', 'check_config'], + }, + { + key: 'profile', + label: 'Load Profile', + shortLabel: 'Profile', + description: 'Load crawler profile and configuration', + actions: ['load_profile', 'resolve_module', 'load_config', 'check_profile'], + }, + { + key: 'sandbox', + label: 'Sandbox', + shortLabel: 'Sandbox', + description: 'Sandbox discovery and selector learning', + actions: [ + 'sandbox_discovery', + 'sandbox_test', + 'learn_selectors', + 'discover_products', + 'sandbox_run', + 'sandbox_validate', + ], + }, + { + key: 'fetch', + label: 'Fetch Products', + shortLabel: 'Fetch', + description: 'Fetch products from GraphQL API', + actions: [ + 'fetch_products', + 'fetch_html', + 'graphql_request', + 'api_call', + 'fetch_menu', + 'crawl_products', + 'mode_a_fetch', + 'mode_b_fetch', + ], + }, + { + key: 'write', + label: 'Write Data', + shortLabel: 'Write', + description: 'Write products and snapshots to database', + actions: [ + 'write_snapshots', + 'write_products', + 'upsert_products', + 'insert_snapshots', + 'save_data', + 'batch_write', + ], + }, + { + key: 'validate', + label: 'Validate', + shortLabel: 'Valid', + description: 'Validate crawl results and check thresholds', + actions: [ + 'validation', + 'validate_results', + 'check_thresholds', + 'compare_previous', + 'quality_check', + ], + }, + { + key: 'complete', + label: 'Complete', + shortLabel: 'Done', + description: 'Finalize crawl run and update status', + actions: ['complete', 'finalize', 'update_status', 'cleanup', 'finish'], + }, +]; + +// Generic phases for unknown crawlers +const GENERIC_PHASES: WorkflowPhase[] = [ + { + key: 'init', + label: 'Initialize', + shortLabel: 'Init', + description: 'Start crawl', + actions: ['init', 'start', 'validate'], + }, + { + key: 'profile', + label: 'Profile', + shortLabel: 'Profile', + description: 'Load configuration', + actions: ['load_profile', 'resolve_module', 'load_config'], + }, + { + key: 'fetch', + label: 'Fetch', + shortLabel: 'Fetch', + description: 'Fetch data', + actions: ['fetch', 'crawl', 'request', 'api_call'], + }, + { + key: 'write', + label: 'Write', + shortLabel: 'Write', + description: 'Save data', + actions: ['write', 'save', 'upsert', 'insert'], + }, + { + key: 'complete', + label: 'Complete', + shortLabel: 'Done', + description: 'Finish', + actions: ['complete', 'finalize', 'finish'], + }, +]; + +// Get phases based on crawler type +export function getPhasesForCrawlerType(crawlerType?: string | null): WorkflowPhase[] { + switch (crawlerType?.toLowerCase()) { + case 'dutchie': + return DUTCHIE_PHASES; + // Add more crawler types here + default: + return GENERIC_PHASES; + } +} + +// ===================================================== +// PHASE STATUS TYPES +// ===================================================== + +export type PhaseStatus = 'success' | 'warning' | 'failed' | 'running' | 'not_reached'; + +export interface PhaseResult { + phase: WorkflowPhase; + status: PhaseStatus; + stepCount: number; + failedSteps: number; + warningSteps: number; + firstStepIndex?: number; + lastStepIndex?: number; +} + +// ===================================================== +// TRACE STEP ANALYSIS +// ===================================================== + +interface TraceStep { + step: number; + action: string; + status: string; + error?: string; +} + +export function analyzeTracePhases( + trace: TraceStep[], + crawlerType?: string | null +): PhaseResult[] { + const phases = getPhasesForCrawlerType(crawlerType); + const results: PhaseResult[] = []; + + for (const phase of phases) { + // Find steps that match this phase's actions + const matchingSteps = trace.filter((step) => + phase.actions.some((action) => + step.action.toLowerCase().includes(action.toLowerCase()) + ) + ); + + const stepCount = matchingSteps.length; + const failedSteps = matchingSteps.filter((s) => s.status === 'failed').length; + const warningSteps = matchingSteps.filter( + (s) => s.status === 'skipped' || s.status === 'warning' + ).length; + + let status: PhaseStatus = 'not_reached'; + + if (stepCount > 0) { + if (failedSteps > 0) { + status = 'failed'; + } else if (warningSteps > 0) { + status = 'warning'; + } else { + // Check if all matching steps completed + const completedSteps = matchingSteps.filter( + (s) => s.status === 'completed' || s.status === 'success' + ).length; + status = completedSteps === stepCount ? 'success' : 'running'; + } + } + + results.push({ + phase, + status, + stepCount, + failedSteps, + warningSteps, + firstStepIndex: matchingSteps.length > 0 ? matchingSteps[0].step : undefined, + lastStepIndex: + matchingSteps.length > 0 ? matchingSteps[matchingSteps.length - 1].step : undefined, + }); + } + + return results; +} + +// ===================================================== +// COMPONENT PROPS +// ===================================================== + +interface WorkflowStepperProps { + trace: TraceStep[]; + crawlerType?: string | null; + stateAtStart?: string; + stateAtEnd?: string; + onPhaseClick?: (phaseKey: string, firstStepIndex?: number) => void; + compact?: boolean; +} + +// ===================================================== +// COMPONENT +// ===================================================== + +export function WorkflowStepper({ + trace, + crawlerType, + stateAtStart, + stateAtEnd, + onPhaseClick, + compact = false, +}: WorkflowStepperProps) { + const phaseResults = analyzeTracePhases(trace, crawlerType); + + const getStatusIcon = (status: PhaseStatus, size = 'w-5 h-5') => { + switch (status) { + case 'success': + return ; + case 'failed': + return ; + case 'warning': + return ; + case 'running': + return ; + default: + return ; + } + }; + + const getStatusBg = (status: PhaseStatus) => { + switch (status) { + case 'success': + return 'bg-green-100 border-green-300'; + case 'failed': + return 'bg-red-100 border-red-300'; + case 'warning': + return 'bg-yellow-100 border-yellow-300'; + case 'running': + return 'bg-blue-100 border-blue-300'; + default: + return 'bg-gray-50 border-gray-200'; + } + }; + + const getConnectorColor = (status: PhaseStatus) => { + switch (status) { + case 'success': + return 'bg-green-400'; + case 'failed': + return 'bg-red-400'; + case 'warning': + return 'bg-yellow-400'; + case 'running': + return 'bg-blue-400'; + default: + return 'bg-gray-200'; + } + }; + + // Determine if this is a sandbox or production run + const isSandboxRun = stateAtStart === 'sandbox' || stateAtEnd === 'sandbox'; + const isProductionRun = stateAtStart === 'production' || stateAtEnd === 'production'; + + return ( +
+ {/* Run Type Badge */} +
+
+ + {isSandboxRun ? 'Sandbox Run' : isProductionRun ? 'Production Run' : 'Crawl Run'} + + {stateAtStart && stateAtEnd && stateAtStart !== stateAtEnd && ( + + {stateAtStart} → {stateAtEnd} + + )} +
+ + {crawlerType || 'unknown'} crawler + +
+ + {/* Workflow Steps */} +
+ {phaseResults.map((result, index) => ( +
+ {/* Phase */} + + + {/* Connector */} + {index < phaseResults.length - 1 && ( +
+ )} +
+ ))} +
+ + {/* Sandbox-specific info */} + {isSandboxRun && ( +
+ Sandbox: Discovery & selector learning phases are highlighted. + {stateAtEnd === 'production' && ( + + ✓ Promoted to production + + )} +
+ )} + + {/* Production warning */} + {isProductionRun && stateAtEnd === 'sandbox' && ( +
+ Demoted: This run was demoted from production to sandbox. +
+ )} +
+ ); +} + +export default WorkflowStepper; diff --git a/cannaiq/src/lib/provider-display.ts b/cannaiq/src/lib/provider-display.ts new file mode 100644 index 00000000..6c6e567d --- /dev/null +++ b/cannaiq/src/lib/provider-display.ts @@ -0,0 +1,56 @@ +/** + * Provider Display Names + * + * Maps internal provider identifiers to safe display labels. + * Internal identifiers (menu_type, product_provider, crawler_type) remain unchanged. + * Only the display label shown to users is transformed. + * + * IMPORTANT: Raw provider names (dutchie, treez, jane, etc.) must NEVER + * be displayed directly in the UI. Always use this utility. + */ + +export const ProviderDisplayNames: Record = { + // All menu providers map to anonymous "Menu Feed" label + dutchie: 'Menu Feed', + treez: 'Menu Feed', + jane: 'Menu Feed', + iheartjane: 'Menu Feed', + blaze: 'Menu Feed', + flowhub: 'Menu Feed', + weedmaps: 'Menu Feed', + leafly: 'Menu Feed', + leaflogix: 'Menu Feed', + tymber: 'Menu Feed', + dispense: 'Menu Feed', + + // Catch-all + unknown: 'Menu Feed', + default: 'Menu Feed', + '': 'Menu Feed', +}; + +/** + * Get the display name for a provider + * @param provider - The internal provider identifier (e.g., 'dutchie', 'treez') + * @returns The safe display label (e.g., 'Embedded Menu') + */ +export function getProviderDisplayName(provider: string | null | undefined): string { + if (!provider) { + return ProviderDisplayNames.default; + } + const normalized = provider.toLowerCase().trim(); + return ProviderDisplayNames[normalized] || ProviderDisplayNames.default; +} + +/** + * Check if a provider string is a raw/internal identifier that should not be displayed + * @param value - The string to check + * @returns True if the value is a raw provider name that needs transformation + */ +export function isRawProviderName(value: string): boolean { + if (!value) return false; + const normalized = value.toLowerCase().trim(); + return Object.keys(ProviderDisplayNames).includes(normalized) && + normalized !== 'default' && + normalized !== ''; +} diff --git a/cannaiq/src/pages/ChainsDashboard.tsx b/cannaiq/src/pages/ChainsDashboard.tsx new file mode 100644 index 00000000..9597b799 --- /dev/null +++ b/cannaiq/src/pages/ChainsDashboard.tsx @@ -0,0 +1,192 @@ +import { useEffect, useState } from 'react'; +import { useNavigate } from 'react-router-dom'; +import { Layout } from '../components/Layout'; +import { api } from '../lib/api'; +import { + Building2, + MapPin, + Package, + ChevronRight, + RefreshCw, + Search, +} from 'lucide-react'; + +interface Chain { + id: number; + name: string; + stateCount: number; + storeCount: number; + productCount: number; +} + +export function ChainsDashboard() { + const navigate = useNavigate(); + const [chains, setChains] = useState([]); + const [loading, setLoading] = useState(true); + const [searchTerm, setSearchTerm] = useState(''); + + useEffect(() => { + loadChains(); + }, []); + + const loadChains = async () => { + try { + setLoading(true); + const data = await api.getOrchestratorChains(); + setChains(data.chains || []); + } catch (error) { + console.error('Failed to load chains:', error); + } finally { + setLoading(false); + } + }; + + const filteredChains = chains.filter(chain => + chain.name.toLowerCase().includes(searchTerm.toLowerCase()) + ); + + const handleChainClick = (chainId: number) => { + navigate(`/admin/orchestrator/stores?chainId=${chainId}`); + }; + + if (loading) { + return ( + +
+
+

Loading chains...

+
+
+ ); + } + + return ( + +
+ {/* Header */} +
+
+

Chains Dashboard

+

+ View dispensary chains across multiple states +

+
+ +
+ + {/* Summary Cards */} +
+
+
+ +
+

Total Chains

+

{chains.length}

+
+
+
+
+
+ +
+

Total Stores

+

+ {chains.reduce((sum, c) => sum + c.storeCount, 0).toLocaleString()} +

+
+
+
+
+
+ +
+

Total Products

+

+ {chains.reduce((sum, c) => sum + c.productCount, 0).toLocaleString()} +

+
+
+
+
+ + {/* Search */} +
+
+ + setSearchTerm(e.target.value)} + className="input input-bordered input-sm w-full pl-10" + /> +
+ + Showing {filteredChains.length} of {chains.length} chains + +
+ + {/* Chains Table */} +
+
+ + + + + + + + + + + + {filteredChains.length === 0 ? ( + + + + ) : ( + filteredChains.map((chain) => ( + handleChainClick(chain.id)} + > + + + + + + + )) + )} + +
Chain NameStatesStoresProducts
+ No chains found +
+
+ + {chain.name} +
+
+ {chain.stateCount} + + {chain.storeCount.toLocaleString()} + + {chain.productCount.toLocaleString()} + + +
+
+
+
+
+ ); +} + +export default ChainsDashboard; diff --git a/cannaiq/src/pages/CrossStateCompare.tsx b/cannaiq/src/pages/CrossStateCompare.tsx new file mode 100644 index 00000000..70259f42 --- /dev/null +++ b/cannaiq/src/pages/CrossStateCompare.tsx @@ -0,0 +1,469 @@ +/** + * Cross-State Compare + * + * Compare brands and categories across multiple states. + * Phase 4: Multi-State Expansion + */ + +import { useState, useEffect } from 'react'; +import { useNavigate } from 'react-router-dom'; +import { Layout } from '../components/Layout'; +import { useStateStore } from '../store/stateStore'; +import { api } from '../lib/api'; +import { + ArrowLeft, + TrendingUp, + TrendingDown, + Search, + Tag, + Package, + Store, + DollarSign, + Percent, + RefreshCw, + AlertCircle, + ChevronRight +} from 'lucide-react'; + +type CompareMode = 'brand' | 'category'; + +interface BrandComparison { + brandId: number; + brandName: string; + states: { + state: string; + stateName: string; + totalStores: number; + storesWithBrand: number; + penetrationPct: number; + productCount: number; + avgPrice: number | null; + }[]; + nationalPenetration: number; + nationalAvgPrice: number | null; + bestPerformingState: string | null; + worstPerformingState: string | null; +} + +interface CategoryComparison { + category: string; + states: { + state: string; + stateName: string; + productCount: number; + storeCount: number; + avgPrice: number | null; + marketShare: number; + }[]; + nationalProductCount: number; + nationalAvgPrice: number | null; + dominantState: string | null; +} + +interface SearchResult { + id: number; + name: string; + type: 'brand' | 'category'; +} + +function PenetrationBar({ value, label }: { value: number; label: string }) { + return ( +
+
+
+
+ + {value.toFixed(1)}% + +
+ ); +} + +function StateComparisonRow({ + state, + metric, + maxValue, + valueLabel, + subLabel, +}: { + state: string; + metric: number; + maxValue: number; + valueLabel: string; + subLabel?: string; +}) { + const percentage = (metric / maxValue) * 100; + + return ( +
+
{state}
+
+
+
+ {percentage > 30 && ( + {valueLabel} + )} +
+
+
+ {percentage <= 30 && ( + {valueLabel} + )} + {subLabel && ( + {subLabel} + )} +
+ ); +} + +export default function CrossStateCompare() { + const navigate = useNavigate(); + const { availableStates, setSelectedState } = useStateStore(); + const [loading, setLoading] = useState(false); + const [error, setError] = useState(null); + const [mode, setMode] = useState('brand'); + const [searchQuery, setSearchQuery] = useState(''); + const [searchResults, setSearchResults] = useState([]); + const [selectedBrandId, setSelectedBrandId] = useState(null); + const [selectedCategory, setSelectedCategory] = useState(null); + const [brandComparison, setBrandComparison] = useState(null); + const [categoryComparison, setCategoryComparison] = useState(null); + const [selectedStates, setSelectedStates] = useState([]); + + // Search brands/categories + useEffect(() => { + if (!searchQuery || searchQuery.length < 2) { + setSearchResults([]); + return; + } + + const timer = setTimeout(async () => { + try { + if (mode === 'brand') { + const response = await api.get(`/api/az/brands?search=${encodeURIComponent(searchQuery)}&limit=10`); + setSearchResults( + (response.data?.brands || []).map((b: any) => ({ + id: b.id, + name: b.name, + type: 'brand' as const, + })) + ); + } else { + const response = await api.get(`/api/az/categories`); + const filtered = (response.data?.categories || []) + .filter((c: any) => c.name?.toLowerCase().includes(searchQuery.toLowerCase())) + .slice(0, 10) + .map((c: any) => ({ + id: c.id || c.name, + name: c.name, + type: 'category' as const, + })); + setSearchResults(filtered); + } + } catch (err) { + console.error('Search failed:', err); + } + }, 300); + + return () => clearTimeout(timer); + }, [searchQuery, mode]); + + // Fetch comparison data + const fetchComparison = async () => { + if (mode === 'brand' && !selectedBrandId) return; + if (mode === 'category' && !selectedCategory) return; + + setLoading(true); + setError(null); + + try { + const statesParam = selectedStates.length > 0 + ? `?states=${selectedStates.join(',')}` + : ''; + + if (mode === 'brand' && selectedBrandId) { + const response = await api.get(`/api/analytics/compare/brand/${selectedBrandId}${statesParam}`); + setBrandComparison(response.data?.data); + } else if (mode === 'category' && selectedCategory) { + const response = await api.get(`/api/analytics/compare/category/${encodeURIComponent(selectedCategory)}${statesParam}`); + setCategoryComparison(response.data?.data); + } + } catch (err: any) { + setError(err.message || 'Failed to load comparison data'); + } finally { + setLoading(false); + } + }; + + useEffect(() => { + if (selectedBrandId || selectedCategory) { + fetchComparison(); + } + }, [selectedBrandId, selectedCategory, selectedStates]); + + const handleSelectItem = (item: SearchResult) => { + setSearchQuery(''); + setSearchResults([]); + if (item.type === 'brand') { + setSelectedBrandId(item.id); + setSelectedCategory(null); + setCategoryComparison(null); + } else { + setSelectedCategory(item.name); + setSelectedBrandId(null); + setBrandComparison(null); + } + }; + + const toggleState = (stateCode: string) => { + setSelectedStates(prev => + prev.includes(stateCode) + ? prev.filter(s => s !== stateCode) + : [...prev, stateCode] + ); + }; + + return ( + +
+ {/* Header */} +
+ +
+

Cross-State Compare

+

+ Compare brand penetration and category performance across states +

+
+
+ + {/* Mode & Search */} +
+
+ {/* Mode Toggle */} +
+ + +
+ + {/* Search */} +
+ + setSearchQuery(e.target.value)} + className="w-full pl-10 pr-4 py-2 border border-gray-200 rounded-lg text-sm focus:outline-none focus:ring-2 focus:ring-emerald-500 focus:border-emerald-500" + /> + {searchResults.length > 0 && ( +
+ {searchResults.map((result) => ( + + ))} +
+ )} +
+
+ + {/* State Filter */} +
+
Filter by states (optional):
+
+ {availableStates.map((state) => ( + + ))} + {selectedStates.length > 0 && ( + + )} +
+
+
+ + {/* Results */} + {loading ? ( +
+ +
+ ) : error ? ( +
+ +
{error}
+
+ ) : brandComparison ? ( +
+ {/* Brand Summary */} +
+

+ {brandComparison.brandName} +

+
+
+
National Penetration
+
+ {brandComparison.nationalPenetration.toFixed(1)}% +
+
+
+
Avg Price
+
+ {brandComparison.nationalAvgPrice + ? `$${brandComparison.nationalAvgPrice.toFixed(2)}` + : '-'} +
+
+
+
Best State
+
+ {brandComparison.bestPerformingState || '-'} +
+
+
+
Lowest Penetration
+
+ {brandComparison.worstPerformingState || '-'} +
+
+
+ + {/* State-by-State */} +

Penetration by State

+
+ {brandComparison.states + .sort((a, b) => b.penetrationPct - a.penetrationPct) + .map((state) => ( + + ))} +
+
+
+ ) : categoryComparison ? ( +
+ {/* Category Summary */} +
+

+ {categoryComparison.category} +

+
+
+
National Products
+
+ {categoryComparison.nationalProductCount.toLocaleString()} +
+
+
+
Avg Price
+
+ {categoryComparison.nationalAvgPrice + ? `$${categoryComparison.nationalAvgPrice.toFixed(2)}` + : '-'} +
+
+
+
Dominant State
+
+ {categoryComparison.dominantState || '-'} +
+
+
+ + {/* State-by-State */} +

Products by State

+
+ {categoryComparison.states + .sort((a, b) => b.productCount - a.productCount) + .map((state) => ( + s.productCount))} + valueLabel={state.productCount.toLocaleString()} + subLabel={state.avgPrice ? `$${state.avgPrice.toFixed(2)} avg` : undefined} + /> + ))} +
+
+
+ ) : ( +
+ +

Search for a {mode} to compare across states

+
+ )} +
+
+ ); +} diff --git a/cannaiq/src/pages/Discovery.tsx b/cannaiq/src/pages/Discovery.tsx new file mode 100644 index 00000000..0bd5ed8a --- /dev/null +++ b/cannaiq/src/pages/Discovery.tsx @@ -0,0 +1,674 @@ +import { useEffect, useState } from 'react'; +import { Layout } from '../components/Layout'; +import { api } from '../lib/api'; +import { + Search, + MapPin, + ExternalLink, + CheckCircle, + XCircle, + Link2, + RefreshCw, + Globe, + Clock, + ChevronDown, + Building2, + Truck, + ShoppingBag, + Plus, + Leaf, +} from 'lucide-react'; + +interface DiscoveryLocation { + id: number; + platform: string; + platformLocationId: string; + platformSlug: string; + platformMenuUrl: string; + name: string; + rawAddress: string | null; + addressLine1: string | null; + city: string | null; + stateCode: string | null; + postalCode: string | null; + countryCode: string | null; + latitude: number | null; + longitude: number | null; + status: string; + dispensaryId: number | null; + dispensaryName: string | null; + offersDelivery: boolean | null; + offersPickup: boolean | null; + isRecreational: boolean | null; + isMedical: boolean | null; + firstSeenAt: string; + lastSeenAt: string; + verifiedAt: string | null; + verifiedBy: string | null; +} + +interface DiscoveryStats { + locations: { + total: number; + discovered: number; + verified: number; + rejected: number; + merged: number; + byState: Array<{ stateCode: string; count: number }>; + }; +} + +interface MatchCandidate { + id: number; + name: string; + city: string; + state: string; + address: string; + menuType: string | null; + platformDispensaryId: string | null; + menuUrl: string | null; + matchType: string; + distanceMiles: number | null; +} + +export function Discovery() { + const [locations, setLocations] = useState([]); + const [stats, setStats] = useState(null); + const [loading, setLoading] = useState(true); + const [total, setTotal] = useState(0); + const [page, setPage] = useState(0); + const [limit] = useState(50); + + // Filters + const [statusFilter, setStatusFilter] = useState('discovered'); + const [stateFilter, setStateFilter] = useState(''); + const [countryFilter, setCountryFilter] = useState('US'); + const [searchFilter, setSearchFilter] = useState(''); + + // Modal state for linking + const [linkModal, setLinkModal] = useState<{ + isOpen: boolean; + location: DiscoveryLocation | null; + candidates: MatchCandidate[]; + loading: boolean; + }>({ + isOpen: false, + location: null, + candidates: [], + loading: false, + }); + + // Action loading state + const [actionLoading, setActionLoading] = useState(null); + + useEffect(() => { + loadData(); + }, [statusFilter, stateFilter, countryFilter, page]); + + // Platform slug for discovery API (dt = Dutchie) + const platformSlug = 'dt'; + + const loadData = async () => { + setLoading(true); + try { + const [locationsRes, statsRes] = await Promise.all([ + api.getPlatformDiscoveryLocations(platformSlug, { + status: statusFilter || undefined, + state_code: stateFilter || undefined, + country_code: countryFilter || undefined, + search: searchFilter || undefined, + limit, + offset: page * limit, + }), + api.getPlatformDiscoverySummary(platformSlug), + ]); + setLocations(locationsRes.locations); + setTotal(locationsRes.total); + // Map the summary response to match the DiscoveryStats interface + setStats({ + locations: { + total: statsRes.summary.total_locations, + discovered: statsRes.summary.discovered, + verified: statsRes.summary.verified, + rejected: statsRes.summary.rejected, + merged: statsRes.summary.merged, + byState: statsRes.by_state.map((s: { state_code: string; total: number }) => ({ + stateCode: s.state_code, + count: s.total, + })), + }, + }); + } catch (error) { + console.error('Failed to load discovery data:', error); + } finally { + setLoading(false); + } + }; + + const handleSearch = () => { + setPage(0); + loadData(); + }; + + const handleVerify = async (location: DiscoveryLocation) => { + if (!confirm(`Create a new dispensary from "${location.name}"?`)) return; + setActionLoading(location.id); + try { + const result = await api.verifyCreatePlatformLocation(platformSlug, location.id); + alert(result.message); + loadData(); + } catch (error: any) { + alert(`Error: ${error.message}`); + } finally { + setActionLoading(null); + } + }; + + const handleReject = async (location: DiscoveryLocation) => { + const reason = prompt(`Reason for rejecting "${location.name}":`); + if (reason === null) return; + setActionLoading(location.id); + try { + const result = await api.rejectPlatformLocation(platformSlug, location.id, reason); + alert(result.message); + loadData(); + } catch (error: any) { + alert(`Error: ${error.message}`); + } finally { + setActionLoading(null); + } + }; + + const handleUnreject = async (location: DiscoveryLocation) => { + if (!confirm(`Restore "${location.name}" to discovered status?`)) return; + setActionLoading(location.id); + try { + const result = await api.unrejectPlatformLocation(platformSlug, location.id); + alert(result.message); + loadData(); + } catch (error: any) { + alert(`Error: ${error.message}`); + } finally { + setActionLoading(null); + } + }; + + const openLinkModal = async (location: DiscoveryLocation) => { + setLinkModal({ isOpen: true, location, candidates: [], loading: true }); + try { + const result = await api.getPlatformLocationMatchCandidates(platformSlug, location.id); + setLinkModal((prev) => ({ ...prev, candidates: result.candidates, loading: false })); + } catch (error: any) { + console.error('Failed to load match candidates:', error); + setLinkModal((prev) => ({ ...prev, loading: false })); + } + }; + + const handleLink = async (dispensaryId: number) => { + if (!linkModal.location) return; + setActionLoading(linkModal.location.id); + try { + const result = await api.verifyLinkPlatformLocation(platformSlug, linkModal.location.id, dispensaryId); + alert(result.message); + setLinkModal({ isOpen: false, location: null, candidates: [], loading: false }); + loadData(); + } catch (error: any) { + alert(`Error: ${error.message}`); + } finally { + setActionLoading(null); + } + }; + + const getStatusBadge = (status: string) => { + switch (status) { + case 'discovered': + return Discovered; + case 'verified': + return Verified; + case 'rejected': + return Rejected; + case 'merged': + return Linked; + default: + return {status}; + } + }; + + const formatDate = (dateStr: string | null) => { + if (!dateStr) return '-'; + return new Date(dateStr).toLocaleDateString('en-US', { + month: 'short', + day: 'numeric', + year: 'numeric', + }); + }; + + return ( + +
+ {/* Header */} +
+
+

Store Discovery

+

+ Discover and verify dispensary locations from platform data +

+
+ +
+ + {/* Stats Cards */} + {stats && ( +
+
+
+
+ +
+
+

Discovered

+

{stats.locations.discovered}

+
+
+
+ +
+
+
+ +
+
+

Verified

+

{stats.locations.verified}

+
+
+
+ +
+
+
+ +
+
+

Linked

+

{stats.locations.merged}

+
+
+
+ +
+
+
+ +
+
+

Rejected

+

{stats.locations.rejected}

+
+
+
+ +
+
+
+ +
+
+

Total

+

{stats.locations.total}

+
+
+
+
+ )} + + {/* Filters */} +
+
+
+ + +
+ +
+ + +
+ +
+ + +
+ +
+ setSearchFilter(e.target.value)} + onKeyDown={(e) => e.key === 'Enter' && handleSearch()} + className="input input-bordered input-sm flex-1 max-w-xs" + /> + +
+
+
+ + {/* Locations List */} +
+
+

+ Discovered Locations ({total}) +

+
+ Showing {page * limit + 1}-{Math.min((page + 1) * limit, total)} of {total} +
+
+ + {loading ? ( +
+
+

Loading...

+
+ ) : locations.length === 0 ? ( +
+ No locations found matching your filters. +
+ ) : ( +
+ + + + + + + + + + + + + + + {locations.map((loc) => ( + + + + + + + + + + + ))} + +
PlatformNameLocationMenu URLStatusFeaturesFirst SeenActions
+ {loc.platform} + +
+ +
+

{loc.name}

+ {loc.dispensaryName && ( +

+ Linked: {loc.dispensaryName} +

+ )} +
+
+
+
+ + {loc.city}, {loc.stateCode} + {loc.countryCode && loc.countryCode !== 'US' && ( + ({loc.countryCode}) + )} +
+
+ + + {loc.platformSlug} + + {getStatusBadge(loc.status)} +
+ {loc.offersPickup && ( + + + + )} + {loc.offersDelivery && ( + + + + )} + {loc.isRecreational && ( + + + + )} + {loc.isMedical && ( + + + + )} +
+
+
+ + {formatDate(loc.firstSeenAt)} +
+
+ {loc.status === 'discovered' && ( +
+ + + +
+ )} + {loc.status === 'rejected' && ( + + )} + {(loc.status === 'verified' || loc.status === 'merged') && loc.dispensaryId && ( + + View + + )} +
+
+ )} + + {/* Pagination */} + {total > limit && ( +
+ + + Page {page + 1} of {Math.ceil(total / limit)} + + +
+ )} +
+
+ + {/* Link Modal */} + {linkModal.isOpen && linkModal.location && ( +
+
+

Link to Existing Dispensary

+

+ Linking {linkModal.location.name} ({linkModal.location.city},{' '} + {linkModal.location.stateCode}) +

+ + {linkModal.loading ? ( +
+
+

Finding matches...

+
+ ) : linkModal.candidates.length === 0 ? ( +
+ No potential matches found. Consider verifying this as a new dispensary. +
+ ) : ( +
+ {linkModal.candidates.map((candidate) => ( +
+
+

{candidate.name}

+

+ {candidate.city}, {candidate.state} + {candidate.distanceMiles !== null && ( + + ({candidate.distanceMiles} mi) + + )} +

+
+ + {candidate.matchType.replace('_', ' ')} + + {candidate.menuType && ( + {candidate.menuType} + )} +
+
+ +
+ ))} +
+ )} + +
+ +
+
+
+ setLinkModal({ isOpen: false, location: null, candidates: [], loading: false }) + } + /> +
+ )} + + ); +} diff --git a/cannaiq/src/pages/DutchieAZSchedule.tsx b/cannaiq/src/pages/DutchieAZSchedule.tsx index e1288b79..6da1a5e2 100644 --- a/cannaiq/src/pages/DutchieAZSchedule.tsx +++ b/cannaiq/src/pages/DutchieAZSchedule.tsx @@ -1,6 +1,8 @@ import { useEffect, useState } from 'react'; import { Layout } from '../components/Layout'; import { api } from '../lib/api'; +import { getProviderDisplayName } from '../lib/provider-display'; +import { WorkerRoleBadge, formatScope } from '../components/WorkerRoleBadge'; interface JobSchedule { id: number; @@ -15,6 +17,8 @@ interface JobSchedule { lastDurationMs: number | null; nextRunAt: string | null; jobConfig: Record | null; + workerName: string | null; + workerRole: string | null; createdAt: string; updatedAt: string; } @@ -32,6 +36,10 @@ interface RunLog { items_succeeded: number | null; items_failed: number | null; metadata: any; + worker_name: string | null; + run_role: string | null; + schedule_worker_name: string | null; + schedule_worker_role: string | null; created_at: string; } @@ -410,7 +418,8 @@ export function DutchieAZSchedule() { - + + @@ -423,7 +432,7 @@ export function DutchieAZSchedule() { {schedules.map((schedule) => ( + + + + @@ -581,6 +596,14 @@ export function DutchieAZSchedule() {
{log.job_name}
Run #{log.id}
+ + + ))} @@ -676,7 +734,7 @@ export function DutchieAZSchedule() { fontSize: '14px', fontWeight: '600' }}> - {provider}: {count} + {getProviderDisplayName(provider)}: {count} ))} @@ -764,10 +822,10 @@ export function DutchieAZSchedule() {

About Menu Detection

  • - Detect All Unknown: Scans dispensaries with no menu_type set and detects the provider (dutchie, treez, jane, etc.) from their menu_url. + Detect All Unknown: Scans dispensaries with no menu_type set and detects the embedded menu provider from their menu_url.
  • - Resolve Missing Platform IDs: For dispensaries already detected as "dutchie", extracts the cName from menu_url and resolves the platform_dispensary_id via GraphQL. + Resolve Missing Platform IDs: For dispensaries with embedded menus detected, extracts the store identifier from menu_url and resolves the platform_dispensary_id.
  • Automatic scheduling: A "Menu Detection" job runs daily (24h +/- 1h jitter) to detect new dispensaries. diff --git a/cannaiq/src/pages/DutchieAZStores.tsx b/cannaiq/src/pages/DutchieAZStores.tsx index 070faf32..c3a183f0 100644 --- a/cannaiq/src/pages/DutchieAZStores.tsx +++ b/cannaiq/src/pages/DutchieAZStores.tsx @@ -1,6 +1,7 @@ import { useEffect, useState } from 'react'; import { useNavigate } from 'react-router-dom'; import { Layout } from '../components/Layout'; +import { OrchestratorTraceModal } from '../components/OrchestratorTraceModal'; import { api } from '../lib/api'; import { Building2, @@ -8,7 +9,8 @@ import { Package, RefreshCw, CheckCircle, - XCircle + XCircle, + FileText } from 'lucide-react'; export function DutchieAZStores() { @@ -17,6 +19,11 @@ export function DutchieAZStores() { const [totalStores, setTotalStores] = useState(0); const [loading, setLoading] = useState(true); const [dashboard, setDashboard] = useState(null); + const [traceModal, setTraceModal] = useState<{ isOpen: boolean; dispensaryId: number; dispensaryName: string }>({ + isOpen: false, + dispensaryId: 0, + dispensaryName: '', + }); useEffect(() => { loadData(); @@ -165,17 +172,17 @@ export function DutchieAZStores() {
))} @@ -208,6 +261,14 @@ export function DutchieAZStores() { + + {/* Orchestrator Trace Modal */} + setTraceModal({ isOpen: false, dispensaryId: 0, dispensaryName: '' })} + /> ); } diff --git a/cannaiq/src/pages/IntelligenceBrands.tsx b/cannaiq/src/pages/IntelligenceBrands.tsx new file mode 100644 index 00000000..1ece3c1f --- /dev/null +++ b/cannaiq/src/pages/IntelligenceBrands.tsx @@ -0,0 +1,286 @@ +import { useEffect, useState } from 'react'; +import { useNavigate } from 'react-router-dom'; +import { Layout } from '../components/Layout'; +import { api } from '../lib/api'; +import { + Building2, + MapPin, + Package, + DollarSign, + RefreshCw, + Search, + TrendingUp, + BarChart3, +} from 'lucide-react'; + +interface BrandData { + brandName: string; + states: string[]; + storeCount: number; + skuCount: number; + avgPriceRec: number | null; + avgPriceMed: number | null; +} + +export function IntelligenceBrands() { + const navigate = useNavigate(); + const [brands, setBrands] = useState([]); + const [loading, setLoading] = useState(true); + const [searchTerm, setSearchTerm] = useState(''); + const [sortBy, setSortBy] = useState<'stores' | 'skus' | 'name'>('stores'); + + useEffect(() => { + loadBrands(); + }, []); + + const loadBrands = async () => { + try { + setLoading(true); + const data = await api.getIntelligenceBrands({ limit: 500 }); + setBrands(data.brands || []); + } catch (error) { + console.error('Failed to load brands:', error); + } finally { + setLoading(false); + } + }; + + const filteredBrands = brands + .filter(brand => + brand.brandName.toLowerCase().includes(searchTerm.toLowerCase()) + ) + .sort((a, b) => { + switch (sortBy) { + case 'stores': + return b.storeCount - a.storeCount; + case 'skus': + return b.skuCount - a.skuCount; + case 'name': + return a.brandName.localeCompare(b.brandName); + default: + return 0; + } + }); + + const formatPrice = (price: number | null) => { + if (price === null) return '-'; + return `$${price.toFixed(2)}`; + }; + + if (loading) { + return ( + +
+
+

Loading brands...

+
+
+ ); + } + + // Top 10 brands for chart + const topBrands = [...brands] + .sort((a, b) => b.storeCount - a.storeCount) + .slice(0, 10); + const maxStoreCount = Math.max(...topBrands.map(b => b.storeCount), 1); + + return ( + +
+ {/* Header */} +
+
+

Brands Intelligence

+

+ Brand penetration and pricing analytics across markets +

+
+
+ + + +
+
+ + {/* Summary Cards */} +
+
+
+ +
+

Total Brands

+

{brands.length.toLocaleString()}

+
+
+
+
+
+ +
+

Total SKUs

+

+ {brands.reduce((sum, b) => sum + b.skuCount, 0).toLocaleString()} +

+
+
+
+
+
+ +
+

Multi-State Brands

+

+ {brands.filter(b => b.states.length > 1).length} +

+
+
+
+
+
+ +
+

Avg SKUs/Brand

+

+ {(brands.reduce((sum, b) => sum + b.skuCount, 0) / brands.length || 0).toFixed(1)} +

+
+
+
+
+ + {/* Top Brands Chart */} +
+

+ + Top 10 Brands by Store Count +

+
+ {topBrands.map((brand, idx) => ( +
+ {idx + 1}. + + {brand.brandName} + +
+
+
+ + {brand.storeCount} stores + +
+ ))} +
+
+ + {/* Filters */} +
+
+ + setSearchTerm(e.target.value)} + className="input input-bordered input-sm w-full pl-10" + /> +
+ + + Showing {filteredBrands.length} of {brands.length} brands + +
+ + {/* Brands Table */} +
+
+
Job NameWorkerRole Enabled Interval (Jitter) Last Run
-
{schedule.jobName}
+
{schedule.workerName || schedule.jobName}
{schedule.description && (
{schedule.description} @@ -431,10 +440,13 @@ export function DutchieAZSchedule() { )} {schedule.jobConfig && (
- Config: {JSON.stringify(schedule.jobConfig)} + Scope: {formatScope(schedule.jobConfig)}
)}
+ +
JobWorkerRole Status Started Duration Processed Succeeded FailedVisibility
+
+ {log.worker_name || log.schedule_worker_name || '-'} +
+
+ + {log.items_failed ?? '-'} + {log.metadata?.visibilityLostCount !== undefined || log.metadata?.visibilityRestoredCount !== undefined ? ( +
+ {log.metadata?.visibilityLostCount > 0 && ( + + -{log.metadata.visibilityLostCount} lost + + )} + {log.metadata?.visibilityRestoredCount > 0 && ( + + +{log.metadata.visibilityRestoredCount} restored + + )} + {log.metadata?.visibilityLostCount === 0 && log.metadata?.visibilityRestoredCount === 0 && ( + - + )} +
+ ) : ( + - + )} +
{store.menu_type ? ( - {store.menu_type} + {store.provider_display || 'Menu'} ) : ( - unknown + Menu )} @@ -186,20 +193,66 @@ export function DutchieAZStores() { )} - {store.platform_dispensary_id ? ( - Ready - ) : ( - Pending - )} +
+ {/* Crawler status pill */} + {store.crawler_status === 'production' ? ( + + PRODUCTION + + ) : store.crawler_status === 'sandbox' ? ( + + SANDBOX + + ) : store.crawler_status === 'needs_manual' ? ( + + NEEDS MANUAL + + ) : store.crawler_status === 'disabled' ? ( + + DISABLED + + ) : store.active_crawler_profile_id ? ( + + PROFILE + + ) : store.platform_dispensary_id ? ( + + LEGACY + + ) : ( + + PENDING + + )} + {/* Retry indicator */} + {store.crawler_status === 'sandbox' && store.next_retry_at && ( + + retry due + + )} +
- +
+ + +
+ + + + + + + + + + + + {filteredBrands.length === 0 ? ( + + + + ) : ( + filteredBrands.map((brand) => ( + + + + + + + + + )) + )} + +
Brand NameStatesStoresSKUsAvg Rec PriceAvg Med Price
+ No brands found +
+ {brand.brandName} + +
+ {brand.states.map(state => ( + + {state} + + ))} +
+
+ {brand.storeCount} + + {brand.skuCount} + + + {formatPrice(brand.avgPriceRec)} + + + + {formatPrice(brand.avgPriceMed)} + +
+
+
+
+ + ); +} + +export default IntelligenceBrands; diff --git a/cannaiq/src/pages/IntelligencePricing.tsx b/cannaiq/src/pages/IntelligencePricing.tsx new file mode 100644 index 00000000..68b97308 --- /dev/null +++ b/cannaiq/src/pages/IntelligencePricing.tsx @@ -0,0 +1,270 @@ +import { useEffect, useState } from 'react'; +import { useNavigate } from 'react-router-dom'; +import { Layout } from '../components/Layout'; +import { api } from '../lib/api'; +import { + DollarSign, + Building2, + MapPin, + Package, + RefreshCw, + TrendingUp, + TrendingDown, + BarChart3, +} from 'lucide-react'; + +interface CategoryPricing { + category: string; + avgPrice: number; + minPrice: number; + maxPrice: number; + medianPrice: number; + productCount: number; +} + +interface OverallPricing { + avgPrice: number; + minPrice: number; + maxPrice: number; + totalProducts: number; +} + +export function IntelligencePricing() { + const navigate = useNavigate(); + const [categories, setCategories] = useState([]); + const [overall, setOverall] = useState(null); + const [loading, setLoading] = useState(true); + + useEffect(() => { + loadPricing(); + }, []); + + const loadPricing = async () => { + try { + setLoading(true); + const data = await api.getIntelligencePricing(); + setCategories(data.byCategory || []); + setOverall(data.overall || null); + } catch (error) { + console.error('Failed to load pricing:', error); + } finally { + setLoading(false); + } + }; + + const formatPrice = (price: number | null | undefined) => { + if (price === null || price === undefined) return '-'; + return `$${price.toFixed(2)}`; + }; + + if (loading) { + return ( + +
+
+

Loading pricing data...

+
+
+ ); + } + + // Sort by average price for visualization + const sortedCategories = [...categories].sort((a, b) => b.avgPrice - a.avgPrice); + const maxAvgPrice = Math.max(...categories.map(c => c.avgPrice), 1); + + return ( + +
+ {/* Header */} +
+
+

Pricing Intelligence

+

+ Price distribution and trends by category +

+
+
+ + + +
+
+ + {/* Overall Stats */} + {overall && ( +
+
+
+ +
+

Average Price

+

+ {formatPrice(overall.avgPrice)} +

+
+
+
+
+
+ +
+

Minimum Price

+

+ {formatPrice(overall.minPrice)} +

+
+
+
+
+
+ +
+

Maximum Price

+

+ {formatPrice(overall.maxPrice)} +

+
+
+
+
+
+ +
+

Products Priced

+

+ {overall.totalProducts.toLocaleString()} +

+
+
+
+
+ )} + + {/* Price by Category Chart */} +
+

+ + Average Price by Category +

+
+ {sortedCategories.map((cat) => ( +
+ + {cat.category || 'Unknown'} + +
+ {/* Price range bar */} +
+ {/* Min-Max range */} +
+ {/* Average marker */} +
+
+
+
+ + Min: {formatPrice(cat.minPrice)} + + + Avg: {formatPrice(cat.avgPrice)} + + + Max: {formatPrice(cat.maxPrice)} + +
+
+ ))} +
+
+ + {/* Category Details Table */} +
+
+

Category Details

+
+
+ + + + + + + + + + + + + + {categories.length === 0 ? ( + + + + ) : ( + sortedCategories.map((cat) => ( + + + + + + + + + + )) + )} + +
CategoryProductsMin PriceMedian PriceAvg PriceMax PricePrice Spread
+ No pricing data available +
+ {cat.category || 'Unknown'} + + {cat.productCount.toLocaleString()} + + {formatPrice(cat.minPrice)} + + {formatPrice(cat.medianPrice)} + + {formatPrice(cat.avgPrice)} + + {formatPrice(cat.maxPrice)} + + + {formatPrice(cat.maxPrice - cat.minPrice)} + +
+
+
+
+ + ); +} + +export default IntelligencePricing; diff --git a/cannaiq/src/pages/IntelligenceStores.tsx b/cannaiq/src/pages/IntelligenceStores.tsx new file mode 100644 index 00000000..6eb78f6d --- /dev/null +++ b/cannaiq/src/pages/IntelligenceStores.tsx @@ -0,0 +1,287 @@ +import { useEffect, useState } from 'react'; +import { useNavigate } from 'react-router-dom'; +import { Layout } from '../components/Layout'; +import { api } from '../lib/api'; +import { + MapPin, + Building2, + DollarSign, + Package, + RefreshCw, + Search, + Clock, + Activity, + ChevronDown, +} from 'lucide-react'; + +interface StoreActivity { + id: number; + name: string; + state: string; + city: string; + chainName: string | null; + skuCount: number; + snapshotCount: number; + lastCrawl: string | null; + crawlFrequencyHours: number | null; +} + +export function IntelligenceStores() { + const navigate = useNavigate(); + const [stores, setStores] = useState([]); + const [loading, setLoading] = useState(true); + const [searchTerm, setSearchTerm] = useState(''); + const [stateFilter, setStateFilter] = useState('all'); + const [states, setStates] = useState([]); + + useEffect(() => { + loadStores(); + }, [stateFilter]); + + const loadStores = async () => { + try { + setLoading(true); + const data = await api.getIntelligenceStoreActivity({ + state: stateFilter !== 'all' ? stateFilter : undefined, + limit: 500, + }); + setStores(data.stores || []); + + // Extract unique states + const uniqueStates = [...new Set(data.stores.map((s: StoreActivity) => s.state))].sort(); + setStates(uniqueStates); + } catch (error) { + console.error('Failed to load stores:', error); + } finally { + setLoading(false); + } + }; + + const filteredStores = stores.filter(store => + store.name.toLowerCase().includes(searchTerm.toLowerCase()) || + store.city.toLowerCase().includes(searchTerm.toLowerCase()) + ); + + const formatTimeAgo = (dateStr: string | null) => { + if (!dateStr) return 'Never'; + const date = new Date(dateStr); + const now = new Date(); + const diff = now.getTime() - date.getTime(); + const hours = Math.floor(diff / (1000 * 60 * 60)); + const days = Math.floor(hours / 24); + + if (days > 0) return `${days}d ago`; + if (hours > 0) return `${hours}h ago`; + const minutes = Math.floor(diff / (1000 * 60)); + if (minutes > 0) return `${minutes}m ago`; + return 'Just now'; + }; + + const getCrawlFrequencyBadge = (hours: number | null) => { + if (hours === null) return Unknown; + if (hours <= 4) return High ({hours}h); + if (hours <= 12) return Medium ({hours}h); + return Low ({hours}h); + }; + + if (loading) { + return ( + +
+
+

Loading store activity...

+
+
+ ); + } + + // Calculate stats + const totalSKUs = stores.reduce((sum, s) => sum + s.skuCount, 0); + const totalSnapshots = stores.reduce((sum, s) => sum + s.snapshotCount, 0); + const avgFrequency = stores.filter(s => s.crawlFrequencyHours).length > 0 + ? stores.filter(s => s.crawlFrequencyHours).reduce((sum, s) => sum + (s.crawlFrequencyHours || 0), 0) / + stores.filter(s => s.crawlFrequencyHours).length + : 0; + + return ( + +
+ {/* Header */} +
+
+

Store Activity

+

+ Per-store SKU counts, snapshots, and crawl frequency +

+
+
+ + + +
+
+ + {/* Summary Cards */} +
+
+
+ +
+

Active Stores

+

{stores.length}

+
+
+
+
+
+ +
+

Total SKUs

+

{totalSKUs.toLocaleString()}

+
+
+
+
+
+ +
+

Total Snapshots

+

{totalSnapshots.toLocaleString()}

+
+
+
+
+
+ +
+

Avg Crawl Frequency

+

{avgFrequency.toFixed(1)}h

+
+
+
+
+ + {/* Filters */} +
+
+ + setSearchTerm(e.target.value)} + className="input input-bordered input-sm w-full pl-10" + /> +
+ + + Showing {filteredStores.length} of {stores.length} stores + +
+ + {/* Stores Table */} +
+
+ + + + + + + + + + + + + + {filteredStores.length === 0 ? ( + + + + ) : ( + filteredStores.map((store) => ( + navigate(`/admin/orchestrator/stores?storeId=${store.id}`)} + > + + + + + + + + + )) + )} + +
StoreLocationChainSKUsSnapshotsLast CrawlFrequency
+ No stores found +
+ {store.name} + + {store.city}, {store.state} + + {store.chainName ? ( + {store.chainName} + ) : ( + - + )} + + {store.skuCount.toLocaleString()} + + {store.snapshotCount.toLocaleString()} + + + {formatTimeAgo(store.lastCrawl)} + + + {getCrawlFrequencyBadge(store.crawlFrequencyHours)} +
+
+
+
+
+ ); +} + +export default IntelligenceStores; diff --git a/cannaiq/src/pages/OrchestratorBrands.tsx b/cannaiq/src/pages/OrchestratorBrands.tsx new file mode 100644 index 00000000..590af391 --- /dev/null +++ b/cannaiq/src/pages/OrchestratorBrands.tsx @@ -0,0 +1,70 @@ +import { Layout } from '../components/Layout'; +import { Building2, ArrowLeft } from 'lucide-react'; +import { useNavigate } from 'react-router-dom'; + +export function OrchestratorBrands() { + const navigate = useNavigate(); + + return ( + +
+ {/* Header */} +
+ +
+

Brands

+

Canonical brand catalog

+
+
+ + {/* Coming Soon Card */} +
+ +

+ Brand Catalog Coming Soon +

+

+ The canonical brand view will show all brands with store presence, + product counts, and portfolio brand tracking. +

+
+ +
+
+ + {/* Feature Preview */} +
+
+

Brand Normalization

+

+ Unified brand names across all provider feeds with alias detection. +

+
+
+

Store Presence

+

+ Track which brands are carried at which stores with availability. +

+
+
+

Portfolio Brands

+

+ Mark brands as portfolio for special tracking and analytics. +

+
+
+
+
+ ); +} diff --git a/cannaiq/src/pages/OrchestratorProducts.tsx b/cannaiq/src/pages/OrchestratorProducts.tsx new file mode 100644 index 00000000..742d11e4 --- /dev/null +++ b/cannaiq/src/pages/OrchestratorProducts.tsx @@ -0,0 +1,76 @@ +import { Layout } from '../components/Layout'; +import { Package, ArrowLeft } from 'lucide-react'; +import { useNavigate } from 'react-router-dom'; + +export function OrchestratorProducts() { + const navigate = useNavigate(); + + return ( + +
+ {/* Header */} +
+ +
+

Products

+

Canonical product catalog

+
+
+ + {/* Coming Soon Card */} +
+ +

+ Product Catalog Coming Soon +

+

+ The canonical product view will show all products across all stores, + with deduplication, brand mapping, and category normalization. +

+
+ + +
+
+ + {/* Feature Preview */} +
+
+

Deduplication

+

+ Products matched across stores using name, brand, and SKU patterns. +

+
+
+

Brand Mapping

+

+ Canonical brand names with variant detection and normalization. +

+
+
+

Price History

+

+ Historical price tracking across all stores with change detection. +

+
+
+
+
+ ); +} diff --git a/cannaiq/src/pages/ProductDetail.tsx b/cannaiq/src/pages/ProductDetail.tsx index 017cb3c0..5932c191 100644 --- a/cannaiq/src/pages/ProductDetail.tsx +++ b/cannaiq/src/pages/ProductDetail.tsx @@ -2,7 +2,7 @@ import { useEffect, useState } from 'react'; import { useParams, useNavigate } from 'react-router-dom'; import { Layout } from '../components/Layout'; import { api } from '../lib/api'; -import { ArrowLeft, ExternalLink, Package } from 'lucide-react'; +import { ArrowLeft, ExternalLink, Package, Code, Copy, CheckCircle, FileJson } from 'lucide-react'; export function ProductDetail() { const { id } = useParams(); @@ -10,11 +10,21 @@ export function ProductDetail() { const [product, setProduct] = useState(null); const [loading, setLoading] = useState(true); const [error, setError] = useState(null); + const [activeTab, setActiveTab] = useState<'details' | 'raw'>('details'); + const [rawPayload, setRawPayload] = useState | null>(null); + const [rawPayloadLoading, setRawPayloadLoading] = useState(false); + const [copied, setCopied] = useState(false); useEffect(() => { loadProduct(); }, [id]); + useEffect(() => { + if (activeTab === 'raw' && !rawPayload && id) { + loadRawPayload(); + } + }, [activeTab, id]); + const loadProduct = async () => { if (!id) return; @@ -31,6 +41,30 @@ export function ProductDetail() { } }; + const loadRawPayload = async () => { + if (!id) return; + + setRawPayloadLoading(true); + try { + const data = await api.getProductRawPayload(parseInt(id)); + setRawPayload(data.product?.rawPayload || data.product?.metadata || null); + } catch (err: any) { + console.error('Failed to load raw payload:', err); + // Use product metadata as fallback + if (product?.metadata) { + setRawPayload(product.metadata); + } + } finally { + setRawPayloadLoading(false); + } + }; + + const copyToClipboard = (text: string) => { + navigator.clipboard.writeText(text); + setCopied(true); + setTimeout(() => setCopied(false), 2000); + }; + if (loading) { return ( @@ -82,6 +116,33 @@ export function ProductDetail() { Back + {/* Tab Navigation */} +
+ + +
+ + {activeTab === 'details' ? (
{/* Product Image */} @@ -263,6 +324,76 @@ export function ProductDetail() {
+ ) : ( + /* Raw Payload Tab */ +
+
+
+ +

Raw Payload / Hydration Data

+
+ {rawPayload && ( + + )} +
+
+ {rawPayloadLoading ? ( +
+
+
+ ) : rawPayload ? ( +
+
+                    {JSON.stringify(rawPayload, null, 2)}
+                  
+
+ ) : product?.metadata && Object.keys(product.metadata).length > 0 ? ( +
+
+ Showing product.metadata (raw_payload not available from API) +
+
+                    {JSON.stringify(product.metadata, null, 2)}
+                  
+
+ ) : ( +
+ +

No Raw Payload Available

+

+ This product does not have raw payload data stored. +
+ Raw payloads are captured during the hydration process. +

+
+ )} + + {/* Debug Info */} +
+

Product ID: {product.id}

+

External ID: {product.external_id || product.external_product_id || '-'}

+

Store: {product.store_name || product.dispensary_name || '-'}

+

First Seen: {product.first_seen_at ? new Date(product.first_seen_at).toLocaleString() : '-'}

+

Last Updated: {product.updated_at ? new Date(product.updated_at).toLocaleString() : '-'}

+
+
+
+ )}
); diff --git a/cannaiq/src/pages/ScraperMonitor.tsx b/cannaiq/src/pages/ScraperMonitor.tsx index e9f4967f..b21f86da 100644 --- a/cannaiq/src/pages/ScraperMonitor.tsx +++ b/cannaiq/src/pages/ScraperMonitor.tsx @@ -1,6 +1,7 @@ import { useEffect, useState } from 'react'; import { Layout } from '../components/Layout'; import { api } from '../lib/api'; +import { WorkerRoleBadge, formatScope } from '../components/WorkerRoleBadge'; export function ScraperMonitor() { const [activeScrapers, setActiveScrapers] = useState([]); @@ -233,12 +234,18 @@ export function ScraperMonitor() { }}>
-
- {job.job_name} +
+ + {job.worker_name || job.schedule_worker_name || job.job_name} + +
-
+
{job.job_description || 'Scheduled job'}
+
+ Scope: {formatScope(job.metadata)} +
Processed
@@ -290,7 +297,7 @@ export function ScraperMonitor() { Store - Worker + Enqueued By Page Products Snapshots @@ -306,9 +313,11 @@ export function ScraperMonitor() {
{job.city} | ID: {job.dispensary_id}
-
- {job.worker_id ? job.worker_id.substring(0, 8) : '-'} -
+ {job.enqueued_by_worker ? ( + {job.enqueued_by_worker} + ) : ( + - + )} {job.worker_hostname && (
{job.worker_hostname}
)} @@ -362,7 +371,8 @@ export function ScraperMonitor() { - + + @@ -371,9 +381,12 @@ export function ScraperMonitor() { {azSummary.nextRuns.map((run: any) => ( +
JobWorkerRole Next Run Last Status
-
{run.job_name}
+
{run.worker_name || run.job_name}
{run.description}
+ +
{run.next_run_at ? new Date(run.next_run_at).toLocaleString() : '-'} @@ -450,7 +463,8 @@ export function ScraperMonitor() { - + + @@ -461,8 +475,13 @@ export function ScraperMonitor() { {azRecentJobs.jobLogs.slice(0, 20).map((job: any) => ( + {/* Platform ID Column */}
JobWorkerRole Status Processed Duration
-
{job.job_name}
-
Log #{job.id}
+
{job.worker_name || job.schedule_worker_name || job.job_name}
+
+ {formatScope(job.metadata) !== '-' ? formatScope(job.metadata) : `Log #${job.id}`} +
+
+ {/* Menu Type Column */} - {disp.menu_type ? ( - - {disp.menu_type} - - ) : ( - - unknown - - )} + + {getProviderDisplayName(disp.menu_type)} + diff --git a/cannaiq/src/pages/StateHeatmap.tsx b/cannaiq/src/pages/StateHeatmap.tsx new file mode 100644 index 00000000..e829df0b --- /dev/null +++ b/cannaiq/src/pages/StateHeatmap.tsx @@ -0,0 +1,288 @@ +/** + * State Heatmap + * + * Visual representation of state metrics with interactive map. + * Phase 4: Multi-State Expansion + */ + +import { useState, useEffect } from 'react'; +import { useNavigate } from 'react-router-dom'; +import { Layout } from '../components/Layout'; +import { useStateStore } from '../store/stateStore'; +import { api } from '../lib/api'; +import { + Store, + Package, + Tag, + DollarSign, + TrendingUp, + RefreshCw, + AlertCircle, + ArrowLeft +} from 'lucide-react'; + +interface HeatmapData { + state: string; + stateName: string; + value: number; + label: string; +} + +type MetricType = 'stores' | 'products' | 'brands' | 'avgPrice'; + +const METRIC_OPTIONS: { value: MetricType; label: string; icon: any; color: string }[] = [ + { value: 'stores', label: 'Store Count', icon: Store, color: 'emerald' }, + { value: 'products', label: 'Product Count', icon: Package, color: 'blue' }, + { value: 'brands', label: 'Brand Count', icon: Tag, color: 'purple' }, + { value: 'avgPrice', label: 'Avg Price', icon: DollarSign, color: 'orange' }, +]; + +// US State positions for simplified grid layout +const STATE_POSITIONS: Record = { + WA: { row: 0, col: 0 }, OR: { row: 1, col: 0 }, CA: { row: 2, col: 0 }, NV: { row: 2, col: 1 }, + AZ: { row: 3, col: 1 }, UT: { row: 2, col: 2 }, CO: { row: 2, col: 3 }, NM: { row: 3, col: 2 }, + MT: { row: 0, col: 2 }, ID: { row: 1, col: 1 }, WY: { row: 1, col: 2 }, + ND: { row: 0, col: 4 }, SD: { row: 1, col: 4 }, NE: { row: 2, col: 4 }, KS: { row: 3, col: 4 }, + MN: { row: 0, col: 5 }, IA: { row: 1, col: 5 }, MO: { row: 2, col: 5 }, AR: { row: 3, col: 5 }, + WI: { row: 0, col: 6 }, IL: { row: 1, col: 6 }, IN: { row: 1, col: 7 }, OH: { row: 1, col: 8 }, + MI: { row: 0, col: 7 }, PA: { row: 1, col: 9 }, NY: { row: 0, col: 9 }, NJ: { row: 2, col: 9 }, + MA: { row: 0, col: 10 }, CT: { row: 1, col: 10 }, RI: { row: 0, col: 11 }, + ME: { row: 0, col: 12 }, NH: { row: 0, col: 11 }, VT: { row: 0, col: 10 }, + FL: { row: 4, col: 8 }, GA: { row: 3, col: 8 }, AL: { row: 3, col: 7 }, MS: { row: 3, col: 6 }, + LA: { row: 4, col: 6 }, TX: { row: 4, col: 4 }, OK: { row: 3, col: 3 }, + TN: { row: 2, col: 7 }, KY: { row: 2, col: 8 }, WV: { row: 2, col: 8 }, VA: { row: 2, col: 9 }, + NC: { row: 3, col: 9 }, SC: { row: 3, col: 9 }, MD: { row: 2, col: 10 }, DE: { row: 2, col: 10 }, + DC: { row: 2, col: 10 }, HI: { row: 5, col: 0 }, AK: { row: 5, col: 1 }, +}; + +function getHeatColor(value: number, max: number, colorScheme: string): string { + if (value === 0) return 'bg-gray-100'; + + const intensity = Math.min(value / max, 1); + const level = Math.ceil(intensity * 5); + + const colors: Record = { + emerald: ['bg-emerald-100', 'bg-emerald-200', 'bg-emerald-300', 'bg-emerald-400', 'bg-emerald-500'], + blue: ['bg-blue-100', 'bg-blue-200', 'bg-blue-300', 'bg-blue-400', 'bg-blue-500'], + purple: ['bg-purple-100', 'bg-purple-200', 'bg-purple-300', 'bg-purple-400', 'bg-purple-500'], + orange: ['bg-orange-100', 'bg-orange-200', 'bg-orange-300', 'bg-orange-400', 'bg-orange-500'], + }; + + return colors[colorScheme]?.[level - 1] || 'bg-gray-100'; +} + +function StateCell({ + state, + data, + maxValue, + color, + onClick, +}: { + state: string; + data?: HeatmapData; + maxValue: number; + color: string; + onClick: () => void; +}) { + const value = data?.value || 0; + const bgColor = getHeatColor(value, maxValue, color); + + return ( + + ); +} + +export default function StateHeatmap() { + const navigate = useNavigate(); + const { setSelectedState } = useStateStore(); + const [loading, setLoading] = useState(true); + const [error, setError] = useState(null); + const [heatmapData, setHeatmapData] = useState([]); + const [selectedMetric, setSelectedMetric] = useState('products'); + + const fetchData = async () => { + setLoading(true); + setError(null); + try { + const response = await api.get(`/api/analytics/national/heatmap?metric=${selectedMetric}`); + if (response.data?.heatmap) { + setHeatmapData(response.data.heatmap); + } + } catch (err: any) { + setError(err.message || 'Failed to load heatmap data'); + } finally { + setLoading(false); + } + }; + + useEffect(() => { + fetchData(); + }, [selectedMetric]); + + const handleStateClick = (stateCode: string) => { + setSelectedState(stateCode); + navigate('/dashboard'); + }; + + const currentMetricOption = METRIC_OPTIONS.find(m => m.value === selectedMetric)!; + const maxValue = Math.max(...heatmapData.map(d => d.value), 1); + const dataByState = new Map(heatmapData.map(d => [d.state, d])); + + // Create grid + const gridRows: string[][] = []; + Object.entries(STATE_POSITIONS).forEach(([state, pos]) => { + if (!gridRows[pos.row]) gridRows[pos.row] = []; + gridRows[pos.row][pos.col] = state; + }); + + return ( + +
+ {/* Header */} +
+
+ +
+

State Heatmap

+

+ Visualize market presence across states +

+
+
+ +
+ + {/* Metric Selector */} +
+
+ Show: +
+ {METRIC_OPTIONS.map((option) => ( + + ))} +
+
+
+ + {/* Heatmap */} + {loading ? ( +
+
Loading heatmap...
+
+ ) : error ? ( +
+ +
{error}
+
+ ) : ( +
+
+ {gridRows.map((row, rowIdx) => ( +
+ {row.map((state, colIdx) => + state ? ( + dataByState.get(state) && handleStateClick(state)} + /> + ) : ( +
+ ) + )} +
+ ))} +
+ + {/* Legend */} +
+ Low +
+ {[1, 2, 3, 4, 5].map((level) => ( +
+ ))} +
+ High + + Max: {maxValue.toLocaleString()} + +
+
+ )} + + {/* Stats Summary */} + {!loading && !error && heatmapData.length > 0 && ( +
+
+
Active States
+
+ {heatmapData.filter(d => d.value > 0).length} +
+
+
+
Total {currentMetricOption.label}
+
+ {heatmapData.reduce((sum, d) => sum + d.value, 0).toLocaleString()} +
+
+
+
Top State
+
+ {heatmapData.sort((a, b) => b.value - a.value)[0]?.stateName || '-'} +
+
+
+
Average per State
+
+ {Math.round(heatmapData.reduce((sum, d) => sum + d.value, 0) / Math.max(heatmapData.filter(d => d.value > 0).length, 1)).toLocaleString()} +
+
+
+ )} +
+ + ); +} diff --git a/cannaiq/src/pages/StoreDetail.tsx b/cannaiq/src/pages/StoreDetail.tsx index f992e40d..d1734dd1 100644 --- a/cannaiq/src/pages/StoreDetail.tsx +++ b/cannaiq/src/pages/StoreDetail.tsx @@ -183,8 +183,8 @@ export function StoreDetail() {

{store.name}

- - {store.provider || 'Unknown'} + + {store.provider_display || 'Menu'}

Store ID: {store.id}

diff --git a/cannaiq/src/pages/SyncInfoPanel.tsx b/cannaiq/src/pages/SyncInfoPanel.tsx new file mode 100644 index 00000000..6d6d6cce --- /dev/null +++ b/cannaiq/src/pages/SyncInfoPanel.tsx @@ -0,0 +1,382 @@ +import { useEffect, useState } from 'react'; +import { useNavigate } from 'react-router-dom'; +import { Layout } from '../components/Layout'; +import { api } from '../lib/api'; +import { + RefreshCw, + Database, + ArrowRight, + CheckCircle, + XCircle, + Clock, + FileText, + Terminal, + AlertTriangle, + Info, + Copy, +} from 'lucide-react'; + +interface SyncInfo { + lastEtlRun: string | null; + rowsImported: number | null; + etlStatus: string; + envVars: { + cannaiqDbConfigured: boolean; + snapshotDbConfigured: boolean; + }; +} + +export function SyncInfoPanel() { + const navigate = useNavigate(); + const [syncInfo, setSyncInfo] = useState(null); + const [loading, setLoading] = useState(true); + const [copiedCommand, setCopiedCommand] = useState(null); + + useEffect(() => { + loadSyncInfo(); + }, []); + + const loadSyncInfo = async () => { + try { + setLoading(true); + const data = await api.getSyncInfo(); + setSyncInfo(data); + } catch (error) { + console.error('Failed to load sync info:', error); + } finally { + setLoading(false); + } + }; + + const copyCommand = (command: string, id: string) => { + navigator.clipboard.writeText(command); + setCopiedCommand(id); + setTimeout(() => setCopiedCommand(null), 2000); + }; + + const formatDate = (dateStr: string | null) => { + if (!dateStr) return 'Never'; + return new Date(dateStr).toLocaleString(); + }; + + const etlCommands = { + pgDump: `# Step 1: Export from remote production database +pg_dump -h $REMOTE_HOST -U $REMOTE_USER -d $REMOTE_DB \\ + --table=dutchie_products \\ + --table=dutchie_product_snapshots \\ + --table=dispensaries \\ + --data-only \\ + > remote_snapshot.sql`, + + pgRestore: `# Step 2: Import into local/staging database +psql -h localhost -U dutchie -d dutchie_menus \\ + < remote_snapshot.sql`, + + runEtl: `# Step 3: Run canonical hydration +DATABASE_URL="postgresql://dutchie:password@localhost:5432/dutchie_menus" \\ + npx tsx src/canonical-hydration/cli/products-only.ts`, + + backfill: `# Optional: Full backfill with date range +DATABASE_URL="postgresql://dutchie:password@localhost:5432/dutchie_menus" \\ + npx tsx src/canonical-hydration/cli/backfill.ts \\ + --start-date 2024-01-01 \\ + --end-date 2024-12-31`, + + incremental: `# Optional: Continuous incremental sync +DATABASE_URL="postgresql://dutchie:password@localhost:5432/dutchie_menus" \\ + npx tsx src/canonical-hydration/cli/incremental.ts --loop`, + }; + + if (loading) { + return ( + +
+
+

Loading sync info...

+
+
+ ); + } + + return ( + +
+ {/* Header */} +
+
+

Snapshot Sync

+

+ Remote to local data synchronization guide +

+
+ +
+ + {/* Important Notice */} +
+
+ +
+

Manual Operation Required

+

+ Data sync operations must be performed manually by an operator. + This page provides instructions and visibility only - it does NOT execute any sync commands. +

+
+
+
+ + {/* Sync Status */} + {syncInfo && ( +
+

+ + Current Sync Status +

+
+
+

Last ETL Run

+

{formatDate(syncInfo.lastEtlRun)}

+
+
+

Rows Imported

+

{syncInfo.rowsImported?.toLocaleString() || '-'}

+
+
+

ETL Status

+ + {syncInfo.etlStatus || 'Unknown'} + +
+
+

Environment

+
+ {syncInfo.envVars.cannaiqDbConfigured ? ( + + ) : ( + + )} + CANNAIQ_DB +
+
+
+
+ )} + + {/* Sync Architecture */} +
+

+ + How Sync Works +

+
+
+
+ +
+

Production DB

+

Remote Server

+
+ +
+
+ +
+

pg_dump

+

SQL Export

+
+ +
+
+ +
+

Local DB

+

Staging/Dev

+
+ +
+
+ +
+

ETL Script

+

Hydration

+
+ +
+
+ +
+

Canonical Tables

+

store_products, etc.

+
+
+
+ + {/* Commands */} +
+

+ + ETL Commands +

+ +
+ {/* pg_dump */} +
+
+

1. Export from Production

+ +
+
+                {etlCommands.pgDump}
+              
+
+ + {/* pg_restore */} +
+
+

2. Import to Local

+ +
+
+                {etlCommands.pgRestore}
+              
+
+ + {/* Run ETL */} +
+
+

3. Run Canonical Hydration

+ +
+
+                {etlCommands.runEtl}
+              
+
+ + {/* Backfill */} +
+
+

Optional: Full Backfill

+ +
+
+                {etlCommands.backfill}
+              
+
+ + {/* Incremental */} +
+
+

Optional: Continuous Sync

+ +
+
+                {etlCommands.incremental}
+              
+
+
+
+ + {/* Environment Variables */} +
+

Environment Variables

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
VariableDescriptionExample
DATABASE_URLLocal/staging PostgreSQL connectionpostgresql://dutchie:pass@localhost:5432/dutchie_menus
REMOTE_HOSTProduction database hostprod-db.example.com
REMOTE_USERProduction database userreadonly_user
REMOTE_DBProduction database namecannaiq_production
+
+
+
+
+ ); +} + +export default SyncInfoPanel; diff --git a/cannaiq/src/store/stateStore.ts b/cannaiq/src/store/stateStore.ts new file mode 100644 index 00000000..fbef1f1a --- /dev/null +++ b/cannaiq/src/store/stateStore.ts @@ -0,0 +1,72 @@ +/** + * State Store + * + * Global state management for multi-state selection. + * Phase 4: Multi-State Expansion + */ + +import { create } from 'zustand'; +import { persist } from 'zustand/middleware'; + +export interface StateOption { + code: string; + name: string; +} + +interface StateStoreState { + // Currently selected state (null = "All States" national view) + selectedState: string | null; + + // Available states (loaded from API) + availableStates: StateOption[]; + + // Loading state + isLoading: boolean; + + // Actions + setSelectedState: (state: string | null) => void; + setAvailableStates: (states: StateOption[]) => void; + setLoading: (loading: boolean) => void; + + // Derived helpers + getStateName: (code: string) => string; + isNationalView: () => boolean; +} + +export const useStateStore = create()( + persist( + (set, get) => ({ + // Default to null (All States / National view) + selectedState: null, + + availableStates: [ + { code: 'AZ', name: 'Arizona' }, + { code: 'CA', name: 'California' }, + { code: 'CO', name: 'Colorado' }, + { code: 'MI', name: 'Michigan' }, + { code: 'NV', name: 'Nevada' }, + ], + + isLoading: false, + + setSelectedState: (state) => set({ selectedState: state }), + + setAvailableStates: (states) => set({ availableStates: states }), + + setLoading: (loading) => set({ isLoading: loading }), + + getStateName: (code) => { + const state = get().availableStates.find((s) => s.code === code); + return state?.name || code; + }, + + isNationalView: () => get().selectedState === null, + }), + { + name: 'cannaiq-state-selection', + partialize: (state) => ({ + selectedState: state.selectedState, + }), + } + ) +); diff --git a/cannaiq/vite.config.ts b/cannaiq/vite.config.ts index d1aed7cd..8d09a6c6 100755 --- a/cannaiq/vite.config.ts +++ b/cannaiq/vite.config.ts @@ -5,9 +5,20 @@ export default defineConfig({ plugins: [react()], server: { host: true, - port: 5173, + port: 8080, watch: { usePolling: true + }, + // Proxy API calls to backend + proxy: { + '/api': { + target: 'http://localhost:3010', + changeOrigin: true, + } } + }, + // Ensure SPA routing works for /admin and /admin/* + preview: { + port: 8080 } }); diff --git a/docker-compose.local.yml b/docker-compose.local.yml index e3a120a3..bfa66b13 100644 --- a/docker-compose.local.yml +++ b/docker-compose.local.yml @@ -18,7 +18,7 @@ services: image: postgres:15-alpine container_name: cannaiq-postgres environment: - POSTGRES_DB: dutchie_menus + POSTGRES_DB: dutchie_legacy POSTGRES_USER: dutchie POSTGRES_PASSWORD: dutchie_local_pass ports: @@ -39,7 +39,13 @@ services: environment: NODE_ENV: development PORT: 3000 - DATABASE_URL: "postgresql://dutchie:dutchie_local_pass@postgres:5432/dutchie_menus" + # CannaiQ database connection - individual env vars + # These match what postgres service uses (POSTGRES_DB, POSTGRES_USER, POSTGRES_PASSWORD) + CANNAIQ_DB_HOST: postgres + CANNAIQ_DB_PORT: "5432" + CANNAIQ_DB_NAME: dutchie_legacy + CANNAIQ_DB_USER: dutchie + CANNAIQ_DB_PASS: dutchie_local_pass # Local storage - NO MinIO STORAGE_DRIVER: local STORAGE_BASE_PATH: /app/storage diff --git a/docker-compose.yml b/docker-compose.yml index b496bb7d..c76a8893 100755 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -1,9 +1,9 @@ services: postgres: image: postgres:15-alpine - container_name: dutchie-postgres + container_name: cannaiq-postgres environment: - POSTGRES_DB: dutchie_menus + POSTGRES_DB: dutchie_legacy POSTGRES_USER: dutchie POSTGRES_PASSWORD: dutchie_local_pass ports: @@ -38,11 +38,12 @@ services: build: context: ./backend dockerfile: Dockerfile.dev - container_name: dutchie-backend + container_name: cannaiq-backend environment: NODE_ENV: development PORT: 3000 - DATABASE_URL: "postgresql://dutchie:dutchie_local_pass@postgres:5432/dutchie_menus" + # Canonical CannaiQ database connection (NOT DATABASE_URL) + CANNAIQ_DB_URL: "postgresql://dutchie:dutchie_local_pass@postgres:5432/dutchie_legacy" MINIO_ENDPOINT: minio MINIO_PORT: 9000 MINIO_ACCESS_KEY: minioadmin diff --git a/docs/legacy_mapping.md b/docs/legacy_mapping.md new file mode 100644 index 00000000..4c33e087 --- /dev/null +++ b/docs/legacy_mapping.md @@ -0,0 +1,324 @@ +# Legacy Data Mapping: dutchie_legacy → CannaiQ + +## Overview + +This document describes the ETL mapping from the legacy `dutchie_legacy` database +to the canonical CannaiQ schema. All imports are **INSERT-ONLY** with no deletions +or overwrites of existing data. + +## Database Locations + +| Database | Host | Purpose | +|----------|------|---------| +| `cannaiq` | localhost:54320 | Main CannaiQ application schema | +| `dutchie_legacy` | localhost:54320 | Imported historical data from old dutchie_menus | + +## Schema Comparison + +### Legacy Tables (dutchie_legacy) + +| Table | Row Purpose | Key Columns | +|-------|-------------|-------------| +| `dispensaries` | Store locations | id, name, slug, city, state, menu_url, menu_provider, product_provider | +| `products` | Legacy product records | id, dispensary_id, dutchie_product_id, name, brand, price, thc_percentage | +| `dutchie_products` | Dutchie-specific products | id, dispensary_id, external_product_id, name, brand_name, type, stock_status | +| `dutchie_product_snapshots` | Historical price/stock snapshots | dutchie_product_id, crawled_at, rec_min_price_cents, stock_status | +| `brands` | Brand entities | id, store_id, name, dispensary_id | +| `categories` | Product categories | id, store_id, name, slug | +| `price_history` | Legacy price tracking | product_id, price, recorded_at | +| `specials` | Deals/promotions | id, dispensary_id, name, discount_type | + +### CannaiQ Canonical Tables + +| Table | Purpose | Key Columns | +|-------|---------|-------------| +| `dispensaries` | Store locations | id, name, slug, city, state, platform_dispensary_id | +| `dutchie_products` | Canonical products | id, dispensary_id, external_product_id, name, brand_name, stock_status | +| `dutchie_product_snapshots` | Historical snapshots | dutchie_product_id, crawled_at, rec_min_price_cents | +| `brands` (view: v_brands) | Derived from products | brand_name, brand_id, product_count | +| `categories` (view: v_categories) | Derived from products | type, subcategory, product_count | + +--- + +## Mapping Plan + +### 1. Dispensaries + +**Source:** `dutchie_legacy.dispensaries` +**Target:** `cannaiq.dispensaries` + +| Legacy Column | Canonical Column | Notes | +|---------------|------------------|-------| +| id | - | Generate new ID, store legacy_id | +| name | name | Direct map | +| slug | slug | Direct map | +| city | city | Direct map | +| state | state | Direct map | +| address | address | Direct map | +| zip | postal_code | Rename | +| latitude | latitude | Direct map | +| longitude | longitude | Direct map | +| menu_url | menu_url | Direct map | +| menu_provider | - | Store in raw_metadata | +| product_provider | - | Store in raw_metadata | +| website | website | Direct map | +| dba_name | - | Store in raw_metadata | +| - | platform | Set to 'dutchie' | +| - | legacy_id | New column: original ID from legacy | + +**Conflict Resolution:** +- ON CONFLICT (slug, city, state) DO NOTHING +- Match on slug+city+state combination +- Never overwrite existing dispensary data + +**Staging Table:** `dispensaries_from_legacy` +```sql +CREATE TABLE IF NOT EXISTS dispensaries_from_legacy ( + id SERIAL PRIMARY KEY, + legacy_id INTEGER NOT NULL, + name VARCHAR(255) NOT NULL, + slug VARCHAR(255) NOT NULL, + city VARCHAR(100) NOT NULL, + state VARCHAR(10) NOT NULL, + postal_code VARCHAR(20), + address TEXT, + latitude DECIMAL(10,7), + longitude DECIMAL(10,7), + menu_url TEXT, + website TEXT, + legacy_metadata JSONB, -- All other legacy fields + imported_at TIMESTAMPTZ DEFAULT NOW(), + UNIQUE(legacy_id) +); +``` + +--- + +### 2. Products (Legacy products table) + +**Source:** `dutchie_legacy.products` +**Target:** `cannaiq.products_from_legacy` (new staging table) + +| Legacy Column | Canonical Column | Notes | +|---------------|------------------|-------| +| id | legacy_product_id | Original ID | +| dispensary_id | legacy_dispensary_id | FK to legacy dispensary | +| dutchie_product_id | external_product_id | Dutchie's _id | +| name | name | Direct map | +| brand | brand_name | Direct map | +| price | price_cents | Multiply by 100 | +| original_price | original_price_cents | Multiply by 100 | +| thc_percentage | thc | Direct map | +| cbd_percentage | cbd | Direct map | +| strain_type | strain_type | Direct map | +| weight | weight | Direct map | +| image_url | primary_image_url | Direct map | +| in_stock | stock_status | Map: true→'in_stock', false→'out_of_stock' | +| first_seen_at | first_seen_at | Direct map | +| last_seen_at | last_seen_at | Direct map | +| raw_data | latest_raw_payload | Direct map | + +**Staging Table:** `products_from_legacy` +```sql +CREATE TABLE IF NOT EXISTS products_from_legacy ( + id SERIAL PRIMARY KEY, + legacy_product_id INTEGER NOT NULL, + legacy_dispensary_id INTEGER, + external_product_id VARCHAR(255), + name VARCHAR(500) NOT NULL, + brand_name VARCHAR(255), + type VARCHAR(100), + subcategory VARCHAR(100), + strain_type VARCHAR(50), + thc DECIMAL(10,4), + cbd DECIMAL(10,4), + price_cents INTEGER, + original_price_cents INTEGER, + stock_status VARCHAR(20), + weight VARCHAR(100), + primary_image_url TEXT, + first_seen_at TIMESTAMPTZ, + last_seen_at TIMESTAMPTZ, + legacy_raw_payload JSONB, + imported_at TIMESTAMPTZ DEFAULT NOW(), + UNIQUE(legacy_product_id) +); +``` + +--- + +### 3. Dutchie Products + +**Source:** `dutchie_legacy.dutchie_products` +**Target:** `cannaiq.dutchie_products` + +These tables have nearly identical schemas. The mapping is direct: + +| Legacy Column | Canonical Column | Notes | +|---------------|------------------|-------| +| id | - | Generate new, store as legacy_dutchie_product_id | +| dispensary_id | dispensary_id | Map via dispensary slug lookup | +| external_product_id | external_product_id | Direct (Dutchie _id) | +| platform_dispensary_id | platform_dispensary_id | Direct | +| name | name | Direct | +| brand_name | brand_name | Direct | +| type | type | Direct | +| subcategory | subcategory | Direct | +| strain_type | strain_type | Direct | +| thc/thc_content | thc/thc_content | Direct | +| cbd/cbd_content | cbd/cbd_content | Direct | +| stock_status | stock_status | Direct | +| images | images | Direct (JSONB) | +| latest_raw_payload | latest_raw_payload | Direct | + +**Conflict Resolution:** +```sql +ON CONFLICT (dispensary_id, external_product_id) DO NOTHING +``` +- Never overwrite existing products +- Skip duplicates silently + +--- + +### 4. Dutchie Product Snapshots + +**Source:** `dutchie_legacy.dutchie_product_snapshots` +**Target:** `cannaiq.dutchie_product_snapshots` + +| Legacy Column | Canonical Column | Notes | +|---------------|------------------|-------| +| id | - | Generate new | +| dutchie_product_id | dutchie_product_id | Map via product lookup | +| dispensary_id | dispensary_id | Map via dispensary lookup | +| crawled_at | crawled_at | Direct | +| rec_min_price_cents | rec_min_price_cents | Direct | +| rec_max_price_cents | rec_max_price_cents | Direct | +| stock_status | stock_status | Direct | +| options | options | Direct (JSONB) | +| raw_payload | raw_payload | Direct (JSONB) | + +**Conflict Resolution:** +```sql +-- No unique constraint on snapshots - all are historical records +-- Just INSERT, no conflict handling needed +INSERT INTO dutchie_product_snapshots (...) VALUES (...) +``` + +--- + +### 5. Price History + +**Source:** `dutchie_legacy.price_history` +**Target:** `cannaiq.price_history_legacy` (new staging table) + +```sql +CREATE TABLE IF NOT EXISTS price_history_legacy ( + id SERIAL PRIMARY KEY, + legacy_product_id INTEGER NOT NULL, + price_cents INTEGER, + recorded_at TIMESTAMPTZ, + imported_at TIMESTAMPTZ DEFAULT NOW() +); +``` + +--- + +## ETL Process + +### Phase 1: Staging Tables (INSERT-ONLY) + +1. Create staging tables with `_from_legacy` or `_legacy` suffix +2. Read from `dutchie_legacy.*` tables in batches +3. INSERT into staging tables with ON CONFLICT DO NOTHING +4. Log counts: read, inserted, skipped + +### Phase 2: ID Mapping + +1. Build ID mapping tables: + - `legacy_dispensary_id` → `canonical_dispensary_id` + - `legacy_product_id` → `canonical_product_id` +2. Match on unique keys (slug+city+state for dispensaries, external_product_id for products) + +### Phase 3: Canonical Merge (Optional, User-Approved) + +Only if explicitly requested: +1. INSERT new records into canonical tables +2. Never UPDATE existing records +3. Never DELETE any records + +--- + +## Safety Rules + +1. **INSERT-ONLY**: No UPDATE, no DELETE, no TRUNCATE +2. **ON CONFLICT DO NOTHING**: Skip duplicates, never overwrite +3. **Batch Processing**: 500-1000 rows per batch to avoid memory issues +4. **Manual Invocation Only**: ETL script requires explicit user execution +5. **Logging**: Record all operations with counts and timestamps +6. **Dry Run Mode**: Support `--dry-run` flag to preview without writes + +--- + +## Validation Queries + +After import, verify with: + +```sql +-- Count imported dispensaries +SELECT COUNT(*) FROM dispensaries_from_legacy; + +-- Count imported products +SELECT COUNT(*) FROM products_from_legacy; + +-- Check for duplicates that were skipped +SELECT + (SELECT COUNT(*) FROM dutchie_legacy.dispensaries) as legacy_count, + (SELECT COUNT(*) FROM dispensaries_from_legacy) as imported_count; + +-- Verify no data loss +SELECT + l.id as legacy_id, + l.name as legacy_name, + c.id as canonical_id +FROM dutchie_legacy.dispensaries l +LEFT JOIN dispensaries c ON c.slug = l.slug AND c.city = l.city AND c.state = l.state +WHERE c.id IS NULL +LIMIT 10; +``` + +--- + +## Invocation + +```bash +# From backend directory +npx tsx src/scripts/etl/legacy-import.ts + +# With dry-run +npx tsx src/scripts/etl/legacy-import.ts --dry-run + +# Import specific tables only +npx tsx src/scripts/etl/legacy-import.ts --tables=dispensaries,products +``` + +--- + +## Environment Variables + +The ETL script expects these environment variables (user configures): + +```bash +# Connection to cannaiq-postgres (same host, different databases) +CANNAIQ_DB_HOST=localhost +CANNAIQ_DB_PORT=54320 +CANNAIQ_DB_USER=cannaiq +CANNAIQ_DB_PASSWORD= +CANNAIQ_DB_NAME=cannaiq + +# Legacy database (same host, different database) +LEGACY_DB_HOST=localhost +LEGACY_DB_PORT=54320 +LEGACY_DB_USER=dutchie +LEGACY_DB_PASSWORD= +LEGACY_DB_NAME=dutchie_legacy +``` diff --git a/docs/multi-state.md b/docs/multi-state.md new file mode 100644 index 00000000..713c7cb9 --- /dev/null +++ b/docs/multi-state.md @@ -0,0 +1,345 @@ +# Multi-State Support + +## Overview + +Phase 4 implements full multi-state support for CannaiQ, transforming it from an Arizona-only platform to a national cannabis intelligence system. This document covers schema updates, API structure, frontend usage, and operational guidelines. + +## Schema Updates + +### Core Tables Modified + +#### 1. `dispensaries` table +Already has `state` column: +```sql +state CHAR(2) DEFAULT 'AZ' -- State code (AZ, CA, CO, etc.) +state_id INTEGER REFERENCES states(id) -- FK to canonical states table +``` + +#### 2. `raw_payloads` table (Migration 047) +Added state column for query optimization: +```sql +state CHAR(2) -- Denormalized from dispensary for fast filtering +``` + +#### 3. `states` table +Canonical reference for all US states: +```sql +CREATE TABLE states ( + id SERIAL PRIMARY KEY, + code VARCHAR(2) NOT NULL UNIQUE, -- 'AZ', 'CA', etc. + name VARCHAR(100) NOT NULL, -- 'Arizona', 'California', etc. + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW() +); +``` + +### New Indexes (Migration 047) + +```sql +-- State-based payload filtering +CREATE INDEX idx_raw_payloads_state ON raw_payloads(state); +CREATE INDEX idx_raw_payloads_state_unprocessed ON raw_payloads(state, processed) WHERE processed = FALSE; + +-- Dispensary state queries +CREATE INDEX idx_dispensaries_state_menu_type ON dispensaries(state, menu_type); +CREATE INDEX idx_dispensaries_state_crawl_status ON dispensaries(state, crawl_status); +CREATE INDEX idx_dispensaries_state_active ON dispensaries(state) WHERE crawl_status != 'disabled'; +``` + +### Materialized Views + +#### `mv_state_metrics` +Pre-aggregated state-level metrics for fast dashboard queries: +```sql +CREATE MATERIALIZED VIEW mv_state_metrics AS +SELECT + d.state, + s.name AS state_name, + COUNT(DISTINCT d.id) AS store_count, + COUNT(DISTINCT sp.id) AS total_products, + COUNT(DISTINCT sp.brand_id) AS unique_brands, + AVG(sp.price_rec) AS avg_price_rec, + -- ... more metrics +FROM dispensaries d +LEFT JOIN states s ON d.state = s.code +LEFT JOIN store_products sp ON d.id = sp.dispensary_id +GROUP BY d.state, s.name; +``` + +Refresh with: +```sql +SELECT refresh_state_metrics(); +``` + +### Views + +#### `v_brand_state_presence` +Brand presence and metrics per state: +```sql +SELECT brand_id, brand_name, state, store_count, product_count, avg_price +FROM v_brand_state_presence +WHERE state = 'AZ'; +``` + +#### `v_category_state_distribution` +Category distribution by state: +```sql +SELECT state, category, product_count, store_count, avg_price +FROM v_category_state_distribution +WHERE state = 'CA'; +``` + +### Functions + +#### `fn_national_price_comparison(category, brand_id)` +Compare prices across all states: +```sql +SELECT * FROM fn_national_price_comparison('Flower', NULL); +``` + +#### `fn_brand_state_penetration(brand_id)` +Get brand penetration across states: +```sql +SELECT * FROM fn_brand_state_penetration(123); +``` + +## API Structure + +### State List Endpoints + +``` +GET /api/states # All configured states +GET /api/states?active=true # Only states with dispensary data +``` + +Response: +```json +{ + "success": true, + "data": { + "states": [ + { "code": "AZ", "name": "Arizona" }, + { "code": "CA", "name": "California" } + ], + "count": 2 + } +} +``` + +### State-Specific Endpoints + +``` +GET /api/state/:state/summary # Full state summary with metrics +GET /api/state/:state/brands # Brands in state (paginated) +GET /api/state/:state/categories # Categories in state (paginated) +GET /api/state/:state/stores # Stores in state (paginated) +GET /api/state/:state/analytics/prices # Price distribution +``` + +Query parameters: +- `limit` - Results per page (default: 50) +- `offset` - Pagination offset +- `sortBy` - Sort field (e.g., `productCount`, `avgPrice`) +- `sortDir` - Sort direction (`asc` or `desc`) +- `includeInactive` - Include disabled stores + +### National Analytics Endpoints + +``` +GET /api/analytics/national/summary # National aggregate metrics +GET /api/analytics/national/prices # Price comparison across states +GET /api/analytics/national/heatmap # State heatmap data +GET /api/analytics/national/metrics # All state metrics +``` + +Heatmap metrics: +- `stores` - Store count per state +- `products` - Product count per state +- `brands` - Brand count per state +- `avgPrice` - Average price per state +- `penetration` - Brand penetration (requires `brandId`) + +### Cross-State Comparison Endpoints + +``` +GET /api/analytics/compare/brand/:brandId # Compare brand across states +GET /api/analytics/compare/category/:category # Compare category across states +GET /api/analytics/brand/:brandId/penetration # Brand penetration by state +GET /api/analytics/brand/:brandId/trend # Historical penetration trend +``` + +Query parameters: +- `states` - Comma-separated state codes to include (optional) +- `days` - Days of history for trends (default: 30) + +### Admin Endpoints + +``` +POST /api/admin/states/refresh-metrics # Refresh materialized views +``` + +## Frontend Usage + +### State Selector + +The global state selector is in the sidebar and persists selection via localStorage: + +```tsx +import { useStateStore } from '../store/stateStore'; + +function MyComponent() { + const { selectedState, setSelectedState, isNationalView } = useStateStore(); + + // null = All States / National view + // 'AZ' = Arizona only + + if (isNationalView()) { + // Show national data + } else { + // Filter by selectedState + } +} +``` + +### State Badge Component + +Show current state selection: +```tsx +import { StateBadge } from '../components/StateSelector'; + + // Shows "National" or state name +``` + +### API Calls with State Filter + +```tsx +import { api } from '../lib/api'; +import { useStateStore } from '../store/stateStore'; + +function useStateData() { + const { selectedState } = useStateStore(); + + useEffect(() => { + if (selectedState) { + // State-specific data + api.get(`/state/${selectedState}/summary`); + } else { + // National data + api.get('/analytics/national/summary'); + } + }, [selectedState]); +} +``` + +### Navigation Routes + +| Route | Component | Description | +|-------|-----------|-------------| +| `/national` | NationalDashboard | National overview with all states | +| `/national/heatmap` | StateHeatmap | Interactive state heatmap | +| `/national/compare` | CrossStateCompare | Brand/category cross-state comparison | + +## Ingestion Rules + +### State Assignment + +Every raw payload MUST include state: +1. State is looked up from `dispensaries.state` during payload storage +2. Stored on `raw_payloads.state` for query optimization +3. Inherited by all normalized products/snapshots via `dispensary_id` + +### Hydration Pipeline + +The hydration worker supports state filtering: +```typescript +// Process only AZ payloads +await getUnprocessedPayloads(pool, { state: 'AZ' }); + +// Process multiple states +await getUnprocessedPayloads(pool, { states: ['AZ', 'CA', 'NV'] }); +``` + +### Data Isolation + +Critical rules: +- **No cross-state contamination** - Product IDs are unique per (dispensary_id, provider_product_id) +- **No SKU merging** - Same SKU in AZ and CA are separate products +- **No store merging** - Same store name in different states are separate records +- Every dispensary maps to exactly ONE state + +## Constraints & Best Practices + +### Query Performance + +1. Use `mv_state_metrics` for dashboard queries (refreshed hourly) +2. Use indexed views for brand/category queries +3. Filter by state early in queries to leverage indexes +4. For cross-state queries, use the dedicated comparison functions + +### Cache Strategy + +API endpoints should be cached with Redis: +```typescript +// Cache key pattern +`state:${state}:summary` // State summary - 5 min TTL +`national:summary` // National summary - 5 min TTL +`heatmap:${metric}` // Heatmap data - 5 min TTL +``` + +### Adding New States + +1. Add state to `states` table (if not already present) +2. Import dispensary data with correct `state` code +3. Run menu detection for new dispensaries +4. Crawl dispensaries with resolved platform IDs +5. Refresh materialized views: `SELECT refresh_state_metrics()` + +## Migration Guide + +### From Arizona-Only to Multi-State + +1. Apply migration 047: + ```bash + DATABASE_URL="..." npm run migrate + ``` + +2. Existing AZ data requires no changes (already has `state='AZ'`) + +3. New states are added via: + - Manual dispensary import + - Menu detection crawl + - Platform ID resolution + +4. Frontend automatically shows state selector after update + +### Rollback + +Migration 047 is additive - no destructive changes: +- New columns have defaults +- Views can be dropped without data loss +- Indexes can be dropped for performance tuning + +## Monitoring + +### Key Metrics to Watch + +1. **Store count by state** - `SELECT state, COUNT(*) FROM dispensaries GROUP BY state` +2. **Product coverage** - `SELECT state, COUNT(DISTINCT sp.id) FROM store_products...` +3. **Crawl health by state** - Check `crawl_runs` by dispensary state +4. **Materialized view freshness** - `SELECT refreshed_at FROM mv_state_metrics` + +### Alerts + +Set up alerts for: +- Materialized view not refreshed in 2+ hours +- State with 0 products after having products +- Cross-state data appearing (should never happen) + +## Future Enhancements + +Planned for future phases: +1. **Redis caching** for all state endpoints +2. **Real-time refresh** of materialized views +3. **Geographic heatmaps** with actual US map visualization +4. **State-specific pricing rules** (tax rates, etc.) +5. **Multi-state brand portfolio tracking** diff --git a/docs/platform-slug-mapping.md b/docs/platform-slug-mapping.md new file mode 100644 index 00000000..42dd38e0 --- /dev/null +++ b/docs/platform-slug-mapping.md @@ -0,0 +1,162 @@ +# Platform Slug Mapping + +## Overview + +To avoid trademark issues in public-facing API URLs, CannaiQ uses neutral two-letter slugs instead of vendor names in route paths. + +**Important**: The actual `platform` value stored in the database remains the full name (e.g., `'dutchie'`). Only the URL paths use neutral slugs. + +## Platform Slug Reference + +| Slug | Platform | DB Value | Status | +|------|----------|----------|--------| +| `dt` | Dutchie | `'dutchie'` | Active | +| `jn` | Jane | `'jane'` | Future | +| `wm` | Weedmaps | `'weedmaps'` | Future | +| `lf` | Leafly | `'leafly'` | Future | +| `tz` | Treez | `'treez'` | Future | +| `bl` | Blaze | `'blaze'` | Future | +| `fl` | Flowhub | `'flowhub'` | Future | + +## API Route Patterns + +### Discovery Routes + +``` +/api/discovery/platforms/:platformSlug/locations +/api/discovery/platforms/:platformSlug/locations/:id +/api/discovery/platforms/:platformSlug/locations/:id/verify-create +/api/discovery/platforms/:platformSlug/locations/:id/verify-link +/api/discovery/platforms/:platformSlug/locations/:id/reject +/api/discovery/platforms/:platformSlug/locations/:id/unreject +/api/discovery/platforms/:platformSlug/locations/:id/match-candidates +/api/discovery/platforms/:platformSlug/cities +/api/discovery/platforms/:platformSlug/summary +``` + +### Orchestrator Routes + +``` +/api/orchestrator/platforms/:platformSlug/promote/:id +``` + +## Example Usage + +### Fetch Discovered Locations (Dutchie) + +```bash +# Using neutral slug 'dt' instead of 'dutchie' +curl "https://api.cannaiq.co/api/discovery/platforms/dt/locations?status=discovered&state_code=AZ" +``` + +### Verify and Create Dispensary + +```bash +curl -X POST "https://api.cannaiq.co/api/discovery/platforms/dt/locations/123/verify-create" \ + -H "Content-Type: application/json" \ + -d '{"verifiedBy": "admin"}' +``` + +### Link to Existing Dispensary + +```bash +curl -X POST "https://api.cannaiq.co/api/discovery/platforms/dt/locations/123/verify-link" \ + -H "Content-Type: application/json" \ + -d '{"dispensaryId": 456, "verifiedBy": "admin"}' +``` + +### Promote to Crawlable + +```bash +curl -X POST "https://api.cannaiq.co/api/orchestrator/platforms/dt/promote/123" +``` + +### Get Discovery Summary + +```bash +curl "https://api.cannaiq.co/api/discovery/platforms/dt/summary" +``` + +## Migration Guide + +### Old Routes (DEPRECATED) + +| Old Route | New Route | +|-----------|-----------| +| `/api/discovery/dutchie/locations` | `/api/discovery/platforms/dt/locations` | +| `/api/discovery/dutchie/locations/:id` | `/api/discovery/platforms/dt/locations/:id` | +| `/api/discovery/dutchie/locations/:id/verify-create` | `/api/discovery/platforms/dt/locations/:id/verify-create` | +| `/api/discovery/dutchie/locations/:id/verify-link` | `/api/discovery/platforms/dt/locations/:id/verify-link` | +| `/api/discovery/dutchie/locations/:id/reject` | `/api/discovery/platforms/dt/locations/:id/reject` | +| `/api/discovery/dutchie/locations/:id/unreject` | `/api/discovery/platforms/dt/locations/:id/unreject` | +| `/api/discovery/dutchie/locations/:id/match-candidates` | `/api/discovery/platforms/dt/locations/:id/match-candidates` | +| `/api/discovery/dutchie/cities` | `/api/discovery/platforms/dt/cities` | +| `/api/discovery/dutchie/summary` | `/api/discovery/platforms/dt/summary` | +| `/api/discovery/dutchie/nearby` | `/api/discovery/platforms/dt/nearby` | +| `/api/discovery/dutchie/geo-stats` | `/api/discovery/platforms/dt/geo-stats` | +| `/api/discovery/dutchie/locations/:id/validate-geo` | `/api/discovery/platforms/dt/locations/:id/validate-geo` | +| `/api/orchestrator/dutchie/promote/:id` | `/api/orchestrator/platforms/dt/promote/:id` | + +### API Client Changes + +| Old Method | New Method | +|------------|------------| +| `getDutchieDiscoverySummary()` | `getPlatformDiscoverySummary('dt')` | +| `getDutchieDiscoveryLocations(params)` | `getPlatformDiscoveryLocations('dt', params)` | +| `getDutchieDiscoveryLocation(id)` | `getPlatformDiscoveryLocation('dt', id)` | +| `verifyCreateDutchieLocation(id)` | `verifyCreatePlatformLocation('dt', id)` | +| `verifyLinkDutchieLocation(id, dispId)` | `verifyLinkPlatformLocation('dt', id, dispId)` | +| `rejectDutchieLocation(id, reason)` | `rejectPlatformLocation('dt', id, reason)` | +| `unrejectDutchieLocation(id)` | `unrejectPlatformLocation('dt', id)` | +| `getDutchieLocationMatchCandidates(id)` | `getPlatformLocationMatchCandidates('dt', id)` | +| `getDutchieDiscoveryCities(params)` | `getPlatformDiscoveryCities('dt', params)` | +| `getDutchieNearbyLocations(lat, lon)` | `getPlatformNearbyLocations('dt', lat, lon)` | +| `getDutchieGeoStats()` | `getPlatformGeoStats('dt')` | +| `validateDutchieLocationGeo(id)` | `validatePlatformLocationGeo('dt', id)` | +| `promoteDutchieDiscoveryLocation(id)` | `promotePlatformDiscoveryLocation('dt', id)` | + +## Adding New Platforms + +When adding support for a new platform: + +1. **Assign a slug**: Choose a two-letter neutral slug +2. **Update validation**: Add to `validPlatforms` array in `backend/src/index.ts` +3. **Create routes**: Implement platform-specific discovery routes +4. **Update docs**: Add to this document + +### Example: Adding Jane Support + +```typescript +// backend/src/index.ts +const validPlatforms = ['dt', 'jn']; // Add 'jn' for Jane + +// Create Jane discovery routes +const jnDiscoveryRoutes = createJaneDiscoveryRoutes(getPool()); +app.use('/api/discovery/platforms/jn', jnDiscoveryRoutes); +``` + +## Database Schema + +The `platform` column in discovery tables stores the **full platform name** (not the slug): + +```sql +-- dutchie_discovery_locations table +SELECT * FROM dutchie_discovery_locations WHERE platform = 'dutchie'; + +-- dutchie_discovery_cities table +SELECT * FROM dutchie_discovery_cities WHERE platform = 'dutchie'; +``` + +This keeps the database schema clean and allows for future renaming of URL slugs without database migrations. + +## Safe Naming Conventions + +### DO +- Use neutral two-letter slugs in URLs: `dt`, `jn`, `wm` +- Use generic terms in user-facing text: "platform", "menu provider" +- Store full platform names in the database for clarity + +### DON'T +- Use trademarked names in URL paths +- Use vendor names in public-facing error messages +- Expose vendor-specific identifiers in consumer APIs diff --git a/k8s/cannaiq-frontend.yaml b/k8s/cannaiq-frontend.yaml new file mode 100644 index 00000000..3422d737 --- /dev/null +++ b/k8s/cannaiq-frontend.yaml @@ -0,0 +1,41 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: cannaiq-frontend + namespace: dispensary-scraper +spec: + replicas: 1 + selector: + matchLabels: + app: cannaiq-frontend + template: + metadata: + labels: + app: cannaiq-frontend + spec: + imagePullSecrets: + - name: regcred + containers: + - name: cannaiq-frontend + image: code.cannabrands.app/creationshop/cannaiq-frontend:latest + ports: + - containerPort: 80 + resources: + requests: + memory: "64Mi" + cpu: "50m" + limits: + memory: "128Mi" + cpu: "100m" +--- +apiVersion: v1 +kind: Service +metadata: + name: cannaiq-frontend + namespace: dispensary-scraper +spec: + selector: + app: cannaiq-frontend + ports: + - port: 80 + targetPort: 80 diff --git a/setup-local.sh b/setup-local.sh new file mode 100755 index 00000000..66f6675d --- /dev/null +++ b/setup-local.sh @@ -0,0 +1,4 @@ +#!/usr/bin/env bash +cd "$(dirname "$0")/backend" +./setup-local.sh + diff --git a/stop-local.sh b/stop-local.sh new file mode 100755 index 00000000..0fa8bf5e --- /dev/null +++ b/stop-local.sh @@ -0,0 +1,4 @@ +#!/usr/bin/env bash +cd "$(dirname "$0")/backend" +./stop-local.sh +