# CannaiQ Backend Codebase Map **Last Updated:** 2025-12-12 **Purpose:** Help Claude and developers understand which code is current vs deprecated --- ## Quick Reference: What to Use ### For Crawling/Scraping | Task | Use This | NOT This | |------|----------|----------| | Fetch products | `src/tasks/handlers/payload-fetch.ts` | `src/hydration/*` | | Process products | `src/tasks/handlers/product-refresh.ts` | `src/scraper-v2/*` | | GraphQL client | `src/platforms/dutchie/client.ts` | `src/dutchie-az/services/graphql-client.ts` | | Worker system | `src/tasks/task-worker.ts` | `src/dutchie-az/services/worker.ts` | ### For Database | Task | Use This | NOT This | |------|----------|----------| | Get DB pool | `src/db/pool.ts` | `src/dutchie-az/db/connection.ts` | | Run migrations | `src/db/migrate.ts` (CLI only) | Never import at runtime | | Query products | `store_products` table | `products`, `dutchie_products` | | Query stores | `dispensaries` table | `stores` table | ### For Discovery | Task | Use This | |------|----------| | Discover stores | `src/discovery/*.ts` | | Run discovery | `npx tsx src/scripts/run-discovery.ts` | --- ## Directory Status ### ACTIVE DIRECTORIES (Use These) ``` src/ ├── auth/ # JWT/session auth, middleware ├── db/ # Database pool, migrations ├── discovery/ # Dutchie store discovery pipeline ├── middleware/ # Express middleware ├── multi-state/ # Multi-state query support ├── platforms/ # Platform-specific clients (Dutchie, Jane, etc) │ └── dutchie/ # THE Dutchie client - use this one ├── routes/ # Express API routes ├── services/ # Core services (logger, scheduler, etc) ├── tasks/ # Task system (workers, handlers, scheduler) │ └── handlers/ # Task handlers (payload_fetch, product_refresh, etc) ├── types/ # TypeScript types └── utils/ # Utilities (storage, image processing) ``` ### DEPRECATED DIRECTORIES (DO NOT USE) ``` src/ ├── hydration/ # DEPRECATED - Old pipeline approach ├── scraper-v2/ # DEPRECATED - Old scraper engine ├── canonical-hydration/# DEPRECATED - Merged into tasks/handlers ├── dutchie-az/ # PARTIAL - Some parts deprecated, some active │ ├── db/ # DEPRECATED - Use src/db/pool.ts │ └── services/ # PARTIAL - worker.ts still runs, graphql-client.ts deprecated ├── portals/ # FUTURE - Not yet implemented ├── seo/ # PARTIAL - Settings work, templates WIP └── system/ # DEPRECATED - Old orchestration system ``` ### DEPRECATED FILES (DO NOT USE) ``` src/dutchie-az/db/connection.ts # Use src/db/pool.ts instead src/dutchie-az/services/graphql-client.ts # Use src/platforms/dutchie/client.ts src/hydration/*.ts # Entire directory deprecated src/scraper-v2/*.ts # Entire directory deprecated ``` --- ## Key Files Reference ### Entry Points | File | Purpose | Status | |------|---------|--------| | `src/index.ts` | Main Express server | ACTIVE | | `src/dutchie-az/services/worker.ts` | Worker process entry | ACTIVE | | `src/tasks/task-worker.ts` | Task worker (new system) | ACTIVE | ### Dutchie Integration | File | Purpose | Status | |------|---------|--------| | `src/platforms/dutchie/client.ts` | GraphQL client, hashes, curl | **PRIMARY** | | `src/platforms/dutchie/queries.ts` | High-level query functions | ACTIVE | | `src/platforms/dutchie/index.ts` | Re-exports | ACTIVE | ### Task Handlers | File | Purpose | Status | |------|---------|--------| | `src/tasks/handlers/payload-fetch.ts` | Fetch products from Dutchie | **PRIMARY** | | `src/tasks/handlers/product-refresh.ts` | Process payload into DB | **PRIMARY** | | `src/tasks/handlers/entry-point-discovery.ts` | Resolve platform IDs (auto-healing) | **PRIMARY** | | `src/tasks/handlers/menu-detection.ts` | Detect menu type | ACTIVE | | `src/tasks/handlers/id-resolution.ts` | Resolve platform IDs (legacy) | LEGACY | | `src/tasks/handlers/image-download.ts` | Download product images | ACTIVE | --- ## Transport Rules (CRITICAL) **Browser-based (Puppeteer) is the DEFAULT transport. curl is ONLY allowed when explicitly specified.** ### Transport Selection | `task.method` | Transport Used | Notes | |---------------|----------------|-------| | `null` | Browser (Puppeteer) | DEFAULT - use this for most tasks | | `'http'` | Browser (Puppeteer) | Explicit browser request | | `'curl'` | curl-impersonate | ONLY when explicitly needed | ### Why Browser-First? 1. **Anti-detection**: Puppeteer with StealthPlugin evades bot detection 2. **Session cookies**: Browser maintains session state automatically 3. **Fingerprinting**: Real browser fingerprint (TLS, headers, etc.) 4. **Age gates**: Browser can click through age verification ### Entry Point Discovery Auto-Healing The `entry_point_discovery` handler uses a healing strategy: ``` 1. FIRST: Check dutchie_discovery_locations for existing platform_location_id - By linked dutchie_discovery_id - By slug match in discovery data → If found, NO network call needed 2. SECOND: Browser-based GraphQL (Puppeteer) - 5x retries for network/proxy failures - On HTTP 403: rotate proxy and retry - On HTTP 404 after 2 attempts: mark as 'removed' 3. HARD FAILURE: After exhausting options → 'needs_investigation' ``` ### DO NOT Use curl Unless: - Task explicitly has `method = 'curl'` - You're testing curl-impersonate binaries - The API explicitly requires curl fingerprinting ### Files | File | Transport | Purpose | |------|-----------|---------| | `src/services/puppeteer-preflight.ts` | Browser | Preflight check | | `src/services/curl-preflight.ts` | curl | Preflight check | | `src/tasks/handlers/entry-point-discovery.ts` | Browser | Platform ID resolution | | `src/tasks/handlers/payload-fetch.ts` | Both | Product fetching | ### Database | File | Purpose | Status | |------|---------|--------| | `src/db/pool.ts` | Canonical DB pool | **PRIMARY** | | `src/db/migrate.ts` | Migration runner (CLI only) | CLI ONLY | | `src/db/auto-migrate.ts` | Auto-run migrations on startup | ACTIVE | ### Configuration | File | Purpose | Status | |------|---------|--------| | `.env` | Environment variables | ACTIVE | | `package.json` | Dependencies | ACTIVE | | `tsconfig.json` | TypeScript config | ACTIVE | --- ## GraphQL Hashes (CRITICAL) The correct hashes are in `src/platforms/dutchie/client.ts`: ```typescript export const GRAPHQL_HASHES = { FilteredProducts: 'ee29c060826dc41c527e470e9ae502c9b2c169720faa0a9f5d25e1b9a530a4a0', GetAddressBasedDispensaryData: '13461f73abf7268770dfd05fe7e10c523084b2bb916a929c08efe3d87531977b', ConsumerDispensaries: '0a5bfa6ca1d64ae47bcccb7c8077c87147cbc4e6982c17ceec97a2a4948b311b', GetAllCitiesByState: 'ae547a0466ace5a48f91e55bf6699eacd87e3a42841560f0c0eabed5a0a920e6', }; ``` **ALWAYS** use `Status: 'Active'` for FilteredProducts (not `null` or `'All'`). --- ## Scripts Reference ### Useful Scripts (in `src/scripts/`) | Script | Purpose | |--------|---------| | `run-discovery.ts` | Run Dutchie discovery | | `crawl-single-store.ts` | Test crawl a single store | | `test-dutchie-graphql.ts` | Test GraphQL queries | ### One-Off Scripts (probably don't need) | Script | Purpose | |--------|---------| | `harmonize-az-dispensaries.ts` | One-time data cleanup | | `bootstrap-stores-for-dispensaries.ts` | One-time migration | | `backfill-*.ts` | Historical backfill scripts | --- ## API Routes ### Active Routes (in `src/routes/`) | Route File | Mount Point | Purpose | |------------|-------------|---------| | `auth.ts` | `/api/auth` | Login/logout/session | | `stores.ts` | `/api/stores` | Store CRUD | | `dashboard.ts` | `/api/dashboard` | Dashboard stats | | `workers.ts` | `/api/workers` | Worker monitoring | | `pipeline.ts` | `/api/pipeline` | Crawl triggers | | `discovery.ts` | `/api/discovery` | Discovery management | | `analytics.ts` | `/api/analytics` | Analytics queries | | `wordpress.ts` | `/api/v1/wordpress` | WordPress plugin API | --- ## Documentation Files ### Current Docs (in `backend/docs/`) | Doc | Purpose | Currency | |-----|---------|----------| | `TASK_WORKFLOW_2024-12-10.md` | Task system architecture | CURRENT | | `WORKER_TASK_ARCHITECTURE.md` | Worker/task design | CURRENT | | `CRAWL_PIPELINE.md` | Crawl pipeline overview | CURRENT | | `ORGANIC_SCRAPING_GUIDE.md` | Browser-based scraping | CURRENT | | `CODEBASE_MAP.md` | This file | CURRENT | | `ANALYTICS_V2_EXAMPLES.md` | Analytics API examples | CURRENT | | `BRAND_INTELLIGENCE_API.md` | Brand API docs | CURRENT | ### Root Docs | Doc | Purpose | Currency | |-----|---------|----------| | `CLAUDE.md` | Claude instructions | **PRIMARY** | | `README.md` | Project overview | NEEDS UPDATE | --- ## Common Mistakes to Avoid 1. **Don't use `src/hydration/`** - It's an old approach that was superseded by the task system 2. **Don't use `src/dutchie-az/db/connection.ts`** - Use `src/db/pool.ts` instead 3. **Don't import `src/db/migrate.ts` at runtime** - It will crash. Only use for CLI migrations. 4. **Don't query `stores` table** - It's empty. Use `dispensaries`. 5. **Don't query `products` table** - It's empty. Use `store_products`. 6. **Don't use wrong GraphQL hash** - Always get hash from `GRAPHQL_HASHES` in client.ts 7. **Don't use `Status: null`** - It returns 0 products. Use `Status: 'Active'`. --- ## When in Doubt 1. Check if the file is imported in `src/index.ts` - if not, it may be deprecated 2. Check the last modified date - older files may be stale 3. Look for `DEPRECATED` comments in the code 4. Ask: "Is there a newer version of this in `src/tasks/` or `src/platforms/`?" 5. Read the relevant doc in `docs/` before modifying code