feat: SEO template library, discovery pipeline, and orchestrator enhancements
## SEO Template Library - Add complete template library with 7 page types (state, city, category, brand, product, search, regeneration) - Add Template Library tab in SEO Orchestrator with accordion-based editors - Add template preview, validation, and variable injection engine - Add API endpoints: /api/seo/templates, preview, validate, generate, regenerate ## Discovery Pipeline - Add promotion.ts for discovery location validation and promotion - Add discover-all-states.ts script for multi-state discovery - Add promotion log migration (067) - Enhance discovery routes and types ## Orchestrator & Admin - Add crawl_enabled filter to stores page - Add API permissions page - Add job queue management - Add price analytics routes - Add markets and intelligence routes - Enhance dashboard and worker monitoring ## Infrastructure - Add migrations for worker definitions, SEO settings, field alignment - Add canonical pipeline for scraper v2 - Update hydration and sync orchestrator - Enhance multi-state query service 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
106
CLAUDE.md
106
CLAUDE.md
@@ -489,12 +489,78 @@ import { saveImage, getImageUrl } from '../utils/storage-adapter';
|
||||
|
||||
## UI ANONYMIZATION RULES
|
||||
|
||||
- No vendor names in forward-facing URLs: use `/api/az/...`, `/az`, `/az-schedule`
|
||||
- No vendor names in forward-facing URLs
|
||||
- No "dutchie", "treez", "jane", "weedmaps", "leafly" visible in consumer UIs
|
||||
- Internal admin tools may show provider names for debugging
|
||||
|
||||
---
|
||||
|
||||
## DUTCHIE DISCOVERY PIPELINE (Added 2025-01)
|
||||
|
||||
### Overview
|
||||
Automated discovery of Dutchie-powered dispensaries across all US states.
|
||||
|
||||
### Flow
|
||||
```
|
||||
1. getAllCitiesByState GraphQL → Get all cities for a state
|
||||
2. ConsumerDispensaries GraphQL → Get stores for each city
|
||||
3. Upsert to dutchie_discovery_locations (keyed by platform_location_id)
|
||||
4. AUTO-VALIDATE: Check required fields
|
||||
5. AUTO-PROMOTE: Create/update dispensaries with crawl_enabled=true
|
||||
6. Log all actions to dutchie_promotion_log
|
||||
```
|
||||
|
||||
### Tables
|
||||
| Table | Purpose |
|
||||
|-------|---------|
|
||||
| `dutchie_discovery_cities` | Cities known to have dispensaries |
|
||||
| `dutchie_discovery_locations` | Raw discovered store data |
|
||||
| `dispensaries` | Canonical stores (promoted from discovery) |
|
||||
| `dutchie_promotion_log` | Audit trail for validation/promotion |
|
||||
|
||||
### Files
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `src/discovery/discovery-crawler.ts` | Main orchestrator |
|
||||
| `src/discovery/location-discovery.ts` | GraphQL fetching |
|
||||
| `src/discovery/promotion.ts` | Validation & promotion logic |
|
||||
| `src/scripts/run-discovery.ts` | CLI interface |
|
||||
| `migrations/067_promotion_log.sql` | Audit log table |
|
||||
|
||||
### GraphQL Hashes (in `src/platforms/dutchie/client.ts`)
|
||||
| Query | Hash |
|
||||
|-------|------|
|
||||
| `GetAllCitiesByState` | `ae547a0466ace5a48f91e55bf6699eacd87e3a42841560f0c0eabed5a0a920e6` |
|
||||
| `ConsumerDispensaries` | `0a5bfa6ca1d64ae47bcccb7c8077c87147cbc4e6982c17ceec97a2a4948b311b` |
|
||||
|
||||
### Usage
|
||||
```bash
|
||||
# Discover all stores in a state
|
||||
npx tsx src/scripts/run-discovery.ts discover:state AZ
|
||||
npx tsx src/scripts/run-discovery.ts discover:state CA
|
||||
|
||||
# Check stats
|
||||
npx tsx src/scripts/run-discovery.ts stats
|
||||
```
|
||||
|
||||
### Validation Rules
|
||||
A discovery location must have:
|
||||
- `platform_location_id` (MongoDB ObjectId, 24 hex chars)
|
||||
- `name`
|
||||
- `city`
|
||||
- `state_code`
|
||||
- `platform_menu_url`
|
||||
|
||||
Invalid records are marked `status='rejected'` with errors logged.
|
||||
|
||||
### Key Design Decisions
|
||||
- `platform_location_id` MUST be MongoDB ObjectId (not slug)
|
||||
- Old geo-based discovery stored slugs → deleted as garbage data
|
||||
- Rate limit: 2 seconds between city requests to avoid API throttling
|
||||
- Promotion is idempotent via `ON CONFLICT (platform_dispensary_id)`
|
||||
|
||||
---
|
||||
|
||||
## FUTURE TODO / PENDING FEATURES
|
||||
|
||||
- [ ] Orchestrator observability dashboard
|
||||
@@ -639,16 +705,19 @@ export default defineConfig({
|
||||
|
||||
- **DB**: Use the single CannaiQ database via `CANNAIQ_DB_*` env vars. No hardcoded names.
|
||||
- **Images**: No MinIO. Save to local /images/products/<disp>/<prod>-<hash>.webp (and brands); preserve original URL; serve via backend static.
|
||||
- **Dutchie GraphQL**: Endpoint https://dutchie.com/api-3/graphql. Variables must use productsFilter.dispensaryId (platform_dispensary_id). Mode A: Status="Active". Mode B: Status=null/activeOnly:false.
|
||||
- **Dutchie GraphQL**: Endpoint https://dutchie.com/api-3/graphql. Variables must use productsFilter.dispensaryId (platform_dispensary_id). **CRITICAL: Use `Status: 'Active'`, NOT `null`** (null returns 0 products).
|
||||
- **cName/slug**: Derive cName from each store's menu_url (/embedded-menu/<cName> or /dispensary/<slug>). No hardcoded defaults.
|
||||
- **Dual-mode always**: useBothModes:true to get pricing (Mode A) + full coverage (Mode B).
|
||||
- **Batch DB writes**: Chunk products/snapshots/missing (100–200) to avoid OOM.
|
||||
- **OOS/missing**: Include inactive/OOS in Mode B. Union A+B, dedupe by external_product_id+dispensary_id.
|
||||
- **API/Frontend**: Use /api/az/... endpoints (stores/products/brands/categories/summary/dashboard).
|
||||
- **API/Frontend**: Use `/api/stores`, `/api/products`, `/api/workers`, `/api/pipeline` endpoints.
|
||||
- **Scheduling**: Crawl only menu_type='dutchie' AND platform_dispensary_id IS NOT NULL. 4-hour crawl with jitter.
|
||||
- **Monitor**: /scraper-monitor (and /az-schedule) should show active/recent jobs from job_run_logs/crawl_jobs.
|
||||
- **THC/CBD values**: Clamp to ≤100 - some products report milligrams as percentages.
|
||||
- **Column names**: Use `name_raw`, `brand_name_raw`, `category_raw`, `subcategory_raw` (NOT `name`, `brand_name`, etc.)
|
||||
|
||||
- **Monitor**: `/api/workers` shows active/recent jobs from job queue.
|
||||
- **No slug guessing**: Never use defaults. Always derive per store from menu_url and resolve platform IDs per location.
|
||||
|
||||
**📖 Full Documentation: See `docs/DUTCHIE_CRAWL_WORKFLOW.md` for complete pipeline documentation.**
|
||||
|
||||
---
|
||||
|
||||
### Detailed Rules
|
||||
@@ -691,7 +760,7 @@ export default defineConfig({
|
||||
- Use dutchie GraphQL pipeline only for `menu_type='dutchie'`.
|
||||
|
||||
6) **Frontend**
|
||||
- Forward-facing URLs: `/api/az`, `/az`, `/az-schedule`; no vendor names.
|
||||
- Forward-facing URLs should not contain vendor names.
|
||||
- `/scraper-schedule`: add filters/search, keep as master view for all schedules; reflect platform ID/menu_type status and controls.
|
||||
|
||||
7) **No slug guessing**
|
||||
@@ -740,18 +809,21 @@ export default defineConfig({
|
||||
|
||||
16) **API Route Semantics**
|
||||
|
||||
**Route Groups:**
|
||||
- `/api/admin/...` = Admin/operator actions (crawl triggers, health checks)
|
||||
- `/api/az/...` = Arizona data slice (stores, products, metrics)
|
||||
**Route Groups (as registered in `src/index.ts`):**
|
||||
- `/api/stores` = Store/dispensary CRUD and listing
|
||||
- `/api/products` = Product listing and details
|
||||
- `/api/workers` = Job queue monitoring (replaces legacy `/api/dutchie-az/...`)
|
||||
- `/api/pipeline` = Crawl pipeline triggers
|
||||
- `/api/admin/orchestrator` = Orchestrator admin actions
|
||||
- `/api/discovery` = Platform discovery (Dutchie, etc.)
|
||||
- `/api/v1/...` = Public API for external consumers (WordPress, etc.)
|
||||
|
||||
**Crawl Trigger (CANONICAL):**
|
||||
```
|
||||
POST /api/admin/crawl/:dispensaryId
|
||||
```
|
||||
**Crawl Trigger:**
|
||||
Check `/api/pipeline` or `/api/admin/orchestrator` routes for crawl triggers.
|
||||
The legacy `POST /api/admin/crawl/:dispensaryId` does NOT exist.
|
||||
|
||||
17) **Monitoring and logging**
|
||||
- /scraper-monitor (and /az-schedule) should show active/recent jobs from job_run_logs/crawl_jobs
|
||||
- `/api/workers` shows active/recent jobs from job queue
|
||||
- Auto-refresh every 30 seconds
|
||||
- System Logs page should show real log data, not just startup messages
|
||||
|
||||
@@ -783,8 +855,8 @@ export default defineConfig({
|
||||
- **Job schedules** (managed in `job_schedules` table):
|
||||
- `dutchie_az_menu_detection`: Runs daily with 60-min jitter
|
||||
- `dutchie_az_product_crawl`: Runs every 4 hours with 30-min jitter
|
||||
- **Trigger schedules**: `curl -X POST /api/az/admin/schedules/{id}/trigger`
|
||||
- **Check schedule status**: `curl /api/az/admin/schedules`
|
||||
- **Monitor jobs**: `GET /api/workers`
|
||||
- **Trigger crawls**: Check `/api/pipeline` routes
|
||||
|
||||
21) **Frontend Architecture - AVOID OVER-ENGINEERING**
|
||||
|
||||
|
||||
Reference in New Issue
Block a user