feat: Add v2 architecture with multi-state support and orchestrator services
Major additions: - Multi-state expansion: states table, StateSelector, NationalDashboard, StateHeatmap, CrossStateCompare - Orchestrator services: trace service, error taxonomy, retry manager, proxy rotator - Discovery system: dutchie discovery service, geo validation, city seeding scripts - Analytics infrastructure: analytics v2 routes, brand/pricing/stores intelligence pages - Local development: setup-local.sh starts all 5 services (postgres, backend, cannaiq, findadispo, findagram) - Migrations 037-056: crawler profiles, states, analytics indexes, worker metadata Frontend pages added: - Discovery, ChainsDashboard, IntelligenceBrands, IntelligencePricing, IntelligenceStores - StateHeatmap, CrossStateCompare, SyncInfoPanel Components added: - StateSelector, OrchestratorTraceModal, WorkflowStepper 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
324
docs/legacy_mapping.md
Normal file
324
docs/legacy_mapping.md
Normal file
@@ -0,0 +1,324 @@
|
||||
# Legacy Data Mapping: dutchie_legacy → CannaiQ
|
||||
|
||||
## Overview
|
||||
|
||||
This document describes the ETL mapping from the legacy `dutchie_legacy` database
|
||||
to the canonical CannaiQ schema. All imports are **INSERT-ONLY** with no deletions
|
||||
or overwrites of existing data.
|
||||
|
||||
## Database Locations
|
||||
|
||||
| Database | Host | Purpose |
|
||||
|----------|------|---------|
|
||||
| `cannaiq` | localhost:54320 | Main CannaiQ application schema |
|
||||
| `dutchie_legacy` | localhost:54320 | Imported historical data from old dutchie_menus |
|
||||
|
||||
## Schema Comparison
|
||||
|
||||
### Legacy Tables (dutchie_legacy)
|
||||
|
||||
| Table | Row Purpose | Key Columns |
|
||||
|-------|-------------|-------------|
|
||||
| `dispensaries` | Store locations | id, name, slug, city, state, menu_url, menu_provider, product_provider |
|
||||
| `products` | Legacy product records | id, dispensary_id, dutchie_product_id, name, brand, price, thc_percentage |
|
||||
| `dutchie_products` | Dutchie-specific products | id, dispensary_id, external_product_id, name, brand_name, type, stock_status |
|
||||
| `dutchie_product_snapshots` | Historical price/stock snapshots | dutchie_product_id, crawled_at, rec_min_price_cents, stock_status |
|
||||
| `brands` | Brand entities | id, store_id, name, dispensary_id |
|
||||
| `categories` | Product categories | id, store_id, name, slug |
|
||||
| `price_history` | Legacy price tracking | product_id, price, recorded_at |
|
||||
| `specials` | Deals/promotions | id, dispensary_id, name, discount_type |
|
||||
|
||||
### CannaiQ Canonical Tables
|
||||
|
||||
| Table | Purpose | Key Columns |
|
||||
|-------|---------|-------------|
|
||||
| `dispensaries` | Store locations | id, name, slug, city, state, platform_dispensary_id |
|
||||
| `dutchie_products` | Canonical products | id, dispensary_id, external_product_id, name, brand_name, stock_status |
|
||||
| `dutchie_product_snapshots` | Historical snapshots | dutchie_product_id, crawled_at, rec_min_price_cents |
|
||||
| `brands` (view: v_brands) | Derived from products | brand_name, brand_id, product_count |
|
||||
| `categories` (view: v_categories) | Derived from products | type, subcategory, product_count |
|
||||
|
||||
---
|
||||
|
||||
## Mapping Plan
|
||||
|
||||
### 1. Dispensaries
|
||||
|
||||
**Source:** `dutchie_legacy.dispensaries`
|
||||
**Target:** `cannaiq.dispensaries`
|
||||
|
||||
| Legacy Column | Canonical Column | Notes |
|
||||
|---------------|------------------|-------|
|
||||
| id | - | Generate new ID, store legacy_id |
|
||||
| name | name | Direct map |
|
||||
| slug | slug | Direct map |
|
||||
| city | city | Direct map |
|
||||
| state | state | Direct map |
|
||||
| address | address | Direct map |
|
||||
| zip | postal_code | Rename |
|
||||
| latitude | latitude | Direct map |
|
||||
| longitude | longitude | Direct map |
|
||||
| menu_url | menu_url | Direct map |
|
||||
| menu_provider | - | Store in raw_metadata |
|
||||
| product_provider | - | Store in raw_metadata |
|
||||
| website | website | Direct map |
|
||||
| dba_name | - | Store in raw_metadata |
|
||||
| - | platform | Set to 'dutchie' |
|
||||
| - | legacy_id | New column: original ID from legacy |
|
||||
|
||||
**Conflict Resolution:**
|
||||
- ON CONFLICT (slug, city, state) DO NOTHING
|
||||
- Match on slug+city+state combination
|
||||
- Never overwrite existing dispensary data
|
||||
|
||||
**Staging Table:** `dispensaries_from_legacy`
|
||||
```sql
|
||||
CREATE TABLE IF NOT EXISTS dispensaries_from_legacy (
|
||||
id SERIAL PRIMARY KEY,
|
||||
legacy_id INTEGER NOT NULL,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
slug VARCHAR(255) NOT NULL,
|
||||
city VARCHAR(100) NOT NULL,
|
||||
state VARCHAR(10) NOT NULL,
|
||||
postal_code VARCHAR(20),
|
||||
address TEXT,
|
||||
latitude DECIMAL(10,7),
|
||||
longitude DECIMAL(10,7),
|
||||
menu_url TEXT,
|
||||
website TEXT,
|
||||
legacy_metadata JSONB, -- All other legacy fields
|
||||
imported_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
UNIQUE(legacy_id)
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. Products (Legacy products table)
|
||||
|
||||
**Source:** `dutchie_legacy.products`
|
||||
**Target:** `cannaiq.products_from_legacy` (new staging table)
|
||||
|
||||
| Legacy Column | Canonical Column | Notes |
|
||||
|---------------|------------------|-------|
|
||||
| id | legacy_product_id | Original ID |
|
||||
| dispensary_id | legacy_dispensary_id | FK to legacy dispensary |
|
||||
| dutchie_product_id | external_product_id | Dutchie's _id |
|
||||
| name | name | Direct map |
|
||||
| brand | brand_name | Direct map |
|
||||
| price | price_cents | Multiply by 100 |
|
||||
| original_price | original_price_cents | Multiply by 100 |
|
||||
| thc_percentage | thc | Direct map |
|
||||
| cbd_percentage | cbd | Direct map |
|
||||
| strain_type | strain_type | Direct map |
|
||||
| weight | weight | Direct map |
|
||||
| image_url | primary_image_url | Direct map |
|
||||
| in_stock | stock_status | Map: true→'in_stock', false→'out_of_stock' |
|
||||
| first_seen_at | first_seen_at | Direct map |
|
||||
| last_seen_at | last_seen_at | Direct map |
|
||||
| raw_data | latest_raw_payload | Direct map |
|
||||
|
||||
**Staging Table:** `products_from_legacy`
|
||||
```sql
|
||||
CREATE TABLE IF NOT EXISTS products_from_legacy (
|
||||
id SERIAL PRIMARY KEY,
|
||||
legacy_product_id INTEGER NOT NULL,
|
||||
legacy_dispensary_id INTEGER,
|
||||
external_product_id VARCHAR(255),
|
||||
name VARCHAR(500) NOT NULL,
|
||||
brand_name VARCHAR(255),
|
||||
type VARCHAR(100),
|
||||
subcategory VARCHAR(100),
|
||||
strain_type VARCHAR(50),
|
||||
thc DECIMAL(10,4),
|
||||
cbd DECIMAL(10,4),
|
||||
price_cents INTEGER,
|
||||
original_price_cents INTEGER,
|
||||
stock_status VARCHAR(20),
|
||||
weight VARCHAR(100),
|
||||
primary_image_url TEXT,
|
||||
first_seen_at TIMESTAMPTZ,
|
||||
last_seen_at TIMESTAMPTZ,
|
||||
legacy_raw_payload JSONB,
|
||||
imported_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
UNIQUE(legacy_product_id)
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. Dutchie Products
|
||||
|
||||
**Source:** `dutchie_legacy.dutchie_products`
|
||||
**Target:** `cannaiq.dutchie_products`
|
||||
|
||||
These tables have nearly identical schemas. The mapping is direct:
|
||||
|
||||
| Legacy Column | Canonical Column | Notes |
|
||||
|---------------|------------------|-------|
|
||||
| id | - | Generate new, store as legacy_dutchie_product_id |
|
||||
| dispensary_id | dispensary_id | Map via dispensary slug lookup |
|
||||
| external_product_id | external_product_id | Direct (Dutchie _id) |
|
||||
| platform_dispensary_id | platform_dispensary_id | Direct |
|
||||
| name | name | Direct |
|
||||
| brand_name | brand_name | Direct |
|
||||
| type | type | Direct |
|
||||
| subcategory | subcategory | Direct |
|
||||
| strain_type | strain_type | Direct |
|
||||
| thc/thc_content | thc/thc_content | Direct |
|
||||
| cbd/cbd_content | cbd/cbd_content | Direct |
|
||||
| stock_status | stock_status | Direct |
|
||||
| images | images | Direct (JSONB) |
|
||||
| latest_raw_payload | latest_raw_payload | Direct |
|
||||
|
||||
**Conflict Resolution:**
|
||||
```sql
|
||||
ON CONFLICT (dispensary_id, external_product_id) DO NOTHING
|
||||
```
|
||||
- Never overwrite existing products
|
||||
- Skip duplicates silently
|
||||
|
||||
---
|
||||
|
||||
### 4. Dutchie Product Snapshots
|
||||
|
||||
**Source:** `dutchie_legacy.dutchie_product_snapshots`
|
||||
**Target:** `cannaiq.dutchie_product_snapshots`
|
||||
|
||||
| Legacy Column | Canonical Column | Notes |
|
||||
|---------------|------------------|-------|
|
||||
| id | - | Generate new |
|
||||
| dutchie_product_id | dutchie_product_id | Map via product lookup |
|
||||
| dispensary_id | dispensary_id | Map via dispensary lookup |
|
||||
| crawled_at | crawled_at | Direct |
|
||||
| rec_min_price_cents | rec_min_price_cents | Direct |
|
||||
| rec_max_price_cents | rec_max_price_cents | Direct |
|
||||
| stock_status | stock_status | Direct |
|
||||
| options | options | Direct (JSONB) |
|
||||
| raw_payload | raw_payload | Direct (JSONB) |
|
||||
|
||||
**Conflict Resolution:**
|
||||
```sql
|
||||
-- No unique constraint on snapshots - all are historical records
|
||||
-- Just INSERT, no conflict handling needed
|
||||
INSERT INTO dutchie_product_snapshots (...) VALUES (...)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 5. Price History
|
||||
|
||||
**Source:** `dutchie_legacy.price_history`
|
||||
**Target:** `cannaiq.price_history_legacy` (new staging table)
|
||||
|
||||
```sql
|
||||
CREATE TABLE IF NOT EXISTS price_history_legacy (
|
||||
id SERIAL PRIMARY KEY,
|
||||
legacy_product_id INTEGER NOT NULL,
|
||||
price_cents INTEGER,
|
||||
recorded_at TIMESTAMPTZ,
|
||||
imported_at TIMESTAMPTZ DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ETL Process
|
||||
|
||||
### Phase 1: Staging Tables (INSERT-ONLY)
|
||||
|
||||
1. Create staging tables with `_from_legacy` or `_legacy` suffix
|
||||
2. Read from `dutchie_legacy.*` tables in batches
|
||||
3. INSERT into staging tables with ON CONFLICT DO NOTHING
|
||||
4. Log counts: read, inserted, skipped
|
||||
|
||||
### Phase 2: ID Mapping
|
||||
|
||||
1. Build ID mapping tables:
|
||||
- `legacy_dispensary_id` → `canonical_dispensary_id`
|
||||
- `legacy_product_id` → `canonical_product_id`
|
||||
2. Match on unique keys (slug+city+state for dispensaries, external_product_id for products)
|
||||
|
||||
### Phase 3: Canonical Merge (Optional, User-Approved)
|
||||
|
||||
Only if explicitly requested:
|
||||
1. INSERT new records into canonical tables
|
||||
2. Never UPDATE existing records
|
||||
3. Never DELETE any records
|
||||
|
||||
---
|
||||
|
||||
## Safety Rules
|
||||
|
||||
1. **INSERT-ONLY**: No UPDATE, no DELETE, no TRUNCATE
|
||||
2. **ON CONFLICT DO NOTHING**: Skip duplicates, never overwrite
|
||||
3. **Batch Processing**: 500-1000 rows per batch to avoid memory issues
|
||||
4. **Manual Invocation Only**: ETL script requires explicit user execution
|
||||
5. **Logging**: Record all operations with counts and timestamps
|
||||
6. **Dry Run Mode**: Support `--dry-run` flag to preview without writes
|
||||
|
||||
---
|
||||
|
||||
## Validation Queries
|
||||
|
||||
After import, verify with:
|
||||
|
||||
```sql
|
||||
-- Count imported dispensaries
|
||||
SELECT COUNT(*) FROM dispensaries_from_legacy;
|
||||
|
||||
-- Count imported products
|
||||
SELECT COUNT(*) FROM products_from_legacy;
|
||||
|
||||
-- Check for duplicates that were skipped
|
||||
SELECT
|
||||
(SELECT COUNT(*) FROM dutchie_legacy.dispensaries) as legacy_count,
|
||||
(SELECT COUNT(*) FROM dispensaries_from_legacy) as imported_count;
|
||||
|
||||
-- Verify no data loss
|
||||
SELECT
|
||||
l.id as legacy_id,
|
||||
l.name as legacy_name,
|
||||
c.id as canonical_id
|
||||
FROM dutchie_legacy.dispensaries l
|
||||
LEFT JOIN dispensaries c ON c.slug = l.slug AND c.city = l.city AND c.state = l.state
|
||||
WHERE c.id IS NULL
|
||||
LIMIT 10;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Invocation
|
||||
|
||||
```bash
|
||||
# From backend directory
|
||||
npx tsx src/scripts/etl/legacy-import.ts
|
||||
|
||||
# With dry-run
|
||||
npx tsx src/scripts/etl/legacy-import.ts --dry-run
|
||||
|
||||
# Import specific tables only
|
||||
npx tsx src/scripts/etl/legacy-import.ts --tables=dispensaries,products
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Environment Variables
|
||||
|
||||
The ETL script expects these environment variables (user configures):
|
||||
|
||||
```bash
|
||||
# Connection to cannaiq-postgres (same host, different databases)
|
||||
CANNAIQ_DB_HOST=localhost
|
||||
CANNAIQ_DB_PORT=54320
|
||||
CANNAIQ_DB_USER=cannaiq
|
||||
CANNAIQ_DB_PASSWORD=<password>
|
||||
CANNAIQ_DB_NAME=cannaiq
|
||||
|
||||
# Legacy database (same host, different database)
|
||||
LEGACY_DB_HOST=localhost
|
||||
LEGACY_DB_PORT=54320
|
||||
LEGACY_DB_USER=dutchie
|
||||
LEGACY_DB_PASSWORD=<password>
|
||||
LEGACY_DB_NAME=dutchie_legacy
|
||||
```
|
||||
345
docs/multi-state.md
Normal file
345
docs/multi-state.md
Normal file
@@ -0,0 +1,345 @@
|
||||
# Multi-State Support
|
||||
|
||||
## Overview
|
||||
|
||||
Phase 4 implements full multi-state support for CannaiQ, transforming it from an Arizona-only platform to a national cannabis intelligence system. This document covers schema updates, API structure, frontend usage, and operational guidelines.
|
||||
|
||||
## Schema Updates
|
||||
|
||||
### Core Tables Modified
|
||||
|
||||
#### 1. `dispensaries` table
|
||||
Already has `state` column:
|
||||
```sql
|
||||
state CHAR(2) DEFAULT 'AZ' -- State code (AZ, CA, CO, etc.)
|
||||
state_id INTEGER REFERENCES states(id) -- FK to canonical states table
|
||||
```
|
||||
|
||||
#### 2. `raw_payloads` table (Migration 047)
|
||||
Added state column for query optimization:
|
||||
```sql
|
||||
state CHAR(2) -- Denormalized from dispensary for fast filtering
|
||||
```
|
||||
|
||||
#### 3. `states` table
|
||||
Canonical reference for all US states:
|
||||
```sql
|
||||
CREATE TABLE states (
|
||||
id SERIAL PRIMARY KEY,
|
||||
code VARCHAR(2) NOT NULL UNIQUE, -- 'AZ', 'CA', etc.
|
||||
name VARCHAR(100) NOT NULL, -- 'Arizona', 'California', etc.
|
||||
created_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
|
||||
### New Indexes (Migration 047)
|
||||
|
||||
```sql
|
||||
-- State-based payload filtering
|
||||
CREATE INDEX idx_raw_payloads_state ON raw_payloads(state);
|
||||
CREATE INDEX idx_raw_payloads_state_unprocessed ON raw_payloads(state, processed) WHERE processed = FALSE;
|
||||
|
||||
-- Dispensary state queries
|
||||
CREATE INDEX idx_dispensaries_state_menu_type ON dispensaries(state, menu_type);
|
||||
CREATE INDEX idx_dispensaries_state_crawl_status ON dispensaries(state, crawl_status);
|
||||
CREATE INDEX idx_dispensaries_state_active ON dispensaries(state) WHERE crawl_status != 'disabled';
|
||||
```
|
||||
|
||||
### Materialized Views
|
||||
|
||||
#### `mv_state_metrics`
|
||||
Pre-aggregated state-level metrics for fast dashboard queries:
|
||||
```sql
|
||||
CREATE MATERIALIZED VIEW mv_state_metrics AS
|
||||
SELECT
|
||||
d.state,
|
||||
s.name AS state_name,
|
||||
COUNT(DISTINCT d.id) AS store_count,
|
||||
COUNT(DISTINCT sp.id) AS total_products,
|
||||
COUNT(DISTINCT sp.brand_id) AS unique_brands,
|
||||
AVG(sp.price_rec) AS avg_price_rec,
|
||||
-- ... more metrics
|
||||
FROM dispensaries d
|
||||
LEFT JOIN states s ON d.state = s.code
|
||||
LEFT JOIN store_products sp ON d.id = sp.dispensary_id
|
||||
GROUP BY d.state, s.name;
|
||||
```
|
||||
|
||||
Refresh with:
|
||||
```sql
|
||||
SELECT refresh_state_metrics();
|
||||
```
|
||||
|
||||
### Views
|
||||
|
||||
#### `v_brand_state_presence`
|
||||
Brand presence and metrics per state:
|
||||
```sql
|
||||
SELECT brand_id, brand_name, state, store_count, product_count, avg_price
|
||||
FROM v_brand_state_presence
|
||||
WHERE state = 'AZ';
|
||||
```
|
||||
|
||||
#### `v_category_state_distribution`
|
||||
Category distribution by state:
|
||||
```sql
|
||||
SELECT state, category, product_count, store_count, avg_price
|
||||
FROM v_category_state_distribution
|
||||
WHERE state = 'CA';
|
||||
```
|
||||
|
||||
### Functions
|
||||
|
||||
#### `fn_national_price_comparison(category, brand_id)`
|
||||
Compare prices across all states:
|
||||
```sql
|
||||
SELECT * FROM fn_national_price_comparison('Flower', NULL);
|
||||
```
|
||||
|
||||
#### `fn_brand_state_penetration(brand_id)`
|
||||
Get brand penetration across states:
|
||||
```sql
|
||||
SELECT * FROM fn_brand_state_penetration(123);
|
||||
```
|
||||
|
||||
## API Structure
|
||||
|
||||
### State List Endpoints
|
||||
|
||||
```
|
||||
GET /api/states # All configured states
|
||||
GET /api/states?active=true # Only states with dispensary data
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"data": {
|
||||
"states": [
|
||||
{ "code": "AZ", "name": "Arizona" },
|
||||
{ "code": "CA", "name": "California" }
|
||||
],
|
||||
"count": 2
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### State-Specific Endpoints
|
||||
|
||||
```
|
||||
GET /api/state/:state/summary # Full state summary with metrics
|
||||
GET /api/state/:state/brands # Brands in state (paginated)
|
||||
GET /api/state/:state/categories # Categories in state (paginated)
|
||||
GET /api/state/:state/stores # Stores in state (paginated)
|
||||
GET /api/state/:state/analytics/prices # Price distribution
|
||||
```
|
||||
|
||||
Query parameters:
|
||||
- `limit` - Results per page (default: 50)
|
||||
- `offset` - Pagination offset
|
||||
- `sortBy` - Sort field (e.g., `productCount`, `avgPrice`)
|
||||
- `sortDir` - Sort direction (`asc` or `desc`)
|
||||
- `includeInactive` - Include disabled stores
|
||||
|
||||
### National Analytics Endpoints
|
||||
|
||||
```
|
||||
GET /api/analytics/national/summary # National aggregate metrics
|
||||
GET /api/analytics/national/prices # Price comparison across states
|
||||
GET /api/analytics/national/heatmap # State heatmap data
|
||||
GET /api/analytics/national/metrics # All state metrics
|
||||
```
|
||||
|
||||
Heatmap metrics:
|
||||
- `stores` - Store count per state
|
||||
- `products` - Product count per state
|
||||
- `brands` - Brand count per state
|
||||
- `avgPrice` - Average price per state
|
||||
- `penetration` - Brand penetration (requires `brandId`)
|
||||
|
||||
### Cross-State Comparison Endpoints
|
||||
|
||||
```
|
||||
GET /api/analytics/compare/brand/:brandId # Compare brand across states
|
||||
GET /api/analytics/compare/category/:category # Compare category across states
|
||||
GET /api/analytics/brand/:brandId/penetration # Brand penetration by state
|
||||
GET /api/analytics/brand/:brandId/trend # Historical penetration trend
|
||||
```
|
||||
|
||||
Query parameters:
|
||||
- `states` - Comma-separated state codes to include (optional)
|
||||
- `days` - Days of history for trends (default: 30)
|
||||
|
||||
### Admin Endpoints
|
||||
|
||||
```
|
||||
POST /api/admin/states/refresh-metrics # Refresh materialized views
|
||||
```
|
||||
|
||||
## Frontend Usage
|
||||
|
||||
### State Selector
|
||||
|
||||
The global state selector is in the sidebar and persists selection via localStorage:
|
||||
|
||||
```tsx
|
||||
import { useStateStore } from '../store/stateStore';
|
||||
|
||||
function MyComponent() {
|
||||
const { selectedState, setSelectedState, isNationalView } = useStateStore();
|
||||
|
||||
// null = All States / National view
|
||||
// 'AZ' = Arizona only
|
||||
|
||||
if (isNationalView()) {
|
||||
// Show national data
|
||||
} else {
|
||||
// Filter by selectedState
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### State Badge Component
|
||||
|
||||
Show current state selection:
|
||||
```tsx
|
||||
import { StateBadge } from '../components/StateSelector';
|
||||
|
||||
<StateBadge /> // Shows "National" or state name
|
||||
```
|
||||
|
||||
### API Calls with State Filter
|
||||
|
||||
```tsx
|
||||
import { api } from '../lib/api';
|
||||
import { useStateStore } from '../store/stateStore';
|
||||
|
||||
function useStateData() {
|
||||
const { selectedState } = useStateStore();
|
||||
|
||||
useEffect(() => {
|
||||
if (selectedState) {
|
||||
// State-specific data
|
||||
api.get(`/state/${selectedState}/summary`);
|
||||
} else {
|
||||
// National data
|
||||
api.get('/analytics/national/summary');
|
||||
}
|
||||
}, [selectedState]);
|
||||
}
|
||||
```
|
||||
|
||||
### Navigation Routes
|
||||
|
||||
| Route | Component | Description |
|
||||
|-------|-----------|-------------|
|
||||
| `/national` | NationalDashboard | National overview with all states |
|
||||
| `/national/heatmap` | StateHeatmap | Interactive state heatmap |
|
||||
| `/national/compare` | CrossStateCompare | Brand/category cross-state comparison |
|
||||
|
||||
## Ingestion Rules
|
||||
|
||||
### State Assignment
|
||||
|
||||
Every raw payload MUST include state:
|
||||
1. State is looked up from `dispensaries.state` during payload storage
|
||||
2. Stored on `raw_payloads.state` for query optimization
|
||||
3. Inherited by all normalized products/snapshots via `dispensary_id`
|
||||
|
||||
### Hydration Pipeline
|
||||
|
||||
The hydration worker supports state filtering:
|
||||
```typescript
|
||||
// Process only AZ payloads
|
||||
await getUnprocessedPayloads(pool, { state: 'AZ' });
|
||||
|
||||
// Process multiple states
|
||||
await getUnprocessedPayloads(pool, { states: ['AZ', 'CA', 'NV'] });
|
||||
```
|
||||
|
||||
### Data Isolation
|
||||
|
||||
Critical rules:
|
||||
- **No cross-state contamination** - Product IDs are unique per (dispensary_id, provider_product_id)
|
||||
- **No SKU merging** - Same SKU in AZ and CA are separate products
|
||||
- **No store merging** - Same store name in different states are separate records
|
||||
- Every dispensary maps to exactly ONE state
|
||||
|
||||
## Constraints & Best Practices
|
||||
|
||||
### Query Performance
|
||||
|
||||
1. Use `mv_state_metrics` for dashboard queries (refreshed hourly)
|
||||
2. Use indexed views for brand/category queries
|
||||
3. Filter by state early in queries to leverage indexes
|
||||
4. For cross-state queries, use the dedicated comparison functions
|
||||
|
||||
### Cache Strategy
|
||||
|
||||
API endpoints should be cached with Redis:
|
||||
```typescript
|
||||
// Cache key pattern
|
||||
`state:${state}:summary` // State summary - 5 min TTL
|
||||
`national:summary` // National summary - 5 min TTL
|
||||
`heatmap:${metric}` // Heatmap data - 5 min TTL
|
||||
```
|
||||
|
||||
### Adding New States
|
||||
|
||||
1. Add state to `states` table (if not already present)
|
||||
2. Import dispensary data with correct `state` code
|
||||
3. Run menu detection for new dispensaries
|
||||
4. Crawl dispensaries with resolved platform IDs
|
||||
5. Refresh materialized views: `SELECT refresh_state_metrics()`
|
||||
|
||||
## Migration Guide
|
||||
|
||||
### From Arizona-Only to Multi-State
|
||||
|
||||
1. Apply migration 047:
|
||||
```bash
|
||||
DATABASE_URL="..." npm run migrate
|
||||
```
|
||||
|
||||
2. Existing AZ data requires no changes (already has `state='AZ'`)
|
||||
|
||||
3. New states are added via:
|
||||
- Manual dispensary import
|
||||
- Menu detection crawl
|
||||
- Platform ID resolution
|
||||
|
||||
4. Frontend automatically shows state selector after update
|
||||
|
||||
### Rollback
|
||||
|
||||
Migration 047 is additive - no destructive changes:
|
||||
- New columns have defaults
|
||||
- Views can be dropped without data loss
|
||||
- Indexes can be dropped for performance tuning
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Key Metrics to Watch
|
||||
|
||||
1. **Store count by state** - `SELECT state, COUNT(*) FROM dispensaries GROUP BY state`
|
||||
2. **Product coverage** - `SELECT state, COUNT(DISTINCT sp.id) FROM store_products...`
|
||||
3. **Crawl health by state** - Check `crawl_runs` by dispensary state
|
||||
4. **Materialized view freshness** - `SELECT refreshed_at FROM mv_state_metrics`
|
||||
|
||||
### Alerts
|
||||
|
||||
Set up alerts for:
|
||||
- Materialized view not refreshed in 2+ hours
|
||||
- State with 0 products after having products
|
||||
- Cross-state data appearing (should never happen)
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
Planned for future phases:
|
||||
1. **Redis caching** for all state endpoints
|
||||
2. **Real-time refresh** of materialized views
|
||||
3. **Geographic heatmaps** with actual US map visualization
|
||||
4. **State-specific pricing rules** (tax rates, etc.)
|
||||
5. **Multi-state brand portfolio tracking**
|
||||
162
docs/platform-slug-mapping.md
Normal file
162
docs/platform-slug-mapping.md
Normal file
@@ -0,0 +1,162 @@
|
||||
# Platform Slug Mapping
|
||||
|
||||
## Overview
|
||||
|
||||
To avoid trademark issues in public-facing API URLs, CannaiQ uses neutral two-letter slugs instead of vendor names in route paths.
|
||||
|
||||
**Important**: The actual `platform` value stored in the database remains the full name (e.g., `'dutchie'`). Only the URL paths use neutral slugs.
|
||||
|
||||
## Platform Slug Reference
|
||||
|
||||
| Slug | Platform | DB Value | Status |
|
||||
|------|----------|----------|--------|
|
||||
| `dt` | Dutchie | `'dutchie'` | Active |
|
||||
| `jn` | Jane | `'jane'` | Future |
|
||||
| `wm` | Weedmaps | `'weedmaps'` | Future |
|
||||
| `lf` | Leafly | `'leafly'` | Future |
|
||||
| `tz` | Treez | `'treez'` | Future |
|
||||
| `bl` | Blaze | `'blaze'` | Future |
|
||||
| `fl` | Flowhub | `'flowhub'` | Future |
|
||||
|
||||
## API Route Patterns
|
||||
|
||||
### Discovery Routes
|
||||
|
||||
```
|
||||
/api/discovery/platforms/:platformSlug/locations
|
||||
/api/discovery/platforms/:platformSlug/locations/:id
|
||||
/api/discovery/platforms/:platformSlug/locations/:id/verify-create
|
||||
/api/discovery/platforms/:platformSlug/locations/:id/verify-link
|
||||
/api/discovery/platforms/:platformSlug/locations/:id/reject
|
||||
/api/discovery/platforms/:platformSlug/locations/:id/unreject
|
||||
/api/discovery/platforms/:platformSlug/locations/:id/match-candidates
|
||||
/api/discovery/platforms/:platformSlug/cities
|
||||
/api/discovery/platforms/:platformSlug/summary
|
||||
```
|
||||
|
||||
### Orchestrator Routes
|
||||
|
||||
```
|
||||
/api/orchestrator/platforms/:platformSlug/promote/:id
|
||||
```
|
||||
|
||||
## Example Usage
|
||||
|
||||
### Fetch Discovered Locations (Dutchie)
|
||||
|
||||
```bash
|
||||
# Using neutral slug 'dt' instead of 'dutchie'
|
||||
curl "https://api.cannaiq.co/api/discovery/platforms/dt/locations?status=discovered&state_code=AZ"
|
||||
```
|
||||
|
||||
### Verify and Create Dispensary
|
||||
|
||||
```bash
|
||||
curl -X POST "https://api.cannaiq.co/api/discovery/platforms/dt/locations/123/verify-create" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"verifiedBy": "admin"}'
|
||||
```
|
||||
|
||||
### Link to Existing Dispensary
|
||||
|
||||
```bash
|
||||
curl -X POST "https://api.cannaiq.co/api/discovery/platforms/dt/locations/123/verify-link" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"dispensaryId": 456, "verifiedBy": "admin"}'
|
||||
```
|
||||
|
||||
### Promote to Crawlable
|
||||
|
||||
```bash
|
||||
curl -X POST "https://api.cannaiq.co/api/orchestrator/platforms/dt/promote/123"
|
||||
```
|
||||
|
||||
### Get Discovery Summary
|
||||
|
||||
```bash
|
||||
curl "https://api.cannaiq.co/api/discovery/platforms/dt/summary"
|
||||
```
|
||||
|
||||
## Migration Guide
|
||||
|
||||
### Old Routes (DEPRECATED)
|
||||
|
||||
| Old Route | New Route |
|
||||
|-----------|-----------|
|
||||
| `/api/discovery/dutchie/locations` | `/api/discovery/platforms/dt/locations` |
|
||||
| `/api/discovery/dutchie/locations/:id` | `/api/discovery/platforms/dt/locations/:id` |
|
||||
| `/api/discovery/dutchie/locations/:id/verify-create` | `/api/discovery/platforms/dt/locations/:id/verify-create` |
|
||||
| `/api/discovery/dutchie/locations/:id/verify-link` | `/api/discovery/platforms/dt/locations/:id/verify-link` |
|
||||
| `/api/discovery/dutchie/locations/:id/reject` | `/api/discovery/platforms/dt/locations/:id/reject` |
|
||||
| `/api/discovery/dutchie/locations/:id/unreject` | `/api/discovery/platforms/dt/locations/:id/unreject` |
|
||||
| `/api/discovery/dutchie/locations/:id/match-candidates` | `/api/discovery/platforms/dt/locations/:id/match-candidates` |
|
||||
| `/api/discovery/dutchie/cities` | `/api/discovery/platforms/dt/cities` |
|
||||
| `/api/discovery/dutchie/summary` | `/api/discovery/platforms/dt/summary` |
|
||||
| `/api/discovery/dutchie/nearby` | `/api/discovery/platforms/dt/nearby` |
|
||||
| `/api/discovery/dutchie/geo-stats` | `/api/discovery/platforms/dt/geo-stats` |
|
||||
| `/api/discovery/dutchie/locations/:id/validate-geo` | `/api/discovery/platforms/dt/locations/:id/validate-geo` |
|
||||
| `/api/orchestrator/dutchie/promote/:id` | `/api/orchestrator/platforms/dt/promote/:id` |
|
||||
|
||||
### API Client Changes
|
||||
|
||||
| Old Method | New Method |
|
||||
|------------|------------|
|
||||
| `getDutchieDiscoverySummary()` | `getPlatformDiscoverySummary('dt')` |
|
||||
| `getDutchieDiscoveryLocations(params)` | `getPlatformDiscoveryLocations('dt', params)` |
|
||||
| `getDutchieDiscoveryLocation(id)` | `getPlatformDiscoveryLocation('dt', id)` |
|
||||
| `verifyCreateDutchieLocation(id)` | `verifyCreatePlatformLocation('dt', id)` |
|
||||
| `verifyLinkDutchieLocation(id, dispId)` | `verifyLinkPlatformLocation('dt', id, dispId)` |
|
||||
| `rejectDutchieLocation(id, reason)` | `rejectPlatformLocation('dt', id, reason)` |
|
||||
| `unrejectDutchieLocation(id)` | `unrejectPlatformLocation('dt', id)` |
|
||||
| `getDutchieLocationMatchCandidates(id)` | `getPlatformLocationMatchCandidates('dt', id)` |
|
||||
| `getDutchieDiscoveryCities(params)` | `getPlatformDiscoveryCities('dt', params)` |
|
||||
| `getDutchieNearbyLocations(lat, lon)` | `getPlatformNearbyLocations('dt', lat, lon)` |
|
||||
| `getDutchieGeoStats()` | `getPlatformGeoStats('dt')` |
|
||||
| `validateDutchieLocationGeo(id)` | `validatePlatformLocationGeo('dt', id)` |
|
||||
| `promoteDutchieDiscoveryLocation(id)` | `promotePlatformDiscoveryLocation('dt', id)` |
|
||||
|
||||
## Adding New Platforms
|
||||
|
||||
When adding support for a new platform:
|
||||
|
||||
1. **Assign a slug**: Choose a two-letter neutral slug
|
||||
2. **Update validation**: Add to `validPlatforms` array in `backend/src/index.ts`
|
||||
3. **Create routes**: Implement platform-specific discovery routes
|
||||
4. **Update docs**: Add to this document
|
||||
|
||||
### Example: Adding Jane Support
|
||||
|
||||
```typescript
|
||||
// backend/src/index.ts
|
||||
const validPlatforms = ['dt', 'jn']; // Add 'jn' for Jane
|
||||
|
||||
// Create Jane discovery routes
|
||||
const jnDiscoveryRoutes = createJaneDiscoveryRoutes(getPool());
|
||||
app.use('/api/discovery/platforms/jn', jnDiscoveryRoutes);
|
||||
```
|
||||
|
||||
## Database Schema
|
||||
|
||||
The `platform` column in discovery tables stores the **full platform name** (not the slug):
|
||||
|
||||
```sql
|
||||
-- dutchie_discovery_locations table
|
||||
SELECT * FROM dutchie_discovery_locations WHERE platform = 'dutchie';
|
||||
|
||||
-- dutchie_discovery_cities table
|
||||
SELECT * FROM dutchie_discovery_cities WHERE platform = 'dutchie';
|
||||
```
|
||||
|
||||
This keeps the database schema clean and allows for future renaming of URL slugs without database migrations.
|
||||
|
||||
## Safe Naming Conventions
|
||||
|
||||
### DO
|
||||
- Use neutral two-letter slugs in URLs: `dt`, `jn`, `wm`
|
||||
- Use generic terms in user-facing text: "platform", "menu provider"
|
||||
- Store full platform names in the database for clarity
|
||||
|
||||
### DON'T
|
||||
- Use trademarked names in URL paths
|
||||
- Use vendor names in public-facing error messages
|
||||
- Expose vendor-specific identifiers in consumer APIs
|
||||
Reference in New Issue
Block a user