feat: Add v2 architecture with multi-state support and orchestrator services

Major additions:
- Multi-state expansion: states table, StateSelector, NationalDashboard, StateHeatmap, CrossStateCompare
- Orchestrator services: trace service, error taxonomy, retry manager, proxy rotator
- Discovery system: dutchie discovery service, geo validation, city seeding scripts
- Analytics infrastructure: analytics v2 routes, brand/pricing/stores intelligence pages
- Local development: setup-local.sh starts all 5 services (postgres, backend, cannaiq, findadispo, findagram)
- Migrations 037-056: crawler profiles, states, analytics indexes, worker metadata

Frontend pages added:
- Discovery, ChainsDashboard, IntelligenceBrands, IntelligencePricing, IntelligenceStores
- StateHeatmap, CrossStateCompare, SyncInfoPanel

Components added:
- StateSelector, OrchestratorTraceModal, WorkflowStepper

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Kelly
2025-12-07 11:30:57 -07:00
parent 8ac64ba077
commit b4a2fb7d03
248 changed files with 60714 additions and 666 deletions

345
docs/multi-state.md Normal file
View File

@@ -0,0 +1,345 @@
# Multi-State Support
## Overview
Phase 4 implements full multi-state support for CannaiQ, transforming it from an Arizona-only platform to a national cannabis intelligence system. This document covers schema updates, API structure, frontend usage, and operational guidelines.
## Schema Updates
### Core Tables Modified
#### 1. `dispensaries` table
Already has `state` column:
```sql
state CHAR(2) DEFAULT 'AZ' -- State code (AZ, CA, CO, etc.)
state_id INTEGER REFERENCES states(id) -- FK to canonical states table
```
#### 2. `raw_payloads` table (Migration 047)
Added state column for query optimization:
```sql
state CHAR(2) -- Denormalized from dispensary for fast filtering
```
#### 3. `states` table
Canonical reference for all US states:
```sql
CREATE TABLE states (
id SERIAL PRIMARY KEY,
code VARCHAR(2) NOT NULL UNIQUE, -- 'AZ', 'CA', etc.
name VARCHAR(100) NOT NULL, -- 'Arizona', 'California', etc.
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
```
### New Indexes (Migration 047)
```sql
-- State-based payload filtering
CREATE INDEX idx_raw_payloads_state ON raw_payloads(state);
CREATE INDEX idx_raw_payloads_state_unprocessed ON raw_payloads(state, processed) WHERE processed = FALSE;
-- Dispensary state queries
CREATE INDEX idx_dispensaries_state_menu_type ON dispensaries(state, menu_type);
CREATE INDEX idx_dispensaries_state_crawl_status ON dispensaries(state, crawl_status);
CREATE INDEX idx_dispensaries_state_active ON dispensaries(state) WHERE crawl_status != 'disabled';
```
### Materialized Views
#### `mv_state_metrics`
Pre-aggregated state-level metrics for fast dashboard queries:
```sql
CREATE MATERIALIZED VIEW mv_state_metrics AS
SELECT
d.state,
s.name AS state_name,
COUNT(DISTINCT d.id) AS store_count,
COUNT(DISTINCT sp.id) AS total_products,
COUNT(DISTINCT sp.brand_id) AS unique_brands,
AVG(sp.price_rec) AS avg_price_rec,
-- ... more metrics
FROM dispensaries d
LEFT JOIN states s ON d.state = s.code
LEFT JOIN store_products sp ON d.id = sp.dispensary_id
GROUP BY d.state, s.name;
```
Refresh with:
```sql
SELECT refresh_state_metrics();
```
### Views
#### `v_brand_state_presence`
Brand presence and metrics per state:
```sql
SELECT brand_id, brand_name, state, store_count, product_count, avg_price
FROM v_brand_state_presence
WHERE state = 'AZ';
```
#### `v_category_state_distribution`
Category distribution by state:
```sql
SELECT state, category, product_count, store_count, avg_price
FROM v_category_state_distribution
WHERE state = 'CA';
```
### Functions
#### `fn_national_price_comparison(category, brand_id)`
Compare prices across all states:
```sql
SELECT * FROM fn_national_price_comparison('Flower', NULL);
```
#### `fn_brand_state_penetration(brand_id)`
Get brand penetration across states:
```sql
SELECT * FROM fn_brand_state_penetration(123);
```
## API Structure
### State List Endpoints
```
GET /api/states # All configured states
GET /api/states?active=true # Only states with dispensary data
```
Response:
```json
{
"success": true,
"data": {
"states": [
{ "code": "AZ", "name": "Arizona" },
{ "code": "CA", "name": "California" }
],
"count": 2
}
}
```
### State-Specific Endpoints
```
GET /api/state/:state/summary # Full state summary with metrics
GET /api/state/:state/brands # Brands in state (paginated)
GET /api/state/:state/categories # Categories in state (paginated)
GET /api/state/:state/stores # Stores in state (paginated)
GET /api/state/:state/analytics/prices # Price distribution
```
Query parameters:
- `limit` - Results per page (default: 50)
- `offset` - Pagination offset
- `sortBy` - Sort field (e.g., `productCount`, `avgPrice`)
- `sortDir` - Sort direction (`asc` or `desc`)
- `includeInactive` - Include disabled stores
### National Analytics Endpoints
```
GET /api/analytics/national/summary # National aggregate metrics
GET /api/analytics/national/prices # Price comparison across states
GET /api/analytics/national/heatmap # State heatmap data
GET /api/analytics/national/metrics # All state metrics
```
Heatmap metrics:
- `stores` - Store count per state
- `products` - Product count per state
- `brands` - Brand count per state
- `avgPrice` - Average price per state
- `penetration` - Brand penetration (requires `brandId`)
### Cross-State Comparison Endpoints
```
GET /api/analytics/compare/brand/:brandId # Compare brand across states
GET /api/analytics/compare/category/:category # Compare category across states
GET /api/analytics/brand/:brandId/penetration # Brand penetration by state
GET /api/analytics/brand/:brandId/trend # Historical penetration trend
```
Query parameters:
- `states` - Comma-separated state codes to include (optional)
- `days` - Days of history for trends (default: 30)
### Admin Endpoints
```
POST /api/admin/states/refresh-metrics # Refresh materialized views
```
## Frontend Usage
### State Selector
The global state selector is in the sidebar and persists selection via localStorage:
```tsx
import { useStateStore } from '../store/stateStore';
function MyComponent() {
const { selectedState, setSelectedState, isNationalView } = useStateStore();
// null = All States / National view
// 'AZ' = Arizona only
if (isNationalView()) {
// Show national data
} else {
// Filter by selectedState
}
}
```
### State Badge Component
Show current state selection:
```tsx
import { StateBadge } from '../components/StateSelector';
<StateBadge /> // Shows "National" or state name
```
### API Calls with State Filter
```tsx
import { api } from '../lib/api';
import { useStateStore } from '../store/stateStore';
function useStateData() {
const { selectedState } = useStateStore();
useEffect(() => {
if (selectedState) {
// State-specific data
api.get(`/state/${selectedState}/summary`);
} else {
// National data
api.get('/analytics/national/summary');
}
}, [selectedState]);
}
```
### Navigation Routes
| Route | Component | Description |
|-------|-----------|-------------|
| `/national` | NationalDashboard | National overview with all states |
| `/national/heatmap` | StateHeatmap | Interactive state heatmap |
| `/national/compare` | CrossStateCompare | Brand/category cross-state comparison |
## Ingestion Rules
### State Assignment
Every raw payload MUST include state:
1. State is looked up from `dispensaries.state` during payload storage
2. Stored on `raw_payloads.state` for query optimization
3. Inherited by all normalized products/snapshots via `dispensary_id`
### Hydration Pipeline
The hydration worker supports state filtering:
```typescript
// Process only AZ payloads
await getUnprocessedPayloads(pool, { state: 'AZ' });
// Process multiple states
await getUnprocessedPayloads(pool, { states: ['AZ', 'CA', 'NV'] });
```
### Data Isolation
Critical rules:
- **No cross-state contamination** - Product IDs are unique per (dispensary_id, provider_product_id)
- **No SKU merging** - Same SKU in AZ and CA are separate products
- **No store merging** - Same store name in different states are separate records
- Every dispensary maps to exactly ONE state
## Constraints & Best Practices
### Query Performance
1. Use `mv_state_metrics` for dashboard queries (refreshed hourly)
2. Use indexed views for brand/category queries
3. Filter by state early in queries to leverage indexes
4. For cross-state queries, use the dedicated comparison functions
### Cache Strategy
API endpoints should be cached with Redis:
```typescript
// Cache key pattern
`state:${state}:summary` // State summary - 5 min TTL
`national:summary` // National summary - 5 min TTL
`heatmap:${metric}` // Heatmap data - 5 min TTL
```
### Adding New States
1. Add state to `states` table (if not already present)
2. Import dispensary data with correct `state` code
3. Run menu detection for new dispensaries
4. Crawl dispensaries with resolved platform IDs
5. Refresh materialized views: `SELECT refresh_state_metrics()`
## Migration Guide
### From Arizona-Only to Multi-State
1. Apply migration 047:
```bash
DATABASE_URL="..." npm run migrate
```
2. Existing AZ data requires no changes (already has `state='AZ'`)
3. New states are added via:
- Manual dispensary import
- Menu detection crawl
- Platform ID resolution
4. Frontend automatically shows state selector after update
### Rollback
Migration 047 is additive - no destructive changes:
- New columns have defaults
- Views can be dropped without data loss
- Indexes can be dropped for performance tuning
## Monitoring
### Key Metrics to Watch
1. **Store count by state** - `SELECT state, COUNT(*) FROM dispensaries GROUP BY state`
2. **Product coverage** - `SELECT state, COUNT(DISTINCT sp.id) FROM store_products...`
3. **Crawl health by state** - Check `crawl_runs` by dispensary state
4. **Materialized view freshness** - `SELECT refreshed_at FROM mv_state_metrics`
### Alerts
Set up alerts for:
- Materialized view not refreshed in 2+ hours
- State with 0 products after having products
- Cross-state data appearing (should never happen)
## Future Enhancements
Planned for future phases:
1. **Redis caching** for all state endpoints
2. **Real-time refresh** of materialized views
3. **Geographic heatmaps** with actual US map visualization
4. **State-specific pricing rules** (tax rates, etc.)
5. **Multi-state brand portfolio tracking**