Files
cannaiq/docs/multi-state.md
Kelly b4a2fb7d03 feat: Add v2 architecture with multi-state support and orchestrator services
Major additions:
- Multi-state expansion: states table, StateSelector, NationalDashboard, StateHeatmap, CrossStateCompare
- Orchestrator services: trace service, error taxonomy, retry manager, proxy rotator
- Discovery system: dutchie discovery service, geo validation, city seeding scripts
- Analytics infrastructure: analytics v2 routes, brand/pricing/stores intelligence pages
- Local development: setup-local.sh starts all 5 services (postgres, backend, cannaiq, findadispo, findagram)
- Migrations 037-056: crawler profiles, states, analytics indexes, worker metadata

Frontend pages added:
- Discovery, ChainsDashboard, IntelligenceBrands, IntelligencePricing, IntelligenceStores
- StateHeatmap, CrossStateCompare, SyncInfoPanel

Components added:
- StateSelector, OrchestratorTraceModal, WorkflowStepper

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 11:30:57 -07:00

9.3 KiB

Multi-State Support

Overview

Phase 4 implements full multi-state support for CannaiQ, transforming it from an Arizona-only platform to a national cannabis intelligence system. This document covers schema updates, API structure, frontend usage, and operational guidelines.

Schema Updates

Core Tables Modified

1. dispensaries table

Already has state column:

state CHAR(2) DEFAULT 'AZ'  -- State code (AZ, CA, CO, etc.)
state_id INTEGER REFERENCES states(id)  -- FK to canonical states table

2. raw_payloads table (Migration 047)

Added state column for query optimization:

state CHAR(2)  -- Denormalized from dispensary for fast filtering

3. states table

Canonical reference for all US states:

CREATE TABLE states (
  id SERIAL PRIMARY KEY,
  code VARCHAR(2) NOT NULL UNIQUE,  -- 'AZ', 'CA', etc.
  name VARCHAR(100) NOT NULL,       -- 'Arizona', 'California', etc.
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

New Indexes (Migration 047)

-- State-based payload filtering
CREATE INDEX idx_raw_payloads_state ON raw_payloads(state);
CREATE INDEX idx_raw_payloads_state_unprocessed ON raw_payloads(state, processed) WHERE processed = FALSE;

-- Dispensary state queries
CREATE INDEX idx_dispensaries_state_menu_type ON dispensaries(state, menu_type);
CREATE INDEX idx_dispensaries_state_crawl_status ON dispensaries(state, crawl_status);
CREATE INDEX idx_dispensaries_state_active ON dispensaries(state) WHERE crawl_status != 'disabled';

Materialized Views

mv_state_metrics

Pre-aggregated state-level metrics for fast dashboard queries:

CREATE MATERIALIZED VIEW mv_state_metrics AS
SELECT
  d.state,
  s.name AS state_name,
  COUNT(DISTINCT d.id) AS store_count,
  COUNT(DISTINCT sp.id) AS total_products,
  COUNT(DISTINCT sp.brand_id) AS unique_brands,
  AVG(sp.price_rec) AS avg_price_rec,
  -- ... more metrics
FROM dispensaries d
LEFT JOIN states s ON d.state = s.code
LEFT JOIN store_products sp ON d.id = sp.dispensary_id
GROUP BY d.state, s.name;

Refresh with:

SELECT refresh_state_metrics();

Views

v_brand_state_presence

Brand presence and metrics per state:

SELECT brand_id, brand_name, state, store_count, product_count, avg_price
FROM v_brand_state_presence
WHERE state = 'AZ';

v_category_state_distribution

Category distribution by state:

SELECT state, category, product_count, store_count, avg_price
FROM v_category_state_distribution
WHERE state = 'CA';

Functions

fn_national_price_comparison(category, brand_id)

Compare prices across all states:

SELECT * FROM fn_national_price_comparison('Flower', NULL);

fn_brand_state_penetration(brand_id)

Get brand penetration across states:

SELECT * FROM fn_brand_state_penetration(123);

API Structure

State List Endpoints

GET /api/states                    # All configured states
GET /api/states?active=true        # Only states with dispensary data

Response:

{
  "success": true,
  "data": {
    "states": [
      { "code": "AZ", "name": "Arizona" },
      { "code": "CA", "name": "California" }
    ],
    "count": 2
  }
}

State-Specific Endpoints

GET /api/state/:state/summary      # Full state summary with metrics
GET /api/state/:state/brands       # Brands in state (paginated)
GET /api/state/:state/categories   # Categories in state (paginated)
GET /api/state/:state/stores       # Stores in state (paginated)
GET /api/state/:state/analytics/prices  # Price distribution

Query parameters:

  • limit - Results per page (default: 50)
  • offset - Pagination offset
  • sortBy - Sort field (e.g., productCount, avgPrice)
  • sortDir - Sort direction (asc or desc)
  • includeInactive - Include disabled stores

National Analytics Endpoints

GET /api/analytics/national/summary    # National aggregate metrics
GET /api/analytics/national/prices     # Price comparison across states
GET /api/analytics/national/heatmap    # State heatmap data
GET /api/analytics/national/metrics    # All state metrics

Heatmap metrics:

  • stores - Store count per state
  • products - Product count per state
  • brands - Brand count per state
  • avgPrice - Average price per state
  • penetration - Brand penetration (requires brandId)

Cross-State Comparison Endpoints

GET /api/analytics/compare/brand/:brandId        # Compare brand across states
GET /api/analytics/compare/category/:category    # Compare category across states
GET /api/analytics/brand/:brandId/penetration    # Brand penetration by state
GET /api/analytics/brand/:brandId/trend          # Historical penetration trend

Query parameters:

  • states - Comma-separated state codes to include (optional)
  • days - Days of history for trends (default: 30)

Admin Endpoints

POST /api/admin/states/refresh-metrics  # Refresh materialized views

Frontend Usage

State Selector

The global state selector is in the sidebar and persists selection via localStorage:

import { useStateStore } from '../store/stateStore';

function MyComponent() {
  const { selectedState, setSelectedState, isNationalView } = useStateStore();

  // null = All States / National view
  // 'AZ' = Arizona only

  if (isNationalView()) {
    // Show national data
  } else {
    // Filter by selectedState
  }
}

State Badge Component

Show current state selection:

import { StateBadge } from '../components/StateSelector';

<StateBadge />  // Shows "National" or state name

API Calls with State Filter

import { api } from '../lib/api';
import { useStateStore } from '../store/stateStore';

function useStateData() {
  const { selectedState } = useStateStore();

  useEffect(() => {
    if (selectedState) {
      // State-specific data
      api.get(`/state/${selectedState}/summary`);
    } else {
      // National data
      api.get('/analytics/national/summary');
    }
  }, [selectedState]);
}

Navigation Routes

Route Component Description
/national NationalDashboard National overview with all states
/national/heatmap StateHeatmap Interactive state heatmap
/national/compare CrossStateCompare Brand/category cross-state comparison

Ingestion Rules

State Assignment

Every raw payload MUST include state:

  1. State is looked up from dispensaries.state during payload storage
  2. Stored on raw_payloads.state for query optimization
  3. Inherited by all normalized products/snapshots via dispensary_id

Hydration Pipeline

The hydration worker supports state filtering:

// Process only AZ payloads
await getUnprocessedPayloads(pool, { state: 'AZ' });

// Process multiple states
await getUnprocessedPayloads(pool, { states: ['AZ', 'CA', 'NV'] });

Data Isolation

Critical rules:

  • No cross-state contamination - Product IDs are unique per (dispensary_id, provider_product_id)
  • No SKU merging - Same SKU in AZ and CA are separate products
  • No store merging - Same store name in different states are separate records
  • Every dispensary maps to exactly ONE state

Constraints & Best Practices

Query Performance

  1. Use mv_state_metrics for dashboard queries (refreshed hourly)
  2. Use indexed views for brand/category queries
  3. Filter by state early in queries to leverage indexes
  4. For cross-state queries, use the dedicated comparison functions

Cache Strategy

API endpoints should be cached with Redis:

// Cache key pattern
`state:${state}:summary` // State summary - 5 min TTL
`national:summary`       // National summary - 5 min TTL
`heatmap:${metric}`      // Heatmap data - 5 min TTL

Adding New States

  1. Add state to states table (if not already present)
  2. Import dispensary data with correct state code
  3. Run menu detection for new dispensaries
  4. Crawl dispensaries with resolved platform IDs
  5. Refresh materialized views: SELECT refresh_state_metrics()

Migration Guide

From Arizona-Only to Multi-State

  1. Apply migration 047:

    DATABASE_URL="..." npm run migrate
    
  2. Existing AZ data requires no changes (already has state='AZ')

  3. New states are added via:

    • Manual dispensary import
    • Menu detection crawl
    • Platform ID resolution
  4. Frontend automatically shows state selector after update

Rollback

Migration 047 is additive - no destructive changes:

  • New columns have defaults
  • Views can be dropped without data loss
  • Indexes can be dropped for performance tuning

Monitoring

Key Metrics to Watch

  1. Store count by state - SELECT state, COUNT(*) FROM dispensaries GROUP BY state
  2. Product coverage - SELECT state, COUNT(DISTINCT sp.id) FROM store_products...
  3. Crawl health by state - Check crawl_runs by dispensary state
  4. Materialized view freshness - SELECT refreshed_at FROM mv_state_metrics

Alerts

Set up alerts for:

  • Materialized view not refreshed in 2+ hours
  • State with 0 products after having products
  • Cross-state data appearing (should never happen)

Future Enhancements

Planned for future phases:

  1. Redis caching for all state endpoints
  2. Real-time refresh of materialized views
  3. Geographic heatmaps with actual US map visualization
  4. State-specific pricing rules (tax rates, etc.)
  5. Multi-state brand portfolio tracking