chore: Clean up deprecated code and docs

- Move deprecated directories to src/_deprecated/: - hydration/ (old pipeline approach) - scraper-v2/ (old Puppeteer scraper) - canonical-hydration/ (merged into tasks) - Unused services: availability, crawler-logger, geolocation, etc - Unused utils: age-gate-playwright, HomepageValidator, stealthBrowser - Archive outdated docs to docs/_archive/: - ANALYTICS_RUNBOOK.md - ANALYTICS_V2_EXAMPLES.md - BRAND_INTELLIGENCE_API.md - CRAWL_PIPELINE.md - TASK_WORKFLOW_2024-12-10.md - WORKER_TASK_ARCHITECTURE.md - ORGANIC_SCRAPING_GUIDE.md - Add docs/CODEBASE_MAP.md as single source of truth - Add warning files to deprecated/archived directories - Slim down CLAUDE.md to essential rules only 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 22:17:40 -07:00
parent f2864bd2ad
commit a35976b9e9
61 changed files with 856 additions and 1281 deletions
--- a/backend/docs/_archive/ANALYTICS_RUNBOOK.md
+++ b/backend/docs/_archive/ANALYTICS_RUNBOOK.md
@@ -0,0 +1,712 @@
+# CannaiQ Analytics Runbook
+
+Phase 3: Analytics Engine - Complete Implementation Guide
+
+## Overview
+
+The CannaiQ Analytics Engine provides real-time insights into cannabis market data across price trends, brand penetration, category performance, store changes, and competitive positioning.
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                        API Layer                                 │
+│  /api/az/analytics/*                                            │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                     Analytics Services                          │
+│  ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐    │
+│  │PriceTrend    │ │Penetration   │ │CategoryAnalytics     │    │
+│  │Service       │ │Service       │ │Service               │    │
+│  └──────────────┘ └──────────────┘ └──────────────────────┘    │
+│  ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐    │
+│  │StoreChange   │ │BrandOpportunity│ │AnalyticsCache       │    │
+│  │Service       │ │Service        │ │(15-min TTL)          │    │
+│  └──────────────┘ └──────────────┘ └──────────────────────┘    │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                     Canonical Tables                            │
+│  store_products │ store_product_snapshots │ brands │ categories │
+│  dispensaries   │ brand_snapshots         │ category_snapshots  │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+## Services
+
+### 1. PriceTrendService
+
+Provides time-series price analytics.
+
+**Key Methods:**
+| Method | Description |
+|--------|-------------|
+| `getProductPriceTrend(productId, storeId?, days)` | Price history for a product |
+| `getBrandPriceTrend(brandName, filters)` | Average prices for a brand |
+| `getCategoryPriceTrend(category, filters)` | Category-level price trends |
+| `getPriceSummary(filters)` | 7d/30d/90d price averages |
+| `detectPriceCompression(category, state?)` | Price war detection |
+| `getGlobalPriceStats()` | Market-wide pricing overview |
+
+**Filters:**
+```typescript
+interface PriceFilters {
+  storeId?: number;
+  brandName?: string;
+  category?: string;
+  state?: string;
+  days?: number; // default: 30
+}
+```
+
+**Price Compression Detection:**
+- Calculates standard deviation of prices within category
+- Returns compression score 0-100 (higher = more compressed)
+- Identifies brands converging toward mean price
+
+---
+
+### 2. PenetrationService
+
+Tracks brand market presence across stores and states.
+
+**Key Methods:**
+| Method | Description |
+|--------|-------------|
+| `getBrandPenetration(brandName, filters)` | Store count, SKU count, coverage |
+| `getTopBrandsByPenetration(limit, filters)` | Leaderboard of dominant brands |
+| `getPenetrationTrend(brandName, days)` | Historical penetration growth |
+| `getShelfShareByCategory(brandName)` | % of shelf per category |
+| `getBrandPresenceByState(brandName)` | Multi-state presence map |
+| `getStoresCarryingBrand(brandName)` | List of stores carrying brand |
+| `getPenetrationHeatmap(brandName?)` | Geographic distribution |
+
+**Penetration Calculation:**
+```
+Penetration % = (Stores with Brand / Total Stores in Market) × 100
+```
+
+---
+
+### 3. CategoryAnalyticsService
+
+Analyzes category performance and trends.
+
+**Key Methods:**
+| Method | Description |
+|--------|-------------|
+| `getCategorySummary(category?, filters)` | SKU count, avg price, stores |
+| `getCategoryGrowth(days, filters)` | 7d/30d/90d growth rates |
+| `getCategoryGrowthTrend(category, days)` | Time-series category growth |
+| `getCategoryHeatmap(metric, periods)` | Visual heatmap data |
+| `getTopMovers(limit, days)` | Fastest growing/declining categories |
+| `getSubcategoryBreakdown(category)` | Drill-down into subcategories |
+
+**Time Windows:**
+- 7 days: Short-term volatility
+- 30 days: Monthly trends
+- 90 days: Seasonal patterns
+
+---
+
+### 4. StoreChangeService
+
+Tracks product adds/drops, brand changes, and price movements per store.
+
+**Key Methods:**
+| Method | Description |
+|--------|-------------|
+| `getStoreChangeSummary(storeId)` | Overview of recent changes |
+| `getStoreChangeEvents(storeId, filters)` | Event log (add, drop, price, OOS) |
+| `getNewBrands(storeId, days)` | Brands added to store |
+| `getLostBrands(storeId, days)` | Brands dropped from store |
+| `getProductChanges(storeId, type, days)` | Filtered product changes |
+| `getCategoryLeaderboard(category, limit)` | Top stores for category |
+| `getMostActiveStores(days, limit)` | Stores with most changes |
+| `compareStores(store1, store2)` | Side-by-side store comparison |
+
+**Event Types:**
+- `added` - New product appeared
+- `discontinued` - Product removed
+- `price_drop` - Price decreased
+- `price_increase` - Price increased
+- `restocked` - OOS → In Stock
+- `out_of_stock` - In Stock → OOS
+
+---
+
+### 5. BrandOpportunityService
+
+Competitive intelligence and opportunity identification.
+
+**Key Methods:**
+| Method | Description |
+|--------|-------------|
+| `getBrandOpportunity(brandName)` | Full opportunity analysis |
+| `getMarketPositionSummary(brandName)` | Market position vs competitors |
+| `getAlerts(filters)` | Analytics-generated alerts |
+| `markAlertsRead(alertIds)` | Mark alerts as read |
+
+**Opportunity Analysis Includes:**
+- White space stores (potential targets)
+- Competitive threats (brands gaining share)
+- Pricing opportunities (underpriced vs market)
+- Missing SKU recommendations
+
+---
+
+### 6. AnalyticsCache
+
+In-memory caching with database fallback.
+
+**Configuration:**
+```typescript
+const cache = new AnalyticsCache(pool, {
+  defaultTtlMinutes: 15,
+});
+```
+
+**Usage Pattern:**
+```typescript
+const data = await cache.getOrCompute(cacheKey, async () => {
+  // Expensive query here
+  return result;
+});
+```
+
+**Cache Management:**
+- `GET /api/az/analytics/cache/stats` - View cache stats
+- `POST /api/az/analytics/cache/clear?pattern=price*` - Clear by pattern
+- Auto-cleanup of expired entries every 5 minutes
+
+---
+
+## API Endpoints Reference
+
+### Price Endpoints
+
+```bash
+# Product price trend (last 30 days)
+GET /api/az/analytics/price/product/12345?days=30
+
+# Brand price trend with filters
+GET /api/az/analytics/price/brand/Cookies?storeId=101&category=Flower&days=90
+
+# Category median price
+GET /api/az/analytics/price/category/Vaporizers?state=AZ
+
+# Price summary (7d/30d/90d)
+GET /api/az/analytics/price/summary?brand=Stiiizy&state=AZ
+
+# Detect price wars
+GET /api/az/analytics/price/compression/Flower?state=AZ
+
+# Global stats
+GET /api/az/analytics/price/global
+```
+
+### Penetration Endpoints
+
+```bash
+# Brand penetration
+GET /api/az/analytics/penetration/brand/Cookies
+
+# Top brands leaderboard
+GET /api/az/analytics/penetration/top?limit=20&state=AZ&category=Flower
+
+# Penetration trend
+GET /api/az/analytics/penetration/trend/Cookies?days=90
+
+# Shelf share by category
+GET /api/az/analytics/penetration/shelf-share/Cookies
+
+# Multi-state presence
+GET /api/az/analytics/penetration/by-state/Cookies
+
+# Stores carrying brand
+GET /api/az/analytics/penetration/stores/Cookies
+
+# Heatmap data
+GET /api/az/analytics/penetration/heatmap?brand=Cookies
+```
+
+### Category Endpoints
+
+```bash
+# Category summary
+GET /api/az/analytics/category/summary?category=Flower&state=AZ
+
+# Category growth (7d/30d/90d)
+GET /api/az/analytics/category/growth?days=30&state=AZ
+
+# Category trend
+GET /api/az/analytics/category/trend/Concentrates?days=90
+
+# Heatmap
+GET /api/az/analytics/category/heatmap?metric=growth&periods=12
+
+# Top movers (growing/declining)
+GET /api/az/analytics/category/top-movers?limit=5&days=30
+
+# Subcategory breakdown
+GET /api/az/analytics/category/Edibles/subcategories
+```
+
+### Store Endpoints
+
+```bash
+# Store change summary
+GET /api/az/analytics/store/101/summary
+
+# Event log
+GET /api/az/analytics/store/101/events?type=price_drop&days=7&limit=50
+
+# New brands
+GET /api/az/analytics/store/101/brands/new?days=30
+
+# Lost brands
+GET /api/az/analytics/store/101/brands/lost?days=30
+
+# Product changes by type
+GET /api/az/analytics/store/101/products/changes?type=added&days=7
+
+# Category leaderboard
+GET /api/az/analytics/store/leaderboard/Flower?limit=20
+
+# Most active stores
+GET /api/az/analytics/store/most-active?days=7&limit=10
+
+# Compare two stores
+GET /api/az/analytics/store/compare?store1=101&store2=102
+```
+
+### Brand Opportunity Endpoints
+
+```bash
+# Full opportunity analysis
+GET /api/az/analytics/brand/Cookies/opportunity
+
+# Market position summary
+GET /api/az/analytics/brand/Cookies/position
+
+# Get alerts
+GET /api/az/analytics/alerts?brand=Cookies&type=competitive&unreadOnly=true
+
+# Mark alerts read
+POST /api/az/analytics/alerts/mark-read
+Body: { "alertIds": [1, 2, 3] }
+```
+
+### Maintenance Endpoints
+
+```bash
+# Capture daily snapshots (run by scheduler)
+POST /api/az/analytics/snapshots/capture
+
+# Cache statistics
+GET /api/az/analytics/cache/stats
+
+# Clear cache (admin)
+POST /api/az/analytics/cache/clear?pattern=price*
+```
+
+---
+
+## Incremental Computation
+
+Analytics are designed for real-time queries without full recomputation:
+
+### Snapshot Strategy
+
+1. **Raw Data**: `store_products` (current state)
+2. **Historical**: `store_product_snapshots` (time-series)
+3. **Aggregated**: `brand_snapshots`, `category_snapshots` (daily rollups)
+
+### Window Calculations
+
+```sql
+-- 7-day window
+WHERE crawled_at >= NOW() - INTERVAL '7 days'
+
+-- 30-day window
+WHERE crawled_at >= NOW() - INTERVAL '30 days'
+
+-- 90-day window
+WHERE crawled_at >= NOW() - INTERVAL '90 days'
+```
+
+### Materialized Views (Optional)
+
+For heavy queries, create materialized views:
+
+```sql
+CREATE MATERIALIZED VIEW mv_brand_daily_metrics AS
+SELECT
+  DATE(sps.captured_at) as date,
+  sp.brand_id,
+  COUNT(DISTINCT sp.dispensary_id) as store_count,
+  COUNT(*) as sku_count,
+  AVG(sp.price_rec) as avg_price
+FROM store_product_snapshots sps
+JOIN store_products sp ON sps.store_product_id = sp.id
+WHERE sps.captured_at >= NOW() - INTERVAL '90 days'
+GROUP BY DATE(sps.captured_at), sp.brand_id;
+
+-- Refresh daily
+REFRESH MATERIALIZED VIEW CONCURRENTLY mv_brand_daily_metrics;
+```
+
+---
+
+## Scheduled Jobs
+
+### Daily Snapshot Capture
+
+Trigger via cron or scheduler:
+
+```bash
+curl -X POST http://localhost:3010/api/az/analytics/snapshots/capture
+```
+
+This calls:
+- `capture_brand_snapshots()` - Captures brand metrics
+- `capture_category_snapshots()` - Captures category metrics
+
+### Cache Cleanup
+
+Automatic cleanup every 5 minutes via in-memory timer.
+
+For manual cleanup:
+```bash
+curl -X POST http://localhost:3010/api/az/analytics/cache/clear
+```
+
+---
+
+## Extending Analytics (Future Phases)
+
+### Phase 6: Intelligence Engine
+- Automated alert generation
+- Recommendation engine
+- Price prediction
+
+### Phase 7: Orders Integration
+- Sales velocity analytics
+- Reorder predictions
+- Inventory turnover
+
+### Phase 8: Advanced ML
+- Demand forecasting
+- Price elasticity modeling
+- Customer segmentation
+
+---
+
+## Troubleshooting
+
+### Common Issues
+
+**1. Slow queries**
+- Check cache stats: `GET /api/az/analytics/cache/stats`
+- Increase cache TTL if data doesn't need real-time freshness
+- Add indexes on frequently filtered columns
+
+**2. Empty results**
+- Verify data exists in source tables
+- Check filter parameters (case-sensitive brand names)
+- Verify state codes are valid
+
+**3. Stale data**
+- Run snapshot capture: `POST /api/az/analytics/snapshots/capture`
+- Clear cache: `POST /api/az/analytics/cache/clear`
+
+### Debugging
+
+Enable query logging:
+```typescript
+// In service constructor
+this.debug = process.env.ANALYTICS_DEBUG === 'true';
+```
+
+---
+
+## Data Contracts
+
+### Price Trend Response
+```typescript
+interface PriceTrend {
+  productId?: number;
+  storeId?: number;
+  brandName?: string;
+  category?: string;
+  dataPoints: Array<{
+    date: string;
+    minPrice: number | null;
+    maxPrice: number | null;
+    avgPrice: number | null;
+    wholesalePrice: number | null;
+    sampleSize: number;
+  }>;
+  summary: {
+    currentAvg: number | null;
+    previousAvg: number | null;
+    changePercent: number | null;
+    trend: 'up' | 'down' | 'stable';
+    volatilityScore: number | null;
+  };
+}
+```
+
+### Brand Penetration Response
+```typescript
+interface BrandPenetration {
+  brandName: string;
+  totalStores: number;
+  storesWithBrand: number;
+  penetrationPercent: number;
+  skuCount: number;
+  avgPrice: number | null;
+  priceRange: { min: number; max: number } | null;
+  topCategories: Array<{ category: string; count: number }>;
+  stateBreakdown?: Array<{ state: string; storeCount: number }>;
+}
+```
+
+### Category Growth Response
+```typescript
+interface CategoryGrowth {
+  category: string;
+  currentCount: number;
+  previousCount: number;
+  growthPercent: number;
+  growthTrend: 'up' | 'down' | 'stable';
+  avgPrice: number | null;
+  priceChange: number | null;
+  topBrands: Array<{ brandName: string; count: number }>;
+}
+```
+
+---
+
+## Files Reference
+
+| File | Purpose |
+|------|---------|
+| `src/dutchie-az/services/analytics/price-trends.ts` | Price analytics |
+| `src/dutchie-az/services/analytics/penetration.ts` | Brand penetration |
+| `src/dutchie-az/services/analytics/category-analytics.ts` | Category metrics |
+| `src/dutchie-az/services/analytics/store-changes.ts` | Store event tracking |
+| `src/dutchie-az/services/analytics/brand-opportunity.ts` | Competitive intel |
+| `src/dutchie-az/services/analytics/cache.ts` | Caching layer |
+| `src/dutchie-az/services/analytics/index.ts` | Module exports |
+| `src/dutchie-az/routes/analytics.ts` | API routes (680 LOC) |
+| `src/multi-state/state-query-service.ts` | Cross-state analytics |
+
+---
+
+---
+
+## Analytics V2: Rec/Med State Segmentation
+
+Phase 3 Enhancement: Enhanced analytics with recreational vs medical-only state analysis.
+
+### V2 API Endpoints
+
+All V2 endpoints are prefixed with `/api/analytics/v2`
+
+#### V2 Price Analytics
+
+```bash
+# Price trends for a specific product
+GET /api/analytics/v2/price/product/12345?window=30d
+
+# Price by category and state (with rec/med segmentation)
+GET /api/analytics/v2/price/category/Flower?state=AZ
+
+# Price by brand and state
+GET /api/analytics/v2/price/brand/Cookies?state=AZ
+
+# Most volatile products
+GET /api/analytics/v2/price/volatile?window=30d&limit=50&state=AZ
+
+# Rec vs Med price comparison by category
+GET /api/analytics/v2/price/rec-vs-med?category=Flower
+```
+
+#### V2 Brand Penetration
+
+```bash
+# Brand penetration metrics with state breakdown
+GET /api/analytics/v2/brand/Cookies/penetration?window=30d
+
+# Brand market position within categories
+GET /api/analytics/v2/brand/Cookies/market-position?category=Flower&state=AZ
+
+# Brand presence in rec vs med-only states
+GET /api/analytics/v2/brand/Cookies/rec-vs-med
+
+# Top brands by penetration
+GET /api/analytics/v2/brand/top?limit=25&state=AZ
+
+# Brands expanding or contracting
+GET /api/analytics/v2/brand/expansion-contraction?window=30d&limit=25
+```
+
+#### V2 Category Analytics
+
+```bash
+# Category growth metrics
+GET /api/analytics/v2/category/Flower/growth?window=30d
+
+# Category growth trend over time
+GET /api/analytics/v2/category/Flower/trend?window=30d
+
+# Top brands in category
+GET /api/analytics/v2/category/Flower/top-brands?limit=25&state=AZ
+
+# All categories with metrics
+GET /api/analytics/v2/category/all?state=AZ&limit=50
+
+# Rec vs Med category comparison
+GET /api/analytics/v2/category/rec-vs-med?category=Flower
+
+# Fastest growing categories
+GET /api/analytics/v2/category/fastest-growing?window=30d&limit=25
+```
+
+#### V2 Store Analytics
+
+```bash
+# Store change summary
+GET /api/analytics/v2/store/101/summary?window=30d
+
+# Product change events
+GET /api/analytics/v2/store/101/events?window=7d&limit=100
+
+# Store inventory composition
+GET /api/analytics/v2/store/101/inventory
+
+# Store price positioning vs market
+GET /api/analytics/v2/store/101/price-position
+
+# Most active stores by changes
+GET /api/analytics/v2/store/most-active?window=7d&limit=25&state=AZ
+```
+
+#### V2 State Analytics
+
+```bash
+# State market summary
+GET /api/analytics/v2/state/AZ/summary
+
+# All states with coverage metrics
+GET /api/analytics/v2/state/all
+
+# Legal state breakdown (rec, med-only, no program)
+GET /api/analytics/v2/state/legal-breakdown
+
+# Rec vs Med pricing by category
+GET /api/analytics/v2/state/rec-vs-med-pricing?category=Flower
+
+# States with coverage gaps
+GET /api/analytics/v2/state/coverage-gaps
+
+# Cross-state pricing comparison
+GET /api/analytics/v2/state/price-comparison
+```
+
+### V2 Services Architecture
+
+```
+src/services/analytics/
+├── index.ts                    # Exports all V2 services
+├── types.ts                    # Shared type definitions
+├── PriceAnalyticsService.ts    # Price trends and volatility
+├── BrandPenetrationService.ts  # Brand market presence
+├── CategoryAnalyticsService.ts # Category growth analysis
+├── StoreAnalyticsService.ts    # Store change tracking
+└── StateAnalyticsService.ts    # State-level analytics
+
+src/routes/analytics-v2.ts      # V2 API route handlers
+```
+
+### Key V2 Features
+
+1. **Rec/Med State Segmentation**: All analytics can be filtered and compared by legal status
+2. **State Coverage Gaps**: Identify legal states with missing or stale data
+3. **Cross-State Pricing**: Compare prices across recreational and medical-only markets
+4. **Brand Footprint Analysis**: Track brand presence in rec vs med states
+5. **Category Comparison**: Compare category performance by legal status
+
+### V2 Migration Path
+
+1. Run migration 052 for state cannabis flags:
+   ```bash
+   psql "$DATABASE_URL" -f migrations/052_add_state_cannabis_flags.sql
+   ```
+
+2. Run migration 053 for analytics indexes:
+   ```bash
+   psql "$DATABASE_URL" -f migrations/053_analytics_indexes.sql
+   ```
+
+3. Restart backend to pick up new routes
+
+### V2 Response Examples
+
+**Rec vs Med Price Comparison:**
+```json
+{
+  "category": "Flower",
+  "recreational": {
+    "state_count": 15,
+    "product_count": 12500,
+    "avg_price": 35.50,
+    "median_price": 32.00
+  },
+  "medical_only": {
+    "state_count": 8,
+    "product_count": 5200,
+    "avg_price": 42.00,
+    "median_price": 40.00
+  },
+  "price_diff_percent": -15.48
+}
+```
+
+**Legal State Breakdown:**
+```json
+{
+  "recreational_states": {
+    "count": 24,
+    "dispensary_count": 850,
+    "product_count": 125000,
+    "states": [
+      { "code": "CA", "name": "California", "dispensary_count": 250 },
+      { "code": "CO", "name": "Colorado", "dispensary_count": 150 }
+    ]
+  },
+  "medical_only_states": {
+    "count": 18,
+    "dispensary_count": 320,
+    "product_count": 45000,
+    "states": [
+      { "code": "FL", "name": "Florida", "dispensary_count": 120 }
+    ]
+  },
+  "no_program_states": {
+    "count": 9,
+    "states": [
+      { "code": "ID", "name": "Idaho" }
+    ]
+  }
+}
+```
+
+---
+
+*Phase 3 Analytics Engine - Fully Implemented*
+*V2 Rec/Med State Analytics - Added December 2024*
--- a/backend/docs/_archive/ANALYTICS_V2_EXAMPLES.md
+++ b/backend/docs/_archive/ANALYTICS_V2_EXAMPLES.md
@@ -0,0 +1,594 @@
+# Analytics V2 API Examples
+
+## Overview
+
+All endpoints are prefixed with `/api/analytics/v2`
+
+### Filtering Options
+
+**Time Windows:**
+- `?window=7d` - Last 7 days
+- `?window=30d` - Last 30 days (default)
+- `?window=90d` - Last 90 days
+
+**Legal Type Filtering:**
+- `?legalType=recreational` - Recreational states only
+- `?legalType=medical_only` - Medical-only states (not recreational)
+- `?legalType=no_program` - States with no cannabis program
+
+---
+
+## 1. Price Analytics
+
+### GET /price/product/:id
+
+Get price trends for a specific store product.
+
+**Request:**
+```bash
+GET /api/analytics/v2/price/product/12345?window=30d
+```
+
+**Response:**
+```json
+{
+  "store_product_id": 12345,
+  "product_name": "Blue Dream 3.5g",
+  "brand_name": "Cookies",
+  "category": "Flower",
+  "dispensary_id": 101,
+  "dispensary_name": "Green Leaf Dispensary",
+  "state_code": "AZ",
+  "data_points": [
+    {
+      "date": "2024-11-06",
+      "price_rec": 45.00,
+      "price_med": 40.00,
+      "price_rec_special": null,
+      "price_med_special": null,
+      "is_on_special": false
+    },
+    {
+      "date": "2024-11-07",
+      "price_rec": 42.00,
+      "price_med": 38.00,
+      "price_rec_special": null,
+      "price_med_special": null,
+      "is_on_special": false
+    }
+  ],
+  "summary": {
+    "current_price": 42.00,
+    "min_price": 40.00,
+    "max_price": 48.00,
+    "avg_price": 43.50,
+    "price_change_count": 3,
+    "volatility_percent": 8.2
+  }
+}
+```
+
+### GET /price/rec-vs-med
+
+Get recreational vs medical-only price comparison by category.
+
+**Request:**
+```bash
+GET /api/analytics/v2/price/rec-vs-med?category=Flower
+```
+
+**Response:**
+```json
+[
+  {
+    "category": "Flower",
+    "rec_avg": 38.50,
+    "rec_median": 35.00,
+    "med_avg": 42.00,
+    "med_median": 40.00
+  },
+  {
+    "category": "Concentrates",
+    "rec_avg": 45.00,
+    "rec_median": 42.00,
+    "med_avg": 48.00,
+    "med_median": 45.00
+  }
+]
+```
+
+---
+
+## 2. Brand Analytics
+
+### GET /brand/:name/penetration
+
+Get brand penetration metrics with state breakdown.
+
+**Request:**
+```bash
+GET /api/analytics/v2/brand/Cookies/penetration?window=30d
+```
+
+**Response:**
+```json
+{
+  "brand_name": "Cookies",
+  "total_dispensaries": 125,
+  "total_skus": 450,
+  "avg_skus_per_dispensary": 3.6,
+  "states_present": ["AZ", "CA", "CO", "NV", "MI"],
+  "state_breakdown": [
+    {
+      "state_code": "CA",
+      "state_name": "California",
+      "legal_type": "recreational",
+      "dispensary_count": 45,
+      "sku_count": 180,
+      "avg_skus_per_dispensary": 4.0,
+      "market_share_percent": 12.5
+    },
+    {
+      "state_code": "AZ",
+      "state_name": "Arizona",
+      "legal_type": "recreational",
+      "dispensary_count": 32,
+      "sku_count": 128,
+      "avg_skus_per_dispensary": 4.0,
+      "market_share_percent": 15.2
+    }
+  ],
+  "penetration_trend": [
+    {
+      "date": "2024-11-01",
+      "dispensary_count": 120,
+      "new_dispensaries": 0,
+      "dropped_dispensaries": 0
+    },
+    {
+      "date": "2024-11-08",
+      "dispensary_count": 123,
+      "new_dispensaries": 3,
+      "dropped_dispensaries": 0
+    },
+    {
+      "date": "2024-11-15",
+      "dispensary_count": 125,
+      "new_dispensaries": 2,
+      "dropped_dispensaries": 0
+    }
+  ]
+}
+```
+
+### GET /brand/:name/rec-vs-med
+
+Get brand presence in recreational vs medical-only states.
+
+**Request:**
+```bash
+GET /api/analytics/v2/brand/Cookies/rec-vs-med
+```
+
+**Response:**
+```json
+{
+  "brand_name": "Cookies",
+  "rec_states_count": 4,
+  "rec_states": ["AZ", "CA", "CO", "NV"],
+  "rec_dispensary_count": 110,
+  "rec_avg_skus": 3.8,
+  "med_only_states_count": 2,
+  "med_only_states": ["FL", "OH"],
+  "med_only_dispensary_count": 15,
+  "med_only_avg_skus": 2.5
+}
+```
+
+---
+
+## 3. Category Analytics
+
+### GET /category/:name/growth
+
+Get category growth metrics with state breakdown.
+
+**Request:**
+```bash
+GET /api/analytics/v2/category/Flower/growth?window=30d
+```
+
+**Response:**
+```json
+{
+  "category": "Flower",
+  "current_sku_count": 5200,
+  "current_dispensary_count": 320,
+  "avg_price": 38.50,
+  "growth_data": [
+    {
+      "date": "2024-11-01",
+      "sku_count": 4800,
+      "dispensary_count": 310,
+      "avg_price": 39.00
+    },
+    {
+      "date": "2024-11-15",
+      "sku_count": 5000,
+      "dispensary_count": 315,
+      "avg_price": 38.75
+    },
+    {
+      "date": "2024-12-01",
+      "sku_count": 5200,
+      "dispensary_count": 320,
+      "avg_price": 38.50
+    }
+  ],
+  "state_breakdown": [
+    {
+      "state_code": "CA",
+      "state_name": "California",
+      "legal_type": "recreational",
+      "sku_count": 2100,
+      "dispensary_count": 145,
+      "avg_price": 36.00
+    },
+    {
+      "state_code": "AZ",
+      "state_name": "Arizona",
+      "legal_type": "recreational",
+      "sku_count": 950,
+      "dispensary_count": 85,
+      "avg_price": 40.00
+    }
+  ]
+}
+```
+
+### GET /category/rec-vs-med
+
+Get category comparison between recreational and medical-only states.
+
+**Request:**
+```bash
+GET /api/analytics/v2/category/rec-vs-med
+```
+
+**Response:**
+```json
+[
+  {
+    "category": "Flower",
+    "recreational": {
+      "state_count": 15,
+      "dispensary_count": 650,
+      "sku_count": 12500,
+      "avg_price": 35.50,
+      "median_price": 32.00
+    },
+    "medical_only": {
+      "state_count": 8,
+      "dispensary_count": 220,
+      "sku_count": 4200,
+      "avg_price": 42.00,
+      "median_price": 40.00
+    },
+    "price_diff_percent": -15.48
+  },
+  {
+    "category": "Concentrates",
+    "recreational": {
+      "state_count": 15,
+      "dispensary_count": 600,
+      "sku_count": 8500,
+      "avg_price": 42.00,
+      "median_price": 40.00
+    },
+    "medical_only": {
+      "state_count": 8,
+      "dispensary_count": 200,
+      "sku_count": 3100,
+      "avg_price": 48.00,
+      "median_price": 45.00
+    },
+    "price_diff_percent": -12.50
+  }
+]
+```
+
+---
+
+## 4. Store Analytics
+
+### GET /store/:id/summary
+
+Get change summary for a store over a time window.
+
+**Request:**
+```bash
+GET /api/analytics/v2/store/101/summary?window=30d
+```
+
+**Response:**
+```json
+{
+  "dispensary_id": 101,
+  "dispensary_name": "Green Leaf Dispensary",
+  "state_code": "AZ",
+  "window": "30d",
+  "products_added": 45,
+  "products_dropped": 12,
+  "brands_added": ["Alien Labs", "Connected"],
+  "brands_dropped": ["House Brand"],
+  "price_changes": 156,
+  "avg_price_change_percent": 3.2,
+  "stock_in_events": 89,
+  "stock_out_events": 34,
+  "current_product_count": 512,
+  "current_in_stock_count": 478
+}
+```
+
+### GET /store/:id/events
+
+Get recent product change events for a store.
+
+**Request:**
+```bash
+GET /api/analytics/v2/store/101/events?window=7d&limit=50
+```
+
+**Response:**
+```json
+[
+  {
+    "store_product_id": 12345,
+    "product_name": "Blue Dream 3.5g",
+    "brand_name": "Cookies",
+    "category": "Flower",
+    "event_type": "price_change",
+    "event_date": "2024-12-05T14:30:00.000Z",
+    "old_value": "45.00",
+    "new_value": "42.00"
+  },
+  {
+    "store_product_id": 12346,
+    "product_name": "OG Kush 1g",
+    "brand_name": "Alien Labs",
+    "category": "Flower",
+    "event_type": "added",
+    "event_date": "2024-12-04T10:00:00.000Z",
+    "old_value": null,
+    "new_value": null
+  },
+  {
+    "store_product_id": 12300,
+    "product_name": "Sour Diesel Cart",
+    "brand_name": "Select",
+    "category": "Vaporizers",
+    "event_type": "stock_out",
+    "event_date": "2024-12-03T16:45:00.000Z",
+    "old_value": "true",
+    "new_value": "false"
+  }
+]
+```
+
+---
+
+## 5. State Analytics
+
+### GET /state/:code/summary
+
+Get market summary for a specific state with rec/med breakdown.
+
+**Request:**
+```bash
+GET /api/analytics/v2/state/AZ/summary
+```
+
+**Response:**
+```json
+{
+  "state_code": "AZ",
+  "state_name": "Arizona",
+  "legal_status": {
+    "recreational_legal": true,
+    "rec_year": 2020,
+    "medical_legal": true,
+    "med_year": 2010
+  },
+  "coverage": {
+    "dispensary_count": 145,
+    "product_count": 18500,
+    "brand_count": 320,
+    "category_count": 12,
+    "snapshot_count": 2450000,
+    "last_crawl_at": "2024-12-06T02:30:00.000Z"
+  },
+  "pricing": {
+    "avg_price": 42.50,
+    "median_price": 38.00,
+    "min_price": 5.00,
+    "max_price": 250.00
+  },
+  "top_categories": [
+    { "category": "Flower", "count": 5200 },
+    { "category": "Concentrates", "count": 3800 },
+    { "category": "Vaporizers", "count": 2950 },
+    { "category": "Edibles", "count": 2400 },
+    { "category": "Pre-Rolls", "count": 1850 }
+  ],
+  "top_brands": [
+    { "brand": "Cookies", "count": 450 },
+    { "brand": "Alien Labs", "count": 380 },
+    { "brand": "Connected", "count": 320 },
+    { "brand": "Stiiizy", "count": 290 },
+    { "brand": "Raw Garden", "count": 275 }
+  ]
+}
+```
+
+### GET /state/legal-breakdown
+
+Get breakdown by legal status (recreational, medical-only, no program).
+
+**Request:**
+```bash
+GET /api/analytics/v2/state/legal-breakdown
+```
+
+**Response:**
+```json
+{
+  "recreational_states": {
+    "count": 24,
+    "dispensary_count": 850,
+    "product_count": 125000,
+    "snapshot_count": 15000000,
+    "states": [
+      { "code": "CA", "name": "California", "dispensary_count": 250 },
+      { "code": "CO", "name": "Colorado", "dispensary_count": 150 },
+      { "code": "AZ", "name": "Arizona", "dispensary_count": 145 },
+      { "code": "MI", "name": "Michigan", "dispensary_count": 120 }
+    ]
+  },
+  "medical_only_states": {
+    "count": 18,
+    "dispensary_count": 320,
+    "product_count": 45000,
+    "snapshot_count": 5000000,
+    "states": [
+      { "code": "FL", "name": "Florida", "dispensary_count": 120 },
+      { "code": "OH", "name": "Ohio", "dispensary_count": 85 },
+      { "code": "PA", "name": "Pennsylvania", "dispensary_count": 75 }
+    ]
+  },
+  "no_program_states": {
+    "count": 9,
+    "states": [
+      { "code": "ID", "name": "Idaho" },
+      { "code": "WY", "name": "Wyoming" },
+      { "code": "KS", "name": "Kansas" }
+    ]
+  }
+}
+```
+
+### GET /state/recreational
+
+Get list of recreational state codes.
+
+**Request:**
+```bash
+GET /api/analytics/v2/state/recreational
+```
+
+**Response:**
+```json
+{
+  "legal_type": "recreational",
+  "states": ["AK", "AZ", "CA", "CO", "CT", "DE", "IL", "MA", "MD", "ME", "MI", "MN", "MO", "MT", "NJ", "NM", "NV", "NY", "OH", "OR", "RI", "VA", "VT", "WA"],
+  "count": 24
+}
+```
+
+### GET /state/medical-only
+
+Get list of medical-only state codes (not recreational).
+
+**Request:**
+```bash
+GET /api/analytics/v2/state/medical-only
+```
+
+**Response:**
+```json
+{
+  "legal_type": "medical_only",
+  "states": ["AR", "FL", "HI", "LA", "MS", "ND", "NH", "OK", "PA", "SD", "UT", "WV"],
+  "count": 12
+}
+```
+
+### GET /state/rec-vs-med-pricing
+
+Get rec vs med price comparison by category.
+
+**Request:**
+```bash
+GET /api/analytics/v2/state/rec-vs-med-pricing?category=Flower
+```
+
+**Response:**
+```json
+[
+  {
+    "category": "Flower",
+    "recreational": {
+      "state_count": 15,
+      "product_count": 12500,
+      "avg_price": 35.50,
+      "median_price": 32.00
+    },
+    "medical_only": {
+      "state_count": 8,
+      "product_count": 5200,
+      "avg_price": 42.00,
+      "median_price": 40.00
+    },
+    "price_diff_percent": -15.48
+  }
+]
+```
+
+---
+
+## How These Endpoints Support Portals
+
+### Brand Portal Use Cases
+
+1. **Track brand penetration**: Use `/brand/:name/penetration` to see how many stores carry the brand
+2. **Compare rec vs med markets**: Use `/brand/:name/rec-vs-med` to understand footprint by legal status
+3. **Identify expansion opportunities**: Use `/state/coverage-gaps` to find underserved markets
+4. **Monitor pricing**: Use `/price/brand/:brand` to track pricing by state
+
+### Buyer Portal Use Cases
+
+1. **Compare stores**: Use `/store/:id/summary` to see activity levels
+2. **Track price changes**: Use `/store/:id/events` to monitor competitor pricing
+3. **Analyze categories**: Use `/category/:name/growth` to identify trending products
+4. **State-level insights**: Use `/state/:code/summary` for market overview
+
+---
+
+## Time Window Filtering
+
+All time-based endpoints support the `window` query parameter:
+
+| Value | Description |
+|-------|-------------|
+| `7d` | Last 7 days |
+| `30d` | Last 30 days (default) |
+| `90d` | Last 90 days |
+
+The window affects:
+- `store_product_snapshots.captured_at` for historical data
+- `store_products.first_seen_at` / `last_seen_at` for product lifecycle
+- `crawl_runs.started_at` for crawl-based metrics
+
+---
+
+## Rec/Med Segmentation
+
+All state-level endpoints automatically segment by:
+
+- **Recreational**: `states.recreational_legal = TRUE`
+- **Medical-only**: `states.medical_legal = TRUE AND states.recreational_legal = FALSE`
+- **No program**: Both flags are FALSE or NULL
+
+This segmentation appears in:
+- `legal_type` field in responses
+- State breakdown arrays
+- Price comparison endpoints
--- a/backend/docs/_archive/BRAND_INTELLIGENCE_API.md
+++ b/backend/docs/_archive/BRAND_INTELLIGENCE_API.md
@@ -0,0 +1,394 @@
+# Brand Intelligence API
+
+## Endpoint
+
+```
+GET /api/analytics/v2/brand/:name/intelligence
+```
+
+## Query Parameters
+
+| Param | Type | Default | Description |
+|-------|------|---------|-------------|
+| `window` | `7d\|30d\|90d` | `30d` | Time window for trend calculations |
+| `state` | string | - | Filter by state code (e.g., `AZ`) |
+| `category` | string | - | Filter by category (e.g., `Flower`) |
+
+## Response Payload Schema
+
+```typescript
+interface BrandIntelligenceResult {
+  brand_name: string;
+  window: '7d' | '30d' | '90d';
+  generated_at: string;  // ISO timestamp when data was computed
+
+  performance_snapshot: PerformanceSnapshot;
+  alerts: Alerts;
+  sku_performance: SkuPerformance[];
+  retail_footprint: RetailFootprint;
+  competitive_landscape: CompetitiveLandscape;
+  inventory_health: InventoryHealth;
+  promo_performance: PromoPerformance;
+}
+```
+
+---
+
+## Section 1: Performance Snapshot
+
+Summary cards with key brand metrics.
+
+```typescript
+interface PerformanceSnapshot {
+  active_skus: number;              // Total products in catalog
+  total_revenue_30d: number | null; // Estimated from qty × price
+  total_stores: number;             // Active retail partners
+  new_stores_30d: number;           // New distribution in window
+  market_share: number | null;      // % of category SKUs
+  avg_wholesale_price: number | null;
+  price_position: 'premium' | 'value' | 'competitive';
+}
+```
+
+**UI Label Mapping:**
+| Field | User-Facing Label | Helper Text |
+|-------|-------------------|-------------|
+| `active_skus` | Active Products | X total in catalog |
+| `total_revenue_30d` | Monthly Revenue | Estimated from sales |
+| `total_stores` | Retail Distribution | Active retail partners |
+| `new_stores_30d` | New Opportunities | X new in last 30 days |
+| `market_share` | Category Position | % of category |
+| `avg_wholesale_price` | Avg Wholesale | Per unit |
+| `price_position` | Pricing Tier | Premium/Value/Market Rate |
+
+---
+
+## Section 2: Alerts
+
+Issues requiring attention.
+
+```typescript
+interface Alerts {
+  lost_stores_30d_count: number;
+  lost_skus_30d_count: number;
+  competitor_takeover_count: number;
+  avg_oos_duration_days: number | null;
+  avg_reorder_lag_days: number | null;
+  items: AlertItem[];
+}
+
+interface AlertItem {
+  type: 'lost_store' | 'delisted_sku' | 'shelf_loss' | 'extended_oos';
+  severity: 'critical' | 'warning';
+  store_name?: string;
+  product_name?: string;
+  competitor_brand?: string;
+  days_since?: number;
+  state_code?: string;
+}
+```
+
+**UI Label Mapping:**
+| Field | User-Facing Label |
+|-------|-------------------|
+| `lost_stores_30d_count` | Accounts at Risk |
+| `lost_skus_30d_count` | Delisted SKUs |
+| `competitor_takeover_count` | Shelf Losses |
+| `avg_oos_duration_days` | Avg Stockout Length |
+| `avg_reorder_lag_days` | Avg Restock Time |
+| `severity: critical` | Urgent |
+| `severity: warning` | Watch |
+
+---
+
+## Section 3: SKU Performance (Product Velocity)
+
+How fast each SKU sells.
+
+```typescript
+interface SkuPerformance {
+  store_product_id: number;
+  product_name: string;
+  category: string | null;
+  daily_velocity: number;        // Units/day estimate
+  velocity_status: 'hot' | 'steady' | 'slow' | 'stale';
+  retail_price: number | null;
+  on_sale: boolean;
+  stores_carrying: number;
+  stock_status: 'in_stock' | 'low_stock' | 'out_of_stock';
+}
+```
+
+**UI Label Mapping:**
+| Field | User-Facing Label |
+|-------|-------------------|
+| `daily_velocity` | Daily Rate |
+| `velocity_status` | Momentum |
+| `velocity_status: hot` | Hot |
+| `velocity_status: steady` | Steady |
+| `velocity_status: slow` | Slow |
+| `velocity_status: stale` | Stale |
+| `retail_price` | Retail Price |
+| `on_sale` | Promo (badge) |
+
+**Velocity Thresholds:**
+- `hot`: >= 5 units/day
+- `steady`: >= 1 unit/day
+- `slow`: >= 0.1 units/day
+- `stale`: < 0.1 units/day
+
+---
+
+## Section 4: Retail Footprint
+
+Store placement and coverage.
+
+```typescript
+interface RetailFootprint {
+  total_stores: number;
+  in_stock_count: number;
+  out_of_stock_count: number;
+  penetration_by_region: RegionPenetration[];
+  whitespace_stores: WhitespaceStore[];
+}
+
+interface RegionPenetration {
+  state_code: string;
+  store_count: number;
+  percent_reached: number;    // % of state's dispensaries
+  in_stock: number;
+  out_of_stock: number;
+}
+
+interface WhitespaceStore {
+  store_id: number;
+  store_name: string;
+  state_code: string;
+  city: string | null;
+  category_fit: number;       // How many competing brands they carry
+  competitor_brands: string[];
+}
+```
+
+**UI Label Mapping:**
+| Field | User-Facing Label |
+|-------|-------------------|
+| `penetration_by_region` | Market Coverage by Region |
+| `percent_reached` | X% reached |
+| `in_stock` | X stocked |
+| `out_of_stock` | X out |
+| `whitespace_stores` | Expansion Opportunities |
+| `category_fit` | X fit |
+
+---
+
+## Section 5: Competitive Landscape
+
+Market positioning vs competitors.
+
+```typescript
+interface CompetitiveLandscape {
+  brand_price_position: 'premium' | 'value' | 'competitive';
+  market_share_trend: MarketSharePoint[];
+  competitors: Competitor[];
+  head_to_head_skus: HeadToHead[];
+}
+
+interface MarketSharePoint {
+  date: string;
+  share_percent: number;
+}
+
+interface Competitor {
+  brand_name: string;
+  store_overlap_percent: number;
+  price_position: 'premium' | 'value' | 'competitive';
+  avg_price: number | null;
+  sku_count: number;
+}
+
+interface HeadToHead {
+  product_name: string;
+  brand_price: number;
+  competitor_brand: string;
+  competitor_price: number;
+  price_diff_percent: number;
+}
+```
+
+**UI Label Mapping:**
+| Field | User-Facing Label |
+|-------|-------------------|
+| `price_position: premium` | Premium Tier |
+| `price_position: value` | Value Leader |
+| `price_position: competitive` | Market Rate |
+| `market_share_trend` | Share of Shelf Trend |
+| `head_to_head_skus` | Price Comparison |
+| `store_overlap_percent` | X% store overlap |
+
+---
+
+## Section 6: Inventory Health
+
+Stock projections and risk levels.
+
+```typescript
+interface InventoryHealth {
+  critical_count: number;      // <7 days stock
+  warning_count: number;       // 7-14 days stock
+  healthy_count: number;       // 14-90 days stock
+  overstocked_count: number;   // >90 days stock
+  skus: InventorySku[];
+  overstock_alert: OverstockItem[];
+}
+
+interface InventorySku {
+  store_product_id: number;
+  product_name: string;
+  store_name: string;
+  days_of_stock: number | null;
+  risk_level: 'critical' | 'elevated' | 'moderate' | 'healthy';
+  current_quantity: number | null;
+  daily_sell_rate: number | null;
+}
+
+interface OverstockItem {
+  product_name: string;
+  store_name: string;
+  excess_units: number;
+  days_of_stock: number;
+}
+```
+
+**UI Label Mapping:**
+| Field | User-Facing Label |
+|-------|-------------------|
+| `risk_level: critical` | Reorder Now |
+| `risk_level: elevated` | Low Stock |
+| `risk_level: moderate` | Monitor |
+| `risk_level: healthy` | Healthy |
+| `critical_count` | Urgent (<7 days) |
+| `warning_count` | Low (7-14 days) |
+| `overstocked_count` | Excess (>90 days) |
+| `days_of_stock` | X days remaining |
+| `overstock_alert` | Overstock Alert |
+| `excess_units` | X excess units |
+
+---
+
+## Section 7: Promotion Effectiveness
+
+How promotions impact sales.
+
+```typescript
+interface PromoPerformance {
+  avg_baseline_velocity: number | null;
+  avg_promo_velocity: number | null;
+  avg_velocity_lift: number | null;     // % increase during promo
+  avg_efficiency_score: number | null;  // ROI proxy
+  promotions: Promotion[];
+}
+
+interface Promotion {
+  product_name: string;
+  store_name: string;
+  status: 'active' | 'scheduled' | 'ended';
+  start_date: string;
+  end_date: string | null;
+  regular_price: number;
+  promo_price: number;
+  discount_percent: number;
+  baseline_velocity: number | null;
+  promo_velocity: number | null;
+  velocity_lift: number | null;
+  efficiency_score: number | null;
+}
+```
+
+**UI Label Mapping:**
+| Field | User-Facing Label |
+|-------|-------------------|
+| `avg_baseline_velocity` | Normal Rate |
+| `avg_promo_velocity` | During Promos |
+| `avg_velocity_lift` | Avg Sales Lift |
+| `avg_efficiency_score` | ROI Score |
+| `velocity_lift` | Sales Lift |
+| `efficiency_score` | ROI Score |
+| `status: active` | Live |
+| `status: scheduled` | Scheduled |
+| `status: ended` | Ended |
+
+---
+
+## Example Queries
+
+### Get full payload
+```javascript
+const response = await fetch('/api/analytics/v2/brand/Wyld/intelligence?window=30d');
+const data = await response.json();
+```
+
+### Extract summary cards (flattened)
+```javascript
+const { performance_snapshot: ps, alerts } = data;
+
+const summaryCards = {
+  activeProducts: ps.active_skus,
+  monthlyRevenue: ps.total_revenue_30d,
+  retailDistribution: ps.total_stores,
+  newOpportunities: ps.new_stores_30d,
+  categoryPosition: ps.market_share,
+  avgWholesale: ps.avg_wholesale_price,
+  pricingTier: ps.price_position,
+  accountsAtRisk: alerts.lost_stores_30d_count,
+  delistedSkus: alerts.lost_skus_30d_count,
+  shelfLosses: alerts.competitor_takeover_count,
+};
+```
+
+### Get top 10 fastest selling SKUs
+```javascript
+const topSkus = data.sku_performance
+  .filter(sku => sku.velocity_status === 'hot' || sku.velocity_status === 'steady')
+  .sort((a, b) => b.daily_velocity - a.daily_velocity)
+  .slice(0, 10);
+```
+
+### Get critical inventory alerts only
+```javascript
+const criticalInventory = data.inventory_health.skus
+  .filter(sku => sku.risk_level === 'critical');
+```
+
+### Get states with <50% penetration
+```javascript
+const underPenetrated = data.retail_footprint.penetration_by_region
+  .filter(region => region.percent_reached < 50)
+  .sort((a, b) => a.percent_reached - b.percent_reached);
+```
+
+### Get active promotions with positive lift
+```javascript
+const effectivePromos = data.promo_performance.promotions
+  .filter(p => p.status === 'active' && p.velocity_lift > 0)
+  .sort((a, b) => b.velocity_lift - a.velocity_lift);
+```
+
+### Build chart data for market share trend
+```javascript
+const chartData = data.competitive_landscape.market_share_trend.map(point => ({
+  x: new Date(point.date),
+  y: point.share_percent,
+}));
+```
+
+---
+
+## Notes for Frontend Implementation
+
+1. **All fields are snake_case** - transform to camelCase if needed
+2. **Null values are possible** - handle gracefully in UI
+3. **Arrays may be empty** - show appropriate empty states
+4. **Timestamps are ISO format** - parse with `new Date()`
+5. **Percentages are already computed** - no need to multiply by 100
+6. **The `window` parameter affects trend calculations** - 7d/30d/90d
--- a/backend/docs/_archive/CRAWL_PIPELINE.md
+++ b/backend/docs/_archive/CRAWL_PIPELINE.md
@@ -0,0 +1,539 @@
+# Crawl Pipeline Documentation
+
+## Overview
+
+The crawl pipeline fetches product data from Dutchie dispensary menus and stores it in the canonical database. This document covers the complete flow from task scheduling to data storage.
+
+---
+
+## Pipeline Stages
+
+```
+┌─────────────────────┐
+│  store_discovery    │  Find new dispensaries
+└─────────┬───────────┘
+          │
+          ▼
+┌─────────────────────┐
+│ entry_point_discovery│  Resolve slug → platform_dispensary_id
+└─────────┬───────────┘
+          │
+          ▼
+┌─────────────────────┐
+│  product_discovery  │  Initial product crawl
+└─────────┬───────────┘
+          │
+          ▼
+┌─────────────────────┐
+│   product_resync    │  Recurring crawl (every 4 hours)
+└─────────────────────┘
+```
+
+---
+
+## Stage Details
+
+### 1. Store Discovery
+**Purpose:** Find new dispensaries to crawl
+
+**Handler:** `src/tasks/handlers/store-discovery.ts`
+
+**Flow:**
+1. Query Dutchie `ConsumerDispensaries` GraphQL for cities/states
+2. Extract dispensary info (name, address, menu_url)
+3. Insert into `dutchie_discovery_locations`
+4. Queue `entry_point_discovery` for each new location
+
+---
+
+### 2. Entry Point Discovery
+**Purpose:** Resolve menu URL slug to platform_dispensary_id (MongoDB ObjectId)
+
+**Handler:** `src/tasks/handlers/entry-point-discovery.ts`
+
+**Flow:**
+1. Load dispensary from database
+2. Extract slug from `menu_url`:
+   - `/embedded-menu/<slug>` or `/dispensary/<slug>`
+3. Start stealth session (fingerprint + proxy)
+4. Query `resolveDispensaryIdWithDetails(slug)` via GraphQL
+5. Update dispensary with `platform_dispensary_id`
+6. Queue `product_discovery` task
+
+**Example:**
+```
+menu_url: https://dutchie.com/embedded-menu/deeply-rooted
+slug: deeply-rooted
+platform_dispensary_id: 6405ef617056e8014d79101b
+```
+
+---
+
+### 3. Product Discovery
+**Purpose:** Initial crawl of a new dispensary
+
+**Handler:** `src/tasks/handlers/product-discovery.ts`
+
+Same as product_resync but for first-time crawls.
+
+---
+
+### 4. Product Resync
+**Purpose:** Recurring crawl to capture price/stock changes
+
+**Handler:** `src/tasks/handlers/product-resync.ts`
+
+**Flow:**
+
+#### Step 1: Load Dispensary Info
+```sql
+SELECT id, name, platform_dispensary_id, menu_url, state
+FROM dispensaries
+WHERE id = $1 AND crawl_enabled = true
+```
+
+#### Step 2: Start Stealth Session
+- Generate random browser fingerprint
+- Set locale/timezone matching state
+- Optional proxy rotation
+
+#### Step 3: Fetch Products via GraphQL
+**Endpoint:** `https://dutchie.com/api-3/graphql`
+
+**Variables:**
+```javascript
+{
+  includeEnterpriseSpecials: false,
+  productsFilter: {
+    dispensaryId: "<platform_dispensary_id>",
+    pricingType: "rec",
+    Status: "All",
+    types: [],
+    useCache: false,
+    isDefaultSort: true,
+    sortBy: "popularSortIdx",
+    sortDirection: 1,
+    bypassOnlineThresholds: true,
+    isKioskMenu: false,
+    removeProductsBelowOptionThresholds: false
+  },
+  page: 0,
+  perPage: 100
+}
+```
+
+**Key Notes:**
+- `Status: "All"` returns all products (Active returns same count)
+- `Status: null` returns 0 products (broken)
+- `pricingType: "rec"` returns BOTH rec and med prices
+- Paginate until `products.length < perPage` or `allProducts.length >= totalCount`
+
+#### Step 4: Normalize Data
+Transform raw Dutchie payload to canonical format via `DutchieNormalizer`.
+
+#### Step 5: Upsert Products
+Insert/update `store_products` table with normalized data.
+
+#### Step 6: Create Snapshots
+Insert point-in-time record to `store_product_snapshots`.
+
+#### Step 7: Track Missing Products (OOS Detection)
+```sql
+-- Reset consecutive_misses for products IN the feed
+UPDATE store_products
+SET consecutive_misses = 0, last_seen_at = NOW()
+WHERE dispensary_id = $1
+  AND provider = 'dutchie'
+  AND provider_product_id = ANY($2)
+
+-- Increment for products NOT in feed
+UPDATE store_products
+SET consecutive_misses = consecutive_misses + 1
+WHERE dispensary_id = $1
+  AND provider = 'dutchie'
+  AND provider_product_id NOT IN (...)
+  AND consecutive_misses < 3
+
+-- Mark OOS at 3 consecutive misses
+UPDATE store_products
+SET stock_status = 'oos', is_in_stock = false
+WHERE dispensary_id = $1
+  AND consecutive_misses >= 3
+  AND stock_status != 'oos'
+```
+
+#### Step 8: Download Images
+For new products, download and store images locally.
+
+#### Step 9: Update Dispensary
+```sql
+UPDATE dispensaries SET last_crawl_at = NOW() WHERE id = $1
+```
+
+---
+
+## GraphQL Payload Structure
+
+### Product Fields (from filteredProducts.products[])
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `_id` / `id` | string | MongoDB ObjectId (24 hex chars) |
+| `Name` | string | Product display name |
+| `brandName` | string | Brand name |
+| `brand.name` | string | Brand name (nested) |
+| `brand.description` | string | Brand description |
+| `type` | string | Category (Flower, Edible, Concentrate, etc.) |
+| `subcategory` | string | Subcategory |
+| `strainType` | string | Hybrid, Indica, Sativa, N/A |
+| `Status` | string | Always "Active" in feed |
+| `Image` | string | Primary image URL |
+| `images[]` | array | All product images |
+
+### Pricing Fields
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `Prices[]` | number[] | Rec prices per option |
+| `recPrices[]` | number[] | Rec prices |
+| `medicalPrices[]` | number[] | Medical prices |
+| `recSpecialPrices[]` | number[] | Rec sale prices |
+| `medicalSpecialPrices[]` | number[] | Medical sale prices |
+| `Options[]` | string[] | Size options ("1/8oz", "1g", etc.) |
+| `rawOptions[]` | string[] | Raw weight options ("3.5g") |
+
+### Inventory Fields (POSMetaData.children[])
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `quantity` | number | Total inventory count |
+| `quantityAvailable` | number | Available for online orders |
+| `kioskQuantityAvailable` | number | Available for kiosk orders |
+| `option` | string | Which size option this is for |
+
+### Potency Fields
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `THCContent.range[]` | number[] | THC percentage |
+| `CBDContent.range[]` | number[] | CBD percentage |
+| `cannabinoidsV2[]` | array | Detailed cannabinoid breakdown |
+
+### Specials (specialData.bogoSpecials[])
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `specialName` | string | Deal name |
+| `specialType` | string | "bogo", "sale", etc. |
+| `itemsForAPrice.value` | string | Bundle price |
+| `bogoRewards[].totalQuantity.quantity` | number | Required quantity |
+
+---
+
+## OOS Detection Logic
+
+Products disappear from the Dutchie feed when they go out of stock. We track this via `consecutive_misses`:
+
+| Scenario | Action |
+|----------|--------|
+| Product in feed | `consecutive_misses = 0` |
+| Product missing 1st time | `consecutive_misses = 1` |
+| Product missing 2nd time | `consecutive_misses = 2` |
+| Product missing 3rd time | `consecutive_misses = 3`, mark `stock_status = 'oos'` |
+| Product returns to feed | `consecutive_misses = 0`, update stock_status |
+
+**Why 3 misses?**
+- Protects against false positives from crawl failures
+- Single bad crawl doesn't trigger mass OOS alerts
+- Balances detection speed vs accuracy
+
+---
+
+## Database Tables
+
+### store_products
+Current state of each product:
+- `provider_product_id` - Dutchie's MongoDB ObjectId
+- `name_raw`, `brand_name_raw` - Raw values from feed
+- `price_rec`, `price_med` - Current prices
+- `is_in_stock`, `stock_status` - Availability
+- `consecutive_misses` - OOS detection counter
+- `last_seen_at` - Last time product was in feed
+
+### store_product_snapshots
+Point-in-time records for historical analysis:
+- One row per product per crawl
+- Captures price, stock, potency at that moment
+- Used for price history, analytics
+
+### dispensaries
+Store metadata:
+- `platform_dispensary_id` - MongoDB ObjectId for GraphQL
+- `menu_url` - Source URL
+- `last_crawl_at` - Last successful crawl
+- `crawl_enabled` - Whether to crawl
+
+---
+
+## Worker Roles
+
+Workers pull tasks from the `worker_tasks` queue based on their assigned role.
+
+| Role | Name | Description | Handler |
+|------|------|-------------|---------|
+| `product_resync` | Product Resync | Re-crawl dispensary products for price/stock changes | `handleProductResync` |
+| `product_discovery` | Product Discovery | Initial product discovery for new dispensaries | `handleProductDiscovery` |
+| `store_discovery` | Store Discovery | Discover new dispensary locations | `handleStoreDiscovery` |
+| `entry_point_discovery` | Entry Point Discovery | Resolve platform IDs from menu URLs | `handleEntryPointDiscovery` |
+| `analytics_refresh` | Analytics Refresh | Refresh materialized views and analytics | `handleAnalyticsRefresh` |
+
+**API Endpoint:** `GET /api/worker-registry/roles`
+
+---
+
+## Scheduling
+
+Crawls are scheduled via `worker_tasks` table:
+
+| Role | Frequency | Description |
+|------|-----------|-------------|
+| `product_resync` | Every 4 hours | Regular product refresh |
+| `product_discovery` | On-demand | First crawl for new stores |
+| `entry_point_discovery` | On-demand | New store setup |
+| `store_discovery` | Daily | Find new stores |
+| `analytics_refresh` | Daily | Refresh analytics materialized views |
+
+---
+
+## Priority & On-Demand Tasks
+
+Tasks are claimed by workers in order of **priority DESC, created_at ASC**.
+
+### Priority Levels
+
+| Priority | Use Case | Example |
+|----------|----------|---------|
+| 0 | Scheduled/batch tasks | Daily product_resync generation |
+| 10 | On-demand/chained tasks | entry_point → product_discovery |
+| Higher | Urgent/manual triggers | Admin-triggered immediate crawl |
+
+### Task Chaining
+
+When a task completes, the system automatically creates follow-up tasks:
+
+```
+store_discovery (completed)
+    └─► entry_point_discovery (priority: 10) for each new store
+
+entry_point_discovery (completed, success)
+    └─► product_discovery (priority: 10) for that store
+
+product_discovery (completed)
+    └─► [no chain] Store enters regular resync schedule
+```
+
+### On-Demand Task Creation
+
+Use the task service to create high-priority tasks:
+
+```typescript
+// Create immediate product resync for a store
+await taskService.createTask({
+  role: 'product_resync',
+  dispensary_id: 123,
+  platform: 'dutchie',
+  priority: 20, // Higher than batch tasks
+});
+
+// Convenience methods with default high priority (10)
+await taskService.createEntryPointTask(dispensaryId, 'dutchie');
+await taskService.createProductDiscoveryTask(dispensaryId, 'dutchie');
+await taskService.createStoreDiscoveryTask('dutchie', 'AZ');
+```
+
+### Claim Function
+
+The `claim_task()` SQL function atomically claims tasks:
+- Respects priority ordering (higher = first)
+- Uses `FOR UPDATE SKIP LOCKED` for concurrency
+- Prevents multiple active tasks per store
+
+---
+
+## Image Storage
+
+Images are downloaded from Dutchie's AWS S3 and stored locally with on-demand resizing.
+
+### Storage Path
+```
+/storage/images/products/<state>/<store>/<brand>/<product_id>/image-<hash>.webp
+/storage/images/brands/<brand>/logo-<hash>.webp
+```
+
+**Example:**
+```
+/storage/images/products/az/az-deeply-rooted/bud-bros/6913e3cd444eac3935e928b9/image-ae38b1f9.webp
+```
+
+### Image Proxy API
+Served via `/img/*` with on-demand resizing using **sharp**:
+
+```
+GET /img/products/az/az-deeply-rooted/bud-bros/6913e3cd444eac3935e928b9/image-ae38b1f9.webp?w=200
+```
+
+| Param | Description |
+|-------|-------------|
+| `w` | Width in pixels (max 4000) |
+| `h` | Height in pixels (max 4000) |
+| `q` | Quality 1-100 (default 80) |
+| `fit` | cover, contain, fill, inside, outside |
+| `blur` | Blur sigma (0.3-1000) |
+| `gray` | Grayscale (1 = enabled) |
+| `format` | webp, jpeg, png, avif (default webp) |
+
+### Key Files
+| File | Purpose |
+|------|---------|
+| `src/utils/image-storage.ts` | Download & save images to local filesystem |
+| `src/routes/image-proxy.ts` | On-demand resize/transform at `/img/*` |
+
+### Download Rules
+
+| Scenario | Image Action |
+|----------|--------------|
+| **New product (first crawl)** | Download if `primaryImageUrl` exists |
+| **Existing product (refresh)** | Download only if `local_image_path` is NULL (backfill) |
+| **Product already has local image** | Skip download entirely |
+
+**Logic:**
+- Images are downloaded **once** and never re-downloaded on subsequent crawls
+- `skipIfExists: true` - filesystem check prevents re-download even if queued
+- First crawl: all products get images
+- Refresh crawl: only new products or products missing local images
+
+### Storage Rules
+- **NO MinIO** - local filesystem only (`STORAGE_DRIVER=local`)
+- Store full resolution, resize on-demand via `/img` proxy
+- Convert to webp for consistency using **sharp**
+- Preserve original Dutchie URL as fallback in `image_url` column
+- Local path stored in `local_image_path` column
+
+---
+
+## Stealth & Anti-Detection
+
+**PROXIES ARE REQUIRED** - Workers will fail to start if no active proxies are available in the database. All HTTP requests to Dutchie go through a proxy.
+
+Workers automatically initialize anti-detection systems on startup.
+
+### Components
+
+| Component | Purpose | Source |
+|-----------|---------|--------|
+| **CrawlRotator** | Coordinates proxy + UA rotation | `src/services/crawl-rotator.ts` |
+| **ProxyRotator** | Round-robin proxy selection, health tracking | `src/services/crawl-rotator.ts` |
+| **UserAgentRotator** | Cycles through realistic browser fingerprints | `src/services/crawl-rotator.ts` |
+| **Dutchie Client** | Curl-based HTTP with auto-retry on 403 | `src/platforms/dutchie/client.ts` |
+
+### Initialization Flow
+
+```
+Worker Start
+    │
+    ├─► initializeStealth()
+    │       │
+    │       ├─► CrawlRotator.initialize()
+    │       │       └─► Load proxies from `proxies` table
+    │       │
+    │       └─► setCrawlRotator(rotator)
+    │               └─► Wire to Dutchie client
+    │
+    └─► Process tasks...
+```
+
+### Stealth Session (per task)
+
+Each crawl task starts a stealth session:
+
+```typescript
+// In product-refresh.ts, entry-point-discovery.ts
+const session = startSession(dispensary.state || 'AZ', 'America/Phoenix');
+```
+
+This creates a new identity with:
+- **Random fingerprint:** Chrome/Firefox/Safari/Edge on Win/Mac/Linux
+- **Accept-Language:** Matches timezone (e.g., `America/Phoenix` → `en-US,en;q=0.9`)
+- **sec-ch-ua headers:** Proper Client Hints for the browser profile
+
+### On 403 Block
+
+When Dutchie returns 403, the client automatically:
+
+1. Records failure on current proxy (increments `failure_count`)
+2. If proxy has 5+ failures, deactivates it
+3. Rotates to next healthy proxy
+4. Rotates fingerprint
+5. Retries the request
+
+### Proxy Table Schema
+
+```sql
+CREATE TABLE proxies (
+  id SERIAL PRIMARY KEY,
+  host VARCHAR(255) NOT NULL,
+  port INTEGER NOT NULL,
+  username VARCHAR(100),
+  password VARCHAR(100),
+  protocol VARCHAR(10) DEFAULT 'http',  -- http, https, socks5
+  is_active BOOLEAN DEFAULT true,
+  last_used_at TIMESTAMPTZ,
+  failure_count INTEGER DEFAULT 0,
+  success_count INTEGER DEFAULT 0,
+  avg_response_time_ms INTEGER,
+  last_failure_at TIMESTAMPTZ,
+  last_error TEXT
+);
+```
+
+### Configuration
+
+Proxies are mandatory. There is no environment variable to disable them. Workers will refuse to start without active proxies in the database.
+
+### User-Agent Generation
+
+See `workflow-12102025.md` for full specification.
+
+**Summary:**
+- Uses `intoli/user-agents` library (daily-updated market share data)
+- Device distribution: Mobile 62%, Desktop 36%, Tablet 2%
+- Browser whitelist: Chrome, Safari, Edge, Firefox only
+- UA sticks until IP rotates (403 or manual rotation)
+- Failure = alert admin + stop crawl (no fallback)
+
+Each fingerprint includes proper `sec-ch-ua`, `sec-ch-ua-platform`, and `sec-ch-ua-mobile` headers.
+
+---
+
+## Error Handling
+
+- **GraphQL errors:** Logged, task marked failed, retried later
+- **Normalization errors:** Logged as warnings, continue with valid products
+- **Image download errors:** Non-fatal, logged, continue
+- **Database errors:** Task fails, will be retried
+- **403 blocks:** Auto-rotate proxy + fingerprint, retry (up to 3 retries)
+
+---
+
+## Files
+
+| File | Purpose |
+|------|---------|
+| `src/tasks/handlers/product-resync.ts` | Main crawl handler |
+| `src/tasks/handlers/entry-point-discovery.ts` | Slug → ID resolution |
+| `src/platforms/dutchie/index.ts` | GraphQL client, session management |
+| `src/hydration/normalizers/dutchie.ts` | Payload normalization |
+| `src/hydration/canonical-upsert.ts` | Database upsert logic |
+| `src/utils/image-storage.ts` | Image download and local storage |
+| `src/routes/image-proxy.ts` | On-demand image resizing |
+| `migrations/075_consecutive_misses.sql` | OOS tracking column |
--- a/backend/docs/_archive/ORGANIC_SCRAPING_GUIDE.md
+++ b/backend/docs/_archive/ORGANIC_SCRAPING_GUIDE.md
@@ -0,0 +1,297 @@
+# Organic Browser-Based Scraping Guide
+
+**Last Updated:** 2025-12-12
+**Status:** Production-ready proof of concept
+
+---
+
+## Overview
+
+This document describes the "organic" browser-based approach to scraping Dutchie dispensary menus. Unlike direct curl/axios requests, this method uses a real browser session to make API calls, making requests appear natural and reducing detection risk.
+
+---
+
+## Why Organic Scraping?
+
+| Approach | Detection Risk | Speed | Complexity |
+|----------|---------------|-------|------------|
+| Direct curl | Higher | Fast | Low |
+| curl-impersonate | Medium | Fast | Medium |
+| **Browser-based (organic)** | **Lowest** | Slower | Higher |
+
+Direct curl requests can be fingerprinted via:
+- TLS fingerprint (cipher suites, extensions)
+- Header order and values
+- Missing cookies/session data
+- Request patterns
+
+Browser-based requests inherit:
+- Real Chrome TLS fingerprint
+- Session cookies from page visit
+- Natural header order
+- JavaScript execution environment
+
+---
+
+## Implementation
+
+### Dependencies
+
+```bash
+npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth
+```
+
+### Core Script: `test-intercept.js`
+
+Located at: `backend/test-intercept.js`
+
+```javascript
+const puppeteer = require('puppeteer-extra');
+const StealthPlugin = require('puppeteer-extra-plugin-stealth');
+const fs = require('fs');
+
+puppeteer.use(StealthPlugin());
+
+async function capturePayload(config) {
+  const { dispensaryId, platformId, cName, outputPath } = config;
+
+  const browser = await puppeteer.launch({
+    headless: 'new',
+    args: ['--no-sandbox', '--disable-setuid-sandbox']
+  });
+
+  const page = await browser.newPage();
+
+  // STEP 1: Establish session by visiting the menu
+  const embedUrl = `https://dutchie.com/embedded-menu/${cName}?menuType=rec`;
+  await page.goto(embedUrl, { waitUntil: 'networkidle2', timeout: 60000 });
+
+  // STEP 2: Fetch ALL products using GraphQL from browser context
+  const result = await page.evaluate(async (platformId) => {
+    const allProducts = [];
+    let pageNum = 0;
+    const perPage = 100;
+    let totalCount = 0;
+    const sessionId = 'browser-session-' + Date.now();
+
+    while (pageNum < 30) {
+      const variables = {
+        includeEnterpriseSpecials: false,
+        productsFilter: {
+          dispensaryId: platformId,
+          pricingType: 'rec',
+          Status: 'Active',  // CRITICAL: Must be 'Active', not null
+          types: [],
+          useCache: true,
+          isDefaultSort: true,
+          sortBy: 'popularSortIdx',
+          sortDirection: 1,
+          bypassOnlineThresholds: true,
+          isKioskMenu: false,
+          removeProductsBelowOptionThresholds: false,
+        },
+        page: pageNum,
+        perPage: perPage,
+      };
+
+      const extensions = {
+        persistedQuery: {
+          version: 1,
+          sha256Hash: 'ee29c060826dc41c527e470e9ae502c9b2c169720faa0a9f5d25e1b9a530a4a0'
+        }
+      };
+
+      const qs = new URLSearchParams({
+        operationName: 'FilteredProducts',
+        variables: JSON.stringify(variables),
+        extensions: JSON.stringify(extensions)
+      });
+
+      const response = await fetch(`https://dutchie.com/api-3/graphql?${qs}`, {
+        method: 'GET',
+        headers: {
+          'Accept': 'application/json',
+          'content-type': 'application/json',
+          'x-dutchie-session': sessionId,
+          'apollographql-client-name': 'Marketplace (production)',
+        },
+        credentials: 'include'
+      });
+
+      const json = await response.json();
+      const data = json?.data?.filteredProducts;
+      if (!data?.products) break;
+
+      allProducts.push(...data.products);
+      if (pageNum === 0) totalCount = data.queryInfo?.totalCount || 0;
+      if (allProducts.length >= totalCount) break;
+
+      pageNum++;
+      await new Promise(r => setTimeout(r, 200)); // Polite delay
+    }
+
+    return { products: allProducts, totalCount };
+  }, platformId);
+
+  await browser.close();
+
+  // STEP 3: Save payload
+  const payload = {
+    dispensaryId,
+    platformId,
+    cName,
+    fetchedAt: new Date().toISOString(),
+    productCount: result.products.length,
+    products: result.products,
+  };
+
+  fs.writeFileSync(outputPath, JSON.stringify(payload, null, 2));
+  return payload;
+}
+```
+
+---
+
+## Critical Parameters
+
+### GraphQL Hash (FilteredProducts)
+
+```
+ee29c060826dc41c527e470e9ae502c9b2c169720faa0a9f5d25e1b9a530a4a0
+```
+
+**WARNING:** Using the wrong hash returns HTTP 400.
+
+### Status Parameter
+
+| Value | Result |
+|-------|--------|
+| `'Active'` | Returns in-stock products (1019 in test) |
+| `null` | Returns 0 products |
+| `'All'` | Returns HTTP 400 |
+
+**ALWAYS use `Status: 'Active'`**
+
+### Required Headers
+
+```javascript
+{
+  'Accept': 'application/json',
+  'content-type': 'application/json',
+  'x-dutchie-session': 'unique-session-id',
+  'apollographql-client-name': 'Marketplace (production)',
+}
+```
+
+### Endpoint
+
+```
+https://dutchie.com/api-3/graphql
+```
+
+---
+
+## Performance Benchmarks
+
+Test store: AZ-Deeply-Rooted (1019 products)
+
+| Metric | Value |
+|--------|-------|
+| Total products | 1019 |
+| Time | 18.5 seconds |
+| Payload size | 11.8 MB |
+| Pages fetched | 11 (100 per page) |
+| Success rate | 100% |
+
+---
+
+## Payload Format
+
+The output matches the existing `payload-fetch.ts` handler format:
+
+```json
+{
+  "dispensaryId": 123,
+  "platformId": "6405ef617056e8014d79101b",
+  "cName": "AZ-Deeply-Rooted",
+  "fetchedAt": "2025-12-12T05:05:19.837Z",
+  "productCount": 1019,
+  "products": [
+    {
+      "id": "6927508db4851262f629a869",
+      "Name": "Product Name",
+      "brand": { "name": "Brand Name", ... },
+      "type": "Flower",
+      "THC": "25%",
+      "Prices": [...],
+      "Options": [...],
+      ...
+    }
+  ]
+}
+```
+
+---
+
+## Integration Points
+
+### As a Task Handler
+
+The organic approach can be integrated as an alternative to curl-based fetching:
+
+```typescript
+// In src/tasks/handlers/organic-payload-fetch.ts
+export async function handleOrganicPayloadFetch(ctx: TaskContext): Promise<TaskResult> {
+  // Use puppeteer-based capture
+  // Save to same payload storage
+  // Queue product_refresh task
+}
+```
+
+### Worker Configuration
+
+Add to job_schedules:
+```sql
+INSERT INTO job_schedules (name, role, cron_expression)
+VALUES ('organic_product_crawl', 'organic_payload_fetch', '0 */6 * * *');
+```
+
+---
+
+## Troubleshooting
+
+### HTTP 400 Bad Request
+- Check hash is correct: `ee29c060...`
+- Verify Status is `'Active'` (string, not null)
+
+### 0 Products Returned
+- Status was likely `null` or `'All'` - use `'Active'`
+- Check platformId is valid MongoDB ObjectId
+
+### Session Not Established
+- Increase timeout on initial page.goto()
+- Check cName is valid (matches embedded-menu URL)
+
+### Detection/Blocking
+- StealthPlugin should handle most cases
+- Add random delays between pages
+- Use headless: 'new' (not true/false)
+
+---
+
+## Files Reference
+
+| File | Purpose |
+|------|---------|
+| `backend/test-intercept.js` | Proof of concept script |
+| `backend/src/platforms/dutchie/client.ts` | GraphQL hashes, curl implementation |
+| `backend/src/tasks/handlers/payload-fetch.ts` | Current curl-based handler |
+| `backend/src/utils/payload-storage.ts` | Payload save/load utilities |
+
+---
+
+## See Also
+
+- `DUTCHIE_CRAWL_WORKFLOW.md` - Full crawl pipeline documentation
+- `TASK_WORKFLOW_2024-12-10.md` - Task system architecture
+- `CLAUDE.md` - Project rules and constraints
--- a/backend/docs/_archive/README.md
+++ b/backend/docs/_archive/README.md
@@ -0,0 +1,25 @@
+# ARCHIVED DOCUMENTATION
+
+**WARNING: These docs may be outdated or inaccurate.**
+
+The code has evolved significantly. These docs are kept for historical reference only.
+
+## What to Use Instead
+
+**The single source of truth is:**
+- `CLAUDE.md` (root) - Essential rules and quick reference
+- `docs/CODEBASE_MAP.md` - Current file/directory reference
+
+## Why Archive?
+
+These docs were written during development iterations and may reference:
+- Old file paths that no longer exist
+- Deprecated approaches (hydration, scraper-v2)
+- APIs that have changed
+- Database schemas that evolved
+
+## If You Need Details
+
+1. First check CODEBASE_MAP.md for current file locations
+2. Then read the actual source code
+3. Only use archive docs as a last resort for historical context
--- a/backend/docs/_archive/TASK_WORKFLOW_2024-12-10.md
+++ b/backend/docs/_archive/TASK_WORKFLOW_2024-12-10.md
@@ -0,0 +1,584 @@
+# Task Workflow Documentation
+**Date: 2024-12-10**
+
+This document describes the complete task/job processing architecture after the 2024-12-10 rewrite.
+
+---
+
+## Complete Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────────────────────┐
+│                              KUBERNETES CLUSTER                                  │
+├─────────────────────────────────────────────────────────────────────────────────┤
+│                                                                                  │
+│  ┌─────────────────────────────────────────────────────────────────────────┐    │
+│  │                         API SERVER POD (scraper)                         │    │
+│  │                                                                          │    │
+│  │   ┌──────────────────┐     ┌────────────────────────────────────────┐   │    │
+│  │   │   Express API    │     │         TaskScheduler                   │   │    │
+│  │   │                  │     │   (src/services/task-scheduler.ts)      │   │    │
+│  │   │  /api/job-queue  │     │                                         │   │    │
+│  │   │  /api/tasks      │     │   • Polls every 60s                     │   │    │
+│  │   │  /api/schedules  │     │   • Checks task_schedules table         │   │    │
+│  │   └────────┬─────────┘     │   • SELECT FOR UPDATE SKIP LOCKED       │   │    │
+│  │            │               │   • Generates tasks when due            │   │    │
+│  │            │               └──────────────────┬─────────────────────┘   │    │
+│  │            │                                  │                          │    │
+│  └────────────┼──────────────────────────────────┼──────────────────────────┘    │
+│               │                                  │                               │
+│               │         ┌────────────────────────┘                               │
+│               │         │                                                        │
+│               ▼         ▼                                                        │
+│  ┌─────────────────────────────────────────────────────────────────────────┐    │
+│  │                          POSTGRESQL DATABASE                             │    │
+│  │                                                                          │    │
+│  │   ┌─────────────────────┐        ┌─────────────────────┐                │    │
+│  │   │   task_schedules    │        │    worker_tasks     │                │    │
+│  │   │                     │        │                     │                │    │
+│  │   │ • product_refresh   │───────►│ • pending tasks     │                │    │
+│  │   │ • store_discovery   │ create │ • claimed tasks     │                │    │
+│  │   │ • analytics_refresh │ tasks  │ • running tasks     │                │    │
+│  │   │                     │        │ • completed tasks   │                │    │
+│  │   │ next_run_at         │        │                     │                │    │
+│  │   │ last_run_at         │        │ role, dispensary_id │                │    │
+│  │   │ interval_hours      │        │ priority, status    │                │    │
+│  │   └─────────────────────┘        └──────────┬──────────┘                │    │
+│  │                                             │                            │    │
+│  └─────────────────────────────────────────────┼────────────────────────────┘    │
+│                                                │                                  │
+│                         ┌──────────────────────┘                                  │
+│                         │ Workers poll for tasks                                  │
+│                         │ (SELECT FOR UPDATE SKIP LOCKED)                         │
+│                         ▼                                                         │
+│  ┌─────────────────────────────────────────────────────────────────────────┐    │
+│  │                    WORKER PODS (StatefulSet: scraper-worker)             │    │
+│  │                                                                          │    │
+│  │   ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐    │    │
+│  │   │  Worker 0   │  │  Worker 1   │  │  Worker 2   │  │  Worker N   │    │    │
+│  │   │             │  │             │  │             │  │             │    │    │
+│  │   │ task-worker │  │ task-worker │  │ task-worker │  │ task-worker │    │    │
+│  │   │     .ts     │  │     .ts     │  │     .ts     │  │     .ts     │    │    │
+│  │   └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘    │    │
+│  │                                                                          │    │
+│  └──────────────────────────────────────────────────────────────────────────┘    │
+│                                                                                  │
+└──────────────────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Startup Sequence
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                        API SERVER STARTUP                                    │
+├─────────────────────────────────────────────────────────────────────────────┤
+│                                                                              │
+│   1. Express app initializes                                                 │
+│                    │                                                         │
+│                    ▼                                                         │
+│   2. runAutoMigrations()                                                     │
+│      • Runs pending migrations (including 079_task_schedules.sql)           │
+│                    │                                                         │
+│                    ▼                                                         │
+│   3. initializeMinio() / initializeImageStorage()                           │
+│                    │                                                         │
+│                    ▼                                                         │
+│   4. cleanupOrphanedJobs()                                                   │
+│                    │                                                         │
+│                    ▼                                                         │
+│   5. taskScheduler.start()  ◄─── NEW (per TASK_WORKFLOW_2024-12-10.md)      │
+│      │                                                                       │
+│      ├── Recover stale tasks (workers that died)                            │
+│      ├── Ensure default schedules exist in task_schedules                   │
+│      ├── Check and run any due schedules immediately                        │
+│      └── Start 60-second poll interval                                      │
+│                    │                                                         │
+│                    ▼                                                         │
+│   6. app.listen(PORT)                                                        │
+│                                                                              │
+└─────────────────────────────────────────────────────────────────────────────┘
+
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                        WORKER POD STARTUP                                    │
+├─────────────────────────────────────────────────────────────────────────────┤
+│                                                                              │
+│   1. K8s starts pod from StatefulSet                                        │
+│                    │                                                         │
+│                    ▼                                                         │
+│   2. TaskWorker.constructor()                                               │
+│      • Create DB pool                                                        │
+│      • Create CrawlRotator                                                   │
+│                    │                                                         │
+│                    ▼                                                         │
+│   3. initializeStealth()                                                    │
+│      • Load proxies from DB (REQUIRED - fails if none)                      │
+│      • Wire rotator to Dutchie client                                       │
+│                    │                                                         │
+│                    ▼                                                         │
+│   4. register() with API                                                    │
+│      • Optional - continues if fails                                         │
+│                    │                                                         │
+│                    ▼                                                         │
+│   5. startRegistryHeartbeat() every 30s                                     │
+│                    │                                                         │
+│                    ▼                                                         │
+│   6. processNextTask() loop                                                 │
+│      │                                                                       │
+│      ├── Poll for pending task (FOR UPDATE SKIP LOCKED)                     │
+│      ├── Claim task atomically                                              │
+│      ├── Execute handler (product_refresh, store_discovery, etc.)           │
+│      ├── Mark complete/failed                                               │
+│      ├── Chain next task if applicable                                      │
+│      └── Loop                                                               │
+│                                                                              │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Schedule Flow
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                     SCHEDULER POLL (every 60 seconds)                        │
+├─────────────────────────────────────────────────────────────────────────────┤
+│                                                                              │
+│   BEGIN TRANSACTION                                                          │
+│         │                                                                    │
+│         ▼                                                                    │
+│   SELECT * FROM task_schedules                                              │
+│   WHERE enabled = true AND next_run_at <= NOW()                             │
+│   FOR UPDATE SKIP LOCKED  ◄─── Prevents duplicate execution across replicas │
+│         │                                                                    │
+│         ▼                                                                    │
+│   For each due schedule:                                                     │
+│         │                                                                    │
+│         ├── product_refresh_all                                             │
+│         │   └─► Query dispensaries needing crawl                            │
+│         │   └─► Create product_refresh tasks in worker_tasks                │
+│         │                                                                    │
+│         ├── store_discovery_dutchie                                         │
+│         │   └─► Create single store_discovery task                          │
+│         │                                                                    │
+│         └── analytics_refresh                                                │
+│             └─► Create single analytics_refresh task                        │
+│         │                                                                    │
+│         ▼                                                                    │
+│   UPDATE task_schedules SET                                                  │
+│     last_run_at = NOW(),                                                     │
+│     next_run_at = NOW() + interval_hours                                    │
+│         │                                                                    │
+│         ▼                                                                    │
+│   COMMIT                                                                     │
+│                                                                              │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Task Lifecycle
+
+```
+                                    ┌──────────┐
+                                    │ SCHEDULE │
+                                    │   DUE    │
+                                    └────┬─────┘
+                                         │
+                                         ▼
+┌──────────────┐    claim    ┌──────────────┐    start    ┌──────────────┐
+│   PENDING    │────────────►│   CLAIMED    │────────────►│   RUNNING    │
+└──────────────┘             └──────────────┘             └──────┬───────┘
+       ▲                                                        │
+       │                                         ┌──────────────┼──────────────┐
+       │ retry                                   │              │              │
+       │ (if retries < max)                      ▼              ▼              ▼
+       │                                  ┌──────────┐   ┌──────────┐   ┌──────────┐
+       └──────────────────────────────────│  FAILED  │   │ COMPLETED│   │  STALE   │
+                                          └──────────┘   └──────────┘   └────┬─────┘
+                                                                              │
+                                                              recover_stale_tasks()
+                                                                              │
+                                                                              ▼
+                                                                        ┌──────────┐
+                                                                        │ PENDING  │
+                                                                        └──────────┘
+```
+
+---
+
+## Database Tables
+
+### task_schedules (NEW - migration 079)
+
+Stores schedule definitions. Survives restarts.
+
+```sql
+CREATE TABLE task_schedules (
+  id SERIAL PRIMARY KEY,
+  name VARCHAR(100) NOT NULL UNIQUE,
+  role VARCHAR(50) NOT NULL,        -- product_refresh, store_discovery, etc.
+  enabled BOOLEAN DEFAULT TRUE,
+  interval_hours INTEGER NOT NULL,  -- How often to run
+  priority INTEGER DEFAULT 0,       -- Task priority when created
+  state_code VARCHAR(2),            -- Optional filter
+  last_run_at TIMESTAMPTZ,          -- When it last ran
+  next_run_at TIMESTAMPTZ,          -- When it's due next
+  last_task_count INTEGER,          -- Tasks created last run
+  last_error TEXT                   -- Error message if failed
+);
+```
+
+### worker_tasks (migration 074)
+
+The task queue. Workers pull from here.
+
+```sql
+CREATE TABLE worker_tasks (
+  id SERIAL PRIMARY KEY,
+  role task_role NOT NULL,          -- What type of work
+  dispensary_id INTEGER,            -- Which store (if applicable)
+  platform VARCHAR(50),             -- Which platform
+  status task_status DEFAULT 'pending',
+  priority INTEGER DEFAULT 0,       -- Higher = process first
+  scheduled_for TIMESTAMP,          -- Don't process before this time
+  worker_id VARCHAR(100),           -- Which worker claimed it
+  claimed_at TIMESTAMP,
+  started_at TIMESTAMP,
+  completed_at TIMESTAMP,
+  last_heartbeat_at TIMESTAMP,      -- For stale detection
+  result JSONB,
+  error_message TEXT,
+  retry_count INTEGER DEFAULT 0,
+  max_retries INTEGER DEFAULT 3
+);
+```
+
+---
+
+## Default Schedules
+
+| Name | Role | Interval | Priority | Description |
+|------|------|----------|----------|-------------|
+| `payload_fetch_all` | payload_fetch | 4 hours | 0 | Fetch payloads from Dutchie API (chains to product_refresh) |
+| `store_discovery_dutchie` | store_discovery | 24 hours | 5 | Find new Dutchie stores |
+| `analytics_refresh` | analytics_refresh | 6 hours | 0 | Refresh MVs |
+
+---
+
+## Task Roles
+
+| Role | Description | Creates Tasks For |
+|------|-------------|-------------------|
+| `payload_fetch` | **NEW** - Fetch from Dutchie API, save to disk | Each dispensary needing crawl |
+| `product_refresh` | **CHANGED** - Read local payload, normalize, upsert to DB | Chained from payload_fetch |
+| `store_discovery` | Find new dispensaries, returns newStoreIds[] | Single task per platform |
+| `entry_point_discovery` | **DEPRECATED** - Resolve platform IDs | No longer used |
+| `product_discovery` | Initial product fetch for new stores | Chained from store_discovery |
+| `analytics_refresh` | Refresh MVs | Single global task |
+
+### Payload/Refresh Separation (2024-12-10)
+
+The crawl workflow is now split into two phases:
+
+```
+payload_fetch (scheduled every 4h)
+  └─► Hit Dutchie GraphQL API
+  └─► Save raw JSON to /storage/payloads/{year}/{month}/{day}/store_{id}_{ts}.json.gz
+  └─► Record metadata in raw_crawl_payloads table
+  └─► Queue product_refresh task with payload_id
+
+product_refresh (chained from payload_fetch)
+  └─► Load payload from filesystem (NOT from API)
+  └─► Normalize via DutchieNormalizer
+  └─► Upsert to store_products
+  └─► Create snapshots
+  └─► Track missing products
+  └─► Download images
+```
+
+**Benefits:**
+- **Retry-friendly**: If normalize fails, re-run product_refresh without re-crawling
+- **Replay-able**: Run product_refresh against any historical payload
+- **Faster refreshes**: Local file read vs network call
+- **Historical diffs**: Compare payloads to see what changed between crawls
+- **Less API pressure**: Only payload_fetch hits Dutchie
+
+---
+
+## Task Chaining
+
+Tasks automatically queue follow-up tasks upon successful completion. This creates two main flows:
+
+### Discovery Flow (New Stores)
+
+When `store_discovery` finds new dispensaries, they automatically get their initial product data:
+
+```
+store_discovery
+  └─► Discovers new locations via Dutchie GraphQL
+  └─► Auto-promotes valid locations to dispensaries table
+  └─► Collects newDispensaryIds[] from promotions
+  └─► Returns { newStoreIds: [...] } in result
+
+chainNextTask() detects newStoreIds
+  └─► Creates product_discovery task for each new store
+
+product_discovery
+  └─► Calls handlePayloadFetch() internally
+  └─► payload_fetch hits Dutchie API
+  └─► Saves raw JSON to /storage/payloads/
+  └─► Queues product_refresh task with payload_id
+
+product_refresh
+  └─► Loads payload from filesystem
+  └─► Normalizes and upserts to store_products
+  └─► Creates snapshots, downloads images
+```
+
+**Complete Discovery Chain:**
+```
+store_discovery → product_discovery → payload_fetch → product_refresh
+                        (internal call)    (queues next)
+```
+
+### Scheduled Flow (Existing Stores)
+
+For existing stores, `payload_fetch_all` schedule runs every 4 hours:
+
+```
+TaskScheduler (every 60s)
+  └─► Checks task_schedules for due schedules
+  └─► payload_fetch_all is due
+  └─► Generates payload_fetch task for each dispensary
+
+payload_fetch
+  └─► Hits Dutchie GraphQL API
+  └─► Saves raw JSON to /storage/payloads/
+  └─► Queues product_refresh task with payload_id
+
+product_refresh
+  └─► Loads payload from filesystem (NOT API)
+  └─► Normalizes via DutchieNormalizer
+  └─► Upserts to store_products
+  └─► Creates snapshots
+```
+
+**Complete Scheduled Chain:**
+```
+payload_fetch → product_refresh
+  (queues)        (reads local)
+```
+
+### Chaining Implementation
+
+Task chaining is handled in two places:
+
+1. **Internal chaining (handler calls handler):**
+   - `product_discovery` calls `handlePayloadFetch()` directly
+
+2. **External chaining (chainNextTask() in task-service.ts):**
+   - Called after task completion
+   - `store_discovery` → queues `product_discovery` for each newStoreId
+
+3. **Queue-based chaining (taskService.createTask):**
+   - `payload_fetch` queues `product_refresh` with `payload: { payload_id }`
+
+---
+
+## Payload API Endpoints
+
+Raw crawl payloads can be accessed via the Payloads API:
+
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `GET /api/payloads` | GET | List payload metadata (paginated) |
+| `GET /api/payloads/:id` | GET | Get payload metadata by ID |
+| `GET /api/payloads/:id/data` | GET | Get full payload JSON (decompressed) |
+| `GET /api/payloads/store/:dispensaryId` | GET | List payloads for a store |
+| `GET /api/payloads/store/:dispensaryId/latest` | GET | Get latest payload for a store |
+| `GET /api/payloads/store/:dispensaryId/diff` | GET | Diff two payloads for changes |
+
+### Payload Diff Response
+
+The diff endpoint returns:
+```json
+{
+  "success": true,
+  "from": { "id": 123, "fetchedAt": "...", "productCount": 100 },
+  "to": { "id": 456, "fetchedAt": "...", "productCount": 105 },
+  "diff": {
+    "added": 10,
+    "removed": 5,
+    "priceChanges": 8,
+    "stockChanges": 12
+  },
+  "details": {
+    "added": [...],
+    "removed": [...],
+    "priceChanges": [...],
+    "stockChanges": [...]
+  }
+}
+```
+
+---
+
+## API Endpoints
+
+### Schedules (NEW)
+
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `GET /api/schedules` | GET | List all schedules |
+| `PUT /api/schedules/:id` | PUT | Update schedule |
+| `POST /api/schedules/:id/trigger` | POST | Run schedule immediately |
+
+### Task Creation (rewired 2024-12-10)
+
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `POST /api/job-queue/enqueue` | POST | Create single task |
+| `POST /api/job-queue/enqueue-batch` | POST | Create batch tasks |
+| `POST /api/job-queue/enqueue-state` | POST | Create tasks for state |
+| `POST /api/tasks` | POST | Direct task creation |
+
+### Task Management
+
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `GET /api/tasks` | GET | List tasks |
+| `GET /api/tasks/:id` | GET | Get single task |
+| `GET /api/tasks/counts` | GET | Task counts by status |
+| `POST /api/tasks/recover-stale` | POST | Recover stale tasks |
+
+---
+
+## Key Files
+
+| File | Purpose |
+|------|---------|
+| `src/services/task-scheduler.ts` | **NEW** - DB-driven scheduler |
+| `src/tasks/task-worker.ts` | Worker that processes tasks |
+| `src/tasks/task-service.ts` | Task CRUD operations |
+| `src/tasks/handlers/payload-fetch.ts` | **NEW** - Fetches from API, saves to disk |
+| `src/tasks/handlers/product-refresh.ts` | **CHANGED** - Reads from disk, processes to DB |
+| `src/utils/payload-storage.ts` | **NEW** - Payload save/load utilities |
+| `src/routes/tasks.ts` | Task API endpoints |
+| `src/routes/job-queue.ts` | Job Queue UI endpoints (rewired) |
+| `migrations/079_task_schedules.sql` | Schedule table |
+| `migrations/080_raw_crawl_payloads.sql` | Payload metadata table |
+| `migrations/081_payload_fetch_columns.sql` | payload, last_fetch_at columns |
+| `migrations/074_worker_task_queue.sql` | Task queue table |
+
+---
+
+## Legacy Code (DEPRECATED)
+
+| File | Status | Replacement |
+|------|--------|-------------|
+| `src/services/scheduler.ts` | DEPRECATED | `task-scheduler.ts` |
+| `dispensary_crawl_jobs` table | ORPHANED | `worker_tasks` |
+| `job_schedules` table | LEGACY | `task_schedules` |
+
+---
+
+## Dashboard Integration
+
+Both pages remain wired to the dashboard:
+
+| Page | Data Source | Actions |
+|------|-------------|---------|
+| **Job Queue** | `worker_tasks`, `task_schedules` | Create tasks, view schedules |
+| **Task Queue** | `worker_tasks` | View tasks, recover stale |
+
+---
+
+## Multi-Replica Safety
+
+The scheduler uses `SELECT FOR UPDATE SKIP LOCKED` to ensure:
+
+1. **Only one replica** executes a schedule at a time
+2. **No duplicate tasks** created
+3. **Survives pod restarts** - state in DB, not memory
+4. **Self-healing** - recovers stale tasks on startup
+
+```sql
+-- This query is atomic across all API server replicas
+SELECT * FROM task_schedules
+WHERE enabled = true AND next_run_at <= NOW()
+FOR UPDATE SKIP LOCKED
+```
+
+---
+
+## Worker Scaling (K8s)
+
+Workers run as a StatefulSet in Kubernetes. You can scale from the admin UI or CLI.
+
+### From Admin UI
+
+The Workers page (`/admin/workers`) provides:
+- Current replica count display
+- Scale up/down buttons
+- Target replica input
+
+### API Endpoints
+
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `GET /api/workers/k8s/replicas` | GET | Get current/desired replica counts |
+| `POST /api/workers/k8s/scale` | POST | Scale to N replicas (body: `{ replicas: N }`) |
+
+### From CLI
+
+```bash
+# View current replicas
+kubectl get statefulset scraper-worker -n dispensary-scraper
+
+# Scale to 10 workers
+kubectl scale statefulset scraper-worker -n dispensary-scraper --replicas=10
+
+# Scale down to 3 workers
+kubectl scale statefulset scraper-worker -n dispensary-scraper --replicas=3
+```
+
+### Configuration
+
+Environment variables for the API server:
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `K8S_NAMESPACE` | `dispensary-scraper` | Kubernetes namespace |
+| `K8S_WORKER_STATEFULSET` | `scraper-worker` | StatefulSet name |
+
+### RBAC Requirements
+
+The API server pod needs these K8s permissions:
+
+```yaml
+apiVersion: rbac.authorization.k8s.io/v1
+kind: Role
+metadata:
+  name: worker-scaler
+  namespace: dispensary-scraper
+rules:
+- apiGroups: ["apps"]
+  resources: ["statefulsets"]
+  verbs: ["get", "patch"]
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: RoleBinding
+metadata:
+  name: scraper-worker-scaler
+  namespace: dispensary-scraper
+subjects:
+- kind: ServiceAccount
+  name: default
+  namespace: dispensary-scraper
+roleRef:
+  kind: Role
+  name: worker-scaler
+  apiGroup: rbac.authorization.k8s.io
+```
--- a/backend/docs/_archive/WORKER_TASK_ARCHITECTURE.md
+++ b/backend/docs/_archive/WORKER_TASK_ARCHITECTURE.md
@@ -0,0 +1,542 @@
+# Worker Task Architecture
+
+This document describes the unified task-based worker system that replaces the legacy fragmented job systems.
+
+## Overview
+
+The task worker architecture provides a single, unified system for managing all background work in CannaiQ:
+
+- **Store discovery** - Find new dispensaries on platforms
+- **Entry point discovery** - Resolve platform IDs from menu URLs
+- **Product discovery** - Initial product fetch for new stores
+- **Product resync** - Regular price/stock updates for existing stores
+- **Analytics refresh** - Refresh materialized views and analytics
+
+## Architecture
+
+### Database Tables
+
+**`worker_tasks`** - Central task queue
+```sql
+CREATE TABLE worker_tasks (
+  id SERIAL PRIMARY KEY,
+  role task_role NOT NULL,           -- What type of work
+  dispensary_id INTEGER,              -- Which store (if applicable)
+  platform VARCHAR(50),               -- Which platform (dutchie, etc.)
+  status task_status DEFAULT 'pending',
+  priority INTEGER DEFAULT 0,         -- Higher = process first
+  scheduled_for TIMESTAMP,            -- Don't process before this time
+  worker_id VARCHAR(100),             -- Which worker claimed it
+  claimed_at TIMESTAMP,
+  started_at TIMESTAMP,
+  completed_at TIMESTAMP,
+  last_heartbeat_at TIMESTAMP,        -- For stale detection
+  result JSONB,                       -- Output from handler
+  error_message TEXT,
+  retry_count INTEGER DEFAULT 0,
+  max_retries INTEGER DEFAULT 3,
+  created_at TIMESTAMP DEFAULT NOW(),
+  updated_at TIMESTAMP DEFAULT NOW()
+);
+```
+
+**Key indexes:**
+- `idx_worker_tasks_pending_priority` - For efficient task claiming
+- `idx_worker_tasks_active_dispensary` - Prevents concurrent tasks per store (partial unique index)
+
+### Task Roles
+
+| Role | Purpose | Per-Store | Scheduled |
+|------|---------|-----------|-----------|
+| `store_discovery` | Find new stores on a platform | No | Daily |
+| `entry_point_discovery` | Resolve platform IDs | Yes | On-demand |
+| `product_discovery` | Initial product fetch | Yes | After entry_point |
+| `product_resync` | Price/stock updates | Yes | Every 4 hours |
+| `analytics_refresh` | Refresh MVs | No | Daily |
+
+### Task Lifecycle
+
+```
+pending → claimed → running → completed
+                  ↓
+                failed
+```
+
+1. **pending** - Task is waiting to be picked up
+2. **claimed** - Worker has claimed it (atomic via SELECT FOR UPDATE SKIP LOCKED)
+3. **running** - Worker is actively processing
+4. **completed** - Task finished successfully
+5. **failed** - Task encountered an error
+6. **stale** - Task lost its worker (recovered automatically)
+
+## Files
+
+### Core Files
+
+| File | Purpose |
+|------|---------|
+| `src/tasks/task-service.ts` | TaskService - CRUD, claiming, capacity metrics |
+| `src/tasks/task-worker.ts` | TaskWorker - Main worker loop |
+| `src/tasks/index.ts` | Module exports |
+| `src/routes/tasks.ts` | API endpoints |
+| `migrations/074_worker_task_queue.sql` | Database schema |
+
+### Task Handlers
+
+| File | Role |
+|------|------|
+| `src/tasks/handlers/store-discovery.ts` | `store_discovery` |
+| `src/tasks/handlers/entry-point-discovery.ts` | `entry_point_discovery` |
+| `src/tasks/handlers/product-discovery.ts` | `product_discovery` |
+| `src/tasks/handlers/product-resync.ts` | `product_resync` |
+| `src/tasks/handlers/analytics-refresh.ts` | `analytics_refresh` |
+
+## Running Workers
+
+### Environment Variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `WORKER_ROLE` | (required) | Which task role to process |
+| `WORKER_ID` | auto-generated | Custom worker identifier |
+| `POLL_INTERVAL_MS` | 5000 | How often to check for tasks |
+| `HEARTBEAT_INTERVAL_MS` | 30000 | How often to update heartbeat |
+
+### Starting a Worker
+
+```bash
+# Start a product resync worker
+WORKER_ROLE=product_resync npx tsx src/tasks/task-worker.ts
+
+# Start with custom ID
+WORKER_ROLE=product_resync WORKER_ID=resync-1 npx tsx src/tasks/task-worker.ts
+
+# Start multiple workers for different roles
+WORKER_ROLE=store_discovery npx tsx src/tasks/task-worker.ts &
+WORKER_ROLE=product_resync npx tsx src/tasks/task-worker.ts &
+```
+
+### Kubernetes Deployment
+
+```yaml
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: task-worker-resync
+spec:
+  replicas: 3
+  template:
+    spec:
+      containers:
+      - name: worker
+        image: code.cannabrands.app/creationshop/dispensary-scraper:latest
+        command: ["npx", "tsx", "src/tasks/task-worker.ts"]
+        env:
+        - name: WORKER_ROLE
+          value: "product_resync"
+```
+
+## API Endpoints
+
+### Task Management
+
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/api/tasks` | GET | List tasks with filters |
+| `/api/tasks` | POST | Create a new task |
+| `/api/tasks/:id` | GET | Get task by ID |
+| `/api/tasks/counts` | GET | Get counts by status |
+| `/api/tasks/capacity` | GET | Get capacity metrics |
+| `/api/tasks/capacity/:role` | GET | Get role-specific capacity |
+| `/api/tasks/recover-stale` | POST | Recover tasks from dead workers |
+
+### Task Generation
+
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/api/tasks/generate/resync` | POST | Generate daily resync tasks |
+| `/api/tasks/generate/discovery` | POST | Create store discovery task |
+
+### Migration (from legacy systems)
+
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/api/tasks/migration/status` | GET | Compare old vs new systems |
+| `/api/tasks/migration/disable-old-schedules` | POST | Disable job_schedules |
+| `/api/tasks/migration/cancel-pending-crawl-jobs` | POST | Cancel old crawl jobs |
+| `/api/tasks/migration/create-resync-tasks` | POST | Create tasks for all stores |
+| `/api/tasks/migration/full-migrate` | POST | One-click migration |
+
+### Role-Specific Endpoints
+
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/api/tasks/role/:role/last-completion` | GET | Last completion time |
+| `/api/tasks/role/:role/recent` | GET | Recent completions |
+| `/api/tasks/store/:id/active` | GET | Check if store has active task |
+
+## Capacity Planning
+
+The `v_worker_capacity` view provides real-time metrics:
+
+```sql
+SELECT * FROM v_worker_capacity;
+```
+
+Returns:
+- `pending_tasks` - Tasks waiting to be claimed
+- `ready_tasks` - Tasks ready now (scheduled_for is null or past)
+- `claimed_tasks` - Tasks claimed but not started
+- `running_tasks` - Tasks actively processing
+- `completed_last_hour` - Recent completions
+- `failed_last_hour` - Recent failures
+- `active_workers` - Workers with recent heartbeats
+- `avg_duration_sec` - Average task duration
+- `tasks_per_worker_hour` - Throughput estimate
+- `estimated_hours_to_drain` - Time to clear queue
+
+### Scaling Recommendations
+
+```javascript
+// API: GET /api/tasks/capacity/:role
+{
+  "role": "product_resync",
+  "pending_tasks": 500,
+  "active_workers": 3,
+  "workers_needed": {
+    "for_1_hour": 10,
+    "for_4_hours": 3,
+    "for_8_hours": 2
+  }
+}
+```
+
+## Task Chaining
+
+Tasks can automatically create follow-up tasks:
+
+```
+store_discovery → entry_point_discovery → product_discovery
+                              ↓
+                     (store has platform_dispensary_id)
+                              ↓
+                     Daily resync tasks
+```
+
+The `chainNextTask()` method handles this automatically.
+
+## Stale Task Recovery
+
+Tasks are considered stale if `last_heartbeat_at` is older than the threshold (default 10 minutes).
+
+```sql
+SELECT recover_stale_tasks(10); -- 10 minute threshold
+```
+
+Or via API:
+```bash
+curl -X POST /api/tasks/recover-stale \
+  -H 'Content-Type: application/json' \
+  -d '{"threshold_minutes": 10}'
+```
+
+## Migration from Legacy Systems
+
+### Legacy Systems Replaced
+
+1. **job_schedules + job_run_logs** - Scheduled job definitions
+2. **dispensary_crawl_jobs** - Per-dispensary crawl queue
+3. **SyncOrchestrator + HydrationWorker** - Raw payload processing
+
+### Migration Steps
+
+**Option 1: One-Click Migration**
+```bash
+curl -X POST /api/tasks/migration/full-migrate
+```
+
+This will:
+1. Disable all job_schedules
+2. Cancel pending dispensary_crawl_jobs
+3. Generate resync tasks for all stores
+4. Create discovery and analytics tasks
+
+**Option 2: Manual Migration**
+```bash
+# 1. Check current status
+curl /api/tasks/migration/status
+
+# 2. Disable old schedules
+curl -X POST /api/tasks/migration/disable-old-schedules
+
+# 3. Cancel pending crawl jobs
+curl -X POST /api/tasks/migration/cancel-pending-crawl-jobs
+
+# 4. Create resync tasks
+curl -X POST /api/tasks/migration/create-resync-tasks \
+  -H 'Content-Type: application/json' \
+  -d '{"state_code": "AZ"}'
+
+# 5. Generate daily resync schedule
+curl -X POST /api/tasks/generate/resync \
+  -H 'Content-Type: application/json' \
+  -d '{"batches_per_day": 6}'
+```
+
+## Per-Store Locking
+
+The system prevents concurrent tasks for the same store using a partial unique index:
+
+```sql
+CREATE UNIQUE INDEX idx_worker_tasks_active_dispensary
+ON worker_tasks (dispensary_id)
+WHERE dispensary_id IS NOT NULL
+AND status IN ('claimed', 'running');
+```
+
+This ensures only one task can be active per store at any time.
+
+## Task Priority
+
+Tasks are claimed in priority order (higher first), then by creation time:
+
+```sql
+ORDER BY priority DESC, created_at ASC
+```
+
+Default priorities:
+- `store_discovery`: 0
+- `entry_point_discovery`: 10 (high - new stores)
+- `product_discovery`: 10 (high - new stores)
+- `product_resync`: 0
+- `analytics_refresh`: 0
+
+## Scheduled Tasks
+
+Tasks can be scheduled for future execution:
+
+```javascript
+await taskService.createTask({
+  role: 'product_resync',
+  dispensary_id: 123,
+  scheduled_for: new Date('2025-01-10T06:00:00Z'),
+});
+```
+
+The `generate_resync_tasks()` function creates staggered tasks throughout the day:
+
+```sql
+SELECT generate_resync_tasks(6, '2025-01-10'); -- 6 batches = every 4 hours
+```
+
+## Dashboard Integration
+
+The admin dashboard shows task queue status in the main overview:
+
+```
+Task Queue Summary
+------------------
+Pending:   45
+Running:   3
+Completed: 1,234
+Failed:    12
+```
+
+Full task management is available at `/admin/tasks`.
+
+## Error Handling
+
+Failed tasks include the error message in `error_message` and can be retried:
+
+```sql
+-- View failed tasks
+SELECT id, role, dispensary_id, error_message, retry_count
+FROM worker_tasks
+WHERE status = 'failed'
+ORDER BY completed_at DESC
+LIMIT 20;
+
+-- Retry failed tasks
+UPDATE worker_tasks
+SET status = 'pending', retry_count = retry_count + 1
+WHERE status = 'failed' AND retry_count < max_retries;
+```
+
+## Concurrent Task Processing (Added 2024-12)
+
+Workers can now process multiple tasks concurrently within a single worker instance. This improves throughput by utilizing async I/O efficiently.
+
+### Architecture
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                         Pod (K8s)                           │
+│                                                             │
+│  ┌─────────────────────────────────────────────────────┐   │
+│  │                    TaskWorker                        │   │
+│  │                                                      │   │
+│  │  ┌─────────┐  ┌─────────┐  ┌─────────┐             │   │
+│  │  │ Task 1  │  │ Task 2  │  │ Task 3  │  (concurrent)│   │
+│  │  └─────────┘  └─────────┘  └─────────┘             │   │
+│  │                                                      │   │
+│  │  Resource Monitor                                    │   │
+│  │  ├── Memory: 65% (threshold: 85%)                   │   │
+│  │  ├── CPU: 45% (threshold: 90%)                      │   │
+│  │  └── Status: Normal                                  │   │
+│  └─────────────────────────────────────────────────────┘   │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### Environment Variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `MAX_CONCURRENT_TASKS` | 3 | Maximum tasks a worker will run concurrently |
+| `MEMORY_BACKOFF_THRESHOLD` | 0.85 | Back off when heap memory exceeds 85% |
+| `CPU_BACKOFF_THRESHOLD` | 0.90 | Back off when CPU exceeds 90% |
+| `BACKOFF_DURATION_MS` | 10000 | How long to wait when backing off (10s) |
+
+### How It Works
+
+1. **Main Loop**: Worker continuously tries to fill up to `MAX_CONCURRENT_TASKS`
+2. **Resource Monitoring**: Before claiming a new task, worker checks memory and CPU
+3. **Backoff**: If resources exceed thresholds, worker pauses and stops claiming new tasks
+4. **Concurrent Execution**: Tasks run in parallel using `Promise` - they don't block each other
+5. **Graceful Shutdown**: On SIGTERM/decommission, worker stops claiming but waits for active tasks
+
+### Resource Monitoring
+
+```typescript
+// ResourceStats interface
+interface ResourceStats {
+  memoryPercent: number;    // Current heap usage as decimal (0.0-1.0)
+  memoryMb: number;         // Current heap used in MB
+  memoryTotalMb: number;    // Total heap available in MB
+  cpuPercent: number;       // CPU usage as percentage (0-100)
+  isBackingOff: boolean;    // True if worker is in backoff state
+  backoffReason: string;    // Why the worker is backing off
+}
+```
+
+### Heartbeat Data
+
+Workers report the following in their heartbeat:
+
+```json
+{
+  "worker_id": "worker-abc123",
+  "current_task_id": 456,
+  "current_task_ids": [456, 457, 458],
+  "active_task_count": 3,
+  "max_concurrent_tasks": 3,
+  "status": "active",
+  "resources": {
+    "memory_mb": 256,
+    "memory_total_mb": 512,
+    "memory_rss_mb": 320,
+    "memory_percent": 50,
+    "cpu_user_ms": 12500,
+    "cpu_system_ms": 3200,
+    "cpu_percent": 45,
+    "is_backing_off": false,
+    "backoff_reason": null
+  }
+}
+```
+
+### Backoff Behavior
+
+When resources exceed thresholds:
+
+1. Worker logs the backoff reason:
+   ```
+   [TaskWorker] MyWorker backing off: Memory at 87.3% (threshold: 85%)
+   ```
+
+2. Worker stops claiming new tasks but continues existing tasks
+
+3. After `BACKOFF_DURATION_MS`, worker rechecks resources
+
+4. When resources return to normal:
+   ```
+   [TaskWorker] MyWorker resuming normal operation
+   ```
+
+### UI Display
+
+The Workers Dashboard shows:
+
+- **Tasks Column**: `2/3 tasks` (active/max concurrent)
+- **Resources Column**: Memory % and CPU % with color coding
+  - Green: < 50%
+  - Yellow: 50-74%
+  - Amber: 75-89%
+  - Red: 90%+
+- **Backing Off**: Orange warning badge when worker is in backoff state
+
+### Task Count Badge Details
+
+```
+┌─────────────────────────────────────────────┐
+│ Worker: "MyWorker"                          │
+│ Tasks: 2/3 tasks  #456, #457                │
+│ Resources: 🧠 65%  💻 45%                    │
+│ Status: ● Active                            │
+└─────────────────────────────────────────────┘
+```
+
+### Best Practices
+
+1. **Start Conservative**: Use `MAX_CONCURRENT_TASKS=3` initially
+2. **Monitor Resources**: Watch for frequent backoffs in logs
+3. **Tune Per Workload**: I/O-bound tasks benefit from higher concurrency
+4. **Scale Horizontally**: Add more pods rather than cranking concurrency too high
+
+### Code References
+
+| File | Purpose |
+|------|---------|
+| `src/tasks/task-worker.ts:68-71` | Concurrency environment variables |
+| `src/tasks/task-worker.ts:104-111` | ResourceStats interface |
+| `src/tasks/task-worker.ts:149-179` | getResourceStats() method |
+| `src/tasks/task-worker.ts:184-196` | shouldBackOff() method |
+| `src/tasks/task-worker.ts:462-516` | mainLoop() with concurrent claiming |
+| `src/routes/worker-registry.ts:148-195` | Heartbeat endpoint handling |
+| `cannaiq/src/pages/WorkersDashboard.tsx:233-305` | UI components for resources |
+
+## Monitoring
+
+### Logs
+
+Workers log to stdout:
+```
+[TaskWorker] Starting worker worker-product_resync-a1b2c3d4 for role: product_resync
+[TaskWorker] Claimed task 123 (product_resync) for dispensary 456
+[TaskWorker] Task 123 completed successfully
+```
+
+### Health Check
+
+Check if workers are active:
+```sql
+SELECT worker_id, role, COUNT(*), MAX(last_heartbeat_at)
+FROM worker_tasks
+WHERE last_heartbeat_at > NOW() - INTERVAL '5 minutes'
+GROUP BY worker_id, role;
+```
+
+### Metrics
+
+```sql
+-- Tasks by status
+SELECT status, COUNT(*) FROM worker_tasks GROUP BY status;
+
+-- Tasks by role
+SELECT role, status, COUNT(*) FROM worker_tasks GROUP BY role, status;
+
+-- Average duration by role
+SELECT role, AVG(EXTRACT(EPOCH FROM (completed_at - started_at))) as avg_seconds
+FROM worker_tasks
+WHERE status = 'completed' AND completed_at > NOW() - INTERVAL '24 hours'
+GROUP BY role;
+```