Files
cannaiq/backend/docs/_archive/ANALYTICS_RUNBOOK.md
Kelly a35976b9e9 chore: Clean up deprecated code and docs
- Move deprecated directories to src/_deprecated/:
  - hydration/ (old pipeline approach)
  - scraper-v2/ (old Puppeteer scraper)
  - canonical-hydration/ (merged into tasks)
  - Unused services: availability, crawler-logger, geolocation, etc
  - Unused utils: age-gate-playwright, HomepageValidator, stealthBrowser

- Archive outdated docs to docs/_archive/:
  - ANALYTICS_RUNBOOK.md
  - ANALYTICS_V2_EXAMPLES.md
  - BRAND_INTELLIGENCE_API.md
  - CRAWL_PIPELINE.md
  - TASK_WORKFLOW_2024-12-10.md
  - WORKER_TASK_ARCHITECTURE.md
  - ORGANIC_SCRAPING_GUIDE.md

- Add docs/CODEBASE_MAP.md as single source of truth
- Add warning files to deprecated/archived directories
- Slim down CLAUDE.md to essential rules only

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 22:17:40 -07:00

20 KiB
Raw Blame History

CannaiQ Analytics Runbook

Phase 3: Analytics Engine - Complete Implementation Guide

Overview

The CannaiQ Analytics Engine provides real-time insights into cannabis market data across price trends, brand penetration, category performance, store changes, and competitive positioning.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        API Layer                                 │
│  /api/az/analytics/*                                            │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                     Analytics Services                          │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐    │
│  │PriceTrend    │ │Penetration   │ │CategoryAnalytics     │    │
│  │Service       │ │Service       │ │Service               │    │
│  └──────────────┘ └──────────────┘ └──────────────────────┘    │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐    │
│  │StoreChange   │ │BrandOpportunity│ │AnalyticsCache       │    │
│  │Service       │ │Service        │ │(15-min TTL)          │    │
│  └──────────────┘ └──────────────┘ └──────────────────────┘    │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                     Canonical Tables                            │
│  store_products │ store_product_snapshots │ brands │ categories │
│  dispensaries   │ brand_snapshots         │ category_snapshots  │
└─────────────────────────────────────────────────────────────────┘

Services

1. PriceTrendService

Provides time-series price analytics.

Key Methods:

Method Description
getProductPriceTrend(productId, storeId?, days) Price history for a product
getBrandPriceTrend(brandName, filters) Average prices for a brand
getCategoryPriceTrend(category, filters) Category-level price trends
getPriceSummary(filters) 7d/30d/90d price averages
detectPriceCompression(category, state?) Price war detection
getGlobalPriceStats() Market-wide pricing overview

Filters:

interface PriceFilters {
  storeId?: number;
  brandName?: string;
  category?: string;
  state?: string;
  days?: number; // default: 30
}

Price Compression Detection:

  • Calculates standard deviation of prices within category
  • Returns compression score 0-100 (higher = more compressed)
  • Identifies brands converging toward mean price

2. PenetrationService

Tracks brand market presence across stores and states.

Key Methods:

Method Description
getBrandPenetration(brandName, filters) Store count, SKU count, coverage
getTopBrandsByPenetration(limit, filters) Leaderboard of dominant brands
getPenetrationTrend(brandName, days) Historical penetration growth
getShelfShareByCategory(brandName) % of shelf per category
getBrandPresenceByState(brandName) Multi-state presence map
getStoresCarryingBrand(brandName) List of stores carrying brand
getPenetrationHeatmap(brandName?) Geographic distribution

Penetration Calculation:

Penetration % = (Stores with Brand / Total Stores in Market) × 100

3. CategoryAnalyticsService

Analyzes category performance and trends.

Key Methods:

Method Description
getCategorySummary(category?, filters) SKU count, avg price, stores
getCategoryGrowth(days, filters) 7d/30d/90d growth rates
getCategoryGrowthTrend(category, days) Time-series category growth
getCategoryHeatmap(metric, periods) Visual heatmap data
getTopMovers(limit, days) Fastest growing/declining categories
getSubcategoryBreakdown(category) Drill-down into subcategories

Time Windows:

  • 7 days: Short-term volatility
  • 30 days: Monthly trends
  • 90 days: Seasonal patterns

4. StoreChangeService

Tracks product adds/drops, brand changes, and price movements per store.

Key Methods:

Method Description
getStoreChangeSummary(storeId) Overview of recent changes
getStoreChangeEvents(storeId, filters) Event log (add, drop, price, OOS)
getNewBrands(storeId, days) Brands added to store
getLostBrands(storeId, days) Brands dropped from store
getProductChanges(storeId, type, days) Filtered product changes
getCategoryLeaderboard(category, limit) Top stores for category
getMostActiveStores(days, limit) Stores with most changes
compareStores(store1, store2) Side-by-side store comparison

Event Types:

  • added - New product appeared
  • discontinued - Product removed
  • price_drop - Price decreased
  • price_increase - Price increased
  • restocked - OOS → In Stock
  • out_of_stock - In Stock → OOS

5. BrandOpportunityService

Competitive intelligence and opportunity identification.

Key Methods:

Method Description
getBrandOpportunity(brandName) Full opportunity analysis
getMarketPositionSummary(brandName) Market position vs competitors
getAlerts(filters) Analytics-generated alerts
markAlertsRead(alertIds) Mark alerts as read

Opportunity Analysis Includes:

  • White space stores (potential targets)
  • Competitive threats (brands gaining share)
  • Pricing opportunities (underpriced vs market)
  • Missing SKU recommendations

6. AnalyticsCache

In-memory caching with database fallback.

Configuration:

const cache = new AnalyticsCache(pool, {
  defaultTtlMinutes: 15,
});

Usage Pattern:

const data = await cache.getOrCompute(cacheKey, async () => {
  // Expensive query here
  return result;
});

Cache Management:

  • GET /api/az/analytics/cache/stats - View cache stats
  • POST /api/az/analytics/cache/clear?pattern=price* - Clear by pattern
  • Auto-cleanup of expired entries every 5 minutes

API Endpoints Reference

Price Endpoints

# Product price trend (last 30 days)
GET /api/az/analytics/price/product/12345?days=30

# Brand price trend with filters
GET /api/az/analytics/price/brand/Cookies?storeId=101&category=Flower&days=90

# Category median price
GET /api/az/analytics/price/category/Vaporizers?state=AZ

# Price summary (7d/30d/90d)
GET /api/az/analytics/price/summary?brand=Stiiizy&state=AZ

# Detect price wars
GET /api/az/analytics/price/compression/Flower?state=AZ

# Global stats
GET /api/az/analytics/price/global

Penetration Endpoints

# Brand penetration
GET /api/az/analytics/penetration/brand/Cookies

# Top brands leaderboard
GET /api/az/analytics/penetration/top?limit=20&state=AZ&category=Flower

# Penetration trend
GET /api/az/analytics/penetration/trend/Cookies?days=90

# Shelf share by category
GET /api/az/analytics/penetration/shelf-share/Cookies

# Multi-state presence
GET /api/az/analytics/penetration/by-state/Cookies

# Stores carrying brand
GET /api/az/analytics/penetration/stores/Cookies

# Heatmap data
GET /api/az/analytics/penetration/heatmap?brand=Cookies

Category Endpoints

# Category summary
GET /api/az/analytics/category/summary?category=Flower&state=AZ

# Category growth (7d/30d/90d)
GET /api/az/analytics/category/growth?days=30&state=AZ

# Category trend
GET /api/az/analytics/category/trend/Concentrates?days=90

# Heatmap
GET /api/az/analytics/category/heatmap?metric=growth&periods=12

# Top movers (growing/declining)
GET /api/az/analytics/category/top-movers?limit=5&days=30

# Subcategory breakdown
GET /api/az/analytics/category/Edibles/subcategories

Store Endpoints

# Store change summary
GET /api/az/analytics/store/101/summary

# Event log
GET /api/az/analytics/store/101/events?type=price_drop&days=7&limit=50

# New brands
GET /api/az/analytics/store/101/brands/new?days=30

# Lost brands
GET /api/az/analytics/store/101/brands/lost?days=30

# Product changes by type
GET /api/az/analytics/store/101/products/changes?type=added&days=7

# Category leaderboard
GET /api/az/analytics/store/leaderboard/Flower?limit=20

# Most active stores
GET /api/az/analytics/store/most-active?days=7&limit=10

# Compare two stores
GET /api/az/analytics/store/compare?store1=101&store2=102

Brand Opportunity Endpoints

# Full opportunity analysis
GET /api/az/analytics/brand/Cookies/opportunity

# Market position summary
GET /api/az/analytics/brand/Cookies/position

# Get alerts
GET /api/az/analytics/alerts?brand=Cookies&type=competitive&unreadOnly=true

# Mark alerts read
POST /api/az/analytics/alerts/mark-read
Body: { "alertIds": [1, 2, 3] }

Maintenance Endpoints

# Capture daily snapshots (run by scheduler)
POST /api/az/analytics/snapshots/capture

# Cache statistics
GET /api/az/analytics/cache/stats

# Clear cache (admin)
POST /api/az/analytics/cache/clear?pattern=price*

Incremental Computation

Analytics are designed for real-time queries without full recomputation:

Snapshot Strategy

  1. Raw Data: store_products (current state)
  2. Historical: store_product_snapshots (time-series)
  3. Aggregated: brand_snapshots, category_snapshots (daily rollups)

Window Calculations

-- 7-day window
WHERE crawled_at >= NOW() - INTERVAL '7 days'

-- 30-day window
WHERE crawled_at >= NOW() - INTERVAL '30 days'

-- 90-day window
WHERE crawled_at >= NOW() - INTERVAL '90 days'

Materialized Views (Optional)

For heavy queries, create materialized views:

CREATE MATERIALIZED VIEW mv_brand_daily_metrics AS
SELECT
  DATE(sps.captured_at) as date,
  sp.brand_id,
  COUNT(DISTINCT sp.dispensary_id) as store_count,
  COUNT(*) as sku_count,
  AVG(sp.price_rec) as avg_price
FROM store_product_snapshots sps
JOIN store_products sp ON sps.store_product_id = sp.id
WHERE sps.captured_at >= NOW() - INTERVAL '90 days'
GROUP BY DATE(sps.captured_at), sp.brand_id;

-- Refresh daily
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_brand_daily_metrics;

Scheduled Jobs

Daily Snapshot Capture

Trigger via cron or scheduler:

curl -X POST http://localhost:3010/api/az/analytics/snapshots/capture

This calls:

  • capture_brand_snapshots() - Captures brand metrics
  • capture_category_snapshots() - Captures category metrics

Cache Cleanup

Automatic cleanup every 5 minutes via in-memory timer.

For manual cleanup:

curl -X POST http://localhost:3010/api/az/analytics/cache/clear

Extending Analytics (Future Phases)

Phase 6: Intelligence Engine

  • Automated alert generation
  • Recommendation engine
  • Price prediction

Phase 7: Orders Integration

  • Sales velocity analytics
  • Reorder predictions
  • Inventory turnover

Phase 8: Advanced ML

  • Demand forecasting
  • Price elasticity modeling
  • Customer segmentation

Troubleshooting

Common Issues

1. Slow queries

  • Check cache stats: GET /api/az/analytics/cache/stats
  • Increase cache TTL if data doesn't need real-time freshness
  • Add indexes on frequently filtered columns

2. Empty results

  • Verify data exists in source tables
  • Check filter parameters (case-sensitive brand names)
  • Verify state codes are valid

3. Stale data

  • Run snapshot capture: POST /api/az/analytics/snapshots/capture
  • Clear cache: POST /api/az/analytics/cache/clear

Debugging

Enable query logging:

// In service constructor
this.debug = process.env.ANALYTICS_DEBUG === 'true';

Data Contracts

Price Trend Response

interface PriceTrend {
  productId?: number;
  storeId?: number;
  brandName?: string;
  category?: string;
  dataPoints: Array<{
    date: string;
    minPrice: number | null;
    maxPrice: number | null;
    avgPrice: number | null;
    wholesalePrice: number | null;
    sampleSize: number;
  }>;
  summary: {
    currentAvg: number | null;
    previousAvg: number | null;
    changePercent: number | null;
    trend: 'up' | 'down' | 'stable';
    volatilityScore: number | null;
  };
}

Brand Penetration Response

interface BrandPenetration {
  brandName: string;
  totalStores: number;
  storesWithBrand: number;
  penetrationPercent: number;
  skuCount: number;
  avgPrice: number | null;
  priceRange: { min: number; max: number } | null;
  topCategories: Array<{ category: string; count: number }>;
  stateBreakdown?: Array<{ state: string; storeCount: number }>;
}

Category Growth Response

interface CategoryGrowth {
  category: string;
  currentCount: number;
  previousCount: number;
  growthPercent: number;
  growthTrend: 'up' | 'down' | 'stable';
  avgPrice: number | null;
  priceChange: number | null;
  topBrands: Array<{ brandName: string; count: number }>;
}

Files Reference

File Purpose
src/dutchie-az/services/analytics/price-trends.ts Price analytics
src/dutchie-az/services/analytics/penetration.ts Brand penetration
src/dutchie-az/services/analytics/category-analytics.ts Category metrics
src/dutchie-az/services/analytics/store-changes.ts Store event tracking
src/dutchie-az/services/analytics/brand-opportunity.ts Competitive intel
src/dutchie-az/services/analytics/cache.ts Caching layer
src/dutchie-az/services/analytics/index.ts Module exports
src/dutchie-az/routes/analytics.ts API routes (680 LOC)
src/multi-state/state-query-service.ts Cross-state analytics


Analytics V2: Rec/Med State Segmentation

Phase 3 Enhancement: Enhanced analytics with recreational vs medical-only state analysis.

V2 API Endpoints

All V2 endpoints are prefixed with /api/analytics/v2

V2 Price Analytics

# Price trends for a specific product
GET /api/analytics/v2/price/product/12345?window=30d

# Price by category and state (with rec/med segmentation)
GET /api/analytics/v2/price/category/Flower?state=AZ

# Price by brand and state
GET /api/analytics/v2/price/brand/Cookies?state=AZ

# Most volatile products
GET /api/analytics/v2/price/volatile?window=30d&limit=50&state=AZ

# Rec vs Med price comparison by category
GET /api/analytics/v2/price/rec-vs-med?category=Flower

V2 Brand Penetration

# Brand penetration metrics with state breakdown
GET /api/analytics/v2/brand/Cookies/penetration?window=30d

# Brand market position within categories
GET /api/analytics/v2/brand/Cookies/market-position?category=Flower&state=AZ

# Brand presence in rec vs med-only states
GET /api/analytics/v2/brand/Cookies/rec-vs-med

# Top brands by penetration
GET /api/analytics/v2/brand/top?limit=25&state=AZ

# Brands expanding or contracting
GET /api/analytics/v2/brand/expansion-contraction?window=30d&limit=25

V2 Category Analytics

# Category growth metrics
GET /api/analytics/v2/category/Flower/growth?window=30d

# Category growth trend over time
GET /api/analytics/v2/category/Flower/trend?window=30d

# Top brands in category
GET /api/analytics/v2/category/Flower/top-brands?limit=25&state=AZ

# All categories with metrics
GET /api/analytics/v2/category/all?state=AZ&limit=50

# Rec vs Med category comparison
GET /api/analytics/v2/category/rec-vs-med?category=Flower

# Fastest growing categories
GET /api/analytics/v2/category/fastest-growing?window=30d&limit=25

V2 Store Analytics

# Store change summary
GET /api/analytics/v2/store/101/summary?window=30d

# Product change events
GET /api/analytics/v2/store/101/events?window=7d&limit=100

# Store inventory composition
GET /api/analytics/v2/store/101/inventory

# Store price positioning vs market
GET /api/analytics/v2/store/101/price-position

# Most active stores by changes
GET /api/analytics/v2/store/most-active?window=7d&limit=25&state=AZ

V2 State Analytics

# State market summary
GET /api/analytics/v2/state/AZ/summary

# All states with coverage metrics
GET /api/analytics/v2/state/all

# Legal state breakdown (rec, med-only, no program)
GET /api/analytics/v2/state/legal-breakdown

# Rec vs Med pricing by category
GET /api/analytics/v2/state/rec-vs-med-pricing?category=Flower

# States with coverage gaps
GET /api/analytics/v2/state/coverage-gaps

# Cross-state pricing comparison
GET /api/analytics/v2/state/price-comparison

V2 Services Architecture

src/services/analytics/
├── index.ts                    # Exports all V2 services
├── types.ts                    # Shared type definitions
├── PriceAnalyticsService.ts    # Price trends and volatility
├── BrandPenetrationService.ts  # Brand market presence
├── CategoryAnalyticsService.ts # Category growth analysis
├── StoreAnalyticsService.ts    # Store change tracking
└── StateAnalyticsService.ts    # State-level analytics

src/routes/analytics-v2.ts      # V2 API route handlers

Key V2 Features

  1. Rec/Med State Segmentation: All analytics can be filtered and compared by legal status
  2. State Coverage Gaps: Identify legal states with missing or stale data
  3. Cross-State Pricing: Compare prices across recreational and medical-only markets
  4. Brand Footprint Analysis: Track brand presence in rec vs med states
  5. Category Comparison: Compare category performance by legal status

V2 Migration Path

  1. Run migration 052 for state cannabis flags:

    psql "$DATABASE_URL" -f migrations/052_add_state_cannabis_flags.sql
    
  2. Run migration 053 for analytics indexes:

    psql "$DATABASE_URL" -f migrations/053_analytics_indexes.sql
    
  3. Restart backend to pick up new routes

V2 Response Examples

Rec vs Med Price Comparison:

{
  "category": "Flower",
  "recreational": {
    "state_count": 15,
    "product_count": 12500,
    "avg_price": 35.50,
    "median_price": 32.00
  },
  "medical_only": {
    "state_count": 8,
    "product_count": 5200,
    "avg_price": 42.00,
    "median_price": 40.00
  },
  "price_diff_percent": -15.48
}

Legal State Breakdown:

{
  "recreational_states": {
    "count": 24,
    "dispensary_count": 850,
    "product_count": 125000,
    "states": [
      { "code": "CA", "name": "California", "dispensary_count": 250 },
      { "code": "CO", "name": "Colorado", "dispensary_count": 150 }
    ]
  },
  "medical_only_states": {
    "count": 18,
    "dispensary_count": 320,
    "product_count": 45000,
    "states": [
      { "code": "FL", "name": "Florida", "dispensary_count": 120 }
    ]
  },
  "no_program_states": {
    "count": 9,
    "states": [
      { "code": "ID", "name": "Idaho" }
    ]
  }
}

Phase 3 Analytics Engine - Fully Implemented V2 Rec/Med State Analytics - Added December 2024