chore: Clean up deprecated code and docs

- Move deprecated directories to src/_deprecated/:
  - hydration/ (old pipeline approach)
  - scraper-v2/ (old Puppeteer scraper)
  - canonical-hydration/ (merged into tasks)
  - Unused services: availability, crawler-logger, geolocation, etc
  - Unused utils: age-gate-playwright, HomepageValidator, stealthBrowser

- Archive outdated docs to docs/_archive/:
  - ANALYTICS_RUNBOOK.md
  - ANALYTICS_V2_EXAMPLES.md
  - BRAND_INTELLIGENCE_API.md
  - CRAWL_PIPELINE.md
  - TASK_WORKFLOW_2024-12-10.md
  - WORKER_TASK_ARCHITECTURE.md
  - ORGANIC_SCRAPING_GUIDE.md

- Add docs/CODEBASE_MAP.md as single source of truth
- Add warning files to deprecated/archived directories
- Slim down CLAUDE.md to essential rules only

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Kelly
2025-12-11 22:17:40 -07:00
parent f2864bd2ad
commit a35976b9e9
61 changed files with 856 additions and 1281 deletions

View File

@@ -0,0 +1,712 @@
# CannaiQ Analytics Runbook
Phase 3: Analytics Engine - Complete Implementation Guide
## Overview
The CannaiQ Analytics Engine provides real-time insights into cannabis market data across price trends, brand penetration, category performance, store changes, and competitive positioning.
## Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ API Layer │
│ /api/az/analytics/* │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Analytics Services │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │PriceTrend │ │Penetration │ │CategoryAnalytics │ │
│ │Service │ │Service │ │Service │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │StoreChange │ │BrandOpportunity│ │AnalyticsCache │ │
│ │Service │ │Service │ │(15-min TTL) │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Canonical Tables │
│ store_products │ store_product_snapshots │ brands │ categories │
│ dispensaries │ brand_snapshots │ category_snapshots │
└─────────────────────────────────────────────────────────────────┘
```
## Services
### 1. PriceTrendService
Provides time-series price analytics.
**Key Methods:**
| Method | Description |
|--------|-------------|
| `getProductPriceTrend(productId, storeId?, days)` | Price history for a product |
| `getBrandPriceTrend(brandName, filters)` | Average prices for a brand |
| `getCategoryPriceTrend(category, filters)` | Category-level price trends |
| `getPriceSummary(filters)` | 7d/30d/90d price averages |
| `detectPriceCompression(category, state?)` | Price war detection |
| `getGlobalPriceStats()` | Market-wide pricing overview |
**Filters:**
```typescript
interface PriceFilters {
storeId?: number;
brandName?: string;
category?: string;
state?: string;
days?: number; // default: 30
}
```
**Price Compression Detection:**
- Calculates standard deviation of prices within category
- Returns compression score 0-100 (higher = more compressed)
- Identifies brands converging toward mean price
---
### 2. PenetrationService
Tracks brand market presence across stores and states.
**Key Methods:**
| Method | Description |
|--------|-------------|
| `getBrandPenetration(brandName, filters)` | Store count, SKU count, coverage |
| `getTopBrandsByPenetration(limit, filters)` | Leaderboard of dominant brands |
| `getPenetrationTrend(brandName, days)` | Historical penetration growth |
| `getShelfShareByCategory(brandName)` | % of shelf per category |
| `getBrandPresenceByState(brandName)` | Multi-state presence map |
| `getStoresCarryingBrand(brandName)` | List of stores carrying brand |
| `getPenetrationHeatmap(brandName?)` | Geographic distribution |
**Penetration Calculation:**
```
Penetration % = (Stores with Brand / Total Stores in Market) × 100
```
---
### 3. CategoryAnalyticsService
Analyzes category performance and trends.
**Key Methods:**
| Method | Description |
|--------|-------------|
| `getCategorySummary(category?, filters)` | SKU count, avg price, stores |
| `getCategoryGrowth(days, filters)` | 7d/30d/90d growth rates |
| `getCategoryGrowthTrend(category, days)` | Time-series category growth |
| `getCategoryHeatmap(metric, periods)` | Visual heatmap data |
| `getTopMovers(limit, days)` | Fastest growing/declining categories |
| `getSubcategoryBreakdown(category)` | Drill-down into subcategories |
**Time Windows:**
- 7 days: Short-term volatility
- 30 days: Monthly trends
- 90 days: Seasonal patterns
---
### 4. StoreChangeService
Tracks product adds/drops, brand changes, and price movements per store.
**Key Methods:**
| Method | Description |
|--------|-------------|
| `getStoreChangeSummary(storeId)` | Overview of recent changes |
| `getStoreChangeEvents(storeId, filters)` | Event log (add, drop, price, OOS) |
| `getNewBrands(storeId, days)` | Brands added to store |
| `getLostBrands(storeId, days)` | Brands dropped from store |
| `getProductChanges(storeId, type, days)` | Filtered product changes |
| `getCategoryLeaderboard(category, limit)` | Top stores for category |
| `getMostActiveStores(days, limit)` | Stores with most changes |
| `compareStores(store1, store2)` | Side-by-side store comparison |
**Event Types:**
- `added` - New product appeared
- `discontinued` - Product removed
- `price_drop` - Price decreased
- `price_increase` - Price increased
- `restocked` - OOS → In Stock
- `out_of_stock` - In Stock → OOS
---
### 5. BrandOpportunityService
Competitive intelligence and opportunity identification.
**Key Methods:**
| Method | Description |
|--------|-------------|
| `getBrandOpportunity(brandName)` | Full opportunity analysis |
| `getMarketPositionSummary(brandName)` | Market position vs competitors |
| `getAlerts(filters)` | Analytics-generated alerts |
| `markAlertsRead(alertIds)` | Mark alerts as read |
**Opportunity Analysis Includes:**
- White space stores (potential targets)
- Competitive threats (brands gaining share)
- Pricing opportunities (underpriced vs market)
- Missing SKU recommendations
---
### 6. AnalyticsCache
In-memory caching with database fallback.
**Configuration:**
```typescript
const cache = new AnalyticsCache(pool, {
defaultTtlMinutes: 15,
});
```
**Usage Pattern:**
```typescript
const data = await cache.getOrCompute(cacheKey, async () => {
// Expensive query here
return result;
});
```
**Cache Management:**
- `GET /api/az/analytics/cache/stats` - View cache stats
- `POST /api/az/analytics/cache/clear?pattern=price*` - Clear by pattern
- Auto-cleanup of expired entries every 5 minutes
---
## API Endpoints Reference
### Price Endpoints
```bash
# Product price trend (last 30 days)
GET /api/az/analytics/price/product/12345?days=30
# Brand price trend with filters
GET /api/az/analytics/price/brand/Cookies?storeId=101&category=Flower&days=90
# Category median price
GET /api/az/analytics/price/category/Vaporizers?state=AZ
# Price summary (7d/30d/90d)
GET /api/az/analytics/price/summary?brand=Stiiizy&state=AZ
# Detect price wars
GET /api/az/analytics/price/compression/Flower?state=AZ
# Global stats
GET /api/az/analytics/price/global
```
### Penetration Endpoints
```bash
# Brand penetration
GET /api/az/analytics/penetration/brand/Cookies
# Top brands leaderboard
GET /api/az/analytics/penetration/top?limit=20&state=AZ&category=Flower
# Penetration trend
GET /api/az/analytics/penetration/trend/Cookies?days=90
# Shelf share by category
GET /api/az/analytics/penetration/shelf-share/Cookies
# Multi-state presence
GET /api/az/analytics/penetration/by-state/Cookies
# Stores carrying brand
GET /api/az/analytics/penetration/stores/Cookies
# Heatmap data
GET /api/az/analytics/penetration/heatmap?brand=Cookies
```
### Category Endpoints
```bash
# Category summary
GET /api/az/analytics/category/summary?category=Flower&state=AZ
# Category growth (7d/30d/90d)
GET /api/az/analytics/category/growth?days=30&state=AZ
# Category trend
GET /api/az/analytics/category/trend/Concentrates?days=90
# Heatmap
GET /api/az/analytics/category/heatmap?metric=growth&periods=12
# Top movers (growing/declining)
GET /api/az/analytics/category/top-movers?limit=5&days=30
# Subcategory breakdown
GET /api/az/analytics/category/Edibles/subcategories
```
### Store Endpoints
```bash
# Store change summary
GET /api/az/analytics/store/101/summary
# Event log
GET /api/az/analytics/store/101/events?type=price_drop&days=7&limit=50
# New brands
GET /api/az/analytics/store/101/brands/new?days=30
# Lost brands
GET /api/az/analytics/store/101/brands/lost?days=30
# Product changes by type
GET /api/az/analytics/store/101/products/changes?type=added&days=7
# Category leaderboard
GET /api/az/analytics/store/leaderboard/Flower?limit=20
# Most active stores
GET /api/az/analytics/store/most-active?days=7&limit=10
# Compare two stores
GET /api/az/analytics/store/compare?store1=101&store2=102
```
### Brand Opportunity Endpoints
```bash
# Full opportunity analysis
GET /api/az/analytics/brand/Cookies/opportunity
# Market position summary
GET /api/az/analytics/brand/Cookies/position
# Get alerts
GET /api/az/analytics/alerts?brand=Cookies&type=competitive&unreadOnly=true
# Mark alerts read
POST /api/az/analytics/alerts/mark-read
Body: { "alertIds": [1, 2, 3] }
```
### Maintenance Endpoints
```bash
# Capture daily snapshots (run by scheduler)
POST /api/az/analytics/snapshots/capture
# Cache statistics
GET /api/az/analytics/cache/stats
# Clear cache (admin)
POST /api/az/analytics/cache/clear?pattern=price*
```
---
## Incremental Computation
Analytics are designed for real-time queries without full recomputation:
### Snapshot Strategy
1. **Raw Data**: `store_products` (current state)
2. **Historical**: `store_product_snapshots` (time-series)
3. **Aggregated**: `brand_snapshots`, `category_snapshots` (daily rollups)
### Window Calculations
```sql
-- 7-day window
WHERE crawled_at >= NOW() - INTERVAL '7 days'
-- 30-day window
WHERE crawled_at >= NOW() - INTERVAL '30 days'
-- 90-day window
WHERE crawled_at >= NOW() - INTERVAL '90 days'
```
### Materialized Views (Optional)
For heavy queries, create materialized views:
```sql
CREATE MATERIALIZED VIEW mv_brand_daily_metrics AS
SELECT
DATE(sps.captured_at) as date,
sp.brand_id,
COUNT(DISTINCT sp.dispensary_id) as store_count,
COUNT(*) as sku_count,
AVG(sp.price_rec) as avg_price
FROM store_product_snapshots sps
JOIN store_products sp ON sps.store_product_id = sp.id
WHERE sps.captured_at >= NOW() - INTERVAL '90 days'
GROUP BY DATE(sps.captured_at), sp.brand_id;
-- Refresh daily
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_brand_daily_metrics;
```
---
## Scheduled Jobs
### Daily Snapshot Capture
Trigger via cron or scheduler:
```bash
curl -X POST http://localhost:3010/api/az/analytics/snapshots/capture
```
This calls:
- `capture_brand_snapshots()` - Captures brand metrics
- `capture_category_snapshots()` - Captures category metrics
### Cache Cleanup
Automatic cleanup every 5 minutes via in-memory timer.
For manual cleanup:
```bash
curl -X POST http://localhost:3010/api/az/analytics/cache/clear
```
---
## Extending Analytics (Future Phases)
### Phase 6: Intelligence Engine
- Automated alert generation
- Recommendation engine
- Price prediction
### Phase 7: Orders Integration
- Sales velocity analytics
- Reorder predictions
- Inventory turnover
### Phase 8: Advanced ML
- Demand forecasting
- Price elasticity modeling
- Customer segmentation
---
## Troubleshooting
### Common Issues
**1. Slow queries**
- Check cache stats: `GET /api/az/analytics/cache/stats`
- Increase cache TTL if data doesn't need real-time freshness
- Add indexes on frequently filtered columns
**2. Empty results**
- Verify data exists in source tables
- Check filter parameters (case-sensitive brand names)
- Verify state codes are valid
**3. Stale data**
- Run snapshot capture: `POST /api/az/analytics/snapshots/capture`
- Clear cache: `POST /api/az/analytics/cache/clear`
### Debugging
Enable query logging:
```typescript
// In service constructor
this.debug = process.env.ANALYTICS_DEBUG === 'true';
```
---
## Data Contracts
### Price Trend Response
```typescript
interface PriceTrend {
productId?: number;
storeId?: number;
brandName?: string;
category?: string;
dataPoints: Array<{
date: string;
minPrice: number | null;
maxPrice: number | null;
avgPrice: number | null;
wholesalePrice: number | null;
sampleSize: number;
}>;
summary: {
currentAvg: number | null;
previousAvg: number | null;
changePercent: number | null;
trend: 'up' | 'down' | 'stable';
volatilityScore: number | null;
};
}
```
### Brand Penetration Response
```typescript
interface BrandPenetration {
brandName: string;
totalStores: number;
storesWithBrand: number;
penetrationPercent: number;
skuCount: number;
avgPrice: number | null;
priceRange: { min: number; max: number } | null;
topCategories: Array<{ category: string; count: number }>;
stateBreakdown?: Array<{ state: string; storeCount: number }>;
}
```
### Category Growth Response
```typescript
interface CategoryGrowth {
category: string;
currentCount: number;
previousCount: number;
growthPercent: number;
growthTrend: 'up' | 'down' | 'stable';
avgPrice: number | null;
priceChange: number | null;
topBrands: Array<{ brandName: string; count: number }>;
}
```
---
## Files Reference
| File | Purpose |
|------|---------|
| `src/dutchie-az/services/analytics/price-trends.ts` | Price analytics |
| `src/dutchie-az/services/analytics/penetration.ts` | Brand penetration |
| `src/dutchie-az/services/analytics/category-analytics.ts` | Category metrics |
| `src/dutchie-az/services/analytics/store-changes.ts` | Store event tracking |
| `src/dutchie-az/services/analytics/brand-opportunity.ts` | Competitive intel |
| `src/dutchie-az/services/analytics/cache.ts` | Caching layer |
| `src/dutchie-az/services/analytics/index.ts` | Module exports |
| `src/dutchie-az/routes/analytics.ts` | API routes (680 LOC) |
| `src/multi-state/state-query-service.ts` | Cross-state analytics |
---
---
## Analytics V2: Rec/Med State Segmentation
Phase 3 Enhancement: Enhanced analytics with recreational vs medical-only state analysis.
### V2 API Endpoints
All V2 endpoints are prefixed with `/api/analytics/v2`
#### V2 Price Analytics
```bash
# Price trends for a specific product
GET /api/analytics/v2/price/product/12345?window=30d
# Price by category and state (with rec/med segmentation)
GET /api/analytics/v2/price/category/Flower?state=AZ
# Price by brand and state
GET /api/analytics/v2/price/brand/Cookies?state=AZ
# Most volatile products
GET /api/analytics/v2/price/volatile?window=30d&limit=50&state=AZ
# Rec vs Med price comparison by category
GET /api/analytics/v2/price/rec-vs-med?category=Flower
```
#### V2 Brand Penetration
```bash
# Brand penetration metrics with state breakdown
GET /api/analytics/v2/brand/Cookies/penetration?window=30d
# Brand market position within categories
GET /api/analytics/v2/brand/Cookies/market-position?category=Flower&state=AZ
# Brand presence in rec vs med-only states
GET /api/analytics/v2/brand/Cookies/rec-vs-med
# Top brands by penetration
GET /api/analytics/v2/brand/top?limit=25&state=AZ
# Brands expanding or contracting
GET /api/analytics/v2/brand/expansion-contraction?window=30d&limit=25
```
#### V2 Category Analytics
```bash
# Category growth metrics
GET /api/analytics/v2/category/Flower/growth?window=30d
# Category growth trend over time
GET /api/analytics/v2/category/Flower/trend?window=30d
# Top brands in category
GET /api/analytics/v2/category/Flower/top-brands?limit=25&state=AZ
# All categories with metrics
GET /api/analytics/v2/category/all?state=AZ&limit=50
# Rec vs Med category comparison
GET /api/analytics/v2/category/rec-vs-med?category=Flower
# Fastest growing categories
GET /api/analytics/v2/category/fastest-growing?window=30d&limit=25
```
#### V2 Store Analytics
```bash
# Store change summary
GET /api/analytics/v2/store/101/summary?window=30d
# Product change events
GET /api/analytics/v2/store/101/events?window=7d&limit=100
# Store inventory composition
GET /api/analytics/v2/store/101/inventory
# Store price positioning vs market
GET /api/analytics/v2/store/101/price-position
# Most active stores by changes
GET /api/analytics/v2/store/most-active?window=7d&limit=25&state=AZ
```
#### V2 State Analytics
```bash
# State market summary
GET /api/analytics/v2/state/AZ/summary
# All states with coverage metrics
GET /api/analytics/v2/state/all
# Legal state breakdown (rec, med-only, no program)
GET /api/analytics/v2/state/legal-breakdown
# Rec vs Med pricing by category
GET /api/analytics/v2/state/rec-vs-med-pricing?category=Flower
# States with coverage gaps
GET /api/analytics/v2/state/coverage-gaps
# Cross-state pricing comparison
GET /api/analytics/v2/state/price-comparison
```
### V2 Services Architecture
```
src/services/analytics/
├── index.ts # Exports all V2 services
├── types.ts # Shared type definitions
├── PriceAnalyticsService.ts # Price trends and volatility
├── BrandPenetrationService.ts # Brand market presence
├── CategoryAnalyticsService.ts # Category growth analysis
├── StoreAnalyticsService.ts # Store change tracking
└── StateAnalyticsService.ts # State-level analytics
src/routes/analytics-v2.ts # V2 API route handlers
```
### Key V2 Features
1. **Rec/Med State Segmentation**: All analytics can be filtered and compared by legal status
2. **State Coverage Gaps**: Identify legal states with missing or stale data
3. **Cross-State Pricing**: Compare prices across recreational and medical-only markets
4. **Brand Footprint Analysis**: Track brand presence in rec vs med states
5. **Category Comparison**: Compare category performance by legal status
### V2 Migration Path
1. Run migration 052 for state cannabis flags:
```bash
psql "$DATABASE_URL" -f migrations/052_add_state_cannabis_flags.sql
```
2. Run migration 053 for analytics indexes:
```bash
psql "$DATABASE_URL" -f migrations/053_analytics_indexes.sql
```
3. Restart backend to pick up new routes
### V2 Response Examples
**Rec vs Med Price Comparison:**
```json
{
"category": "Flower",
"recreational": {
"state_count": 15,
"product_count": 12500,
"avg_price": 35.50,
"median_price": 32.00
},
"medical_only": {
"state_count": 8,
"product_count": 5200,
"avg_price": 42.00,
"median_price": 40.00
},
"price_diff_percent": -15.48
}
```
**Legal State Breakdown:**
```json
{
"recreational_states": {
"count": 24,
"dispensary_count": 850,
"product_count": 125000,
"states": [
{ "code": "CA", "name": "California", "dispensary_count": 250 },
{ "code": "CO", "name": "Colorado", "dispensary_count": 150 }
]
},
"medical_only_states": {
"count": 18,
"dispensary_count": 320,
"product_count": 45000,
"states": [
{ "code": "FL", "name": "Florida", "dispensary_count": 120 }
]
},
"no_program_states": {
"count": 9,
"states": [
{ "code": "ID", "name": "Idaho" }
]
}
}
```
---
*Phase 3 Analytics Engine - Fully Implemented*
*V2 Rec/Med State Analytics - Added December 2024*