Compare commits
7 Commits
feature/wo
...
feature/ca
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
b1ab45f662 | ||
|
|
20300edbb8 | ||
|
|
b7cfec0770 | ||
|
|
948a732dd5 | ||
|
|
bf4ceaf09e | ||
|
|
fda688b11a | ||
|
|
414b97b3c0 |
59
CLAUDE.md
59
CLAUDE.md
@@ -193,6 +193,44 @@ CannaiQ has **TWO databases** with distinct purposes:
|
||||
| `dutchie_menus` | **Canonical CannaiQ database** - All schema, migrations, and application data | READ/WRITE |
|
||||
| `dutchie_legacy` | **Legacy read-only archive** - Historical data from old system | READ-ONLY |
|
||||
|
||||
### Store vs Dispensary Terminology
|
||||
|
||||
**"Store" and "Dispensary" are SYNONYMS in CannaiQ.**
|
||||
|
||||
| Term | Usage | DB Table |
|
||||
|------|-------|----------|
|
||||
| Store | API routes (`/api/stores`) | `dispensaries` |
|
||||
| Dispensary | DB table, internal code | `dispensaries` |
|
||||
|
||||
- `/api/stores` and `/api/dispensaries` both query the `dispensaries` table
|
||||
- There is NO `stores` table in use - it's a legacy empty table
|
||||
- Use these terms interchangeably in code and documentation
|
||||
|
||||
### Canonical vs Legacy Tables
|
||||
|
||||
**CANONICAL TABLES (USE THESE):**
|
||||
|
||||
| Table | Purpose | Row Count |
|
||||
|-------|---------|-----------|
|
||||
| `dispensaries` | Store/dispensary records | ~188+ rows |
|
||||
| `dutchie_products` | Product catalog | ~37,000+ rows |
|
||||
| `dutchie_product_snapshots` | Price/stock history | ~millions |
|
||||
| `store_products` | Canonical product schema | ~37,000+ rows |
|
||||
| `store_product_snapshots` | Canonical snapshot schema | growing |
|
||||
|
||||
**LEGACY TABLES (EMPTY - DO NOT USE):**
|
||||
|
||||
| Table | Status | Action |
|
||||
|-------|--------|--------|
|
||||
| `stores` | EMPTY (0 rows) | Use `dispensaries` instead |
|
||||
| `products` | EMPTY (0 rows) | Use `dutchie_products` or `store_products` |
|
||||
| `categories` | EMPTY (0 rows) | Categories stored in product records |
|
||||
|
||||
**Code must NEVER:**
|
||||
- Query the `stores` table (use `dispensaries`)
|
||||
- Query the `products` table (use `dutchie_products` or `store_products`)
|
||||
- Query the `categories` table (categories are in product records)
|
||||
|
||||
**CRITICAL RULES:**
|
||||
- **Migrations ONLY run on `dutchie_menus`** - NEVER on `dutchie_legacy`
|
||||
- **Application code connects ONLY to `dutchie_menus`**
|
||||
@@ -615,15 +653,28 @@ export default defineConfig({
|
||||
|
||||
### Detailed Rules
|
||||
|
||||
1) **Dispensary vs Store**
|
||||
- Dutchie pipeline uses `dispensaries` (not legacy `stores`). For dutchie crawls, always work with dispensary ID.
|
||||
1) **Dispensary = Store (SAME THING)**
|
||||
- "Dispensary" and "store" are synonyms in CannaiQ. Use interchangeably.
|
||||
- **API endpoint**: `/api/stores` (NOT `/api/dispensaries`)
|
||||
- **DB table**: `dispensaries`
|
||||
- When you need to create/query stores via API, use `/api/stores`
|
||||
- Use the record's `menu_url` and `platform_dispensary_id`.
|
||||
|
||||
2) **Menu detection and platform IDs**
|
||||
2) **API Authentication**
|
||||
- **Trusted Origins (no auth needed)**:
|
||||
- IPs: `127.0.0.1`, `::1`, `::ffff:127.0.0.1`
|
||||
- Origins: `https://cannaiq.co`, `https://findadispo.com`, `https://findagram.co`
|
||||
- Also: `http://localhost:3010`, `http://localhost:8080`, `http://localhost:5173`
|
||||
- Requests from trusted IPs/origins get automatic admin access (`role: 'internal'`)
|
||||
- **Remote (non-trusted)**: Use Bearer token (JWT or API token). NO username/password auth.
|
||||
- Never try to login with username/password via API - use tokens only.
|
||||
- See `src/auth/middleware.ts` for `TRUSTED_ORIGINS` and `TRUSTED_IPS` lists.
|
||||
|
||||
3) **Menu detection and platform IDs**
|
||||
- Set `menu_type` from `menu_url` detection; resolve `platform_dispensary_id` for `menu_type='dutchie'`.
|
||||
- Admin should have "refresh detection" and "resolve ID" actions; schedule/crawl only when `menu_type='dutchie'` AND `platform_dispensary_id` is set.
|
||||
|
||||
3) **Queries and mapping**
|
||||
4) **Queries and mapping**
|
||||
- The DB returns snake_case; code expects camelCase. Always alias/map:
|
||||
- `platform_dispensary_id AS "platformDispensaryId"`
|
||||
- Map via `mapDbRowToDispensary` when loading dispensaries (scheduler, crawler, admin crawl).
|
||||
|
||||
40
backend/.env
40
backend/.env
@@ -1,30 +1,52 @@
|
||||
# CannaiQ Backend Environment Configuration
|
||||
# Copy this file to .env and fill in the values
|
||||
|
||||
# Server
|
||||
PORT=3010
|
||||
NODE_ENV=development
|
||||
|
||||
# =============================================================================
|
||||
# CannaiQ Database (dutchie_menus) - PRIMARY DATABASE
|
||||
# CANNAIQ DATABASE (dutchie_menus) - PRIMARY DATABASE
|
||||
# =============================================================================
|
||||
# This is where all schema migrations run and where canonical tables live.
|
||||
# All CANNAIQ_DB_* variables are REQUIRED - connection will fail if missing.
|
||||
# This is where ALL schema migrations run and where canonical tables live.
|
||||
# All CANNAIQ_DB_* variables are REQUIRED - no defaults.
|
||||
# The application will fail to start if any are missing.
|
||||
|
||||
CANNAIQ_DB_HOST=localhost
|
||||
CANNAIQ_DB_PORT=54320
|
||||
CANNAIQ_DB_NAME=dutchie_menus
|
||||
CANNAIQ_DB_NAME=dutchie_menus # MUST be dutchie_menus - NOT dutchie_legacy
|
||||
CANNAIQ_DB_USER=dutchie
|
||||
CANNAIQ_DB_PASS=dutchie_local_pass
|
||||
|
||||
# Alternative: Use a full connection URL instead of individual vars
|
||||
# If set, this takes priority over individual vars above
|
||||
# CANNAIQ_DB_URL=postgresql://user:pass@host:port/dutchie_menus
|
||||
|
||||
# =============================================================================
|
||||
# Legacy Database (dutchie_legacy) - READ-ONLY SOURCE
|
||||
# LEGACY DATABASE (dutchie_legacy) - READ-ONLY FOR ETL
|
||||
# =============================================================================
|
||||
# Used ONLY by ETL scripts to read historical data.
|
||||
# NEVER run migrations against this database.
|
||||
# These are only needed when running 042_legacy_import.ts
|
||||
|
||||
LEGACY_DB_HOST=localhost
|
||||
LEGACY_DB_PORT=54320
|
||||
LEGACY_DB_NAME=dutchie_legacy
|
||||
LEGACY_DB_NAME=dutchie_legacy # READ-ONLY - never migrated
|
||||
LEGACY_DB_USER=dutchie
|
||||
LEGACY_DB_PASS=dutchie_local_pass
|
||||
LEGACY_DB_PASS=
|
||||
|
||||
# Local image storage (no MinIO per CLAUDE.md)
|
||||
# Alternative: Use a full connection URL instead of individual vars
|
||||
# LEGACY_DB_URL=postgresql://user:pass@host:port/dutchie_legacy
|
||||
|
||||
# =============================================================================
|
||||
# LOCAL STORAGE
|
||||
# =============================================================================
|
||||
# Local image storage path (no MinIO)
|
||||
LOCAL_IMAGES_PATH=./public/images
|
||||
|
||||
# JWT
|
||||
# =============================================================================
|
||||
# AUTHENTICATION
|
||||
# =============================================================================
|
||||
JWT_SECRET=your-secret-key-change-in-production
|
||||
ANTHROPIC_API_KEY=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
|
||||
OPENAI_API_KEY=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
|
||||
|
||||
@@ -1,18 +1,18 @@
|
||||
-- Add location columns to proxies table
|
||||
ALTER TABLE proxies
|
||||
ADD COLUMN city VARCHAR(100),
|
||||
ADD COLUMN state VARCHAR(100),
|
||||
ADD COLUMN country VARCHAR(100),
|
||||
ADD COLUMN country_code VARCHAR(2),
|
||||
ADD COLUMN location_updated_at TIMESTAMP;
|
||||
ADD COLUMN IF NOT EXISTS city VARCHAR(100),
|
||||
ADD COLUMN IF NOT EXISTS state VARCHAR(100),
|
||||
ADD COLUMN IF NOT EXISTS country VARCHAR(100),
|
||||
ADD COLUMN IF NOT EXISTS country_code VARCHAR(2),
|
||||
ADD COLUMN IF NOT EXISTS location_updated_at TIMESTAMP;
|
||||
|
||||
-- Add index for location-based queries
|
||||
CREATE INDEX idx_proxies_location ON proxies(country_code, state, city);
|
||||
CREATE INDEX IF NOT EXISTS idx_proxies_location ON proxies(country_code, state, city);
|
||||
|
||||
-- Add the same to failed_proxies table
|
||||
ALTER TABLE failed_proxies
|
||||
ADD COLUMN city VARCHAR(100),
|
||||
ADD COLUMN state VARCHAR(100),
|
||||
ADD COLUMN country VARCHAR(100),
|
||||
ADD COLUMN country_code VARCHAR(2),
|
||||
ADD COLUMN location_updated_at TIMESTAMP;
|
||||
ADD COLUMN IF NOT EXISTS city VARCHAR(100),
|
||||
ADD COLUMN IF NOT EXISTS state VARCHAR(100),
|
||||
ADD COLUMN IF NOT EXISTS country VARCHAR(100),
|
||||
ADD COLUMN IF NOT EXISTS country_code VARCHAR(2),
|
||||
ADD COLUMN IF NOT EXISTS location_updated_at TIMESTAMP;
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
-- Create dispensaries table as single source of truth
|
||||
-- This consolidates azdhs_list (official data) + stores (menu data) into one table
|
||||
CREATE TABLE dispensaries (
|
||||
CREATE TABLE IF NOT EXISTS dispensaries (
|
||||
-- Primary key
|
||||
id SERIAL PRIMARY KEY,
|
||||
|
||||
@@ -43,11 +43,11 @@ CREATE TABLE dispensaries (
|
||||
);
|
||||
|
||||
-- Create indexes for common queries
|
||||
CREATE INDEX idx_dispensaries_city ON dispensaries(city);
|
||||
CREATE INDEX idx_dispensaries_state ON dispensaries(state);
|
||||
CREATE INDEX idx_dispensaries_slug ON dispensaries(slug);
|
||||
CREATE INDEX idx_dispensaries_azdhs_id ON dispensaries(azdhs_id);
|
||||
CREATE INDEX idx_dispensaries_menu_status ON dispensaries(menu_scrape_status);
|
||||
CREATE INDEX IF NOT EXISTS idx_dispensaries_city ON dispensaries(city);
|
||||
CREATE INDEX IF NOT EXISTS idx_dispensaries_state ON dispensaries(state);
|
||||
CREATE INDEX IF NOT EXISTS idx_dispensaries_slug ON dispensaries(slug);
|
||||
CREATE INDEX IF NOT EXISTS idx_dispensaries_azdhs_id ON dispensaries(azdhs_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_dispensaries_menu_status ON dispensaries(menu_scrape_status);
|
||||
|
||||
-- Create index for location-based queries
|
||||
CREATE INDEX idx_dispensaries_location ON dispensaries(latitude, longitude) WHERE latitude IS NOT NULL AND longitude IS NOT NULL;
|
||||
CREATE INDEX IF NOT EXISTS idx_dispensaries_location ON dispensaries(latitude, longitude) WHERE latitude IS NOT NULL AND longitude IS NOT NULL;
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
-- Create dispensary_changes table for change approval workflow
|
||||
-- This protects against accidental data destruction by requiring manual review
|
||||
CREATE TABLE dispensary_changes (
|
||||
CREATE TABLE IF NOT EXISTS dispensary_changes (
|
||||
id SERIAL PRIMARY KEY,
|
||||
dispensary_id INTEGER NOT NULL REFERENCES dispensaries(id) ON DELETE CASCADE,
|
||||
|
||||
@@ -26,10 +26,10 @@ CREATE TABLE dispensary_changes (
|
||||
);
|
||||
|
||||
-- Create indexes for common queries
|
||||
CREATE INDEX idx_dispensary_changes_status ON dispensary_changes(status);
|
||||
CREATE INDEX idx_dispensary_changes_dispensary_status ON dispensary_changes(dispensary_id, status);
|
||||
CREATE INDEX idx_dispensary_changes_created_at ON dispensary_changes(created_at DESC);
|
||||
CREATE INDEX idx_dispensary_changes_requires_recrawl ON dispensary_changes(requires_recrawl) WHERE requires_recrawl = TRUE;
|
||||
CREATE INDEX IF NOT EXISTS idx_dispensary_changes_status ON dispensary_changes(status);
|
||||
CREATE INDEX IF NOT EXISTS idx_dispensary_changes_dispensary_status ON dispensary_changes(dispensary_id, status);
|
||||
CREATE INDEX IF NOT EXISTS idx_dispensary_changes_created_at ON dispensary_changes(created_at DESC);
|
||||
CREATE INDEX IF NOT EXISTS idx_dispensary_changes_requires_recrawl ON dispensary_changes(requires_recrawl) WHERE requires_recrawl = TRUE;
|
||||
|
||||
-- Create function to automatically set requires_recrawl for website/menu_url changes
|
||||
CREATE OR REPLACE FUNCTION set_requires_recrawl()
|
||||
@@ -42,7 +42,8 @@ BEGIN
|
||||
END;
|
||||
$$ LANGUAGE plpgsql;
|
||||
|
||||
-- Create trigger to call the function
|
||||
-- Create trigger to call the function (drop first to make idempotent)
|
||||
DROP TRIGGER IF EXISTS trigger_set_requires_recrawl ON dispensary_changes;
|
||||
CREATE TRIGGER trigger_set_requires_recrawl
|
||||
BEFORE INSERT ON dispensary_changes
|
||||
FOR EACH ROW
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
-- Populate dispensaries table from azdhs_list
|
||||
-- This migrates all 182 AZDHS records with their enriched Google Maps data
|
||||
-- For multi-location dispensaries with duplicate slugs, append city name to make unique
|
||||
-- IDEMPOTENT: Uses ON CONFLICT DO NOTHING to skip already-imported records
|
||||
|
||||
WITH ranked_dispensaries AS (
|
||||
SELECT
|
||||
@@ -78,9 +79,10 @@ SELECT
|
||||
created_at,
|
||||
updated_at
|
||||
FROM ranked_dispensaries
|
||||
ORDER BY id;
|
||||
ORDER BY id
|
||||
ON CONFLICT (azdhs_id) DO NOTHING;
|
||||
|
||||
-- Verify the migration
|
||||
-- Verify the migration (idempotent - just logs, doesn't fail)
|
||||
DO $$
|
||||
DECLARE
|
||||
source_count INTEGER;
|
||||
@@ -89,9 +91,11 @@ BEGIN
|
||||
SELECT COUNT(*) INTO source_count FROM azdhs_list;
|
||||
SELECT COUNT(*) INTO dest_count FROM dispensaries;
|
||||
|
||||
RAISE NOTICE 'Migration complete: % records from azdhs_list → % records in dispensaries', source_count, dest_count;
|
||||
RAISE NOTICE 'Migration status: % records in azdhs_list, % records in dispensaries', source_count, dest_count;
|
||||
|
||||
IF source_count != dest_count THEN
|
||||
RAISE EXCEPTION 'Record count mismatch! Expected %, got %', source_count, dest_count;
|
||||
IF dest_count >= source_count THEN
|
||||
RAISE NOTICE 'OK: dispensaries table has expected records';
|
||||
ELSE
|
||||
RAISE WARNING 'dispensaries has fewer records than azdhs_list (% vs %)', dest_count, source_count;
|
||||
END IF;
|
||||
END $$;
|
||||
|
||||
@@ -3,15 +3,15 @@
|
||||
|
||||
-- Add dispensary_id to products table
|
||||
ALTER TABLE products
|
||||
ADD COLUMN dispensary_id INTEGER REFERENCES dispensaries(id) ON DELETE CASCADE;
|
||||
ADD COLUMN IF NOT EXISTS dispensary_id INTEGER REFERENCES dispensaries(id) ON DELETE CASCADE;
|
||||
|
||||
-- Add dispensary_id to categories table
|
||||
ALTER TABLE categories
|
||||
ADD COLUMN dispensary_id INTEGER REFERENCES dispensaries(id) ON DELETE CASCADE;
|
||||
ADD COLUMN IF NOT EXISTS dispensary_id INTEGER REFERENCES dispensaries(id) ON DELETE CASCADE;
|
||||
|
||||
-- Create indexes for the new foreign keys
|
||||
CREATE INDEX idx_products_dispensary_id ON products(dispensary_id);
|
||||
CREATE INDEX idx_categories_dispensary_id ON categories(dispensary_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_products_dispensary_id ON products(dispensary_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_categories_dispensary_id ON categories(dispensary_id);
|
||||
|
||||
-- NOTE: We'll populate these FKs and migrate data from stores in a separate data migration
|
||||
-- For now, new scrapers should use dispensary_id, but old store_id still works
|
||||
|
||||
@@ -0,0 +1,42 @@
|
||||
-- Migration 057: Add crawl_enabled and dutchie_verified fields to dispensaries
|
||||
--
|
||||
-- Purpose:
|
||||
-- 1. Add crawl_enabled to control which dispensaries get crawled
|
||||
-- 2. Add dutchie_verified to track Dutchie source-of-truth verification
|
||||
-- 3. Default existing records to crawl_enabled = TRUE to preserve behavior
|
||||
--
|
||||
-- After this migration, run the harmonization script to:
|
||||
-- - Match dispensaries to Dutchie discoveries
|
||||
-- - Update platform_dispensary_id from Dutchie
|
||||
-- - Set dutchie_verified = TRUE for matches
|
||||
-- - Set crawl_enabled = FALSE for unverified records
|
||||
|
||||
-- Add crawl_enabled column (defaults to true to not break existing crawls)
|
||||
ALTER TABLE dispensaries
|
||||
ADD COLUMN IF NOT EXISTS crawl_enabled BOOLEAN DEFAULT TRUE;
|
||||
|
||||
-- Add dutchie_verified column to track if record is verified against Dutchie
|
||||
ALTER TABLE dispensaries
|
||||
ADD COLUMN IF NOT EXISTS dutchie_verified BOOLEAN DEFAULT FALSE;
|
||||
|
||||
-- Add dutchie_verified_at timestamp
|
||||
ALTER TABLE dispensaries
|
||||
ADD COLUMN IF NOT EXISTS dutchie_verified_at TIMESTAMP WITH TIME ZONE;
|
||||
|
||||
-- Add dutchie_discovery_id to link back to the discovery record
|
||||
ALTER TABLE dispensaries
|
||||
ADD COLUMN IF NOT EXISTS dutchie_discovery_id BIGINT REFERENCES dutchie_discovery_locations(id);
|
||||
|
||||
-- Create index for crawl queries (only crawl enabled dispensaries)
|
||||
CREATE INDEX IF NOT EXISTS idx_dispensaries_crawl_enabled
|
||||
ON dispensaries(crawl_enabled, state)
|
||||
WHERE crawl_enabled = TRUE;
|
||||
|
||||
-- Create index for dutchie verification status
|
||||
CREATE INDEX IF NOT EXISTS idx_dispensaries_dutchie_verified
|
||||
ON dispensaries(dutchie_verified, state);
|
||||
|
||||
COMMENT ON COLUMN dispensaries.crawl_enabled IS 'Whether this dispensary should be included in crawl jobs. Set to FALSE for unverified or problematic records.';
|
||||
COMMENT ON COLUMN dispensaries.dutchie_verified IS 'Whether this dispensary has been verified against Dutchie source of truth (matched by slug or manually linked).';
|
||||
COMMENT ON COLUMN dispensaries.dutchie_verified_at IS 'Timestamp when Dutchie verification was completed.';
|
||||
COMMENT ON COLUMN dispensaries.dutchie_discovery_id IS 'Link to the dutchie_discovery_locations record this was matched/verified against.';
|
||||
56
backend/migrations/065_slug_verification_tracking.sql
Normal file
56
backend/migrations/065_slug_verification_tracking.sql
Normal file
@@ -0,0 +1,56 @@
|
||||
-- Migration 065: Slug verification and data source tracking
|
||||
-- Adds columns to track when slug/menu data was verified and from what source
|
||||
|
||||
-- Add slug verification columns to dispensaries
|
||||
ALTER TABLE dispensaries
|
||||
ADD COLUMN IF NOT EXISTS slug_source VARCHAR(50),
|
||||
ADD COLUMN IF NOT EXISTS slug_verified_at TIMESTAMPTZ,
|
||||
ADD COLUMN IF NOT EXISTS slug_status VARCHAR(20) DEFAULT 'unverified',
|
||||
ADD COLUMN IF NOT EXISTS menu_url_source VARCHAR(50),
|
||||
ADD COLUMN IF NOT EXISTS menu_url_verified_at TIMESTAMPTZ,
|
||||
ADD COLUMN IF NOT EXISTS platform_id_source VARCHAR(50),
|
||||
ADD COLUMN IF NOT EXISTS platform_id_verified_at TIMESTAMPTZ,
|
||||
ADD COLUMN IF NOT EXISTS country VARCHAR(2) DEFAULT 'US';
|
||||
|
||||
-- Add index for finding unverified stores
|
||||
CREATE INDEX IF NOT EXISTS idx_dispensaries_slug_status
|
||||
ON dispensaries(slug_status)
|
||||
WHERE slug_status != 'verified';
|
||||
|
||||
-- Add index for country
|
||||
CREATE INDEX IF NOT EXISTS idx_dispensaries_country
|
||||
ON dispensaries(country);
|
||||
|
||||
-- Comment on columns
|
||||
COMMENT ON COLUMN dispensaries.slug_source IS 'Source of slug data: dutchie_api, manual, azdhs, discovery, etc.';
|
||||
COMMENT ON COLUMN dispensaries.slug_verified_at IS 'When the slug was last verified against the source';
|
||||
COMMENT ON COLUMN dispensaries.slug_status IS 'Status: unverified, verified, invalid, changed';
|
||||
COMMENT ON COLUMN dispensaries.menu_url_source IS 'Source of menu_url: dutchie_api, website_scrape, manual, etc.';
|
||||
COMMENT ON COLUMN dispensaries.menu_url_verified_at IS 'When the menu_url was last verified';
|
||||
COMMENT ON COLUMN dispensaries.platform_id_source IS 'Source of platform_dispensary_id: dutchie_api, graphql_resolution, etc.';
|
||||
COMMENT ON COLUMN dispensaries.platform_id_verified_at IS 'When the platform_dispensary_id was last verified';
|
||||
COMMENT ON COLUMN dispensaries.country IS 'ISO 2-letter country code: US, CA, etc.';
|
||||
|
||||
-- Update Green Pharms Mesa with verified Dutchie data
|
||||
UPDATE dispensaries
|
||||
SET
|
||||
slug = 'green-pharms-mesa',
|
||||
menu_url = 'https://dutchie.com/embedded-menu/green-pharms-mesa',
|
||||
menu_type = 'dutchie',
|
||||
platform_dispensary_id = '68dc47a2af90f2e653f8df30',
|
||||
slug_source = 'dutchie_api',
|
||||
slug_verified_at = NOW(),
|
||||
slug_status = 'verified',
|
||||
menu_url_source = 'dutchie_api',
|
||||
menu_url_verified_at = NOW(),
|
||||
platform_id_source = 'dutchie_api',
|
||||
platform_id_verified_at = NOW(),
|
||||
updated_at = NOW()
|
||||
WHERE id = 232;
|
||||
|
||||
-- Mark all other AZ dispensaries as needing verification
|
||||
UPDATE dispensaries
|
||||
SET slug_status = 'unverified'
|
||||
WHERE state = 'AZ'
|
||||
AND id != 232
|
||||
AND (slug_status IS NULL OR slug_status = 'unverified');
|
||||
BIN
backend/public/downloads/cannaiq-menus-1.5.3.zip
Normal file
BIN
backend/public/downloads/cannaiq-menus-1.5.3.zip
Normal file
Binary file not shown.
@@ -1,3 +1,14 @@
|
||||
/**
|
||||
* CannaiQ Authentication Middleware
|
||||
*
|
||||
* AUTH METHODS (in order of priority):
|
||||
* 1. IP-based: Localhost/trusted IPs get 'internal' role (full access, no token needed)
|
||||
* 2. Token-based: Bearer token (JWT or API token)
|
||||
*
|
||||
* NO username/password auth in API. Use tokens only.
|
||||
*
|
||||
* Localhost bypass: curl from 127.0.0.1 gets automatic admin access.
|
||||
*/
|
||||
import { Request, Response, NextFunction } from 'express';
|
||||
import jwt from 'jsonwebtoken';
|
||||
import bcrypt from 'bcrypt';
|
||||
@@ -5,6 +16,61 @@ import { pool } from '../db/pool';
|
||||
|
||||
const JWT_SECRET = process.env.JWT_SECRET || 'change_this_in_production';
|
||||
|
||||
// Trusted origins that bypass auth for internal/same-origin requests
|
||||
const TRUSTED_ORIGINS = [
|
||||
'https://cannaiq.co',
|
||||
'https://www.cannaiq.co',
|
||||
'https://findadispo.com',
|
||||
'https://www.findadispo.com',
|
||||
'https://findagram.co',
|
||||
'https://www.findagram.co',
|
||||
'http://localhost:3010',
|
||||
'http://localhost:8080',
|
||||
'http://localhost:5173',
|
||||
];
|
||||
|
||||
// Trusted IPs for internal pod-to-pod communication
|
||||
const TRUSTED_IPS = [
|
||||
'127.0.0.1',
|
||||
'::1',
|
||||
'::ffff:127.0.0.1',
|
||||
];
|
||||
|
||||
/**
|
||||
* Check if request is from a trusted origin/IP
|
||||
*/
|
||||
function isTrustedRequest(req: Request): boolean {
|
||||
// Check origin header
|
||||
const origin = req.headers.origin;
|
||||
if (origin && TRUSTED_ORIGINS.includes(origin)) {
|
||||
return true;
|
||||
}
|
||||
|
||||
// Check referer header (for same-origin requests without CORS)
|
||||
const referer = req.headers.referer;
|
||||
if (referer) {
|
||||
for (const trusted of TRUSTED_ORIGINS) {
|
||||
if (referer.startsWith(trusted)) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Check IP for internal requests (pod-to-pod, localhost)
|
||||
const clientIp = req.ip || req.socket.remoteAddress || '';
|
||||
if (TRUSTED_IPS.includes(clientIp)) {
|
||||
return true;
|
||||
}
|
||||
|
||||
// Check for Kubernetes internal header (set by ingress/service mesh)
|
||||
const internalHeader = req.headers['x-internal-request'];
|
||||
if (internalHeader === process.env.INTERNAL_REQUEST_SECRET) {
|
||||
return true;
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
export interface AuthUser {
|
||||
id: number;
|
||||
email: string;
|
||||
@@ -61,6 +127,16 @@ export async function authenticateUser(email: string, password: string): Promise
|
||||
}
|
||||
|
||||
export async function authMiddleware(req: AuthRequest, res: Response, next: NextFunction) {
|
||||
// Allow trusted origins/IPs to bypass auth (internal services, same-origin)
|
||||
if (isTrustedRequest(req)) {
|
||||
req.user = {
|
||||
id: 0,
|
||||
email: 'internal@system',
|
||||
role: 'internal'
|
||||
};
|
||||
return next();
|
||||
}
|
||||
|
||||
const authHeader = req.headers.authorization;
|
||||
|
||||
if (!authHeader || !authHeader.startsWith('Bearer ')) {
|
||||
@@ -135,12 +211,23 @@ export async function authMiddleware(req: AuthRequest, res: Response, next: Next
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Require specific role(s) to access endpoint.
|
||||
*
|
||||
* NOTE: 'internal' role (localhost/trusted IPs) bypasses all role checks.
|
||||
* This allows local development and internal services full access.
|
||||
*/
|
||||
export function requireRole(...roles: string[]) {
|
||||
return (req: AuthRequest, res: Response, next: NextFunction) => {
|
||||
if (!req.user) {
|
||||
return res.status(401).json({ error: 'Not authenticated' });
|
||||
}
|
||||
|
||||
// Internal role (localhost) bypasses role checks
|
||||
if (req.user.role === 'internal') {
|
||||
return next();
|
||||
}
|
||||
|
||||
if (!roles.includes(req.user.role)) {
|
||||
return res.status(403).json({ error: 'Insufficient permissions' });
|
||||
}
|
||||
|
||||
@@ -472,7 +472,8 @@ export class CanonicalHydrationService {
|
||||
}
|
||||
|
||||
// Step 3: Create initial snapshots from current product state
|
||||
const snapshotsWritten = await this.createInitialSnapshots(dispensaryId, crawlRunId);
|
||||
// crawlRunId is guaranteed to be set at this point (either from existing run or insert)
|
||||
const snapshotsWritten = await this.createInitialSnapshots(dispensaryId, crawlRunId!);
|
||||
result.snapshotsWritten += snapshotsWritten;
|
||||
|
||||
// Update crawl run with snapshot count
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
#!/usr/bin/env node
|
||||
/**
|
||||
* CLI Entrypoint for CannaIQ Backend
|
||||
* @module cli
|
||||
*
|
||||
* Usage:
|
||||
* npx tsx src/cli.ts # Start API server
|
||||
@@ -50,18 +51,14 @@ async function main() {
|
||||
showHelp();
|
||||
}
|
||||
|
||||
if (args.includes('--worker')) {
|
||||
console.log('[CLI] Starting worker process...');
|
||||
const { startWorker } = await import('./dutchie-az/services/worker');
|
||||
await startWorker();
|
||||
} else {
|
||||
// Default: start API server
|
||||
console.log('[CLI] Starting API server...');
|
||||
await import('./index');
|
||||
}
|
||||
// Default: start API server
|
||||
console.log('[CLI] Starting API server...');
|
||||
await import('./index');
|
||||
}
|
||||
|
||||
main().catch((error) => {
|
||||
console.error('[CLI] Fatal error:', error);
|
||||
process.exit(1);
|
||||
});
|
||||
|
||||
export {};
|
||||
|
||||
@@ -1,657 +0,0 @@
|
||||
/**
|
||||
* Base Dutchie Crawler Template
|
||||
*
|
||||
* This is the base template for all Dutchie store crawlers.
|
||||
* Per-store crawlers extend this by overriding specific methods.
|
||||
*
|
||||
* Exports:
|
||||
* - crawlProducts(dispensary, options) - Main crawl entry point
|
||||
* - detectStructure(page) - Detect page structure for sandbox mode
|
||||
* - extractProducts(document) - Extract product data
|
||||
* - extractImages(document) - Extract product images
|
||||
* - extractStock(document) - Extract stock status
|
||||
* - extractPagination(document) - Extract pagination info
|
||||
*/
|
||||
|
||||
import {
|
||||
crawlDispensaryProducts as baseCrawlDispensaryProducts,
|
||||
CrawlResult,
|
||||
} from '../../dutchie-az/services/product-crawler';
|
||||
import { Dispensary, CrawlerProfileOptions } from '../../dutchie-az/types';
|
||||
|
||||
// Re-export CrawlResult for convenience
|
||||
export { CrawlResult };
|
||||
|
||||
// ============================================================
|
||||
// TYPES
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Options passed to the per-store crawler
|
||||
*/
|
||||
export interface StoreCrawlOptions {
|
||||
pricingType?: 'rec' | 'med';
|
||||
useBothModes?: boolean;
|
||||
downloadImages?: boolean;
|
||||
trackStock?: boolean;
|
||||
timeoutMs?: number;
|
||||
config?: Record<string, any>;
|
||||
}
|
||||
|
||||
/**
|
||||
* Progress callback for reporting crawl progress
|
||||
*/
|
||||
export interface CrawlProgressCallback {
|
||||
phase: 'fetching' | 'processing' | 'saving' | 'images' | 'complete';
|
||||
current: number;
|
||||
total: number;
|
||||
message?: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Structure detection result for sandbox mode
|
||||
*/
|
||||
export interface StructureDetectionResult {
|
||||
success: boolean;
|
||||
menuType: 'dutchie' | 'treez' | 'jane' | 'unknown';
|
||||
iframeUrl?: string;
|
||||
graphqlEndpoint?: string;
|
||||
dispensaryId?: string;
|
||||
selectors: {
|
||||
productContainer?: string;
|
||||
productName?: string;
|
||||
productPrice?: string;
|
||||
productImage?: string;
|
||||
productCategory?: string;
|
||||
pagination?: string;
|
||||
loadMore?: string;
|
||||
};
|
||||
pagination: {
|
||||
type: 'scroll' | 'click' | 'graphql' | 'none';
|
||||
hasMore?: boolean;
|
||||
pageSize?: number;
|
||||
};
|
||||
errors: string[];
|
||||
metadata: Record<string, any>;
|
||||
}
|
||||
|
||||
/**
|
||||
* Product extraction result
|
||||
*/
|
||||
export interface ExtractedProduct {
|
||||
externalId: string;
|
||||
name: string;
|
||||
brand?: string;
|
||||
category?: string;
|
||||
subcategory?: string;
|
||||
price?: number;
|
||||
priceRec?: number;
|
||||
priceMed?: number;
|
||||
weight?: string;
|
||||
thcContent?: string;
|
||||
cbdContent?: string;
|
||||
description?: string;
|
||||
imageUrl?: string;
|
||||
stockStatus?: 'in_stock' | 'out_of_stock' | 'low_stock' | 'unknown';
|
||||
quantity?: number;
|
||||
raw?: Record<string, any>;
|
||||
}
|
||||
|
||||
/**
|
||||
* Image extraction result
|
||||
*/
|
||||
export interface ExtractedImage {
|
||||
productId: string;
|
||||
imageUrl: string;
|
||||
isPrimary: boolean;
|
||||
position: number;
|
||||
}
|
||||
|
||||
/**
|
||||
* Stock extraction result
|
||||
*/
|
||||
export interface ExtractedStock {
|
||||
productId: string;
|
||||
status: 'in_stock' | 'out_of_stock' | 'low_stock' | 'unknown';
|
||||
quantity?: number;
|
||||
lastChecked: Date;
|
||||
}
|
||||
|
||||
/**
|
||||
* Pagination extraction result
|
||||
*/
|
||||
export interface ExtractedPagination {
|
||||
hasNextPage: boolean;
|
||||
currentPage?: number;
|
||||
totalPages?: number;
|
||||
totalProducts?: number;
|
||||
nextCursor?: string;
|
||||
loadMoreSelector?: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Hook points that per-store crawlers can override
|
||||
*/
|
||||
export interface DutchieCrawlerHooks {
|
||||
/**
|
||||
* Called before fetching products
|
||||
* Can be used to set up custom headers, cookies, etc.
|
||||
*/
|
||||
beforeFetch?: (dispensary: Dispensary) => Promise<void>;
|
||||
|
||||
/**
|
||||
* Called after fetching products, before processing
|
||||
* Can be used to filter or transform raw products
|
||||
*/
|
||||
afterFetch?: (products: any[], dispensary: Dispensary) => Promise<any[]>;
|
||||
|
||||
/**
|
||||
* Called after all processing is complete
|
||||
* Can be used for cleanup or post-processing
|
||||
*/
|
||||
afterComplete?: (result: CrawlResult, dispensary: Dispensary) => Promise<void>;
|
||||
|
||||
/**
|
||||
* Custom selector resolver for iframe detection
|
||||
*/
|
||||
resolveIframe?: (page: any) => Promise<string | null>;
|
||||
|
||||
/**
|
||||
* Custom product container selector
|
||||
*/
|
||||
getProductContainerSelector?: () => string;
|
||||
|
||||
/**
|
||||
* Custom product extraction from container element
|
||||
*/
|
||||
extractProductFromElement?: (element: any) => Promise<ExtractedProduct | null>;
|
||||
}
|
||||
|
||||
/**
|
||||
* Selectors configuration for per-store overrides
|
||||
*/
|
||||
export interface DutchieSelectors {
|
||||
iframe?: string;
|
||||
productContainer?: string;
|
||||
productName?: string;
|
||||
productPrice?: string;
|
||||
productPriceRec?: string;
|
||||
productPriceMed?: string;
|
||||
productImage?: string;
|
||||
productCategory?: string;
|
||||
productBrand?: string;
|
||||
productWeight?: string;
|
||||
productThc?: string;
|
||||
productCbd?: string;
|
||||
productDescription?: string;
|
||||
productStock?: string;
|
||||
loadMore?: string;
|
||||
pagination?: string;
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// DEFAULT SELECTORS
|
||||
// ============================================================
|
||||
|
||||
export const DEFAULT_DUTCHIE_SELECTORS: DutchieSelectors = {
|
||||
iframe: 'iframe[src*="dutchie.com"]',
|
||||
productContainer: '[data-testid="product-card"], .product-card, [class*="ProductCard"]',
|
||||
productName: '[data-testid="product-title"], .product-title, [class*="ProductTitle"]',
|
||||
productPrice: '[data-testid="product-price"], .product-price, [class*="ProductPrice"]',
|
||||
productImage: 'img[src*="dutchie"], img[src*="product"], .product-image img',
|
||||
productCategory: '[data-testid="category-name"], .category-name',
|
||||
productBrand: '[data-testid="brand-name"], .brand-name, [class*="BrandName"]',
|
||||
loadMore: 'button[data-testid="load-more"], .load-more-button',
|
||||
pagination: '.pagination, [class*="Pagination"]',
|
||||
};
|
||||
|
||||
// ============================================================
|
||||
// BASE CRAWLER CLASS
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* BaseDutchieCrawler - Base class for all Dutchie store crawlers
|
||||
*
|
||||
* Per-store crawlers extend this class and override methods as needed.
|
||||
* The default implementation delegates to the existing shared Dutchie logic.
|
||||
*/
|
||||
export class BaseDutchieCrawler {
|
||||
protected dispensary: Dispensary;
|
||||
protected options: StoreCrawlOptions;
|
||||
protected hooks: DutchieCrawlerHooks;
|
||||
protected selectors: DutchieSelectors;
|
||||
|
||||
constructor(
|
||||
dispensary: Dispensary,
|
||||
options: StoreCrawlOptions = {},
|
||||
hooks: DutchieCrawlerHooks = {},
|
||||
selectors: DutchieSelectors = {}
|
||||
) {
|
||||
this.dispensary = dispensary;
|
||||
this.options = {
|
||||
pricingType: 'rec',
|
||||
useBothModes: true,
|
||||
downloadImages: true,
|
||||
trackStock: true,
|
||||
timeoutMs: 30000,
|
||||
...options,
|
||||
};
|
||||
this.hooks = hooks;
|
||||
this.selectors = { ...DEFAULT_DUTCHIE_SELECTORS, ...selectors };
|
||||
}
|
||||
|
||||
/**
|
||||
* Main entry point - crawl products for this dispensary
|
||||
* Override this in per-store crawlers to customize behavior
|
||||
*/
|
||||
async crawlProducts(): Promise<CrawlResult> {
|
||||
// Call beforeFetch hook if defined
|
||||
if (this.hooks.beforeFetch) {
|
||||
await this.hooks.beforeFetch(this.dispensary);
|
||||
}
|
||||
|
||||
// Use the existing shared Dutchie crawl logic
|
||||
const result = await baseCrawlDispensaryProducts(
|
||||
this.dispensary,
|
||||
this.options.pricingType || 'rec',
|
||||
{
|
||||
useBothModes: this.options.useBothModes,
|
||||
downloadImages: this.options.downloadImages,
|
||||
}
|
||||
);
|
||||
|
||||
// Call afterComplete hook if defined
|
||||
if (this.hooks.afterComplete) {
|
||||
await this.hooks.afterComplete(result, this.dispensary);
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
/**
|
||||
* Detect page structure for sandbox discovery mode
|
||||
* Override in per-store crawlers if needed
|
||||
*
|
||||
* @param page - Puppeteer page object or HTML string
|
||||
* @returns Structure detection result
|
||||
*/
|
||||
async detectStructure(page: any): Promise<StructureDetectionResult> {
|
||||
const result: StructureDetectionResult = {
|
||||
success: false,
|
||||
menuType: 'unknown',
|
||||
selectors: {},
|
||||
pagination: { type: 'none' },
|
||||
errors: [],
|
||||
metadata: {},
|
||||
};
|
||||
|
||||
try {
|
||||
// Default implementation: check for Dutchie iframe
|
||||
if (typeof page === 'string') {
|
||||
// HTML string mode
|
||||
if (page.includes('dutchie.com')) {
|
||||
result.menuType = 'dutchie';
|
||||
result.success = true;
|
||||
}
|
||||
} else if (page && typeof page.evaluate === 'function') {
|
||||
// Puppeteer page mode
|
||||
const detection = await page.evaluate((selectorConfig: DutchieSelectors) => {
|
||||
const iframe = document.querySelector(selectorConfig.iframe || '') as HTMLIFrameElement;
|
||||
const iframeUrl = iframe?.src || null;
|
||||
|
||||
// Check for product containers
|
||||
const containers = document.querySelectorAll(selectorConfig.productContainer || '');
|
||||
|
||||
return {
|
||||
hasIframe: !!iframe,
|
||||
iframeUrl,
|
||||
productCount: containers.length,
|
||||
isDutchie: !!iframeUrl?.includes('dutchie.com'),
|
||||
};
|
||||
}, this.selectors);
|
||||
|
||||
if (detection.isDutchie) {
|
||||
result.menuType = 'dutchie';
|
||||
result.iframeUrl = detection.iframeUrl;
|
||||
result.success = true;
|
||||
}
|
||||
|
||||
result.metadata = detection;
|
||||
}
|
||||
|
||||
// Set default selectors for Dutchie
|
||||
if (result.menuType === 'dutchie') {
|
||||
result.selectors = {
|
||||
productContainer: this.selectors.productContainer,
|
||||
productName: this.selectors.productName,
|
||||
productPrice: this.selectors.productPrice,
|
||||
productImage: this.selectors.productImage,
|
||||
productCategory: this.selectors.productCategory,
|
||||
};
|
||||
result.pagination = { type: 'graphql' };
|
||||
}
|
||||
} catch (error: any) {
|
||||
result.errors.push(`Detection error: ${error.message}`);
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract products from page/document
|
||||
* Override in per-store crawlers for custom extraction
|
||||
*
|
||||
* @param document - DOM document, Puppeteer page, or raw products array
|
||||
* @returns Array of extracted products
|
||||
*/
|
||||
async extractProducts(document: any): Promise<ExtractedProduct[]> {
|
||||
// Default implementation: assume document is already an array of products
|
||||
// from the GraphQL response
|
||||
if (Array.isArray(document)) {
|
||||
return document.map((product) => this.mapRawProduct(product));
|
||||
}
|
||||
|
||||
// If document is a Puppeteer page, extract from DOM
|
||||
if (document && typeof document.evaluate === 'function') {
|
||||
return this.extractProductsFromPage(document);
|
||||
}
|
||||
|
||||
return [];
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract products from Puppeteer page
|
||||
* Override for custom DOM extraction
|
||||
*/
|
||||
protected async extractProductsFromPage(page: any): Promise<ExtractedProduct[]> {
|
||||
const products = await page.evaluate((selectors: DutchieSelectors) => {
|
||||
const containers = document.querySelectorAll(selectors.productContainer || '');
|
||||
return Array.from(containers).map((container) => {
|
||||
const nameEl = container.querySelector(selectors.productName || '');
|
||||
const priceEl = container.querySelector(selectors.productPrice || '');
|
||||
const imageEl = container.querySelector(selectors.productImage || '') as HTMLImageElement;
|
||||
const brandEl = container.querySelector(selectors.productBrand || '');
|
||||
|
||||
return {
|
||||
name: nameEl?.textContent?.trim() || '',
|
||||
price: priceEl?.textContent?.trim() || '',
|
||||
imageUrl: imageEl?.src || '',
|
||||
brand: brandEl?.textContent?.trim() || '',
|
||||
};
|
||||
});
|
||||
}, this.selectors);
|
||||
|
||||
return products.map((p: any, i: number) => ({
|
||||
externalId: `dom-product-${i}`,
|
||||
name: p.name,
|
||||
brand: p.brand,
|
||||
price: this.parsePrice(p.price),
|
||||
imageUrl: p.imageUrl,
|
||||
stockStatus: 'unknown' as const,
|
||||
}));
|
||||
}
|
||||
|
||||
/**
|
||||
* Map raw product from GraphQL to ExtractedProduct
|
||||
* Override for custom mapping
|
||||
*/
|
||||
protected mapRawProduct(raw: any): ExtractedProduct {
|
||||
return {
|
||||
externalId: raw.id || raw._id || raw.externalId,
|
||||
name: raw.name || raw.Name,
|
||||
brand: raw.brand?.name || raw.brandName || raw.brand,
|
||||
category: raw.type || raw.category || raw.Category,
|
||||
subcategory: raw.subcategory || raw.Subcategory,
|
||||
price: raw.recPrice || raw.price || raw.Price,
|
||||
priceRec: raw.recPrice || raw.Prices?.rec,
|
||||
priceMed: raw.medPrice || raw.Prices?.med,
|
||||
weight: raw.weight || raw.Weight,
|
||||
thcContent: raw.potencyThc?.formatted || raw.THCContent?.formatted,
|
||||
cbdContent: raw.potencyCbd?.formatted || raw.CBDContent?.formatted,
|
||||
description: raw.description || raw.Description,
|
||||
imageUrl: raw.image || raw.Image,
|
||||
stockStatus: this.mapStockStatus(raw),
|
||||
quantity: raw.quantity || raw.Quantity,
|
||||
raw,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Map raw stock status to standardized value
|
||||
*/
|
||||
protected mapStockStatus(raw: any): 'in_stock' | 'out_of_stock' | 'low_stock' | 'unknown' {
|
||||
const status = raw.Status || raw.status || raw.stockStatus;
|
||||
if (status === 'Active' || status === 'active' || status === 'in_stock') {
|
||||
return 'in_stock';
|
||||
}
|
||||
if (status === 'Inactive' || status === 'inactive' || status === 'out_of_stock') {
|
||||
return 'out_of_stock';
|
||||
}
|
||||
if (status === 'low_stock') {
|
||||
return 'low_stock';
|
||||
}
|
||||
return 'unknown';
|
||||
}
|
||||
|
||||
/**
|
||||
* Parse price string to number
|
||||
*/
|
||||
protected parsePrice(priceStr: string): number | undefined {
|
||||
if (!priceStr) return undefined;
|
||||
const cleaned = priceStr.replace(/[^0-9.]/g, '');
|
||||
const num = parseFloat(cleaned);
|
||||
return isNaN(num) ? undefined : num;
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract images from document
|
||||
* Override for custom image extraction
|
||||
*
|
||||
* @param document - DOM document, Puppeteer page, or products array
|
||||
* @returns Array of extracted images
|
||||
*/
|
||||
async extractImages(document: any): Promise<ExtractedImage[]> {
|
||||
if (Array.isArray(document)) {
|
||||
return document
|
||||
.filter((p) => p.image || p.Image || p.imageUrl)
|
||||
.map((p, i) => ({
|
||||
productId: p.id || p._id || `product-${i}`,
|
||||
imageUrl: p.image || p.Image || p.imageUrl,
|
||||
isPrimary: true,
|
||||
position: 0,
|
||||
}));
|
||||
}
|
||||
|
||||
// Puppeteer page extraction
|
||||
if (document && typeof document.evaluate === 'function') {
|
||||
return this.extractImagesFromPage(document);
|
||||
}
|
||||
|
||||
return [];
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract images from Puppeteer page
|
||||
*/
|
||||
protected async extractImagesFromPage(page: any): Promise<ExtractedImage[]> {
|
||||
const images = await page.evaluate((selector: string) => {
|
||||
const imgs = document.querySelectorAll(selector);
|
||||
return Array.from(imgs).map((img, i) => ({
|
||||
src: (img as HTMLImageElement).src,
|
||||
position: i,
|
||||
}));
|
||||
}, this.selectors.productImage || 'img');
|
||||
|
||||
return images.map((img: any, i: number) => ({
|
||||
productId: `dom-product-${i}`,
|
||||
imageUrl: img.src,
|
||||
isPrimary: i === 0,
|
||||
position: img.position,
|
||||
}));
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract stock information from document
|
||||
* Override for custom stock extraction
|
||||
*
|
||||
* @param document - DOM document, Puppeteer page, or products array
|
||||
* @returns Array of extracted stock statuses
|
||||
*/
|
||||
async extractStock(document: any): Promise<ExtractedStock[]> {
|
||||
if (Array.isArray(document)) {
|
||||
return document.map((p) => ({
|
||||
productId: p.id || p._id || p.externalId,
|
||||
status: this.mapStockStatus(p),
|
||||
quantity: p.quantity || p.Quantity,
|
||||
lastChecked: new Date(),
|
||||
}));
|
||||
}
|
||||
|
||||
return [];
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract pagination information from document
|
||||
* Override for custom pagination handling
|
||||
*
|
||||
* @param document - DOM document, Puppeteer page, or GraphQL response
|
||||
* @returns Pagination info
|
||||
*/
|
||||
async extractPagination(document: any): Promise<ExtractedPagination> {
|
||||
// Default: check for page info in GraphQL response
|
||||
if (document && document.pageInfo) {
|
||||
return {
|
||||
hasNextPage: document.pageInfo.hasNextPage || false,
|
||||
currentPage: document.pageInfo.currentPage,
|
||||
totalPages: document.pageInfo.totalPages,
|
||||
totalProducts: document.pageInfo.totalCount || document.totalCount,
|
||||
nextCursor: document.pageInfo.endCursor,
|
||||
};
|
||||
}
|
||||
|
||||
// Default: no pagination
|
||||
return {
|
||||
hasNextPage: false,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Get the cName (Dutchie slug) for this dispensary
|
||||
* Override to customize cName extraction
|
||||
*/
|
||||
getCName(): string {
|
||||
if (this.dispensary.menuUrl) {
|
||||
try {
|
||||
const url = new URL(this.dispensary.menuUrl);
|
||||
const segments = url.pathname.split('/').filter(Boolean);
|
||||
if (segments.length >= 2) {
|
||||
return segments[segments.length - 1];
|
||||
}
|
||||
} catch {
|
||||
// Fall through to default
|
||||
}
|
||||
}
|
||||
return this.dispensary.slug || '';
|
||||
}
|
||||
|
||||
/**
|
||||
* Get custom headers for API requests
|
||||
* Override for store-specific headers
|
||||
*/
|
||||
getCustomHeaders(): Record<string, string> {
|
||||
const cName = this.getCName();
|
||||
return {
|
||||
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
|
||||
Origin: 'https://dutchie.com',
|
||||
Referer: `https://dutchie.com/embedded-menu/${cName}`,
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// FACTORY FUNCTION
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Create a base Dutchie crawler instance
|
||||
* This is the default export used when no per-store override exists
|
||||
*/
|
||||
export function createCrawler(
|
||||
dispensary: Dispensary,
|
||||
options: StoreCrawlOptions = {},
|
||||
hooks: DutchieCrawlerHooks = {},
|
||||
selectors: DutchieSelectors = {}
|
||||
): BaseDutchieCrawler {
|
||||
return new BaseDutchieCrawler(dispensary, options, hooks, selectors);
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// STANDALONE FUNCTIONS (required exports for orchestrator)
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Crawl products using the base Dutchie logic
|
||||
* Per-store files can call this or override it completely
|
||||
*/
|
||||
export async function crawlProducts(
|
||||
dispensary: Dispensary,
|
||||
options: StoreCrawlOptions = {}
|
||||
): Promise<CrawlResult> {
|
||||
const crawler = createCrawler(dispensary, options);
|
||||
return crawler.crawlProducts();
|
||||
}
|
||||
|
||||
/**
|
||||
* Detect structure using the base Dutchie logic
|
||||
*/
|
||||
export async function detectStructure(
|
||||
page: any,
|
||||
dispensary?: Dispensary
|
||||
): Promise<StructureDetectionResult> {
|
||||
const crawler = createCrawler(dispensary || ({} as Dispensary));
|
||||
return crawler.detectStructure(page);
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract products using the base Dutchie logic
|
||||
*/
|
||||
export async function extractProducts(
|
||||
document: any,
|
||||
dispensary?: Dispensary
|
||||
): Promise<ExtractedProduct[]> {
|
||||
const crawler = createCrawler(dispensary || ({} as Dispensary));
|
||||
return crawler.extractProducts(document);
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract images using the base Dutchie logic
|
||||
*/
|
||||
export async function extractImages(
|
||||
document: any,
|
||||
dispensary?: Dispensary
|
||||
): Promise<ExtractedImage[]> {
|
||||
const crawler = createCrawler(dispensary || ({} as Dispensary));
|
||||
return crawler.extractImages(document);
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract stock using the base Dutchie logic
|
||||
*/
|
||||
export async function extractStock(
|
||||
document: any,
|
||||
dispensary?: Dispensary
|
||||
): Promise<ExtractedStock[]> {
|
||||
const crawler = createCrawler(dispensary || ({} as Dispensary));
|
||||
return crawler.extractStock(document);
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract pagination using the base Dutchie logic
|
||||
*/
|
||||
export async function extractPagination(
|
||||
document: any,
|
||||
dispensary?: Dispensary
|
||||
): Promise<ExtractedPagination> {
|
||||
const crawler = createCrawler(dispensary || ({} as Dispensary));
|
||||
return crawler.extractPagination(document);
|
||||
}
|
||||
@@ -1,330 +0,0 @@
|
||||
/**
|
||||
* Base Jane Crawler Template (PLACEHOLDER)
|
||||
*
|
||||
* This is the base template for all Jane (iheartjane) store crawlers.
|
||||
* Per-store crawlers extend this by overriding specific methods.
|
||||
*
|
||||
* TODO: Implement Jane-specific crawling logic (Algolia-based)
|
||||
*/
|
||||
|
||||
import { Dispensary } from '../../dutchie-az/types';
|
||||
import {
|
||||
StoreCrawlOptions,
|
||||
CrawlResult,
|
||||
StructureDetectionResult,
|
||||
ExtractedProduct,
|
||||
ExtractedImage,
|
||||
ExtractedStock,
|
||||
ExtractedPagination,
|
||||
} from './base-dutchie';
|
||||
|
||||
// Re-export types
|
||||
export {
|
||||
StoreCrawlOptions,
|
||||
CrawlResult,
|
||||
StructureDetectionResult,
|
||||
ExtractedProduct,
|
||||
ExtractedImage,
|
||||
ExtractedStock,
|
||||
ExtractedPagination,
|
||||
};
|
||||
|
||||
// ============================================================
|
||||
// JANE-SPECIFIC TYPES
|
||||
// ============================================================
|
||||
|
||||
export interface JaneConfig {
|
||||
algoliaAppId?: string;
|
||||
algoliaApiKey?: string;
|
||||
algoliaIndex?: string;
|
||||
storeId?: string;
|
||||
}
|
||||
|
||||
export interface JaneSelectors {
|
||||
productContainer?: string;
|
||||
productName?: string;
|
||||
productPrice?: string;
|
||||
productImage?: string;
|
||||
productCategory?: string;
|
||||
productBrand?: string;
|
||||
pagination?: string;
|
||||
loadMore?: string;
|
||||
}
|
||||
|
||||
export const DEFAULT_JANE_SELECTORS: JaneSelectors = {
|
||||
productContainer: '[data-testid="product-card"], .product-card',
|
||||
productName: '[data-testid="product-name"], .product-name',
|
||||
productPrice: '[data-testid="product-price"], .product-price',
|
||||
productImage: '.product-image img, [data-testid="product-image"] img',
|
||||
productCategory: '.product-category',
|
||||
productBrand: '.product-brand, [data-testid="brand-name"]',
|
||||
loadMore: '[data-testid="load-more"], .load-more-btn',
|
||||
};
|
||||
|
||||
// ============================================================
|
||||
// BASE JANE CRAWLER CLASS
|
||||
// ============================================================
|
||||
|
||||
export class BaseJaneCrawler {
|
||||
protected dispensary: Dispensary;
|
||||
protected options: StoreCrawlOptions;
|
||||
protected selectors: JaneSelectors;
|
||||
protected janeConfig: JaneConfig;
|
||||
|
||||
constructor(
|
||||
dispensary: Dispensary,
|
||||
options: StoreCrawlOptions = {},
|
||||
selectors: JaneSelectors = {},
|
||||
janeConfig: JaneConfig = {}
|
||||
) {
|
||||
this.dispensary = dispensary;
|
||||
this.options = {
|
||||
pricingType: 'rec',
|
||||
useBothModes: false,
|
||||
downloadImages: true,
|
||||
trackStock: true,
|
||||
timeoutMs: 30000,
|
||||
...options,
|
||||
};
|
||||
this.selectors = { ...DEFAULT_JANE_SELECTORS, ...selectors };
|
||||
this.janeConfig = janeConfig;
|
||||
}
|
||||
|
||||
/**
|
||||
* Main entry point - crawl products for this dispensary
|
||||
* TODO: Implement Jane/Algolia-specific crawling
|
||||
*/
|
||||
async crawlProducts(): Promise<CrawlResult> {
|
||||
const startTime = Date.now();
|
||||
console.warn(`[BaseJaneCrawler] Jane crawling not yet implemented for ${this.dispensary.name}`);
|
||||
return {
|
||||
success: false,
|
||||
dispensaryId: this.dispensary.id || 0,
|
||||
productsFound: 0,
|
||||
productsFetched: 0,
|
||||
productsUpserted: 0,
|
||||
snapshotsCreated: 0,
|
||||
imagesDownloaded: 0,
|
||||
errorMessage: 'Jane crawler not yet implemented',
|
||||
durationMs: Date.now() - startTime,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Detect page structure for sandbox discovery mode
|
||||
* Jane uses Algolia, so we look for Algolia config
|
||||
*/
|
||||
async detectStructure(page: any): Promise<StructureDetectionResult> {
|
||||
const result: StructureDetectionResult = {
|
||||
success: false,
|
||||
menuType: 'unknown',
|
||||
selectors: {},
|
||||
pagination: { type: 'none' },
|
||||
errors: [],
|
||||
metadata: {},
|
||||
};
|
||||
|
||||
try {
|
||||
if (page && typeof page.evaluate === 'function') {
|
||||
// Look for Jane/Algolia indicators
|
||||
const detection = await page.evaluate(() => {
|
||||
// Check for iheartjane in page
|
||||
const hasJane = document.documentElement.innerHTML.includes('iheartjane') ||
|
||||
document.documentElement.innerHTML.includes('jane-menu');
|
||||
|
||||
// Look for Algolia config
|
||||
const scripts = Array.from(document.querySelectorAll('script'));
|
||||
let algoliaConfig: any = null;
|
||||
|
||||
for (const script of scripts) {
|
||||
const content = script.textContent || '';
|
||||
if (content.includes('algolia') || content.includes('ALGOLIA')) {
|
||||
// Try to extract config
|
||||
const appIdMatch = content.match(/applicationId['":\s]+['"]([^'"]+)['"]/);
|
||||
const apiKeyMatch = content.match(/apiKey['":\s]+['"]([^'"]+)['"]/);
|
||||
if (appIdMatch && apiKeyMatch) {
|
||||
algoliaConfig = {
|
||||
appId: appIdMatch[1],
|
||||
apiKey: apiKeyMatch[1],
|
||||
};
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return {
|
||||
hasJane,
|
||||
algoliaConfig,
|
||||
};
|
||||
});
|
||||
|
||||
if (detection.hasJane) {
|
||||
result.menuType = 'jane';
|
||||
result.success = true;
|
||||
result.metadata = detection;
|
||||
|
||||
if (detection.algoliaConfig) {
|
||||
result.metadata.algoliaAppId = detection.algoliaConfig.appId;
|
||||
result.metadata.algoliaApiKey = detection.algoliaConfig.apiKey;
|
||||
}
|
||||
}
|
||||
}
|
||||
} catch (error: any) {
|
||||
result.errors.push(`Detection error: ${error.message}`);
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract products from Algolia response or page
|
||||
*/
|
||||
async extractProducts(document: any): Promise<ExtractedProduct[]> {
|
||||
// If document is Algolia hits array
|
||||
if (Array.isArray(document)) {
|
||||
return document.map((hit) => this.mapAlgoliaHit(hit));
|
||||
}
|
||||
|
||||
console.warn('[BaseJaneCrawler] extractProducts not yet fully implemented');
|
||||
return [];
|
||||
}
|
||||
|
||||
/**
|
||||
* Map Algolia hit to ExtractedProduct
|
||||
*/
|
||||
protected mapAlgoliaHit(hit: any): ExtractedProduct {
|
||||
return {
|
||||
externalId: hit.objectID || hit.id || hit.product_id,
|
||||
name: hit.name || hit.product_name,
|
||||
brand: hit.brand || hit.brand_name,
|
||||
category: hit.category || hit.kind,
|
||||
subcategory: hit.subcategory,
|
||||
price: hit.price || hit.bucket_price,
|
||||
priceRec: hit.prices?.rec || hit.price_rec,
|
||||
priceMed: hit.prices?.med || hit.price_med,
|
||||
weight: hit.weight || hit.amount,
|
||||
thcContent: hit.percent_thc ? `${hit.percent_thc}%` : undefined,
|
||||
cbdContent: hit.percent_cbd ? `${hit.percent_cbd}%` : undefined,
|
||||
description: hit.description,
|
||||
imageUrl: hit.image_url || hit.product_image_url,
|
||||
stockStatus: hit.available ? 'in_stock' : 'out_of_stock',
|
||||
quantity: hit.quantity_available,
|
||||
raw: hit,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract images from document
|
||||
*/
|
||||
async extractImages(document: any): Promise<ExtractedImage[]> {
|
||||
if (Array.isArray(document)) {
|
||||
return document
|
||||
.filter((hit) => hit.image_url || hit.product_image_url)
|
||||
.map((hit, i) => ({
|
||||
productId: hit.objectID || hit.id || `jane-product-${i}`,
|
||||
imageUrl: hit.image_url || hit.product_image_url,
|
||||
isPrimary: true,
|
||||
position: 0,
|
||||
}));
|
||||
}
|
||||
|
||||
return [];
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract stock information from document
|
||||
*/
|
||||
async extractStock(document: any): Promise<ExtractedStock[]> {
|
||||
if (Array.isArray(document)) {
|
||||
return document.map((hit) => ({
|
||||
productId: hit.objectID || hit.id,
|
||||
status: hit.available ? 'in_stock' as const : 'out_of_stock' as const,
|
||||
quantity: hit.quantity_available,
|
||||
lastChecked: new Date(),
|
||||
}));
|
||||
}
|
||||
|
||||
return [];
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract pagination information
|
||||
* Algolia uses cursor-based pagination
|
||||
*/
|
||||
async extractPagination(document: any): Promise<ExtractedPagination> {
|
||||
if (document && typeof document === 'object' && !Array.isArray(document)) {
|
||||
return {
|
||||
hasNextPage: document.page < document.nbPages - 1,
|
||||
currentPage: document.page,
|
||||
totalPages: document.nbPages,
|
||||
totalProducts: document.nbHits,
|
||||
};
|
||||
}
|
||||
|
||||
return { hasNextPage: false };
|
||||
}
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// FACTORY FUNCTION
|
||||
// ============================================================
|
||||
|
||||
export function createCrawler(
|
||||
dispensary: Dispensary,
|
||||
options: StoreCrawlOptions = {},
|
||||
selectors: JaneSelectors = {},
|
||||
janeConfig: JaneConfig = {}
|
||||
): BaseJaneCrawler {
|
||||
return new BaseJaneCrawler(dispensary, options, selectors, janeConfig);
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// STANDALONE FUNCTIONS
|
||||
// ============================================================
|
||||
|
||||
export async function crawlProducts(
|
||||
dispensary: Dispensary,
|
||||
options: StoreCrawlOptions = {}
|
||||
): Promise<CrawlResult> {
|
||||
const crawler = createCrawler(dispensary, options);
|
||||
return crawler.crawlProducts();
|
||||
}
|
||||
|
||||
export async function detectStructure(
|
||||
page: any,
|
||||
dispensary?: Dispensary
|
||||
): Promise<StructureDetectionResult> {
|
||||
const crawler = createCrawler(dispensary || ({} as Dispensary));
|
||||
return crawler.detectStructure(page);
|
||||
}
|
||||
|
||||
export async function extractProducts(
|
||||
document: any,
|
||||
dispensary?: Dispensary
|
||||
): Promise<ExtractedProduct[]> {
|
||||
const crawler = createCrawler(dispensary || ({} as Dispensary));
|
||||
return crawler.extractProducts(document);
|
||||
}
|
||||
|
||||
export async function extractImages(
|
||||
document: any,
|
||||
dispensary?: Dispensary
|
||||
): Promise<ExtractedImage[]> {
|
||||
const crawler = createCrawler(dispensary || ({} as Dispensary));
|
||||
return crawler.extractImages(document);
|
||||
}
|
||||
|
||||
export async function extractStock(
|
||||
document: any,
|
||||
dispensary?: Dispensary
|
||||
): Promise<ExtractedStock[]> {
|
||||
const crawler = createCrawler(dispensary || ({} as Dispensary));
|
||||
return crawler.extractStock(document);
|
||||
}
|
||||
|
||||
export async function extractPagination(
|
||||
document: any,
|
||||
dispensary?: Dispensary
|
||||
): Promise<ExtractedPagination> {
|
||||
const crawler = createCrawler(dispensary || ({} as Dispensary));
|
||||
return crawler.extractPagination(document);
|
||||
}
|
||||
@@ -1,212 +0,0 @@
|
||||
/**
|
||||
* Base Treez Crawler Template (PLACEHOLDER)
|
||||
*
|
||||
* This is the base template for all Treez store crawlers.
|
||||
* Per-store crawlers extend this by overriding specific methods.
|
||||
*
|
||||
* TODO: Implement Treez-specific crawling logic
|
||||
*/
|
||||
|
||||
import { Dispensary } from '../../dutchie-az/types';
|
||||
import {
|
||||
StoreCrawlOptions,
|
||||
CrawlResult,
|
||||
StructureDetectionResult,
|
||||
ExtractedProduct,
|
||||
ExtractedImage,
|
||||
ExtractedStock,
|
||||
ExtractedPagination,
|
||||
} from './base-dutchie';
|
||||
|
||||
// Re-export types
|
||||
export {
|
||||
StoreCrawlOptions,
|
||||
CrawlResult,
|
||||
StructureDetectionResult,
|
||||
ExtractedProduct,
|
||||
ExtractedImage,
|
||||
ExtractedStock,
|
||||
ExtractedPagination,
|
||||
};
|
||||
|
||||
// ============================================================
|
||||
// TREEZ-SPECIFIC TYPES
|
||||
// ============================================================
|
||||
|
||||
export interface TreezSelectors {
|
||||
productContainer?: string;
|
||||
productName?: string;
|
||||
productPrice?: string;
|
||||
productImage?: string;
|
||||
productCategory?: string;
|
||||
productBrand?: string;
|
||||
addToCart?: string;
|
||||
pagination?: string;
|
||||
}
|
||||
|
||||
export const DEFAULT_TREEZ_SELECTORS: TreezSelectors = {
|
||||
productContainer: '.product-tile, [class*="ProductCard"]',
|
||||
productName: '.product-name, [class*="ProductName"]',
|
||||
productPrice: '.product-price, [class*="ProductPrice"]',
|
||||
productImage: '.product-image img',
|
||||
productCategory: '.product-category',
|
||||
productBrand: '.product-brand',
|
||||
addToCart: '.add-to-cart-btn',
|
||||
pagination: '.pagination',
|
||||
};
|
||||
|
||||
// ============================================================
|
||||
// BASE TREEZ CRAWLER CLASS
|
||||
// ============================================================
|
||||
|
||||
export class BaseTreezCrawler {
|
||||
protected dispensary: Dispensary;
|
||||
protected options: StoreCrawlOptions;
|
||||
protected selectors: TreezSelectors;
|
||||
|
||||
constructor(
|
||||
dispensary: Dispensary,
|
||||
options: StoreCrawlOptions = {},
|
||||
selectors: TreezSelectors = {}
|
||||
) {
|
||||
this.dispensary = dispensary;
|
||||
this.options = {
|
||||
pricingType: 'rec',
|
||||
useBothModes: false,
|
||||
downloadImages: true,
|
||||
trackStock: true,
|
||||
timeoutMs: 30000,
|
||||
...options,
|
||||
};
|
||||
this.selectors = { ...DEFAULT_TREEZ_SELECTORS, ...selectors };
|
||||
}
|
||||
|
||||
/**
|
||||
* Main entry point - crawl products for this dispensary
|
||||
* TODO: Implement Treez-specific crawling
|
||||
*/
|
||||
async crawlProducts(): Promise<CrawlResult> {
|
||||
const startTime = Date.now();
|
||||
console.warn(`[BaseTreezCrawler] Treez crawling not yet implemented for ${this.dispensary.name}`);
|
||||
return {
|
||||
success: false,
|
||||
dispensaryId: this.dispensary.id || 0,
|
||||
productsFound: 0,
|
||||
productsFetched: 0,
|
||||
productsUpserted: 0,
|
||||
snapshotsCreated: 0,
|
||||
imagesDownloaded: 0,
|
||||
errorMessage: 'Treez crawler not yet implemented',
|
||||
durationMs: Date.now() - startTime,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Detect page structure for sandbox discovery mode
|
||||
*/
|
||||
async detectStructure(page: any): Promise<StructureDetectionResult> {
|
||||
return {
|
||||
success: false,
|
||||
menuType: 'unknown',
|
||||
selectors: {},
|
||||
pagination: { type: 'none' },
|
||||
errors: ['Treez structure detection not yet implemented'],
|
||||
metadata: {},
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract products from page/document
|
||||
*/
|
||||
async extractProducts(document: any): Promise<ExtractedProduct[]> {
|
||||
console.warn('[BaseTreezCrawler] extractProducts not yet implemented');
|
||||
return [];
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract images from document
|
||||
*/
|
||||
async extractImages(document: any): Promise<ExtractedImage[]> {
|
||||
console.warn('[BaseTreezCrawler] extractImages not yet implemented');
|
||||
return [];
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract stock information from document
|
||||
*/
|
||||
async extractStock(document: any): Promise<ExtractedStock[]> {
|
||||
console.warn('[BaseTreezCrawler] extractStock not yet implemented');
|
||||
return [];
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract pagination information from document
|
||||
*/
|
||||
async extractPagination(document: any): Promise<ExtractedPagination> {
|
||||
return { hasNextPage: false };
|
||||
}
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// FACTORY FUNCTION
|
||||
// ============================================================
|
||||
|
||||
export function createCrawler(
|
||||
dispensary: Dispensary,
|
||||
options: StoreCrawlOptions = {},
|
||||
selectors: TreezSelectors = {}
|
||||
): BaseTreezCrawler {
|
||||
return new BaseTreezCrawler(dispensary, options, selectors);
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// STANDALONE FUNCTIONS
|
||||
// ============================================================
|
||||
|
||||
export async function crawlProducts(
|
||||
dispensary: Dispensary,
|
||||
options: StoreCrawlOptions = {}
|
||||
): Promise<CrawlResult> {
|
||||
const crawler = createCrawler(dispensary, options);
|
||||
return crawler.crawlProducts();
|
||||
}
|
||||
|
||||
export async function detectStructure(
|
||||
page: any,
|
||||
dispensary?: Dispensary
|
||||
): Promise<StructureDetectionResult> {
|
||||
const crawler = createCrawler(dispensary || ({} as Dispensary));
|
||||
return crawler.detectStructure(page);
|
||||
}
|
||||
|
||||
export async function extractProducts(
|
||||
document: any,
|
||||
dispensary?: Dispensary
|
||||
): Promise<ExtractedProduct[]> {
|
||||
const crawler = createCrawler(dispensary || ({} as Dispensary));
|
||||
return crawler.extractProducts(document);
|
||||
}
|
||||
|
||||
export async function extractImages(
|
||||
document: any,
|
||||
dispensary?: Dispensary
|
||||
): Promise<ExtractedImage[]> {
|
||||
const crawler = createCrawler(dispensary || ({} as Dispensary));
|
||||
return crawler.extractImages(document);
|
||||
}
|
||||
|
||||
export async function extractStock(
|
||||
document: any,
|
||||
dispensary?: Dispensary
|
||||
): Promise<ExtractedStock[]> {
|
||||
const crawler = createCrawler(dispensary || ({} as Dispensary));
|
||||
return crawler.extractStock(document);
|
||||
}
|
||||
|
||||
export async function extractPagination(
|
||||
document: any,
|
||||
dispensary?: Dispensary
|
||||
): Promise<ExtractedPagination> {
|
||||
const crawler = createCrawler(dispensary || ({} as Dispensary));
|
||||
return crawler.extractPagination(document);
|
||||
}
|
||||
@@ -1,27 +0,0 @@
|
||||
/**
|
||||
* Base Crawler Templates Index
|
||||
*
|
||||
* Exports all base crawler templates for easy importing.
|
||||
*/
|
||||
|
||||
// Dutchie base (primary implementation)
|
||||
export * from './base-dutchie';
|
||||
|
||||
// Treez base (placeholder)
|
||||
export * as Treez from './base-treez';
|
||||
|
||||
// Jane base (placeholder)
|
||||
export * as Jane from './base-jane';
|
||||
|
||||
// Re-export common types from dutchie for convenience
|
||||
export type {
|
||||
StoreCrawlOptions,
|
||||
CrawlResult,
|
||||
StructureDetectionResult,
|
||||
ExtractedProduct,
|
||||
ExtractedImage,
|
||||
ExtractedStock,
|
||||
ExtractedPagination,
|
||||
DutchieCrawlerHooks,
|
||||
DutchieSelectors,
|
||||
} from './base-dutchie';
|
||||
@@ -1,9 +0,0 @@
|
||||
/**
|
||||
* Base Dutchie Crawler Template (Re-export for backward compatibility)
|
||||
*
|
||||
* DEPRECATED: Import from '../base/base-dutchie' instead.
|
||||
* This file re-exports everything from the new location for existing code.
|
||||
*/
|
||||
|
||||
// Re-export everything from the new base location
|
||||
export * from '../base/base-dutchie';
|
||||
@@ -1,118 +0,0 @@
|
||||
/**
|
||||
* Trulieve Scottsdale - Per-Store Dutchie Crawler
|
||||
*
|
||||
* Store ID: 101
|
||||
* Profile Key: trulieve-scottsdale
|
||||
* Platform Dispensary ID: 5eaf489fa8a61801212577cc
|
||||
*
|
||||
* Phase 1: Identity implementation - no overrides, just uses base Dutchie logic.
|
||||
* Future: Add store-specific selectors, timing, or custom logic as needed.
|
||||
*/
|
||||
|
||||
import {
|
||||
BaseDutchieCrawler,
|
||||
StoreCrawlOptions,
|
||||
CrawlResult,
|
||||
DutchieSelectors,
|
||||
crawlProducts as baseCrawlProducts,
|
||||
} from '../../base/base-dutchie';
|
||||
import { Dispensary } from '../../../dutchie-az/types';
|
||||
|
||||
// Re-export CrawlResult for the orchestrator
|
||||
export { CrawlResult };
|
||||
|
||||
// ============================================================
|
||||
// STORE CONFIGURATION
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Store-specific configuration
|
||||
* These can be used to customize crawler behavior for this store
|
||||
*/
|
||||
export const STORE_CONFIG = {
|
||||
storeId: 101,
|
||||
profileKey: 'trulieve-scottsdale',
|
||||
name: 'Trulieve of Scottsdale Dispensary',
|
||||
platformDispensaryId: '5eaf489fa8a61801212577cc',
|
||||
|
||||
// Store-specific overrides (none for Phase 1)
|
||||
customOptions: {
|
||||
// Example future overrides:
|
||||
// pricingType: 'rec',
|
||||
// useBothModes: true,
|
||||
// customHeaders: {},
|
||||
// maxRetries: 3,
|
||||
},
|
||||
};
|
||||
|
||||
// ============================================================
|
||||
// STORE CRAWLER CLASS
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* TrulieveScottsdaleCrawler - Per-store crawler for Trulieve Scottsdale
|
||||
*
|
||||
* Phase 1: Identity implementation - extends BaseDutchieCrawler with no overrides.
|
||||
* Future phases can override methods like:
|
||||
* - getCName() for custom slug handling
|
||||
* - crawlProducts() for completely custom logic
|
||||
* - Add hooks for pre/post processing
|
||||
*/
|
||||
export class TrulieveScottsdaleCrawler extends BaseDutchieCrawler {
|
||||
constructor(dispensary: Dispensary, options: StoreCrawlOptions = {}) {
|
||||
// Merge store-specific options with provided options
|
||||
const mergedOptions: StoreCrawlOptions = {
|
||||
...STORE_CONFIG.customOptions,
|
||||
...options,
|
||||
};
|
||||
|
||||
super(dispensary, mergedOptions);
|
||||
}
|
||||
|
||||
// Phase 1: No overrides - use base implementation
|
||||
// Future phases can add overrides here:
|
||||
//
|
||||
// async crawlProducts(): Promise<CrawlResult> {
|
||||
// // Custom pre-processing
|
||||
// // ...
|
||||
// const result = await super.crawlProducts();
|
||||
// // Custom post-processing
|
||||
// // ...
|
||||
// return result;
|
||||
// }
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// EXPORTED CRAWL FUNCTION
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Main entry point for the orchestrator
|
||||
*
|
||||
* The orchestrator calls: mod.crawlProducts(dispensary, options)
|
||||
* This function creates a TrulieveScottsdaleCrawler and runs it.
|
||||
*/
|
||||
export async function crawlProducts(
|
||||
dispensary: Dispensary,
|
||||
options: StoreCrawlOptions = {}
|
||||
): Promise<CrawlResult> {
|
||||
console.log(`[TrulieveScottsdale] Using per-store crawler for ${dispensary.name}`);
|
||||
|
||||
const crawler = new TrulieveScottsdaleCrawler(dispensary, options);
|
||||
return crawler.crawlProducts();
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// FACTORY FUNCTION (alternative API)
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Create a crawler instance without running it
|
||||
* Useful for testing or when you need to configure before running
|
||||
*/
|
||||
export function createCrawler(
|
||||
dispensary: Dispensary,
|
||||
options: StoreCrawlOptions = {}
|
||||
): TrulieveScottsdaleCrawler {
|
||||
return new TrulieveScottsdaleCrawler(dispensary, options);
|
||||
}
|
||||
@@ -77,7 +77,9 @@ export function getPool(): Pool {
|
||||
* This is a getter that lazily initializes on first access.
|
||||
*/
|
||||
export const pool = {
|
||||
query: (...args: Parameters<Pool['query']>) => getPool().query(...args),
|
||||
query: (queryTextOrConfig: string | import('pg').QueryConfig, values?: any[]): Promise<import('pg').QueryResult<any>> => {
|
||||
return getPool().query(queryTextOrConfig as any, values);
|
||||
},
|
||||
connect: () => getPool().connect(),
|
||||
end: () => getPool().end(),
|
||||
on: (event: 'error' | 'connect' | 'acquire' | 'remove' | 'release', listener: (...args: any[]) => void) => getPool().on(event as any, listener),
|
||||
|
||||
@@ -26,13 +26,377 @@ import {
|
||||
mapLocationRowToLocation,
|
||||
} from './types';
|
||||
import { DiscoveryCity } from './types';
|
||||
import {
|
||||
executeGraphQL,
|
||||
fetchPage,
|
||||
extractNextData,
|
||||
GRAPHQL_HASHES,
|
||||
setProxy,
|
||||
} from '../platforms/dutchie/client';
|
||||
import { getStateProxy, getRandomProxy } from '../utils/proxyManager';
|
||||
|
||||
puppeteer.use(StealthPlugin());
|
||||
|
||||
// ============================================================
|
||||
// PROXY INITIALIZATION
|
||||
// ============================================================
|
||||
// Call initDiscoveryProxy() before any discovery operations to
|
||||
// set up proxy if USE_PROXY=true environment variable is set.
|
||||
// This is opt-in and does NOT break existing behavior.
|
||||
// ============================================================
|
||||
|
||||
let proxyInitialized = false;
|
||||
|
||||
/**
|
||||
* Initialize proxy for discovery operations
|
||||
* Only runs if USE_PROXY=true is set in environment
|
||||
* Safe to call multiple times - only initializes once
|
||||
*
|
||||
* @param stateCode - Optional state code for state-specific proxy (e.g., 'AZ', 'CA')
|
||||
* @returns true if proxy was set, false if skipped or failed
|
||||
*/
|
||||
export async function initDiscoveryProxy(stateCode?: string): Promise<boolean> {
|
||||
// Skip if already initialized
|
||||
if (proxyInitialized) {
|
||||
return true;
|
||||
}
|
||||
|
||||
// Skip if USE_PROXY is not enabled
|
||||
if (process.env.USE_PROXY !== 'true') {
|
||||
console.log('[LocationDiscovery] Proxy disabled (USE_PROXY != true)');
|
||||
return false;
|
||||
}
|
||||
|
||||
try {
|
||||
// Get proxy - prefer state-specific if state code provided
|
||||
const proxyConfig = stateCode
|
||||
? await getStateProxy(stateCode)
|
||||
: await getRandomProxy();
|
||||
|
||||
if (!proxyConfig) {
|
||||
console.warn('[LocationDiscovery] No proxy available, proceeding without proxy');
|
||||
return false;
|
||||
}
|
||||
|
||||
// Build proxy URL with auth if needed
|
||||
let proxyUrl = proxyConfig.server;
|
||||
if (proxyConfig.username && proxyConfig.password) {
|
||||
const url = new URL(proxyConfig.server);
|
||||
url.username = proxyConfig.username;
|
||||
url.password = proxyConfig.password;
|
||||
proxyUrl = url.toString();
|
||||
}
|
||||
|
||||
// Set proxy on the Dutchie client
|
||||
setProxy(proxyUrl);
|
||||
proxyInitialized = true;
|
||||
|
||||
console.log(`[LocationDiscovery] Proxy initialized for ${stateCode || 'general'} discovery`);
|
||||
return true;
|
||||
} catch (error: any) {
|
||||
console.error(`[LocationDiscovery] Failed to initialize proxy: ${error.message}`);
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Reset proxy initialization flag (for testing or re-initialization)
|
||||
*/
|
||||
export function resetProxyInit(): void {
|
||||
proxyInitialized = false;
|
||||
setProxy(null);
|
||||
}
|
||||
|
||||
const PLATFORM = 'dutchie';
|
||||
|
||||
// ============================================================
|
||||
// GRAPHQL / API FETCHING
|
||||
// CITY-BASED DISCOVERY (CANONICAL SOURCE OF TRUTH)
|
||||
// ============================================================
|
||||
// GraphQL with city+state filter is the SOURCE OF TRUTH for database data.
|
||||
//
|
||||
// Method:
|
||||
// 1. Get city list from statesWithDispensaries (in __NEXT_DATA__)
|
||||
// 2. Query stores per city using city + state GraphQL filter
|
||||
// 3. This gives us complete, accurate dispensary data
|
||||
//
|
||||
// Geo-coordinate queries (nearLat/nearLng) are ONLY for showing search
|
||||
// results to users (e.g., "stores within 20 miles of me").
|
||||
// They are NOT a source of truth for establishing database records.
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* State with dispensary cities from Dutchie's statesWithDispensaries data
|
||||
*/
|
||||
export interface StateWithCities {
|
||||
name: string; // State code (e.g., "CA", "AZ")
|
||||
country: string; // Country code (e.g., "US")
|
||||
cities: string[]; // Array of city names
|
||||
}
|
||||
|
||||
/**
|
||||
* Fetch all states with their cities from Dutchie's __NEXT_DATA__
|
||||
*
|
||||
* This fetches a city page and extracts the statesWithDispensaries data
|
||||
* which contains all states and their cities where Dutchie has dispensaries.
|
||||
*/
|
||||
export async function fetchStatesWithDispensaries(
|
||||
options: { verbose?: boolean } = {}
|
||||
): Promise<StateWithCities[]> {
|
||||
const { verbose = false } = options;
|
||||
|
||||
// Initialize proxy if USE_PROXY=true
|
||||
await initDiscoveryProxy();
|
||||
|
||||
console.log('[LocationDiscovery] Fetching statesWithDispensaries from Dutchie...');
|
||||
|
||||
// Fetch any city page to get the __NEXT_DATA__ with statesWithDispensaries
|
||||
// Using a known city that's likely to exist
|
||||
const result = await fetchPage('/dispensaries/az/phoenix', { maxRetries: 3 });
|
||||
|
||||
if (!result || result.status !== 200) {
|
||||
console.error('[LocationDiscovery] Failed to fetch city page');
|
||||
return [];
|
||||
}
|
||||
|
||||
const nextData = extractNextData(result.html);
|
||||
if (!nextData) {
|
||||
console.error('[LocationDiscovery] No __NEXT_DATA__ found');
|
||||
return [];
|
||||
}
|
||||
|
||||
// Extract statesWithDispensaries from Apollo state
|
||||
const apolloState = nextData.props?.pageProps?.initialApolloState;
|
||||
if (!apolloState) {
|
||||
console.error('[LocationDiscovery] No initialApolloState found');
|
||||
return [];
|
||||
}
|
||||
|
||||
// Find ROOT_QUERY.statesWithDispensaries
|
||||
const rootQuery = apolloState['ROOT_QUERY'];
|
||||
if (!rootQuery) {
|
||||
console.error('[LocationDiscovery] No ROOT_QUERY found');
|
||||
return [];
|
||||
}
|
||||
|
||||
// The statesWithDispensaries is at ROOT_QUERY.statesWithDispensaries
|
||||
const statesRefs = rootQuery.statesWithDispensaries;
|
||||
if (!Array.isArray(statesRefs)) {
|
||||
console.error('[LocationDiscovery] statesWithDispensaries not found or not an array');
|
||||
return [];
|
||||
}
|
||||
|
||||
// Resolve the references to actual state data
|
||||
const states: StateWithCities[] = [];
|
||||
for (const ref of statesRefs) {
|
||||
// ref might be { __ref: "StateWithDispensaries:0" } or direct object
|
||||
let stateData: any;
|
||||
|
||||
if (ref && ref.__ref) {
|
||||
stateData = apolloState[ref.__ref];
|
||||
} else {
|
||||
stateData = ref;
|
||||
}
|
||||
|
||||
if (stateData && stateData.name) {
|
||||
// Parse cities JSON array if it's a string
|
||||
let cities = stateData.cities;
|
||||
if (typeof cities === 'string') {
|
||||
try {
|
||||
cities = JSON.parse(cities);
|
||||
} catch {
|
||||
cities = [];
|
||||
}
|
||||
}
|
||||
|
||||
states.push({
|
||||
name: stateData.name,
|
||||
country: stateData.country || 'US',
|
||||
cities: Array.isArray(cities) ? cities : [],
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
if (verbose) {
|
||||
console.log(`[LocationDiscovery] Found ${states.length} states`);
|
||||
for (const state of states) {
|
||||
console.log(` ${state.name}: ${state.cities.length} cities`);
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`[LocationDiscovery] Loaded ${states.length} states with cities`);
|
||||
return states;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get cities for a specific state
|
||||
*/
|
||||
export async function getCitiesForState(
|
||||
stateCode: string,
|
||||
options: { verbose?: boolean } = {}
|
||||
): Promise<string[]> {
|
||||
const states = await fetchStatesWithDispensaries(options);
|
||||
const state = states.find(s => s.name.toUpperCase() === stateCode.toUpperCase());
|
||||
|
||||
if (!state) {
|
||||
console.warn(`[LocationDiscovery] No cities found for state: ${stateCode}`);
|
||||
return [];
|
||||
}
|
||||
|
||||
console.log(`[LocationDiscovery] Found ${state.cities.length} cities for ${stateCode}`);
|
||||
return state.cities;
|
||||
}
|
||||
|
||||
/**
|
||||
* Fetch dispensaries for a specific city+state using GraphQL
|
||||
*
|
||||
* This is the CORRECT method for establishing database data:
|
||||
* Uses city + state filter, NOT geo-coordinates.
|
||||
*/
|
||||
export async function fetchDispensariesByCityState(
|
||||
city: string,
|
||||
stateCode: string,
|
||||
options: { verbose?: boolean; perPage?: number; maxPages?: number } = {}
|
||||
): Promise<DutchieLocationResponse[]> {
|
||||
const { verbose = false, perPage = 200, maxPages = 10 } = options;
|
||||
|
||||
// Initialize proxy if USE_PROXY=true (state-specific proxy preferred)
|
||||
await initDiscoveryProxy(stateCode);
|
||||
|
||||
console.log(`[LocationDiscovery] Fetching dispensaries for ${city}, ${stateCode}...`);
|
||||
|
||||
const allDispensaries: any[] = [];
|
||||
let page = 0;
|
||||
let hasMore = true;
|
||||
|
||||
while (hasMore && page < maxPages) {
|
||||
const variables = {
|
||||
dispensaryFilter: {
|
||||
activeOnly: true,
|
||||
city: city,
|
||||
state: stateCode,
|
||||
},
|
||||
page,
|
||||
perPage,
|
||||
};
|
||||
|
||||
try {
|
||||
const result = await executeGraphQL(
|
||||
'ConsumerDispensaries',
|
||||
variables,
|
||||
GRAPHQL_HASHES.ConsumerDispensaries,
|
||||
{ cName: `${city.toLowerCase().replace(/\s+/g, '-')}-${stateCode.toLowerCase()}`, maxRetries: 2, retryOn403: true }
|
||||
);
|
||||
|
||||
const dispensaries = result?.data?.filteredDispensaries || [];
|
||||
|
||||
if (verbose) {
|
||||
console.log(`[LocationDiscovery] Page ${page}: ${dispensaries.length} dispensaries`);
|
||||
}
|
||||
|
||||
if (dispensaries.length === 0) {
|
||||
hasMore = false;
|
||||
} else {
|
||||
// Filter to ensure we only get dispensaries in the correct state
|
||||
const stateFiltered = dispensaries.filter((d: any) =>
|
||||
d.location?.state?.toUpperCase() === stateCode.toUpperCase()
|
||||
);
|
||||
allDispensaries.push(...stateFiltered);
|
||||
|
||||
if (dispensaries.length < perPage) {
|
||||
hasMore = false;
|
||||
} else {
|
||||
page++;
|
||||
}
|
||||
}
|
||||
} catch (error: any) {
|
||||
console.error(`[LocationDiscovery] Error fetching page ${page}: ${error.message}`);
|
||||
hasMore = false;
|
||||
}
|
||||
}
|
||||
|
||||
// Dedupe by ID
|
||||
const uniqueMap = new Map<string, any>();
|
||||
for (const d of allDispensaries) {
|
||||
const id = d.id || d._id;
|
||||
if (id && !uniqueMap.has(id)) {
|
||||
uniqueMap.set(id, d);
|
||||
}
|
||||
}
|
||||
|
||||
const unique = Array.from(uniqueMap.values());
|
||||
console.log(`[LocationDiscovery] Found ${unique.length} unique dispensaries in ${city}, ${stateCode}`);
|
||||
|
||||
return unique.map(d => normalizeLocationResponse(d));
|
||||
}
|
||||
|
||||
/**
|
||||
* Fetch ALL dispensaries for a state by querying each city
|
||||
*
|
||||
* This is the canonical method for establishing state data:
|
||||
* 1. Get city list from statesWithDispensaries
|
||||
* 2. Query each city using city+state filter
|
||||
* 3. Dedupe and return all dispensaries
|
||||
*/
|
||||
export async function fetchAllDispensariesForState(
|
||||
stateCode: string,
|
||||
options: { verbose?: boolean; progressCallback?: (city: string, count: number, total: number) => void } = {}
|
||||
): Promise<{ dispensaries: DutchieLocationResponse[]; citiesQueried: number; citiesWithResults: number }> {
|
||||
const { verbose = false, progressCallback } = options;
|
||||
|
||||
console.log(`[LocationDiscovery] Fetching all dispensaries for ${stateCode}...`);
|
||||
|
||||
// Step 1: Get city list
|
||||
const cities = await getCitiesForState(stateCode, { verbose });
|
||||
if (cities.length === 0) {
|
||||
console.warn(`[LocationDiscovery] No cities found for ${stateCode}`);
|
||||
return { dispensaries: [], citiesQueried: 0, citiesWithResults: 0 };
|
||||
}
|
||||
|
||||
console.log(`[LocationDiscovery] Will query ${cities.length} cities for ${stateCode}`);
|
||||
|
||||
// Step 2: Query each city
|
||||
const allDispensaries = new Map<string, DutchieLocationResponse>();
|
||||
let citiesWithResults = 0;
|
||||
|
||||
for (let i = 0; i < cities.length; i++) {
|
||||
const city = cities[i];
|
||||
|
||||
if (progressCallback) {
|
||||
progressCallback(city, i + 1, cities.length);
|
||||
}
|
||||
|
||||
try {
|
||||
const dispensaries = await fetchDispensariesByCityState(city, stateCode, { verbose });
|
||||
|
||||
if (dispensaries.length > 0) {
|
||||
citiesWithResults++;
|
||||
for (const d of dispensaries) {
|
||||
const id = d.id || d.slug;
|
||||
if (id && !allDispensaries.has(id)) {
|
||||
allDispensaries.set(id, d);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Small delay between cities to avoid rate limiting
|
||||
await new Promise(r => setTimeout(r, 300));
|
||||
} catch (error: any) {
|
||||
console.error(`[LocationDiscovery] Error querying ${city}: ${error.message}`);
|
||||
}
|
||||
}
|
||||
|
||||
const result = Array.from(allDispensaries.values());
|
||||
console.log(`[LocationDiscovery] Total: ${result.length} unique dispensaries across ${citiesWithResults}/${cities.length} cities`);
|
||||
|
||||
return {
|
||||
dispensaries: result,
|
||||
citiesQueried: cities.length,
|
||||
citiesWithResults,
|
||||
};
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// GRAPHQL / API FETCHING (LEGACY - PUPPETEER-BASED)
|
||||
// ============================================================
|
||||
|
||||
interface SessionCredentials {
|
||||
@@ -91,57 +455,77 @@ async function closeSession(session: SessionCredentials): Promise<void> {
|
||||
}
|
||||
|
||||
/**
|
||||
* Fetch locations for a city using Dutchie's internal search API.
|
||||
* Fetch locations for a city.
|
||||
*
|
||||
* PRIMARY METHOD: Uses city+state GraphQL filter (source of truth)
|
||||
* FALLBACK: Legacy Puppeteer-based methods for edge cases
|
||||
*/
|
||||
export async function fetchLocationsForCity(
|
||||
city: DiscoveryCity,
|
||||
options: {
|
||||
session?: SessionCredentials;
|
||||
verbose?: boolean;
|
||||
useLegacyMethods?: boolean;
|
||||
} = {}
|
||||
): Promise<DutchieLocationResponse[]> {
|
||||
const { verbose = false } = options;
|
||||
let session = options.session;
|
||||
let shouldCloseSession = false;
|
||||
const { verbose = false, useLegacyMethods = false } = options;
|
||||
|
||||
if (!session) {
|
||||
session = await createSession(city.citySlug);
|
||||
shouldCloseSession = true;
|
||||
}
|
||||
console.log(`[LocationDiscovery] Fetching locations for ${city.cityName}, ${city.stateCode}...`);
|
||||
|
||||
try {
|
||||
console.log(`[LocationDiscovery] Fetching locations for ${city.cityName}, ${city.stateCode}...`);
|
||||
|
||||
// Try multiple approaches to get location data
|
||||
|
||||
// Approach 1: Extract from page __NEXT_DATA__ or similar
|
||||
const locations = await extractLocationsFromPage(session.page, verbose);
|
||||
if (locations.length > 0) {
|
||||
console.log(`[LocationDiscovery] Found ${locations.length} locations from page data`);
|
||||
return locations;
|
||||
}
|
||||
|
||||
// Approach 2: Try the geo-based GraphQL query
|
||||
const geoLocations = await fetchLocationsViaGraphQL(session, city, verbose);
|
||||
if (geoLocations.length > 0) {
|
||||
console.log(`[LocationDiscovery] Found ${geoLocations.length} locations from GraphQL`);
|
||||
return geoLocations;
|
||||
}
|
||||
|
||||
// Approach 3: Scrape visible location cards
|
||||
const scrapedLocations = await scrapeLocationCards(session.page, verbose);
|
||||
if (scrapedLocations.length > 0) {
|
||||
console.log(`[LocationDiscovery] Found ${scrapedLocations.length} locations from scraping`);
|
||||
return scrapedLocations;
|
||||
}
|
||||
|
||||
console.log(`[LocationDiscovery] No locations found for ${city.cityName}`);
|
||||
return [];
|
||||
} finally {
|
||||
if (shouldCloseSession) {
|
||||
await closeSession(session);
|
||||
// PRIMARY METHOD: City+State GraphQL query (SOURCE OF TRUTH)
|
||||
if (city.cityName && city.stateCode) {
|
||||
try {
|
||||
const locations = await fetchDispensariesByCityState(city.cityName, city.stateCode, { verbose });
|
||||
if (locations.length > 0) {
|
||||
console.log(`[LocationDiscovery] Found ${locations.length} locations via GraphQL city+state`);
|
||||
return locations;
|
||||
}
|
||||
} catch (error: any) {
|
||||
console.warn(`[LocationDiscovery] GraphQL city+state failed: ${error.message}`);
|
||||
}
|
||||
}
|
||||
|
||||
// FALLBACK: Legacy Puppeteer-based methods (only if explicitly enabled)
|
||||
if (useLegacyMethods) {
|
||||
let session = options.session;
|
||||
let shouldCloseSession = false;
|
||||
|
||||
if (!session) {
|
||||
session = await createSession(city.citySlug);
|
||||
shouldCloseSession = true;
|
||||
}
|
||||
|
||||
try {
|
||||
// Legacy Approach 1: Extract from page __NEXT_DATA__
|
||||
const locations = await extractLocationsFromPage(session.page, verbose);
|
||||
if (locations.length > 0) {
|
||||
console.log(`[LocationDiscovery] Found ${locations.length} locations from page data (legacy)`);
|
||||
return locations;
|
||||
}
|
||||
|
||||
// Legacy Approach 2: Try the geo-based GraphQL query
|
||||
// NOTE: Geo queries are for SEARCH RESULTS only, not source of truth
|
||||
const geoLocations = await fetchLocationsViaGraphQL(session, city, verbose);
|
||||
if (geoLocations.length > 0) {
|
||||
console.log(`[LocationDiscovery] Found ${geoLocations.length} locations from geo GraphQL (legacy)`);
|
||||
return geoLocations;
|
||||
}
|
||||
|
||||
// Legacy Approach 3: Scrape visible location cards
|
||||
const scrapedLocations = await scrapeLocationCards(session.page, verbose);
|
||||
if (scrapedLocations.length > 0) {
|
||||
console.log(`[LocationDiscovery] Found ${scrapedLocations.length} locations from scraping (legacy)`);
|
||||
return scrapedLocations;
|
||||
}
|
||||
} finally {
|
||||
if (shouldCloseSession) {
|
||||
await closeSession(session);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`[LocationDiscovery] No locations found for ${city.cityName}`);
|
||||
return [];
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -202,33 +586,52 @@ async function extractLocationsFromPage(
|
||||
|
||||
/**
|
||||
* Fetch locations via GraphQL geo-based query.
|
||||
*
|
||||
* Uses ConsumerDispensaries with geo filtering:
|
||||
* - dispensaryFilter.nearLat/nearLng for center point
|
||||
* - dispensaryFilter.distance for radius in miles
|
||||
* - Response at data.filteredDispensaries
|
||||
*/
|
||||
async function fetchLocationsViaGraphQL(
|
||||
session: SessionCredentials,
|
||||
city: DiscoveryCity,
|
||||
verbose: boolean
|
||||
): Promise<DutchieLocationResponse[]> {
|
||||
// Use a known center point for the city or default to a central US location
|
||||
const CITY_COORDS: Record<string, { lat: number; lng: number }> = {
|
||||
'phoenix': { lat: 33.4484, lng: -112.074 },
|
||||
'tucson': { lat: 32.2226, lng: -110.9747 },
|
||||
'scottsdale': { lat: 33.4942, lng: -111.9261 },
|
||||
'mesa': { lat: 33.4152, lng: -111.8315 },
|
||||
'tempe': { lat: 33.4255, lng: -111.94 },
|
||||
'flagstaff': { lat: 35.1983, lng: -111.6513 },
|
||||
// Add more as needed
|
||||
// City center coordinates with appropriate radius
|
||||
const CITY_COORDS: Record<string, { lat: number; lng: number; radius: number }> = {
|
||||
'phoenix': { lat: 33.4484, lng: -112.074, radius: 50 },
|
||||
'tucson': { lat: 32.2226, lng: -110.9747, radius: 50 },
|
||||
'scottsdale': { lat: 33.4942, lng: -111.9261, radius: 30 },
|
||||
'mesa': { lat: 33.4152, lng: -111.8315, radius: 30 },
|
||||
'tempe': { lat: 33.4255, lng: -111.94, radius: 30 },
|
||||
'flagstaff': { lat: 35.1983, lng: -111.6513, radius: 50 },
|
||||
};
|
||||
|
||||
const coords = CITY_COORDS[city.citySlug] || { lat: 33.4484, lng: -112.074 };
|
||||
// State-wide coordinates for full coverage
|
||||
const STATE_COORDS: Record<string, { lat: number; lng: number; radius: number }> = {
|
||||
'AZ': { lat: 33.4484, lng: -112.074, radius: 200 },
|
||||
'CA': { lat: 36.7783, lng: -119.4179, radius: 400 },
|
||||
'CO': { lat: 39.5501, lng: -105.7821, radius: 200 },
|
||||
'FL': { lat: 27.6648, lng: -81.5158, radius: 400 },
|
||||
'MI': { lat: 44.3148, lng: -85.6024, radius: 250 },
|
||||
'NV': { lat: 36.1699, lng: -115.1398, radius: 200 },
|
||||
};
|
||||
|
||||
// Try city-specific coords first, then state-wide, then default
|
||||
const coords = CITY_COORDS[city.citySlug]
|
||||
|| (city.stateCode && STATE_COORDS[city.stateCode])
|
||||
|| { lat: 33.4484, lng: -112.074, radius: 200 };
|
||||
|
||||
// Correct GraphQL variables for ConsumerDispensaries
|
||||
const variables = {
|
||||
dispensariesFilter: {
|
||||
latitude: coords.lat,
|
||||
longitude: coords.lng,
|
||||
distance: 50, // miles
|
||||
state: city.stateCode,
|
||||
city: city.cityName,
|
||||
dispensaryFilter: {
|
||||
activeOnly: true,
|
||||
nearLat: coords.lat,
|
||||
nearLng: coords.lng,
|
||||
distance: coords.radius,
|
||||
},
|
||||
page: 0,
|
||||
perPage: 200,
|
||||
};
|
||||
|
||||
const hash = '0a5bfa6ca1d64ae47bcccb7c8077c87147cbc4e6982c17ceec97a2a4948b311b';
|
||||
@@ -263,8 +666,19 @@ async function fetchLocationsViaGraphQL(
|
||||
return [];
|
||||
}
|
||||
|
||||
const dispensaries = response.data?.data?.consumerDispensaries || [];
|
||||
return dispensaries.map((d: any) => normalizeLocationResponse(d));
|
||||
// Response is at data.filteredDispensaries
|
||||
const dispensaries = response.data?.data?.filteredDispensaries || [];
|
||||
|
||||
// Filter to specific state if needed (radius may include neighboring states)
|
||||
const filtered = city.stateCode
|
||||
? dispensaries.filter((d: any) => d.location?.state === city.stateCode)
|
||||
: dispensaries;
|
||||
|
||||
if (verbose) {
|
||||
console.log(`[LocationDiscovery] GraphQL returned ${dispensaries.length} total, ${filtered.length} in ${city.stateCode || 'all states'}`);
|
||||
}
|
||||
|
||||
return filtered.map((d: any) => normalizeLocationResponse(d));
|
||||
} catch (error: any) {
|
||||
if (verbose) {
|
||||
console.log(`[LocationDiscovery] GraphQL error: ${error.message}`);
|
||||
@@ -373,13 +787,20 @@ function normalizeLocationResponse(raw: any): DutchieLocationResponse {
|
||||
|
||||
/**
|
||||
* Upsert a location into dutchie_discovery_locations.
|
||||
* REQUIRES a valid platform ID (MongoDB ObjectId) - will skip records without one.
|
||||
*/
|
||||
export async function upsertLocation(
|
||||
pool: Pool,
|
||||
location: DutchieLocationResponse,
|
||||
cityId: number | null
|
||||
): Promise<{ id: number; isNew: boolean }> {
|
||||
const platformLocationId = location.id || location.slug;
|
||||
): Promise<{ id: number; isNew: boolean } | null> {
|
||||
// REQUIRE actual platform ID - NO fallback to slug
|
||||
const platformLocationId = location.id;
|
||||
if (!platformLocationId) {
|
||||
console.warn(`[LocationDiscovery] Skipping location without platform ID: ${location.name} (${location.slug})`);
|
||||
return null;
|
||||
}
|
||||
|
||||
const menuUrl = location.menuUrl || `https://dutchie.com/dispensary/${location.slug}`;
|
||||
|
||||
const result = await pool.query(
|
||||
@@ -642,6 +1063,12 @@ export async function discoverLocationsForCity(
|
||||
|
||||
const result = await upsertLocation(pool, location, city.id);
|
||||
|
||||
// Skip locations without valid platform ID
|
||||
if (!result) {
|
||||
errors.push(`Location ${location.slug}: No valid platform ID - skipped`);
|
||||
continue;
|
||||
}
|
||||
|
||||
if (result.isNew) {
|
||||
newCount++;
|
||||
} else {
|
||||
|
||||
@@ -1,199 +0,0 @@
|
||||
# Dutchie AZ Pipeline
|
||||
|
||||
## Overview
|
||||
|
||||
The Dutchie AZ pipeline is the **only** authorized way to crawl Dutchie dispensary menus. It uses Dutchie's GraphQL API directly (no DOM scraping) and writes to an isolated database with a proper snapshot model.
|
||||
|
||||
## Key Principles
|
||||
|
||||
1. **GraphQL Only** - All Dutchie data is fetched via their FilteredProducts GraphQL API
|
||||
2. **Isolated Database** - Data lives in `dutchie_az_*` tables, NOT the legacy `products` table
|
||||
3. **Append-Only Snapshots** - Every crawl creates snapshots, never overwrites historical data
|
||||
4. **Stock Status Tracking** - Derived from `POSMetaData.children` inventory data
|
||||
5. **Missing Product Detection** - Products not in feed are marked with `isPresentInFeed=false`
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
src/dutchie-az/
|
||||
├── db/
|
||||
│ ├── connection.ts # Database connection pool
|
||||
│ └── schema.ts # Table definitions and migrations
|
||||
├── routes/
|
||||
│ └── index.ts # REST API endpoints
|
||||
├── services/
|
||||
│ ├── graphql-client.ts # Direct GraphQL fetch (Mode A + Mode B)
|
||||
│ ├── product-crawler.ts # Main crawler orchestration
|
||||
│ └── scheduler.ts # Jittered scheduling with wandering intervals
|
||||
└── types/
|
||||
└── index.ts # TypeScript interfaces
|
||||
```
|
||||
|
||||
## Data Model
|
||||
|
||||
### Tables
|
||||
|
||||
- **dispensaries** - Arizona Dutchie stores with `platform_dispensary_id`
|
||||
- **dutchie_products** - Canonical product identity (one row per product per store)
|
||||
- **dutchie_product_snapshots** - Historical state per crawl (append-only)
|
||||
- **job_schedules** - Scheduler configuration with jitter support
|
||||
- **job_run_logs** - Execution history
|
||||
|
||||
### Stock Status
|
||||
|
||||
The `stock_status` field is derived from `POSMetaData.children`:
|
||||
|
||||
```typescript
|
||||
function deriveStockStatus(children?: POSChild[]): StockStatus {
|
||||
if (!children || children.length === 0) return 'unknown';
|
||||
const totalAvailable = children.reduce((sum, c) =>
|
||||
sum + (c.quantityAvailable || 0), 0);
|
||||
return totalAvailable > 0 ? 'in_stock' : 'out_of_stock';
|
||||
}
|
||||
```
|
||||
|
||||
### Two-Mode Crawling
|
||||
|
||||
Mode A (UI Parity):
|
||||
- `Status: null` - Returns what the UI shows
|
||||
- Best for "current inventory" snapshot
|
||||
|
||||
Mode B (Max Coverage):
|
||||
- `Status: 'Active'` - Returns all active products
|
||||
- Catches items with `isBelowThreshold: true`
|
||||
|
||||
Both modes are merged to get maximum product coverage.
|
||||
|
||||
## API Endpoints
|
||||
|
||||
All endpoints are mounted at `/api/dutchie-az/`:
|
||||
|
||||
```
|
||||
GET /api/dutchie-az/dispensaries - List all dispensaries
|
||||
GET /api/dutchie-az/dispensaries/:id - Get dispensary details
|
||||
GET /api/dutchie-az/products - List products (with filters)
|
||||
GET /api/dutchie-az/products/:id - Get product with snapshots
|
||||
GET /api/dutchie-az/products/:id/snapshots - Get product snapshot history
|
||||
POST /api/dutchie-az/crawl/:dispensaryId - Trigger manual crawl
|
||||
GET /api/dutchie-az/schedule - Get scheduler status
|
||||
POST /api/dutchie-az/schedule/run - Manually run scheduled jobs
|
||||
GET /api/dutchie-az/stats - Dashboard statistics
|
||||
```
|
||||
|
||||
## Scheduler
|
||||
|
||||
The scheduler uses **jitter** to avoid detection patterns:
|
||||
|
||||
```typescript
|
||||
// Each job has independent "wandering" timing
|
||||
interface JobSchedule {
|
||||
base_interval_minutes: number; // e.g., 240 (4 hours)
|
||||
jitter_minutes: number; // e.g., 30 (±30 min)
|
||||
next_run_at: Date; // Calculated with jitter after each run
|
||||
}
|
||||
```
|
||||
|
||||
Jobs run when `next_run_at <= NOW()`. After completion, the next run is calculated:
|
||||
```
|
||||
next_run_at = NOW() + base_interval + random(-jitter, +jitter)
|
||||
```
|
||||
|
||||
This prevents crawls from clustering at predictable times.
|
||||
|
||||
## Manual Testing
|
||||
|
||||
### Run a single dispensary crawl:
|
||||
|
||||
```bash
|
||||
DATABASE_URL="..." npx tsx -e "
|
||||
const { crawlDispensaryProducts } = require('./src/dutchie-az/services/product-crawler');
|
||||
const { query } = require('./src/dutchie-az/db/connection');
|
||||
|
||||
async function test() {
|
||||
const { rows } = await query('SELECT * FROM dispensaries LIMIT 1');
|
||||
if (!rows[0]) return console.log('No dispensaries found');
|
||||
|
||||
const result = await crawlDispensaryProducts(rows[0], 'rec', { useBothModes: true });
|
||||
console.log(JSON.stringify(result, null, 2));
|
||||
}
|
||||
test();
|
||||
"
|
||||
```
|
||||
|
||||
### Check stock status distribution:
|
||||
|
||||
```sql
|
||||
SELECT stock_status, COUNT(*)
|
||||
FROM dutchie_products
|
||||
GROUP BY stock_status;
|
||||
```
|
||||
|
||||
### View recent snapshots:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
p.name,
|
||||
s.stock_status,
|
||||
s.is_present_in_feed,
|
||||
s.crawled_at
|
||||
FROM dutchie_product_snapshots s
|
||||
JOIN dutchie_products p ON p.id = s.dutchie_product_id
|
||||
ORDER BY s.crawled_at DESC
|
||||
LIMIT 20;
|
||||
```
|
||||
|
||||
## Deprecated Code
|
||||
|
||||
The following files are **DEPRECATED** and will throw errors if called:
|
||||
|
||||
- `src/scrapers/dutchie-graphql.ts` - Wrote to legacy `products` table
|
||||
- `src/scrapers/dutchie-graphql-direct.ts` - Wrote to legacy `products` table
|
||||
- `src/scrapers/templates/dutchie.ts` - HTML/DOM scraper (unreliable)
|
||||
- `src/scraper-v2/engine.ts` DutchieSpider - DOM-based extraction
|
||||
|
||||
If `store-crawl-orchestrator.ts` detects `provider='dutchie'` with `mode='production'`, it now routes to this dutchie-az pipeline automatically.
|
||||
|
||||
## Integration with Legacy System
|
||||
|
||||
The `store-crawl-orchestrator.ts` bridges the legacy stores system with dutchie-az:
|
||||
|
||||
1. When a store has `product_provider='dutchie'` and `product_crawler_mode='production'`
|
||||
2. The orchestrator looks up the corresponding dispensary in `dutchie_az.dispensaries`
|
||||
3. It calls `crawlDispensaryProducts()` from the dutchie-az pipeline
|
||||
4. Results are logged but data stays in the dutchie_az tables
|
||||
|
||||
To use the dutchie-az pipeline independently:
|
||||
- Navigate to `/dutchie-az-schedule` in the UI
|
||||
- Use the REST API endpoints directly
|
||||
- Run the scheduler service
|
||||
|
||||
## Environment Variables
|
||||
|
||||
```bash
|
||||
# Database connection for dutchie-az (same DB, separate tables)
|
||||
DATABASE_URL=postgresql://user:pass@host:port/database
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Dispensary not found in dutchie-az database"
|
||||
|
||||
The dispensary must exist in `dutchie_az.dispensaries` before crawling. Either:
|
||||
1. Run discovery to populate dispensaries
|
||||
2. Manually insert the dispensary with `platform_dispensary_id`
|
||||
|
||||
### GraphQL returns empty products
|
||||
|
||||
1. Check `platform_dispensary_id` is correct (the internal Dutchie ID, not slug)
|
||||
2. Verify the dispensary is online and has menu data
|
||||
3. Try both `rec` and `med` pricing types
|
||||
|
||||
### Snapshots show `stock_status='unknown'`
|
||||
|
||||
The product likely has no `POSMetaData.children` array. This happens for:
|
||||
- Products without inventory tracking
|
||||
- Manually managed inventory
|
||||
|
||||
---
|
||||
|
||||
Last updated: December 2025
|
||||
@@ -1,129 +0,0 @@
|
||||
/**
|
||||
* Dutchie Configuration
|
||||
*
|
||||
* Centralized configuration for Dutchie GraphQL API interaction.
|
||||
* Update hashes here when Dutchie changes their persisted query system.
|
||||
*/
|
||||
|
||||
export const dutchieConfig = {
|
||||
// ============================================================
|
||||
// GRAPHQL ENDPOINT
|
||||
// ============================================================
|
||||
|
||||
/** GraphQL endpoint - must be the api-3 graphql endpoint (NOT api-gw.dutchie.com which no longer exists) */
|
||||
graphqlEndpoint: 'https://dutchie.com/api-3/graphql',
|
||||
|
||||
// ============================================================
|
||||
// GRAPHQL PERSISTED QUERY HASHES
|
||||
// ============================================================
|
||||
//
|
||||
// These hashes identify specific GraphQL operations.
|
||||
// If Dutchie changes their schema, you may need to capture
|
||||
// new hashes from live browser traffic (Network tab → graphql requests).
|
||||
|
||||
/** FilteredProducts - main product listing query */
|
||||
filteredProductsHash: 'ee29c060826dc41c527e470e9ae502c9b2c169720faa0a9f5d25e1b9a530a4a0',
|
||||
|
||||
/** GetAddressBasedDispensaryData - resolve slug to internal ID */
|
||||
getDispensaryDataHash: '13461f73abf7268770dfd05fe7e10c523084b2bb916a929c08efe3d87531977b',
|
||||
|
||||
/**
|
||||
* ConsumerDispensaries - geo-based discovery
|
||||
* NOTE: This is a placeholder guess. If discovery fails, either:
|
||||
* 1. Capture the real hash from live traffic
|
||||
* 2. Rely on known AZDHS slugs instead (set useDiscovery: false)
|
||||
*/
|
||||
consumerDispensariesHash: '0a5bfa6ca1d64ae47bcccb7c8077c87147cbc4e6982c17ceec97a2a4948b311b',
|
||||
|
||||
// ============================================================
|
||||
// BEHAVIOR FLAGS
|
||||
// ============================================================
|
||||
|
||||
/** Enable geo-based discovery (false = use known AZDHS slugs only) */
|
||||
useDiscovery: true,
|
||||
|
||||
/** Prefer GET requests (true) or POST (false). GET is default. */
|
||||
preferGet: true,
|
||||
|
||||
/**
|
||||
* Enable POST fallback when GET fails with 405 or blocked.
|
||||
* If true, will retry failed GETs as POSTs.
|
||||
*/
|
||||
enablePostFallback: true,
|
||||
|
||||
// ============================================================
|
||||
// PAGINATION & RETRY
|
||||
// ============================================================
|
||||
|
||||
/** Products per page for pagination */
|
||||
perPage: 100,
|
||||
|
||||
/** Maximum pages to fetch (safety limit) */
|
||||
maxPages: 200,
|
||||
|
||||
/** Number of retries for failed page fetches */
|
||||
maxRetries: 1,
|
||||
|
||||
/** Delay between pages in ms */
|
||||
pageDelayMs: 500,
|
||||
|
||||
/** Delay between modes in ms */
|
||||
modeDelayMs: 2000,
|
||||
|
||||
// ============================================================
|
||||
// HTTP HEADERS
|
||||
// ============================================================
|
||||
|
||||
/** Default headers to mimic browser requests */
|
||||
defaultHeaders: {
|
||||
'accept': 'application/json, text/plain, */*',
|
||||
'accept-language': 'en-US,en;q=0.9',
|
||||
'apollographql-client-name': 'Marketplace (production)',
|
||||
} as Record<string, string>,
|
||||
|
||||
/** User agent string */
|
||||
userAgent:
|
||||
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
|
||||
|
||||
// ============================================================
|
||||
// BROWSER LAUNCH OPTIONS
|
||||
// ============================================================
|
||||
|
||||
browserArgs: [
|
||||
'--no-sandbox',
|
||||
'--disable-setuid-sandbox',
|
||||
'--disable-dev-shm-usage',
|
||||
'--disable-blink-features=AutomationControlled',
|
||||
],
|
||||
|
||||
/** Navigation timeout in ms */
|
||||
navigationTimeout: 60000,
|
||||
|
||||
/** Initial page load delay in ms */
|
||||
pageLoadDelay: 2000,
|
||||
};
|
||||
|
||||
/**
|
||||
* Get GraphQL hashes object for backward compatibility
|
||||
*/
|
||||
export const GRAPHQL_HASHES = {
|
||||
FilteredProducts: dutchieConfig.filteredProductsHash,
|
||||
GetAddressBasedDispensaryData: dutchieConfig.getDispensaryDataHash,
|
||||
ConsumerDispensaries: dutchieConfig.consumerDispensariesHash,
|
||||
};
|
||||
|
||||
/**
|
||||
* Arizona geo centerpoints for discovery scans
|
||||
*/
|
||||
export const ARIZONA_CENTERPOINTS = [
|
||||
{ name: 'Phoenix', lat: 33.4484, lng: -112.074 },
|
||||
{ name: 'Tucson', lat: 32.2226, lng: -110.9747 },
|
||||
{ name: 'Flagstaff', lat: 35.1983, lng: -111.6513 },
|
||||
{ name: 'Mesa', lat: 33.4152, lng: -111.8315 },
|
||||
{ name: 'Scottsdale', lat: 33.4942, lng: -111.9261 },
|
||||
{ name: 'Tempe', lat: 33.4255, lng: -111.94 },
|
||||
{ name: 'Yuma', lat: 32.6927, lng: -114.6277 },
|
||||
{ name: 'Prescott', lat: 34.54, lng: -112.4685 },
|
||||
{ name: 'Lake Havasu', lat: 34.4839, lng: -114.3224 },
|
||||
{ name: 'Sierra Vista', lat: 31.5455, lng: -110.2773 },
|
||||
];
|
||||
@@ -1,131 +0,0 @@
|
||||
/**
|
||||
* CannaiQ Database Connection
|
||||
*
|
||||
* All database access for the CannaiQ platform goes through this module.
|
||||
*
|
||||
* SINGLE DATABASE ARCHITECTURE:
|
||||
* - All services (auth, orchestrator, crawlers, admin) use this ONE database
|
||||
* - States are modeled via states table + state_id on dispensaries (not separate DBs)
|
||||
*
|
||||
* CONFIGURATION (in priority order):
|
||||
* 1. CANNAIQ_DB_URL - Full connection string (preferred)
|
||||
* 2. Individual vars: CANNAIQ_DB_HOST, CANNAIQ_DB_PORT, CANNAIQ_DB_NAME, CANNAIQ_DB_USER, CANNAIQ_DB_PASS
|
||||
* 3. DATABASE_URL - Legacy fallback for K8s compatibility
|
||||
*
|
||||
* IMPORTANT:
|
||||
* - Do NOT create separate pools elsewhere
|
||||
* - All services should import from this module
|
||||
*/
|
||||
|
||||
import { Pool, PoolClient } from 'pg';
|
||||
|
||||
/**
|
||||
* Get the database connection string from environment variables.
|
||||
* Supports multiple configuration methods with fallback for legacy compatibility.
|
||||
*/
|
||||
function getConnectionString(): string {
|
||||
// Priority 1: Full CANNAIQ connection URL
|
||||
if (process.env.CANNAIQ_DB_URL) {
|
||||
return process.env.CANNAIQ_DB_URL;
|
||||
}
|
||||
|
||||
// Priority 2: Build from individual CANNAIQ env vars
|
||||
const host = process.env.CANNAIQ_DB_HOST;
|
||||
const port = process.env.CANNAIQ_DB_PORT;
|
||||
const name = process.env.CANNAIQ_DB_NAME;
|
||||
const user = process.env.CANNAIQ_DB_USER;
|
||||
const pass = process.env.CANNAIQ_DB_PASS;
|
||||
|
||||
if (host && port && name && user && pass) {
|
||||
return `postgresql://${user}:${pass}@${host}:${port}/${name}`;
|
||||
}
|
||||
|
||||
// Priority 3: Fallback to DATABASE_URL for legacy/K8s compatibility
|
||||
if (process.env.DATABASE_URL) {
|
||||
return process.env.DATABASE_URL;
|
||||
}
|
||||
|
||||
// Report what's missing
|
||||
const required = ['CANNAIQ_DB_HOST', 'CANNAIQ_DB_PORT', 'CANNAIQ_DB_NAME', 'CANNAIQ_DB_USER', 'CANNAIQ_DB_PASS'];
|
||||
const missing = required.filter((key) => !process.env[key]);
|
||||
|
||||
throw new Error(
|
||||
`[CannaiQ DB] Missing database configuration.\n` +
|
||||
`Set CANNAIQ_DB_URL, DATABASE_URL, or all of: ${missing.join(', ')}`
|
||||
);
|
||||
}
|
||||
|
||||
let pool: Pool | null = null;
|
||||
|
||||
/**
|
||||
* Get the CannaiQ database pool (singleton)
|
||||
*
|
||||
* This is the canonical pool for all CannaiQ services.
|
||||
* Do NOT create separate pools elsewhere.
|
||||
*/
|
||||
export function getPool(): Pool {
|
||||
if (!pool) {
|
||||
pool = new Pool({
|
||||
connectionString: getConnectionString(),
|
||||
max: 10,
|
||||
idleTimeoutMillis: 30000,
|
||||
connectionTimeoutMillis: 5000,
|
||||
});
|
||||
|
||||
pool.on('error', (err) => {
|
||||
console.error('[CannaiQ DB] Unexpected error on idle client:', err);
|
||||
});
|
||||
|
||||
console.log('[CannaiQ DB] Pool initialized');
|
||||
}
|
||||
return pool;
|
||||
}
|
||||
|
||||
/**
|
||||
* @deprecated Use getPool() instead
|
||||
*/
|
||||
export function getDutchieAZPool(): Pool {
|
||||
console.warn('[CannaiQ DB] getDutchieAZPool() is deprecated. Use getPool() instead.');
|
||||
return getPool();
|
||||
}
|
||||
|
||||
/**
|
||||
* Execute a query on the CannaiQ database
|
||||
*/
|
||||
export async function query<T = any>(text: string, params?: any[]): Promise<{ rows: T[]; rowCount: number }> {
|
||||
const p = getPool();
|
||||
const result = await p.query(text, params);
|
||||
return { rows: result.rows as T[], rowCount: result.rowCount || 0 };
|
||||
}
|
||||
|
||||
/**
|
||||
* Get a client from the pool for transaction use
|
||||
*/
|
||||
export async function getClient(): Promise<PoolClient> {
|
||||
const p = getPool();
|
||||
return p.connect();
|
||||
}
|
||||
|
||||
/**
|
||||
* Close the pool connection
|
||||
*/
|
||||
export async function closePool(): Promise<void> {
|
||||
if (pool) {
|
||||
await pool.end();
|
||||
pool = null;
|
||||
console.log('[CannaiQ DB] Pool closed');
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if the database is accessible
|
||||
*/
|
||||
export async function healthCheck(): Promise<boolean> {
|
||||
try {
|
||||
const result = await query('SELECT 1 as ok');
|
||||
return result.rows.length > 0 && result.rows[0].ok === 1;
|
||||
} catch (error) {
|
||||
console.error('[CannaiQ DB] Health check failed:', error);
|
||||
return false;
|
||||
}
|
||||
}
|
||||
@@ -1,137 +0,0 @@
|
||||
/**
|
||||
* Dispensary Column Definitions
|
||||
*
|
||||
* Centralized column list for dispensaries table queries.
|
||||
* Handles optional columns that may not exist in all environments.
|
||||
*
|
||||
* USAGE:
|
||||
* import { DISPENSARY_COLUMNS, DISPENSARY_COLUMNS_WITH_FAILED } from '../db/dispensary-columns';
|
||||
* const result = await query(`SELECT ${DISPENSARY_COLUMNS} FROM dispensaries WHERE ...`);
|
||||
*/
|
||||
|
||||
/**
|
||||
* Core dispensary columns that always exist.
|
||||
* These are guaranteed to be present in all environments.
|
||||
*/
|
||||
const CORE_COLUMNS = `
|
||||
id, name, slug, city, state, zip, address, latitude, longitude,
|
||||
menu_type, menu_url, platform_dispensary_id, website,
|
||||
created_at, updated_at
|
||||
`;
|
||||
|
||||
/**
|
||||
* Optional columns with NULL fallback.
|
||||
*
|
||||
* provider_detection_data: Added in migration 044
|
||||
* active_crawler_profile_id: Added in migration 041
|
||||
*
|
||||
* Using COALESCE ensures the query works whether or not the column exists:
|
||||
* - If column exists: returns the actual value
|
||||
* - If column doesn't exist: query fails (but migration should be run)
|
||||
*
|
||||
* For pre-migration compatibility, we select NULL::jsonb which always works.
|
||||
* After migration 044 is applied, this can be changed to the real column.
|
||||
*/
|
||||
|
||||
// TEMPORARY: Use NULL fallback until migration 044 is applied
|
||||
// After running 044, change this to: provider_detection_data
|
||||
const PROVIDER_DETECTION_COLUMN = `NULL::jsonb AS provider_detection_data`;
|
||||
|
||||
// After migration 044 is applied, uncomment this line and remove the above:
|
||||
// const PROVIDER_DETECTION_COLUMN = `provider_detection_data`;
|
||||
|
||||
/**
|
||||
* Standard dispensary columns for most queries.
|
||||
* Includes provider_detection_data with NULL fallback for pre-migration compatibility.
|
||||
*/
|
||||
export const DISPENSARY_COLUMNS = `${CORE_COLUMNS.trim()},
|
||||
${PROVIDER_DETECTION_COLUMN}`;
|
||||
|
||||
/**
|
||||
* Dispensary columns including active_crawler_profile_id.
|
||||
* Used by routes that need profile information.
|
||||
*/
|
||||
export const DISPENSARY_COLUMNS_WITH_PROFILE = `${CORE_COLUMNS.trim()},
|
||||
${PROVIDER_DETECTION_COLUMN},
|
||||
active_crawler_profile_id`;
|
||||
|
||||
/**
|
||||
* Dispensary columns including failed_at.
|
||||
* Used by worker for compatibility checks.
|
||||
*/
|
||||
export const DISPENSARY_COLUMNS_WITH_FAILED = `${CORE_COLUMNS.trim()},
|
||||
${PROVIDER_DETECTION_COLUMN},
|
||||
failed_at`;
|
||||
|
||||
/**
|
||||
* NOTE: After migration 044 is applied, update PROVIDER_DETECTION_COLUMN above
|
||||
* to use the real column instead of NULL fallback.
|
||||
*
|
||||
* To verify migration status:
|
||||
* SELECT column_name FROM information_schema.columns
|
||||
* WHERE table_name = 'dispensaries' AND column_name = 'provider_detection_data';
|
||||
*/
|
||||
|
||||
// Cache for column existence check
|
||||
let _providerDetectionColumnExists: boolean | null = null;
|
||||
|
||||
/**
|
||||
* Check if provider_detection_data column exists in dispensaries table.
|
||||
* Result is cached after first check.
|
||||
*/
|
||||
export async function hasProviderDetectionColumn(pool: { query: (sql: string) => Promise<{ rows: any[] }> }): Promise<boolean> {
|
||||
if (_providerDetectionColumnExists !== null) {
|
||||
return _providerDetectionColumnExists;
|
||||
}
|
||||
|
||||
try {
|
||||
const result = await pool.query(`
|
||||
SELECT 1 FROM information_schema.columns
|
||||
WHERE table_name = 'dispensaries' AND column_name = 'provider_detection_data'
|
||||
`);
|
||||
_providerDetectionColumnExists = result.rows.length > 0;
|
||||
} catch {
|
||||
_providerDetectionColumnExists = false;
|
||||
}
|
||||
|
||||
return _providerDetectionColumnExists;
|
||||
}
|
||||
|
||||
/**
|
||||
* Safely update provider_detection_data column.
|
||||
* If column doesn't exist, logs a warning but doesn't crash.
|
||||
*
|
||||
* @param pool - Database pool with query method
|
||||
* @param dispensaryId - ID of dispensary to update
|
||||
* @param data - JSONB data to merge into provider_detection_data
|
||||
* @returns true if update succeeded, false if column doesn't exist
|
||||
*/
|
||||
export async function safeUpdateProviderDetectionData(
|
||||
pool: { query: (sql: string, params?: any[]) => Promise<any> },
|
||||
dispensaryId: number,
|
||||
data: Record<string, any>
|
||||
): Promise<boolean> {
|
||||
const hasColumn = await hasProviderDetectionColumn(pool);
|
||||
|
||||
if (!hasColumn) {
|
||||
console.warn(`[DispensaryColumns] provider_detection_data column not found. Run migration 044 to add it.`);
|
||||
return false;
|
||||
}
|
||||
|
||||
try {
|
||||
await pool.query(
|
||||
`UPDATE dispensaries
|
||||
SET provider_detection_data = COALESCE(provider_detection_data, '{}'::jsonb) || $1::jsonb,
|
||||
updated_at = NOW()
|
||||
WHERE id = $2`,
|
||||
[JSON.stringify(data), dispensaryId]
|
||||
);
|
||||
return true;
|
||||
} catch (error: any) {
|
||||
if (error.message?.includes('provider_detection_data')) {
|
||||
console.warn(`[DispensaryColumns] Failed to update provider_detection_data: ${error.message}`);
|
||||
return false;
|
||||
}
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
@@ -1,29 +0,0 @@
|
||||
/**
|
||||
* Dutchie AZ Schema Bootstrap
|
||||
*
|
||||
* Run this to create/update the dutchie_az tables (dutchie_products, dutchie_product_snapshots, etc.)
|
||||
* in the AZ pipeline database. This is separate from the legacy schema.
|
||||
*
|
||||
* Usage:
|
||||
* TS_NODE_TRANSPILE_ONLY=1 npx ts-node src/dutchie-az/db/migrate.ts
|
||||
* or (after build)
|
||||
* node dist/dutchie-az/db/migrate.js
|
||||
*/
|
||||
|
||||
import { createSchema } from './schema';
|
||||
import { closePool } from './connection';
|
||||
|
||||
async function main() {
|
||||
try {
|
||||
console.log('[DutchieAZ] Running schema migration...');
|
||||
await createSchema();
|
||||
console.log('[DutchieAZ] Schema migration complete.');
|
||||
} catch (err: any) {
|
||||
console.error('[DutchieAZ] Schema migration failed:', err.message);
|
||||
process.exitCode = 1;
|
||||
} finally {
|
||||
await closePool();
|
||||
}
|
||||
}
|
||||
|
||||
main();
|
||||
@@ -1,408 +0,0 @@
|
||||
/**
|
||||
* Dutchie AZ Database Schema
|
||||
*
|
||||
* Creates all tables for the isolated Dutchie Arizona data pipeline.
|
||||
* Run this to initialize the dutchie_az database.
|
||||
*/
|
||||
|
||||
import { query, getClient } from './connection';
|
||||
|
||||
/**
|
||||
* SQL statements to create all tables
|
||||
*/
|
||||
const SCHEMA_SQL = `
|
||||
-- ============================================================
|
||||
-- DISPENSARIES TABLE
|
||||
-- Stores discovered Dutchie dispensaries in Arizona
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS dispensaries (
|
||||
id SERIAL PRIMARY KEY,
|
||||
platform VARCHAR(20) NOT NULL DEFAULT 'dutchie',
|
||||
name VARCHAR(255) NOT NULL,
|
||||
slug VARCHAR(255) NOT NULL,
|
||||
city VARCHAR(100) NOT NULL,
|
||||
state VARCHAR(10) NOT NULL DEFAULT 'AZ',
|
||||
postal_code VARCHAR(20),
|
||||
address TEXT,
|
||||
latitude DECIMAL(10, 7),
|
||||
longitude DECIMAL(10, 7),
|
||||
platform_dispensary_id VARCHAR(100),
|
||||
is_delivery BOOLEAN DEFAULT false,
|
||||
is_pickup BOOLEAN DEFAULT true,
|
||||
raw_metadata JSONB,
|
||||
last_crawled_at TIMESTAMPTZ,
|
||||
product_count INTEGER DEFAULT 0,
|
||||
created_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
|
||||
CONSTRAINT uk_dispensaries_platform_slug UNIQUE (platform, slug, city, state)
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_dispensaries_platform ON dispensaries(platform);
|
||||
CREATE INDEX IF NOT EXISTS idx_dispensaries_platform_id ON dispensaries(platform_dispensary_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_dispensaries_state ON dispensaries(state);
|
||||
CREATE INDEX IF NOT EXISTS idx_dispensaries_city ON dispensaries(city);
|
||||
|
||||
-- ============================================================
|
||||
-- DUTCHIE_PRODUCTS TABLE
|
||||
-- Canonical product identity per store
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS dutchie_products (
|
||||
id SERIAL PRIMARY KEY,
|
||||
dispensary_id INTEGER NOT NULL REFERENCES dispensaries(id) ON DELETE CASCADE,
|
||||
platform VARCHAR(20) NOT NULL DEFAULT 'dutchie',
|
||||
|
||||
external_product_id VARCHAR(100) NOT NULL,
|
||||
platform_dispensary_id VARCHAR(100) NOT NULL,
|
||||
c_name VARCHAR(500),
|
||||
name VARCHAR(500) NOT NULL,
|
||||
|
||||
-- Brand
|
||||
brand_name VARCHAR(255),
|
||||
brand_id VARCHAR(100),
|
||||
brand_logo_url TEXT,
|
||||
|
||||
-- Classification
|
||||
type VARCHAR(100),
|
||||
subcategory VARCHAR(100),
|
||||
strain_type VARCHAR(50),
|
||||
provider VARCHAR(100),
|
||||
|
||||
-- Potency
|
||||
thc DECIMAL(10, 4),
|
||||
thc_content DECIMAL(10, 4),
|
||||
cbd DECIMAL(10, 4),
|
||||
cbd_content DECIMAL(10, 4),
|
||||
cannabinoids_v2 JSONB,
|
||||
effects JSONB,
|
||||
|
||||
-- Status / flags
|
||||
status VARCHAR(50),
|
||||
medical_only BOOLEAN DEFAULT false,
|
||||
rec_only BOOLEAN DEFAULT false,
|
||||
featured BOOLEAN DEFAULT false,
|
||||
coming_soon BOOLEAN DEFAULT false,
|
||||
certificate_of_analysis_enabled BOOLEAN DEFAULT false,
|
||||
|
||||
is_below_threshold BOOLEAN DEFAULT false,
|
||||
is_below_kiosk_threshold BOOLEAN DEFAULT false,
|
||||
options_below_threshold BOOLEAN DEFAULT false,
|
||||
options_below_kiosk_threshold BOOLEAN DEFAULT false,
|
||||
|
||||
-- Derived stock status: 'in_stock', 'out_of_stock', 'unknown'
|
||||
stock_status VARCHAR(20) DEFAULT 'unknown',
|
||||
total_quantity_available INTEGER DEFAULT 0,
|
||||
|
||||
-- Images
|
||||
primary_image_url TEXT,
|
||||
images JSONB,
|
||||
|
||||
-- Misc
|
||||
measurements JSONB,
|
||||
weight VARCHAR(50),
|
||||
past_c_names TEXT[],
|
||||
|
||||
created_at_dutchie TIMESTAMPTZ,
|
||||
updated_at_dutchie TIMESTAMPTZ,
|
||||
|
||||
latest_raw_payload JSONB,
|
||||
|
||||
created_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
|
||||
CONSTRAINT uk_dutchie_products UNIQUE (dispensary_id, external_product_id)
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_dutchie_products_dispensary ON dutchie_products(dispensary_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_dutchie_products_external_id ON dutchie_products(external_product_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_dutchie_products_platform_disp ON dutchie_products(platform_dispensary_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_dutchie_products_brand ON dutchie_products(brand_name);
|
||||
CREATE INDEX IF NOT EXISTS idx_dutchie_products_type ON dutchie_products(type);
|
||||
CREATE INDEX IF NOT EXISTS idx_dutchie_products_subcategory ON dutchie_products(subcategory);
|
||||
CREATE INDEX IF NOT EXISTS idx_dutchie_products_status ON dutchie_products(status);
|
||||
CREATE INDEX IF NOT EXISTS idx_dutchie_products_strain ON dutchie_products(strain_type);
|
||||
CREATE INDEX IF NOT EXISTS idx_dutchie_products_stock_status ON dutchie_products(stock_status);
|
||||
|
||||
-- ============================================================
|
||||
-- DUTCHIE_PRODUCT_SNAPSHOTS TABLE
|
||||
-- Historical state per crawl, includes options[]
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS dutchie_product_snapshots (
|
||||
id SERIAL PRIMARY KEY,
|
||||
dutchie_product_id INTEGER NOT NULL REFERENCES dutchie_products(id) ON DELETE CASCADE,
|
||||
dispensary_id INTEGER NOT NULL REFERENCES dispensaries(id) ON DELETE CASCADE,
|
||||
platform_dispensary_id VARCHAR(100) NOT NULL,
|
||||
external_product_id VARCHAR(100) NOT NULL,
|
||||
pricing_type VARCHAR(20) DEFAULT 'unknown',
|
||||
crawl_mode VARCHAR(20) DEFAULT 'mode_a', -- 'mode_a' (UI parity) or 'mode_b' (max coverage)
|
||||
|
||||
status VARCHAR(50),
|
||||
featured BOOLEAN DEFAULT false,
|
||||
special BOOLEAN DEFAULT false,
|
||||
medical_only BOOLEAN DEFAULT false,
|
||||
rec_only BOOLEAN DEFAULT false,
|
||||
|
||||
-- Flag indicating if product was present in feed (false = missing_from_feed snapshot)
|
||||
is_present_in_feed BOOLEAN DEFAULT true,
|
||||
|
||||
-- Derived stock status
|
||||
stock_status VARCHAR(20) DEFAULT 'unknown',
|
||||
|
||||
-- Price summary (in cents)
|
||||
rec_min_price_cents INTEGER,
|
||||
rec_max_price_cents INTEGER,
|
||||
rec_min_special_price_cents INTEGER,
|
||||
med_min_price_cents INTEGER,
|
||||
med_max_price_cents INTEGER,
|
||||
med_min_special_price_cents INTEGER,
|
||||
wholesale_min_price_cents INTEGER,
|
||||
|
||||
-- Inventory summary
|
||||
total_quantity_available INTEGER,
|
||||
total_kiosk_quantity_available INTEGER,
|
||||
manual_inventory BOOLEAN DEFAULT false,
|
||||
is_below_threshold BOOLEAN DEFAULT false,
|
||||
is_below_kiosk_threshold BOOLEAN DEFAULT false,
|
||||
|
||||
-- Option-level data (from POSMetaData.children)
|
||||
options JSONB,
|
||||
|
||||
-- Full raw product node
|
||||
raw_payload JSONB NOT NULL,
|
||||
|
||||
crawled_at TIMESTAMPTZ NOT NULL,
|
||||
created_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_snapshots_product ON dutchie_product_snapshots(dutchie_product_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_snapshots_dispensary ON dutchie_product_snapshots(dispensary_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_snapshots_crawled_at ON dutchie_product_snapshots(crawled_at);
|
||||
CREATE INDEX IF NOT EXISTS idx_snapshots_platform_disp ON dutchie_product_snapshots(platform_dispensary_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_snapshots_external_id ON dutchie_product_snapshots(external_product_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_snapshots_special ON dutchie_product_snapshots(special) WHERE special = true;
|
||||
CREATE INDEX IF NOT EXISTS idx_snapshots_stock_status ON dutchie_product_snapshots(stock_status);
|
||||
CREATE INDEX IF NOT EXISTS idx_snapshots_crawl_mode ON dutchie_product_snapshots(crawl_mode);
|
||||
|
||||
-- ============================================================
|
||||
-- CRAWL_JOBS TABLE
|
||||
-- Tracks crawl execution status
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS crawl_jobs (
|
||||
id SERIAL PRIMARY KEY,
|
||||
job_type VARCHAR(50) NOT NULL,
|
||||
dispensary_id INTEGER REFERENCES dispensaries(id) ON DELETE SET NULL,
|
||||
status VARCHAR(20) NOT NULL DEFAULT 'pending',
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
error_message TEXT,
|
||||
products_found INTEGER,
|
||||
snapshots_created INTEGER,
|
||||
metadata JSONB,
|
||||
created_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_crawl_jobs_type ON crawl_jobs(job_type);
|
||||
CREATE INDEX IF NOT EXISTS idx_crawl_jobs_status ON crawl_jobs(status);
|
||||
CREATE INDEX IF NOT EXISTS idx_crawl_jobs_dispensary ON crawl_jobs(dispensary_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_crawl_jobs_created ON crawl_jobs(created_at);
|
||||
|
||||
-- ============================================================
|
||||
-- JOB_SCHEDULES TABLE
|
||||
-- Stores schedule configuration for recurring jobs with jitter support
|
||||
-- Each job has independent timing that "wanders" over time
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS job_schedules (
|
||||
id SERIAL PRIMARY KEY,
|
||||
job_name VARCHAR(100) NOT NULL UNIQUE,
|
||||
description TEXT,
|
||||
enabled BOOLEAN DEFAULT true,
|
||||
|
||||
-- Timing configuration (jitter makes times "wander")
|
||||
base_interval_minutes INTEGER NOT NULL DEFAULT 240, -- e.g., 4 hours
|
||||
jitter_minutes INTEGER NOT NULL DEFAULT 30, -- e.g., ±30 min
|
||||
|
||||
-- Last run tracking
|
||||
last_run_at TIMESTAMPTZ,
|
||||
last_status VARCHAR(20), -- 'success', 'error', 'partial', 'running'
|
||||
last_error_message TEXT,
|
||||
last_duration_ms INTEGER,
|
||||
|
||||
-- Next run (calculated with jitter after each run)
|
||||
next_run_at TIMESTAMPTZ,
|
||||
|
||||
-- Additional config
|
||||
job_config JSONB, -- e.g., { pricingType: 'rec', useBothModes: true }
|
||||
|
||||
created_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_job_schedules_enabled ON job_schedules(enabled);
|
||||
CREATE INDEX IF NOT EXISTS idx_job_schedules_next_run ON job_schedules(next_run_at);
|
||||
|
||||
-- ============================================================
|
||||
-- JOB_RUN_LOGS TABLE
|
||||
-- Stores history of job runs for monitoring
|
||||
-- ============================================================
|
||||
CREATE TABLE IF NOT EXISTS job_run_logs (
|
||||
id SERIAL PRIMARY KEY,
|
||||
schedule_id INTEGER NOT NULL REFERENCES job_schedules(id) ON DELETE CASCADE,
|
||||
job_name VARCHAR(100) NOT NULL,
|
||||
status VARCHAR(20) NOT NULL, -- 'pending', 'running', 'success', 'error', 'partial'
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
duration_ms INTEGER,
|
||||
error_message TEXT,
|
||||
|
||||
-- Results summary
|
||||
items_processed INTEGER,
|
||||
items_succeeded INTEGER,
|
||||
items_failed INTEGER,
|
||||
|
||||
metadata JSONB, -- Additional run details
|
||||
|
||||
created_at TIMESTAMPTZ DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_job_run_logs_schedule ON job_run_logs(schedule_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_job_run_logs_job_name ON job_run_logs(job_name);
|
||||
CREATE INDEX IF NOT EXISTS idx_job_run_logs_status ON job_run_logs(status);
|
||||
CREATE INDEX IF NOT EXISTS idx_job_run_logs_created ON job_run_logs(created_at);
|
||||
|
||||
-- ============================================================
|
||||
-- VIEWS FOR EASY QUERYING
|
||||
-- ============================================================
|
||||
|
||||
-- Categories derived from products
|
||||
CREATE OR REPLACE VIEW v_categories AS
|
||||
SELECT
|
||||
type,
|
||||
subcategory,
|
||||
COUNT(DISTINCT id) as product_count,
|
||||
COUNT(DISTINCT dispensary_id) as dispensary_count,
|
||||
AVG(thc) as avg_thc,
|
||||
MIN(thc) as min_thc,
|
||||
MAX(thc) as max_thc
|
||||
FROM dutchie_products
|
||||
WHERE type IS NOT NULL
|
||||
GROUP BY type, subcategory
|
||||
ORDER BY type, subcategory;
|
||||
|
||||
-- Brands derived from products
|
||||
CREATE OR REPLACE VIEW v_brands AS
|
||||
SELECT
|
||||
brand_name,
|
||||
brand_id,
|
||||
MAX(brand_logo_url) as brand_logo_url,
|
||||
COUNT(DISTINCT id) as product_count,
|
||||
COUNT(DISTINCT dispensary_id) as dispensary_count,
|
||||
ARRAY_AGG(DISTINCT type) FILTER (WHERE type IS NOT NULL) as product_types
|
||||
FROM dutchie_products
|
||||
WHERE brand_name IS NOT NULL
|
||||
GROUP BY brand_name, brand_id
|
||||
ORDER BY product_count DESC;
|
||||
|
||||
-- Latest snapshot per product (most recent crawl data)
|
||||
CREATE OR REPLACE VIEW v_latest_snapshots AS
|
||||
SELECT DISTINCT ON (dutchie_product_id)
|
||||
s.*
|
||||
FROM dutchie_product_snapshots s
|
||||
ORDER BY dutchie_product_id, crawled_at DESC;
|
||||
|
||||
-- Dashboard stats
|
||||
CREATE OR REPLACE VIEW v_dashboard_stats AS
|
||||
SELECT
|
||||
(SELECT COUNT(*) FROM dispensaries WHERE state = 'AZ') as dispensary_count,
|
||||
(SELECT COUNT(*) FROM dutchie_products) as product_count,
|
||||
(SELECT COUNT(*) FROM dutchie_product_snapshots WHERE crawled_at > NOW() - INTERVAL '24 hours') as snapshots_24h,
|
||||
(SELECT MAX(crawled_at) FROM dutchie_product_snapshots) as last_crawl_time,
|
||||
(SELECT COUNT(*) FROM crawl_jobs WHERE status = 'failed' AND created_at > NOW() - INTERVAL '24 hours') as failed_jobs_24h,
|
||||
(SELECT COUNT(DISTINCT brand_name) FROM dutchie_products WHERE brand_name IS NOT NULL) as brand_count,
|
||||
(SELECT COUNT(DISTINCT (type, subcategory)) FROM dutchie_products WHERE type IS NOT NULL) as category_count;
|
||||
`;
|
||||
|
||||
/**
|
||||
* Run the schema migration
|
||||
*/
|
||||
export async function createSchema(): Promise<void> {
|
||||
console.log('[DutchieAZ Schema] Creating database schema...');
|
||||
|
||||
const client = await getClient();
|
||||
|
||||
try {
|
||||
await client.query('BEGIN');
|
||||
|
||||
// Split into individual statements and execute
|
||||
const statements = SCHEMA_SQL
|
||||
.split(';')
|
||||
.map(s => s.trim())
|
||||
.filter(s => s.length > 0 && !s.startsWith('--'));
|
||||
|
||||
for (const statement of statements) {
|
||||
if (statement.trim()) {
|
||||
await client.query(statement + ';');
|
||||
}
|
||||
}
|
||||
|
||||
await client.query('COMMIT');
|
||||
console.log('[DutchieAZ Schema] Schema created successfully');
|
||||
} catch (error) {
|
||||
await client.query('ROLLBACK');
|
||||
console.error('[DutchieAZ Schema] Failed to create schema:', error);
|
||||
throw error;
|
||||
} finally {
|
||||
client.release();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Drop all tables (for development/testing)
|
||||
*/
|
||||
export async function dropSchema(): Promise<void> {
|
||||
console.log('[DutchieAZ Schema] Dropping all tables...');
|
||||
|
||||
await query(`
|
||||
DROP VIEW IF EXISTS v_dashboard_stats CASCADE;
|
||||
DROP VIEW IF EXISTS v_latest_snapshots CASCADE;
|
||||
DROP VIEW IF EXISTS v_brands CASCADE;
|
||||
DROP VIEW IF EXISTS v_categories CASCADE;
|
||||
DROP TABLE IF EXISTS crawl_schedule CASCADE;
|
||||
DROP TABLE IF EXISTS crawl_jobs CASCADE;
|
||||
DROP TABLE IF EXISTS dutchie_product_snapshots CASCADE;
|
||||
DROP TABLE IF EXISTS dutchie_products CASCADE;
|
||||
DROP TABLE IF EXISTS dispensaries CASCADE;
|
||||
`);
|
||||
|
||||
console.log('[DutchieAZ Schema] All tables dropped');
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if schema exists
|
||||
*/
|
||||
export async function schemaExists(): Promise<boolean> {
|
||||
try {
|
||||
const result = await query(`
|
||||
SELECT EXISTS (
|
||||
SELECT FROM information_schema.tables
|
||||
WHERE table_name = 'dispensaries'
|
||||
) as exists
|
||||
`);
|
||||
return result.rows[0]?.exists === true;
|
||||
} catch (error) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Initialize schema if it doesn't exist
|
||||
*/
|
||||
export async function ensureSchema(): Promise<void> {
|
||||
const exists = await schemaExists();
|
||||
if (!exists) {
|
||||
await createSchema();
|
||||
} else {
|
||||
console.log('[DutchieAZ Schema] Schema already exists');
|
||||
}
|
||||
}
|
||||
@@ -1,403 +0,0 @@
|
||||
/**
|
||||
* DtCityDiscoveryService
|
||||
*
|
||||
* Core service for Dutchie city discovery.
|
||||
* Contains shared logic used by multiple entrypoints.
|
||||
*
|
||||
* Responsibilities:
|
||||
* - Browser/API-based city fetching
|
||||
* - Manual city seeding
|
||||
* - City upsert operations
|
||||
*/
|
||||
|
||||
import { Pool } from 'pg';
|
||||
import axios from 'axios';
|
||||
import puppeteer from 'puppeteer-extra';
|
||||
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
|
||||
|
||||
puppeteer.use(StealthPlugin());
|
||||
|
||||
// ============================================================
|
||||
// TYPES
|
||||
// ============================================================
|
||||
|
||||
export interface DutchieCity {
|
||||
name: string;
|
||||
slug: string;
|
||||
stateCode: string | null;
|
||||
countryCode: string;
|
||||
url?: string;
|
||||
}
|
||||
|
||||
export interface CityDiscoveryResult {
|
||||
citiesFound: number;
|
||||
citiesInserted: number;
|
||||
citiesUpdated: number;
|
||||
errors: string[];
|
||||
durationMs: number;
|
||||
}
|
||||
|
||||
export interface ManualSeedResult {
|
||||
city: DutchieCity;
|
||||
id: number;
|
||||
wasInserted: boolean;
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// US STATE CODE MAPPING
|
||||
// ============================================================
|
||||
|
||||
export const US_STATE_MAP: Record<string, string> = {
|
||||
'alabama': 'AL', 'alaska': 'AK', 'arizona': 'AZ', 'arkansas': 'AR',
|
||||
'california': 'CA', 'colorado': 'CO', 'connecticut': 'CT', 'delaware': 'DE',
|
||||
'florida': 'FL', 'georgia': 'GA', 'hawaii': 'HI', 'idaho': 'ID',
|
||||
'illinois': 'IL', 'indiana': 'IN', 'iowa': 'IA', 'kansas': 'KS',
|
||||
'kentucky': 'KY', 'louisiana': 'LA', 'maine': 'ME', 'maryland': 'MD',
|
||||
'massachusetts': 'MA', 'michigan': 'MI', 'minnesota': 'MN', 'mississippi': 'MS',
|
||||
'missouri': 'MO', 'montana': 'MT', 'nebraska': 'NE', 'nevada': 'NV',
|
||||
'new-hampshire': 'NH', 'new-jersey': 'NJ', 'new-mexico': 'NM', 'new-york': 'NY',
|
||||
'north-carolina': 'NC', 'north-dakota': 'ND', 'ohio': 'OH', 'oklahoma': 'OK',
|
||||
'oregon': 'OR', 'pennsylvania': 'PA', 'rhode-island': 'RI', 'south-carolina': 'SC',
|
||||
'south-dakota': 'SD', 'tennessee': 'TN', 'texas': 'TX', 'utah': 'UT',
|
||||
'vermont': 'VT', 'virginia': 'VA', 'washington': 'WA', 'west-virginia': 'WV',
|
||||
'wisconsin': 'WI', 'wyoming': 'WY', 'district-of-columbia': 'DC',
|
||||
};
|
||||
|
||||
// Canadian province mapping
|
||||
export const CA_PROVINCE_MAP: Record<string, string> = {
|
||||
'alberta': 'AB', 'british-columbia': 'BC', 'manitoba': 'MB',
|
||||
'new-brunswick': 'NB', 'newfoundland-and-labrador': 'NL',
|
||||
'northwest-territories': 'NT', 'nova-scotia': 'NS', 'nunavut': 'NU',
|
||||
'ontario': 'ON', 'prince-edward-island': 'PE', 'quebec': 'QC',
|
||||
'saskatchewan': 'SK', 'yukon': 'YT',
|
||||
};
|
||||
|
||||
// ============================================================
|
||||
// CITY FETCHING (AUTO DISCOVERY)
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Fetch cities from Dutchie's /cities page using Puppeteer.
|
||||
*/
|
||||
export async function fetchCitiesFromBrowser(): Promise<DutchieCity[]> {
|
||||
console.log('[DtCityDiscoveryService] Launching browser to fetch cities...');
|
||||
|
||||
const browser = await puppeteer.launch({
|
||||
headless: 'new',
|
||||
args: ['--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage'],
|
||||
});
|
||||
|
||||
try {
|
||||
const page = await browser.newPage();
|
||||
await page.setUserAgent(
|
||||
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
|
||||
);
|
||||
|
||||
console.log('[DtCityDiscoveryService] Navigating to https://dutchie.com/cities...');
|
||||
await page.goto('https://dutchie.com/cities', {
|
||||
waitUntil: 'networkidle2',
|
||||
timeout: 60000,
|
||||
});
|
||||
|
||||
await new Promise((r) => setTimeout(r, 3000));
|
||||
|
||||
const cities = await page.evaluate(() => {
|
||||
const cityLinks: Array<{
|
||||
name: string;
|
||||
slug: string;
|
||||
url: string;
|
||||
stateSlug: string | null;
|
||||
}> = [];
|
||||
|
||||
const links = document.querySelectorAll('a[href*="/city/"]');
|
||||
links.forEach((link) => {
|
||||
const href = (link as HTMLAnchorElement).href;
|
||||
const text = (link as HTMLElement).innerText?.trim();
|
||||
|
||||
const match = href.match(/\/city\/([^/]+)\/([^/?]+)/);
|
||||
if (match && text) {
|
||||
cityLinks.push({
|
||||
name: text,
|
||||
slug: match[2],
|
||||
url: href,
|
||||
stateSlug: match[1],
|
||||
});
|
||||
}
|
||||
});
|
||||
|
||||
return cityLinks;
|
||||
});
|
||||
|
||||
console.log(`[DtCityDiscoveryService] Extracted ${cities.length} city links from page`);
|
||||
|
||||
return cities.map((city) => {
|
||||
let countryCode = 'US';
|
||||
let stateCode: string | null = null;
|
||||
|
||||
if (city.stateSlug) {
|
||||
if (US_STATE_MAP[city.stateSlug]) {
|
||||
stateCode = US_STATE_MAP[city.stateSlug];
|
||||
countryCode = 'US';
|
||||
} else if (CA_PROVINCE_MAP[city.stateSlug]) {
|
||||
stateCode = CA_PROVINCE_MAP[city.stateSlug];
|
||||
countryCode = 'CA';
|
||||
} else if (city.stateSlug.length === 2) {
|
||||
stateCode = city.stateSlug.toUpperCase();
|
||||
if (Object.values(CA_PROVINCE_MAP).includes(stateCode)) {
|
||||
countryCode = 'CA';
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return {
|
||||
name: city.name,
|
||||
slug: city.slug,
|
||||
stateCode,
|
||||
countryCode,
|
||||
url: city.url,
|
||||
};
|
||||
});
|
||||
} finally {
|
||||
await browser.close();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Fetch cities via API endpoints (fallback).
|
||||
*/
|
||||
export async function fetchCitiesFromAPI(): Promise<DutchieCity[]> {
|
||||
console.log('[DtCityDiscoveryService] Attempting API-based city discovery...');
|
||||
|
||||
const apiEndpoints = [
|
||||
'https://dutchie.com/api/cities',
|
||||
'https://api.dutchie.com/v1/cities',
|
||||
];
|
||||
|
||||
for (const endpoint of apiEndpoints) {
|
||||
try {
|
||||
const response = await axios.get(endpoint, {
|
||||
headers: {
|
||||
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0',
|
||||
Accept: 'application/json',
|
||||
},
|
||||
timeout: 15000,
|
||||
});
|
||||
|
||||
if (response.data && Array.isArray(response.data)) {
|
||||
console.log(`[DtCityDiscoveryService] API returned ${response.data.length} cities`);
|
||||
return response.data.map((c: any) => ({
|
||||
name: c.name || c.city,
|
||||
slug: c.slug || c.citySlug,
|
||||
stateCode: c.stateCode || c.state,
|
||||
countryCode: c.countryCode || c.country || 'US',
|
||||
}));
|
||||
}
|
||||
} catch (error: any) {
|
||||
console.log(`[DtCityDiscoveryService] API ${endpoint} failed: ${error.message}`);
|
||||
}
|
||||
}
|
||||
|
||||
return [];
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// DATABASE OPERATIONS
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Upsert a city into dutchie_discovery_cities
|
||||
*/
|
||||
export async function upsertCity(
|
||||
pool: Pool,
|
||||
city: DutchieCity
|
||||
): Promise<{ id: number; inserted: boolean; updated: boolean }> {
|
||||
const result = await pool.query(
|
||||
`
|
||||
INSERT INTO dutchie_discovery_cities (
|
||||
platform,
|
||||
city_name,
|
||||
city_slug,
|
||||
state_code,
|
||||
country_code,
|
||||
crawl_enabled,
|
||||
created_at,
|
||||
updated_at
|
||||
) VALUES (
|
||||
'dutchie',
|
||||
$1,
|
||||
$2,
|
||||
$3,
|
||||
$4,
|
||||
TRUE,
|
||||
NOW(),
|
||||
NOW()
|
||||
)
|
||||
ON CONFLICT (platform, country_code, state_code, city_slug)
|
||||
DO UPDATE SET
|
||||
city_name = EXCLUDED.city_name,
|
||||
crawl_enabled = TRUE,
|
||||
updated_at = NOW()
|
||||
RETURNING id, (xmax = 0) AS inserted
|
||||
`,
|
||||
[city.name, city.slug, city.stateCode, city.countryCode]
|
||||
);
|
||||
|
||||
const inserted = result.rows[0]?.inserted === true;
|
||||
return {
|
||||
id: result.rows[0]?.id,
|
||||
inserted,
|
||||
updated: !inserted,
|
||||
};
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// MAIN SERVICE CLASS
|
||||
// ============================================================
|
||||
|
||||
export class DtCityDiscoveryService {
|
||||
constructor(private pool: Pool) {}
|
||||
|
||||
/**
|
||||
* Run auto-discovery (browser + API fallback)
|
||||
*/
|
||||
async runAutoDiscovery(): Promise<CityDiscoveryResult> {
|
||||
const startTime = Date.now();
|
||||
const errors: string[] = [];
|
||||
let citiesFound = 0;
|
||||
let citiesInserted = 0;
|
||||
let citiesUpdated = 0;
|
||||
|
||||
console.log('[DtCityDiscoveryService] Starting auto city discovery...');
|
||||
|
||||
try {
|
||||
let cities = await fetchCitiesFromBrowser();
|
||||
|
||||
if (cities.length === 0) {
|
||||
console.log('[DtCityDiscoveryService] Browser returned 0 cities, trying API...');
|
||||
cities = await fetchCitiesFromAPI();
|
||||
}
|
||||
|
||||
citiesFound = cities.length;
|
||||
console.log(`[DtCityDiscoveryService] Found ${citiesFound} cities`);
|
||||
|
||||
for (const city of cities) {
|
||||
try {
|
||||
const result = await upsertCity(this.pool, city);
|
||||
if (result.inserted) citiesInserted++;
|
||||
else if (result.updated) citiesUpdated++;
|
||||
} catch (error: any) {
|
||||
const msg = `Failed to upsert city ${city.slug}: ${error.message}`;
|
||||
console.error(`[DtCityDiscoveryService] ${msg}`);
|
||||
errors.push(msg);
|
||||
}
|
||||
}
|
||||
} catch (error: any) {
|
||||
const msg = `Auto discovery failed: ${error.message}`;
|
||||
console.error(`[DtCityDiscoveryService] ${msg}`);
|
||||
errors.push(msg);
|
||||
}
|
||||
|
||||
const durationMs = Date.now() - startTime;
|
||||
|
||||
return {
|
||||
citiesFound,
|
||||
citiesInserted,
|
||||
citiesUpdated,
|
||||
errors,
|
||||
durationMs,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Seed a single city manually
|
||||
*/
|
||||
async seedCity(city: DutchieCity): Promise<ManualSeedResult> {
|
||||
console.log(`[DtCityDiscoveryService] Seeding city: ${city.name} (${city.slug}), ${city.stateCode}, ${city.countryCode}`);
|
||||
|
||||
const result = await upsertCity(this.pool, city);
|
||||
|
||||
return {
|
||||
city,
|
||||
id: result.id,
|
||||
wasInserted: result.inserted,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Seed multiple cities from a list
|
||||
*/
|
||||
async seedCities(cities: DutchieCity[]): Promise<{
|
||||
results: ManualSeedResult[];
|
||||
errors: string[];
|
||||
}> {
|
||||
const results: ManualSeedResult[] = [];
|
||||
const errors: string[] = [];
|
||||
|
||||
for (const city of cities) {
|
||||
try {
|
||||
const result = await this.seedCity(city);
|
||||
results.push(result);
|
||||
} catch (error: any) {
|
||||
errors.push(`${city.slug}: ${error.message}`);
|
||||
}
|
||||
}
|
||||
|
||||
return { results, errors };
|
||||
}
|
||||
|
||||
/**
|
||||
* Get statistics about discovered cities
|
||||
*/
|
||||
async getStats(): Promise<{
|
||||
total: number;
|
||||
byCountry: Array<{ countryCode: string; count: number }>;
|
||||
byState: Array<{ stateCode: string; countryCode: string; count: number }>;
|
||||
crawlEnabled: number;
|
||||
neverCrawled: number;
|
||||
}> {
|
||||
const [totalRes, byCountryRes, byStateRes, enabledRes, neverRes] = await Promise.all([
|
||||
this.pool.query('SELECT COUNT(*) as cnt FROM dutchie_discovery_cities WHERE platform = \'dutchie\''),
|
||||
this.pool.query(`
|
||||
SELECT country_code, COUNT(*) as cnt
|
||||
FROM dutchie_discovery_cities
|
||||
WHERE platform = 'dutchie'
|
||||
GROUP BY country_code
|
||||
ORDER BY cnt DESC
|
||||
`),
|
||||
this.pool.query(`
|
||||
SELECT state_code, country_code, COUNT(*) as cnt
|
||||
FROM dutchie_discovery_cities
|
||||
WHERE platform = 'dutchie' AND state_code IS NOT NULL
|
||||
GROUP BY state_code, country_code
|
||||
ORDER BY cnt DESC
|
||||
`),
|
||||
this.pool.query(`
|
||||
SELECT COUNT(*) as cnt
|
||||
FROM dutchie_discovery_cities
|
||||
WHERE platform = 'dutchie' AND crawl_enabled = TRUE
|
||||
`),
|
||||
this.pool.query(`
|
||||
SELECT COUNT(*) as cnt
|
||||
FROM dutchie_discovery_cities
|
||||
WHERE platform = 'dutchie' AND last_crawled_at IS NULL
|
||||
`),
|
||||
]);
|
||||
|
||||
return {
|
||||
total: parseInt(totalRes.rows[0]?.cnt || '0', 10),
|
||||
byCountry: byCountryRes.rows.map((r) => ({
|
||||
countryCode: r.country_code,
|
||||
count: parseInt(r.cnt, 10),
|
||||
})),
|
||||
byState: byStateRes.rows.map((r) => ({
|
||||
stateCode: r.state_code,
|
||||
countryCode: r.country_code,
|
||||
count: parseInt(r.cnt, 10),
|
||||
})),
|
||||
crawlEnabled: parseInt(enabledRes.rows[0]?.cnt || '0', 10),
|
||||
neverCrawled: parseInt(neverRes.rows[0]?.cnt || '0', 10),
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
export default DtCityDiscoveryService;
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,390 +0,0 @@
|
||||
/**
|
||||
* DutchieCityDiscovery
|
||||
*
|
||||
* Discovers cities from Dutchie's /cities page and upserts to dutchie_discovery_cities.
|
||||
*
|
||||
* Responsibilities:
|
||||
* - Fetch all cities available on Dutchie
|
||||
* - For each city derive: city_name, city_slug, state_code, country_code
|
||||
* - Upsert into dutchie_discovery_cities
|
||||
*/
|
||||
|
||||
import { Pool } from 'pg';
|
||||
import axios from 'axios';
|
||||
import puppeteer from 'puppeteer-extra';
|
||||
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
|
||||
import type { Browser, Page } from 'puppeteer';
|
||||
|
||||
puppeteer.use(StealthPlugin());
|
||||
|
||||
// ============================================================
|
||||
// TYPES
|
||||
// ============================================================
|
||||
|
||||
export interface DutchieCity {
|
||||
name: string;
|
||||
slug: string;
|
||||
stateCode: string | null;
|
||||
countryCode: string;
|
||||
url?: string;
|
||||
}
|
||||
|
||||
export interface CityDiscoveryResult {
|
||||
citiesFound: number;
|
||||
citiesInserted: number;
|
||||
citiesUpdated: number;
|
||||
errors: string[];
|
||||
durationMs: number;
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// US STATE CODE MAPPING
|
||||
// ============================================================
|
||||
|
||||
const US_STATE_MAP: Record<string, string> = {
|
||||
'alabama': 'AL', 'alaska': 'AK', 'arizona': 'AZ', 'arkansas': 'AR',
|
||||
'california': 'CA', 'colorado': 'CO', 'connecticut': 'CT', 'delaware': 'DE',
|
||||
'florida': 'FL', 'georgia': 'GA', 'hawaii': 'HI', 'idaho': 'ID',
|
||||
'illinois': 'IL', 'indiana': 'IN', 'iowa': 'IA', 'kansas': 'KS',
|
||||
'kentucky': 'KY', 'louisiana': 'LA', 'maine': 'ME', 'maryland': 'MD',
|
||||
'massachusetts': 'MA', 'michigan': 'MI', 'minnesota': 'MN', 'mississippi': 'MS',
|
||||
'missouri': 'MO', 'montana': 'MT', 'nebraska': 'NE', 'nevada': 'NV',
|
||||
'new-hampshire': 'NH', 'new-jersey': 'NJ', 'new-mexico': 'NM', 'new-york': 'NY',
|
||||
'north-carolina': 'NC', 'north-dakota': 'ND', 'ohio': 'OH', 'oklahoma': 'OK',
|
||||
'oregon': 'OR', 'pennsylvania': 'PA', 'rhode-island': 'RI', 'south-carolina': 'SC',
|
||||
'south-dakota': 'SD', 'tennessee': 'TN', 'texas': 'TX', 'utah': 'UT',
|
||||
'vermont': 'VT', 'virginia': 'VA', 'washington': 'WA', 'west-virginia': 'WV',
|
||||
'wisconsin': 'WI', 'wyoming': 'WY', 'district-of-columbia': 'DC',
|
||||
};
|
||||
|
||||
// Canadian province mapping
|
||||
const CA_PROVINCE_MAP: Record<string, string> = {
|
||||
'alberta': 'AB', 'british-columbia': 'BC', 'manitoba': 'MB',
|
||||
'new-brunswick': 'NB', 'newfoundland-and-labrador': 'NL',
|
||||
'northwest-territories': 'NT', 'nova-scotia': 'NS', 'nunavut': 'NU',
|
||||
'ontario': 'ON', 'prince-edward-island': 'PE', 'quebec': 'QC',
|
||||
'saskatchewan': 'SK', 'yukon': 'YT',
|
||||
};
|
||||
|
||||
// ============================================================
|
||||
// CITY FETCHING
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Fetch cities from Dutchie's /cities page using Puppeteer to extract data.
|
||||
*/
|
||||
async function fetchCitiesFromDutchie(): Promise<DutchieCity[]> {
|
||||
console.log('[DutchieCityDiscovery] Launching browser to fetch cities...');
|
||||
|
||||
const browser = await puppeteer.launch({
|
||||
headless: 'new',
|
||||
args: ['--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage'],
|
||||
});
|
||||
|
||||
try {
|
||||
const page = await browser.newPage();
|
||||
await page.setUserAgent(
|
||||
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
|
||||
);
|
||||
|
||||
// Navigate to cities page
|
||||
console.log('[DutchieCityDiscovery] Navigating to https://dutchie.com/cities...');
|
||||
await page.goto('https://dutchie.com/cities', {
|
||||
waitUntil: 'networkidle2',
|
||||
timeout: 60000,
|
||||
});
|
||||
|
||||
// Wait for content to load
|
||||
await new Promise((r) => setTimeout(r, 3000));
|
||||
|
||||
// Extract city links from the page
|
||||
const cities = await page.evaluate(() => {
|
||||
const cityLinks: Array<{
|
||||
name: string;
|
||||
slug: string;
|
||||
url: string;
|
||||
stateSlug: string | null;
|
||||
}> = [];
|
||||
|
||||
// Find all city links - they typically follow pattern /city/{state}/{city}
|
||||
const links = document.querySelectorAll('a[href*="/city/"]');
|
||||
links.forEach((link) => {
|
||||
const href = (link as HTMLAnchorElement).href;
|
||||
const text = (link as HTMLElement).innerText?.trim();
|
||||
|
||||
// Parse URL: https://dutchie.com/city/{state}/{city}
|
||||
const match = href.match(/\/city\/([^/]+)\/([^/?]+)/);
|
||||
if (match && text) {
|
||||
cityLinks.push({
|
||||
name: text,
|
||||
slug: match[2],
|
||||
url: href,
|
||||
stateSlug: match[1],
|
||||
});
|
||||
}
|
||||
});
|
||||
|
||||
return cityLinks;
|
||||
});
|
||||
|
||||
console.log(`[DutchieCityDiscovery] Extracted ${cities.length} city links from page`);
|
||||
|
||||
// Convert to DutchieCity format
|
||||
const result: DutchieCity[] = [];
|
||||
|
||||
for (const city of cities) {
|
||||
// Determine country and state code
|
||||
let countryCode = 'US';
|
||||
let stateCode: string | null = null;
|
||||
|
||||
if (city.stateSlug) {
|
||||
// Check if it's a US state
|
||||
if (US_STATE_MAP[city.stateSlug]) {
|
||||
stateCode = US_STATE_MAP[city.stateSlug];
|
||||
countryCode = 'US';
|
||||
}
|
||||
// Check if it's a Canadian province
|
||||
else if (CA_PROVINCE_MAP[city.stateSlug]) {
|
||||
stateCode = CA_PROVINCE_MAP[city.stateSlug];
|
||||
countryCode = 'CA';
|
||||
}
|
||||
// Check if it's already a 2-letter code
|
||||
else if (city.stateSlug.length === 2) {
|
||||
stateCode = city.stateSlug.toUpperCase();
|
||||
// Determine country based on state code
|
||||
if (Object.values(CA_PROVINCE_MAP).includes(stateCode)) {
|
||||
countryCode = 'CA';
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
result.push({
|
||||
name: city.name,
|
||||
slug: city.slug,
|
||||
stateCode,
|
||||
countryCode,
|
||||
url: city.url,
|
||||
});
|
||||
}
|
||||
|
||||
return result;
|
||||
} finally {
|
||||
await browser.close();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Alternative: Fetch cities by making API/GraphQL requests.
|
||||
* Falls back to this if scraping fails.
|
||||
*/
|
||||
async function fetchCitiesFromAPI(): Promise<DutchieCity[]> {
|
||||
console.log('[DutchieCityDiscovery] Attempting API-based city discovery...');
|
||||
|
||||
// Dutchie may have an API endpoint for cities
|
||||
// Try common patterns
|
||||
const apiEndpoints = [
|
||||
'https://dutchie.com/api/cities',
|
||||
'https://api.dutchie.com/v1/cities',
|
||||
];
|
||||
|
||||
for (const endpoint of apiEndpoints) {
|
||||
try {
|
||||
const response = await axios.get(endpoint, {
|
||||
headers: {
|
||||
'User-Agent':
|
||||
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0',
|
||||
Accept: 'application/json',
|
||||
},
|
||||
timeout: 15000,
|
||||
});
|
||||
|
||||
if (response.data && Array.isArray(response.data)) {
|
||||
console.log(`[DutchieCityDiscovery] API returned ${response.data.length} cities`);
|
||||
return response.data.map((c: any) => ({
|
||||
name: c.name || c.city,
|
||||
slug: c.slug || c.citySlug,
|
||||
stateCode: c.stateCode || c.state,
|
||||
countryCode: c.countryCode || c.country || 'US',
|
||||
}));
|
||||
}
|
||||
} catch (error: any) {
|
||||
console.log(`[DutchieCityDiscovery] API ${endpoint} failed: ${error.message}`);
|
||||
}
|
||||
}
|
||||
|
||||
return [];
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// DATABASE OPERATIONS
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Upsert a city into dutchie_discovery_cities
|
||||
*/
|
||||
async function upsertCity(
|
||||
pool: Pool,
|
||||
city: DutchieCity
|
||||
): Promise<{ inserted: boolean; updated: boolean }> {
|
||||
const result = await pool.query(
|
||||
`
|
||||
INSERT INTO dutchie_discovery_cities (
|
||||
platform,
|
||||
city_name,
|
||||
city_slug,
|
||||
state_code,
|
||||
country_code,
|
||||
last_crawled_at,
|
||||
updated_at
|
||||
) VALUES (
|
||||
'dutchie',
|
||||
$1,
|
||||
$2,
|
||||
$3,
|
||||
$4,
|
||||
NOW(),
|
||||
NOW()
|
||||
)
|
||||
ON CONFLICT (platform, country_code, state_code, city_slug)
|
||||
DO UPDATE SET
|
||||
city_name = EXCLUDED.city_name,
|
||||
last_crawled_at = NOW(),
|
||||
updated_at = NOW()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
`,
|
||||
[city.name, city.slug, city.stateCode, city.countryCode]
|
||||
);
|
||||
|
||||
const inserted = result.rows[0]?.inserted === true;
|
||||
return { inserted, updated: !inserted };
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// MAIN DISCOVERY FUNCTION
|
||||
// ============================================================
|
||||
|
||||
export class DutchieCityDiscovery {
|
||||
private pool: Pool;
|
||||
|
||||
constructor(pool: Pool) {
|
||||
this.pool = pool;
|
||||
}
|
||||
|
||||
/**
|
||||
* Run the city discovery process
|
||||
*/
|
||||
async run(): Promise<CityDiscoveryResult> {
|
||||
const startTime = Date.now();
|
||||
const errors: string[] = [];
|
||||
let citiesFound = 0;
|
||||
let citiesInserted = 0;
|
||||
let citiesUpdated = 0;
|
||||
|
||||
console.log('[DutchieCityDiscovery] Starting city discovery...');
|
||||
|
||||
try {
|
||||
// Try scraping first, fall back to API
|
||||
let cities = await fetchCitiesFromDutchie();
|
||||
|
||||
if (cities.length === 0) {
|
||||
console.log('[DutchieCityDiscovery] Scraping returned 0 cities, trying API...');
|
||||
cities = await fetchCitiesFromAPI();
|
||||
}
|
||||
|
||||
citiesFound = cities.length;
|
||||
console.log(`[DutchieCityDiscovery] Found ${citiesFound} cities`);
|
||||
|
||||
// Upsert each city
|
||||
for (const city of cities) {
|
||||
try {
|
||||
const result = await upsertCity(this.pool, city);
|
||||
if (result.inserted) {
|
||||
citiesInserted++;
|
||||
} else if (result.updated) {
|
||||
citiesUpdated++;
|
||||
}
|
||||
} catch (error: any) {
|
||||
const msg = `Failed to upsert city ${city.slug}: ${error.message}`;
|
||||
console.error(`[DutchieCityDiscovery] ${msg}`);
|
||||
errors.push(msg);
|
||||
}
|
||||
}
|
||||
} catch (error: any) {
|
||||
const msg = `City discovery failed: ${error.message}`;
|
||||
console.error(`[DutchieCityDiscovery] ${msg}`);
|
||||
errors.push(msg);
|
||||
}
|
||||
|
||||
const durationMs = Date.now() - startTime;
|
||||
|
||||
console.log('[DutchieCityDiscovery] Discovery complete:');
|
||||
console.log(` Cities found: ${citiesFound}`);
|
||||
console.log(` Inserted: ${citiesInserted}`);
|
||||
console.log(` Updated: ${citiesUpdated}`);
|
||||
console.log(` Errors: ${errors.length}`);
|
||||
console.log(` Duration: ${(durationMs / 1000).toFixed(1)}s`);
|
||||
|
||||
return {
|
||||
citiesFound,
|
||||
citiesInserted,
|
||||
citiesUpdated,
|
||||
errors,
|
||||
durationMs,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Get statistics about discovered cities
|
||||
*/
|
||||
async getStats(): Promise<{
|
||||
total: number;
|
||||
byCountry: Array<{ countryCode: string; count: number }>;
|
||||
byState: Array<{ stateCode: string; countryCode: string; count: number }>;
|
||||
crawlEnabled: number;
|
||||
neverCrawled: number;
|
||||
}> {
|
||||
const [totalRes, byCountryRes, byStateRes, enabledRes, neverRes] = await Promise.all([
|
||||
this.pool.query('SELECT COUNT(*) as cnt FROM dutchie_discovery_cities'),
|
||||
this.pool.query(`
|
||||
SELECT country_code, COUNT(*) as cnt
|
||||
FROM dutchie_discovery_cities
|
||||
GROUP BY country_code
|
||||
ORDER BY cnt DESC
|
||||
`),
|
||||
this.pool.query(`
|
||||
SELECT state_code, country_code, COUNT(*) as cnt
|
||||
FROM dutchie_discovery_cities
|
||||
WHERE state_code IS NOT NULL
|
||||
GROUP BY state_code, country_code
|
||||
ORDER BY cnt DESC
|
||||
`),
|
||||
this.pool.query(`
|
||||
SELECT COUNT(*) as cnt
|
||||
FROM dutchie_discovery_cities
|
||||
WHERE crawl_enabled = TRUE
|
||||
`),
|
||||
this.pool.query(`
|
||||
SELECT COUNT(*) as cnt
|
||||
FROM dutchie_discovery_cities
|
||||
WHERE last_crawled_at IS NULL
|
||||
`),
|
||||
]);
|
||||
|
||||
return {
|
||||
total: parseInt(totalRes.rows[0]?.cnt || '0', 10),
|
||||
byCountry: byCountryRes.rows.map((r) => ({
|
||||
countryCode: r.country_code,
|
||||
count: parseInt(r.cnt, 10),
|
||||
})),
|
||||
byState: byStateRes.rows.map((r) => ({
|
||||
stateCode: r.state_code,
|
||||
countryCode: r.country_code,
|
||||
count: parseInt(r.cnt, 10),
|
||||
})),
|
||||
crawlEnabled: parseInt(enabledRes.rows[0]?.cnt || '0', 10),
|
||||
neverCrawled: parseInt(neverRes.rows[0]?.cnt || '0', 10),
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
export default DutchieCityDiscovery;
|
||||
@@ -1,639 +0,0 @@
|
||||
/**
|
||||
* DutchieLocationDiscovery
|
||||
*
|
||||
* Discovers store locations for each city from Dutchie and upserts to dutchie_discovery_locations.
|
||||
*
|
||||
* Responsibilities:
|
||||
* - Given a dutchie_discovery_cities row, call Dutchie's location/search endpoint
|
||||
* - For each store: extract platform_location_id, platform_slug, platform_menu_url, name, address, coords
|
||||
* - Upsert into dutchie_discovery_locations
|
||||
* - DO NOT overwrite status if already verified/merged/rejected
|
||||
* - DO NOT overwrite dispensary_id if already set
|
||||
*/
|
||||
|
||||
import { Pool } from 'pg';
|
||||
import axios from 'axios';
|
||||
import puppeteer from 'puppeteer-extra';
|
||||
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
|
||||
|
||||
puppeteer.use(StealthPlugin());
|
||||
|
||||
// ============================================================
|
||||
// TYPES
|
||||
// ============================================================
|
||||
|
||||
export interface DiscoveryCity {
|
||||
id: number;
|
||||
platform: string;
|
||||
cityName: string;
|
||||
citySlug: string;
|
||||
stateCode: string | null;
|
||||
countryCode: string;
|
||||
crawlEnabled: boolean;
|
||||
}
|
||||
|
||||
export interface DutchieLocation {
|
||||
platformLocationId: string;
|
||||
platformSlug: string;
|
||||
platformMenuUrl: string;
|
||||
name: string;
|
||||
rawAddress: string | null;
|
||||
addressLine1: string | null;
|
||||
addressLine2: string | null;
|
||||
city: string | null;
|
||||
stateCode: string | null;
|
||||
postalCode: string | null;
|
||||
countryCode: string | null;
|
||||
latitude: number | null;
|
||||
longitude: number | null;
|
||||
timezone: string | null;
|
||||
offersDelivery: boolean | null;
|
||||
offersPickup: boolean | null;
|
||||
isRecreational: boolean | null;
|
||||
isMedical: boolean | null;
|
||||
metadata: Record<string, any>;
|
||||
}
|
||||
|
||||
export interface LocationDiscoveryResult {
|
||||
cityId: number;
|
||||
citySlug: string;
|
||||
locationsFound: number;
|
||||
locationsInserted: number;
|
||||
locationsUpdated: number;
|
||||
locationsSkipped: number;
|
||||
errors: string[];
|
||||
durationMs: number;
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// LOCATION FETCHING
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Fetch locations for a city using Puppeteer to scrape the city page
|
||||
*/
|
||||
async function fetchLocationsForCity(city: DiscoveryCity): Promise<DutchieLocation[]> {
|
||||
console.log(`[DutchieLocationDiscovery] Fetching locations for ${city.cityName}, ${city.stateCode}...`);
|
||||
|
||||
const browser = await puppeteer.launch({
|
||||
headless: 'new',
|
||||
args: ['--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage'],
|
||||
});
|
||||
|
||||
try {
|
||||
const page = await browser.newPage();
|
||||
await page.setUserAgent(
|
||||
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
|
||||
);
|
||||
|
||||
// Navigate to city page - use /us/dispensaries/{city_slug} pattern
|
||||
const cityUrl = `https://dutchie.com/us/dispensaries/${city.citySlug}`;
|
||||
console.log(`[DutchieLocationDiscovery] Navigating to ${cityUrl}...`);
|
||||
|
||||
await page.goto(cityUrl, {
|
||||
waitUntil: 'networkidle2',
|
||||
timeout: 60000,
|
||||
});
|
||||
|
||||
// Wait for content
|
||||
await new Promise((r) => setTimeout(r, 3000));
|
||||
|
||||
// Try to extract __NEXT_DATA__ which often contains store data
|
||||
const nextData = await page.evaluate(() => {
|
||||
const script = document.querySelector('script#__NEXT_DATA__');
|
||||
if (script) {
|
||||
try {
|
||||
return JSON.parse(script.textContent || '{}');
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
return null;
|
||||
});
|
||||
|
||||
let locations: DutchieLocation[] = [];
|
||||
|
||||
if (nextData?.props?.pageProps?.dispensaries) {
|
||||
// Extract from Next.js data
|
||||
const dispensaries = nextData.props.pageProps.dispensaries;
|
||||
console.log(`[DutchieLocationDiscovery] Found ${dispensaries.length} dispensaries in __NEXT_DATA__`);
|
||||
|
||||
locations = dispensaries.map((d: any) => parseDispensaryData(d, city));
|
||||
} else {
|
||||
// Fall back to DOM scraping
|
||||
console.log('[DutchieLocationDiscovery] No __NEXT_DATA__, trying DOM scraping...');
|
||||
|
||||
const scrapedData = await page.evaluate(() => {
|
||||
const stores: Array<{
|
||||
name: string;
|
||||
href: string;
|
||||
address: string | null;
|
||||
}> = [];
|
||||
|
||||
// Look for dispensary cards/links
|
||||
const cards = document.querySelectorAll('[data-testid="dispensary-card"], .dispensary-card, a[href*="/dispensary/"]');
|
||||
cards.forEach((card) => {
|
||||
const link = card.querySelector('a[href*="/dispensary/"]') || (card as HTMLAnchorElement);
|
||||
const href = (link as HTMLAnchorElement).href || '';
|
||||
const name =
|
||||
card.querySelector('[data-testid="dispensary-name"]')?.textContent ||
|
||||
card.querySelector('h2, h3, .name')?.textContent ||
|
||||
link.textContent ||
|
||||
'';
|
||||
const address = card.querySelector('[data-testid="dispensary-address"], .address')?.textContent || null;
|
||||
|
||||
if (href && name) {
|
||||
stores.push({
|
||||
name: name.trim(),
|
||||
href,
|
||||
address: address?.trim() || null,
|
||||
});
|
||||
}
|
||||
});
|
||||
|
||||
return stores;
|
||||
});
|
||||
|
||||
console.log(`[DutchieLocationDiscovery] DOM scraping found ${scrapedData.length} stores`);
|
||||
|
||||
locations = scrapedData.map((s) => {
|
||||
// Parse slug from URL
|
||||
const match = s.href.match(/\/dispensary\/([^/?]+)/);
|
||||
const slug = match ? match[1] : s.name.toLowerCase().replace(/\s+/g, '-');
|
||||
|
||||
return {
|
||||
platformLocationId: slug, // Will be resolved later
|
||||
platformSlug: slug,
|
||||
platformMenuUrl: `https://dutchie.com/dispensary/${slug}`,
|
||||
name: s.name,
|
||||
rawAddress: s.address,
|
||||
addressLine1: null,
|
||||
addressLine2: null,
|
||||
city: city.cityName,
|
||||
stateCode: city.stateCode,
|
||||
postalCode: null,
|
||||
countryCode: city.countryCode,
|
||||
latitude: null,
|
||||
longitude: null,
|
||||
timezone: null,
|
||||
offersDelivery: null,
|
||||
offersPickup: null,
|
||||
isRecreational: null,
|
||||
isMedical: null,
|
||||
metadata: { source: 'dom_scrape', originalUrl: s.href },
|
||||
};
|
||||
});
|
||||
}
|
||||
|
||||
return locations;
|
||||
} finally {
|
||||
await browser.close();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Parse dispensary data from Dutchie's API/JSON response
|
||||
*/
|
||||
function parseDispensaryData(d: any, city: DiscoveryCity): DutchieLocation {
|
||||
const id = d.id || d._id || d.dispensaryId || '';
|
||||
const slug = d.slug || d.cName || d.name?.toLowerCase().replace(/\s+/g, '-') || '';
|
||||
|
||||
// Build menu URL
|
||||
let menuUrl = `https://dutchie.com/dispensary/${slug}`;
|
||||
if (d.menuUrl) {
|
||||
menuUrl = d.menuUrl;
|
||||
} else if (d.embeddedMenuUrl) {
|
||||
menuUrl = d.embeddedMenuUrl;
|
||||
}
|
||||
|
||||
// Parse address
|
||||
const address = d.address || d.location?.address || {};
|
||||
const rawAddress = [
|
||||
address.line1 || address.street1 || d.address1,
|
||||
address.line2 || address.street2 || d.address2,
|
||||
[
|
||||
address.city || d.city,
|
||||
address.state || address.stateCode || d.state,
|
||||
address.zip || address.zipCode || address.postalCode || d.zip,
|
||||
]
|
||||
.filter(Boolean)
|
||||
.join(' '),
|
||||
]
|
||||
.filter(Boolean)
|
||||
.join(', ');
|
||||
|
||||
return {
|
||||
platformLocationId: id,
|
||||
platformSlug: slug,
|
||||
platformMenuUrl: menuUrl,
|
||||
name: d.name || d.dispensaryName || '',
|
||||
rawAddress: rawAddress || null,
|
||||
addressLine1: address.line1 || address.street1 || d.address1 || null,
|
||||
addressLine2: address.line2 || address.street2 || d.address2 || null,
|
||||
city: address.city || d.city || city.cityName,
|
||||
stateCode: address.state || address.stateCode || d.state || city.stateCode,
|
||||
postalCode: address.zip || address.zipCode || address.postalCode || d.zip || null,
|
||||
countryCode: address.country || address.countryCode || d.country || city.countryCode,
|
||||
latitude: d.latitude ?? d.location?.latitude ?? d.location?.lat ?? null,
|
||||
longitude: d.longitude ?? d.location?.longitude ?? d.location?.lng ?? null,
|
||||
timezone: d.timezone || d.timeZone || null,
|
||||
offersDelivery: d.offerDelivery ?? d.offersDelivery ?? d.delivery ?? null,
|
||||
offersPickup: d.offerPickup ?? d.offersPickup ?? d.pickup ?? null,
|
||||
isRecreational: d.isRecreational ?? d.recreational ?? (d.retailType === 'recreational' || d.retailType === 'both'),
|
||||
isMedical: d.isMedical ?? d.medical ?? (d.retailType === 'medical' || d.retailType === 'both'),
|
||||
metadata: {
|
||||
source: 'next_data',
|
||||
retailType: d.retailType,
|
||||
brand: d.brand,
|
||||
logo: d.logo || d.logoUrl,
|
||||
raw: d,
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Alternative: Use GraphQL to discover locations
|
||||
*/
|
||||
async function fetchLocationsViaGraphQL(city: DiscoveryCity): Promise<DutchieLocation[]> {
|
||||
console.log(`[DutchieLocationDiscovery] Trying GraphQL for ${city.cityName}...`);
|
||||
|
||||
// Try geo-based search
|
||||
// This would require knowing the city's coordinates
|
||||
// For now, return empty and rely on page scraping
|
||||
return [];
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// DATABASE OPERATIONS
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Upsert a location into dutchie_discovery_locations
|
||||
* Does NOT overwrite status if already verified/merged/rejected
|
||||
* Does NOT overwrite dispensary_id if already set
|
||||
*/
|
||||
async function upsertLocation(
|
||||
pool: Pool,
|
||||
location: DutchieLocation,
|
||||
cityId: number
|
||||
): Promise<{ inserted: boolean; updated: boolean; skipped: boolean }> {
|
||||
// First check if this location exists and has a protected status
|
||||
const existing = await pool.query(
|
||||
`
|
||||
SELECT id, status, dispensary_id
|
||||
FROM dutchie_discovery_locations
|
||||
WHERE platform = 'dutchie' AND platform_location_id = $1
|
||||
`,
|
||||
[location.platformLocationId]
|
||||
);
|
||||
|
||||
if (existing.rows.length > 0) {
|
||||
const row = existing.rows[0];
|
||||
const protectedStatuses = ['verified', 'merged', 'rejected'];
|
||||
|
||||
if (protectedStatuses.includes(row.status)) {
|
||||
// Only update last_seen_at for protected statuses
|
||||
await pool.query(
|
||||
`
|
||||
UPDATE dutchie_discovery_locations
|
||||
SET last_seen_at = NOW(), updated_at = NOW()
|
||||
WHERE id = $1
|
||||
`,
|
||||
[row.id]
|
||||
);
|
||||
return { inserted: false, updated: false, skipped: true };
|
||||
}
|
||||
|
||||
// Update existing discovered location (but preserve dispensary_id if set)
|
||||
await pool.query(
|
||||
`
|
||||
UPDATE dutchie_discovery_locations
|
||||
SET
|
||||
platform_slug = $2,
|
||||
platform_menu_url = $3,
|
||||
name = $4,
|
||||
raw_address = COALESCE($5, raw_address),
|
||||
address_line1 = COALESCE($6, address_line1),
|
||||
address_line2 = COALESCE($7, address_line2),
|
||||
city = COALESCE($8, city),
|
||||
state_code = COALESCE($9, state_code),
|
||||
postal_code = COALESCE($10, postal_code),
|
||||
country_code = COALESCE($11, country_code),
|
||||
latitude = COALESCE($12, latitude),
|
||||
longitude = COALESCE($13, longitude),
|
||||
timezone = COALESCE($14, timezone),
|
||||
offers_delivery = COALESCE($15, offers_delivery),
|
||||
offers_pickup = COALESCE($16, offers_pickup),
|
||||
is_recreational = COALESCE($17, is_recreational),
|
||||
is_medical = COALESCE($18, is_medical),
|
||||
metadata = COALESCE($19, metadata),
|
||||
discovery_city_id = $20,
|
||||
last_seen_at = NOW(),
|
||||
updated_at = NOW()
|
||||
WHERE id = $1
|
||||
`,
|
||||
[
|
||||
row.id,
|
||||
location.platformSlug,
|
||||
location.platformMenuUrl,
|
||||
location.name,
|
||||
location.rawAddress,
|
||||
location.addressLine1,
|
||||
location.addressLine2,
|
||||
location.city,
|
||||
location.stateCode,
|
||||
location.postalCode,
|
||||
location.countryCode,
|
||||
location.latitude,
|
||||
location.longitude,
|
||||
location.timezone,
|
||||
location.offersDelivery,
|
||||
location.offersPickup,
|
||||
location.isRecreational,
|
||||
location.isMedical,
|
||||
JSON.stringify(location.metadata),
|
||||
cityId,
|
||||
]
|
||||
);
|
||||
return { inserted: false, updated: true, skipped: false };
|
||||
}
|
||||
|
||||
// Insert new location
|
||||
await pool.query(
|
||||
`
|
||||
INSERT INTO dutchie_discovery_locations (
|
||||
platform,
|
||||
platform_location_id,
|
||||
platform_slug,
|
||||
platform_menu_url,
|
||||
name,
|
||||
raw_address,
|
||||
address_line1,
|
||||
address_line2,
|
||||
city,
|
||||
state_code,
|
||||
postal_code,
|
||||
country_code,
|
||||
latitude,
|
||||
longitude,
|
||||
timezone,
|
||||
status,
|
||||
offers_delivery,
|
||||
offers_pickup,
|
||||
is_recreational,
|
||||
is_medical,
|
||||
metadata,
|
||||
discovery_city_id,
|
||||
first_seen_at,
|
||||
last_seen_at,
|
||||
active,
|
||||
created_at,
|
||||
updated_at
|
||||
) VALUES (
|
||||
'dutchie',
|
||||
$1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14,
|
||||
'discovered',
|
||||
$15, $16, $17, $18, $19, $20,
|
||||
NOW(), NOW(), TRUE, NOW(), NOW()
|
||||
)
|
||||
`,
|
||||
[
|
||||
location.platformLocationId,
|
||||
location.platformSlug,
|
||||
location.platformMenuUrl,
|
||||
location.name,
|
||||
location.rawAddress,
|
||||
location.addressLine1,
|
||||
location.addressLine2,
|
||||
location.city,
|
||||
location.stateCode,
|
||||
location.postalCode,
|
||||
location.countryCode,
|
||||
location.latitude,
|
||||
location.longitude,
|
||||
location.timezone,
|
||||
location.offersDelivery,
|
||||
location.offersPickup,
|
||||
location.isRecreational,
|
||||
location.isMedical,
|
||||
JSON.stringify(location.metadata),
|
||||
cityId,
|
||||
]
|
||||
);
|
||||
|
||||
return { inserted: true, updated: false, skipped: false };
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// MAIN DISCOVERY CLASS
|
||||
// ============================================================
|
||||
|
||||
export class DutchieLocationDiscovery {
|
||||
private pool: Pool;
|
||||
|
||||
constructor(pool: Pool) {
|
||||
this.pool = pool;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get a city by slug
|
||||
*/
|
||||
async getCityBySlug(citySlug: string): Promise<DiscoveryCity | null> {
|
||||
const { rows } = await this.pool.query(
|
||||
`
|
||||
SELECT id, platform, city_name, city_slug, state_code, country_code, crawl_enabled
|
||||
FROM dutchie_discovery_cities
|
||||
WHERE platform = 'dutchie' AND city_slug = $1
|
||||
LIMIT 1
|
||||
`,
|
||||
[citySlug]
|
||||
);
|
||||
|
||||
if (rows.length === 0) return null;
|
||||
|
||||
const r = rows[0];
|
||||
return {
|
||||
id: r.id,
|
||||
platform: r.platform,
|
||||
cityName: r.city_name,
|
||||
citySlug: r.city_slug,
|
||||
stateCode: r.state_code,
|
||||
countryCode: r.country_code,
|
||||
crawlEnabled: r.crawl_enabled,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Get all crawl-enabled cities
|
||||
*/
|
||||
async getEnabledCities(limit?: number): Promise<DiscoveryCity[]> {
|
||||
const { rows } = await this.pool.query(
|
||||
`
|
||||
SELECT id, platform, city_name, city_slug, state_code, country_code, crawl_enabled
|
||||
FROM dutchie_discovery_cities
|
||||
WHERE platform = 'dutchie' AND crawl_enabled = TRUE
|
||||
ORDER BY last_crawled_at ASC NULLS FIRST, city_name ASC
|
||||
${limit ? `LIMIT ${limit}` : ''}
|
||||
`
|
||||
);
|
||||
|
||||
return rows.map((r) => ({
|
||||
id: r.id,
|
||||
platform: r.platform,
|
||||
cityName: r.city_name,
|
||||
citySlug: r.city_slug,
|
||||
stateCode: r.state_code,
|
||||
countryCode: r.country_code,
|
||||
crawlEnabled: r.crawl_enabled,
|
||||
}));
|
||||
}
|
||||
|
||||
/**
|
||||
* Discover locations for a single city
|
||||
*/
|
||||
async discoverForCity(city: DiscoveryCity): Promise<LocationDiscoveryResult> {
|
||||
const startTime = Date.now();
|
||||
const errors: string[] = [];
|
||||
let locationsFound = 0;
|
||||
let locationsInserted = 0;
|
||||
let locationsUpdated = 0;
|
||||
let locationsSkipped = 0;
|
||||
|
||||
console.log(`[DutchieLocationDiscovery] Discovering locations for ${city.cityName}, ${city.stateCode}...`);
|
||||
|
||||
try {
|
||||
// Fetch locations
|
||||
let locations = await fetchLocationsForCity(city);
|
||||
|
||||
// If scraping fails, try GraphQL
|
||||
if (locations.length === 0) {
|
||||
locations = await fetchLocationsViaGraphQL(city);
|
||||
}
|
||||
|
||||
locationsFound = locations.length;
|
||||
console.log(`[DutchieLocationDiscovery] Found ${locationsFound} locations`);
|
||||
|
||||
// Upsert each location
|
||||
for (const location of locations) {
|
||||
try {
|
||||
const result = await upsertLocation(this.pool, location, city.id);
|
||||
if (result.inserted) locationsInserted++;
|
||||
else if (result.updated) locationsUpdated++;
|
||||
else if (result.skipped) locationsSkipped++;
|
||||
} catch (error: any) {
|
||||
const msg = `Failed to upsert location ${location.platformSlug}: ${error.message}`;
|
||||
console.error(`[DutchieLocationDiscovery] ${msg}`);
|
||||
errors.push(msg);
|
||||
}
|
||||
}
|
||||
|
||||
// Update city's last_crawled_at and location_count
|
||||
await this.pool.query(
|
||||
`
|
||||
UPDATE dutchie_discovery_cities
|
||||
SET last_crawled_at = NOW(),
|
||||
location_count = $1,
|
||||
updated_at = NOW()
|
||||
WHERE id = $2
|
||||
`,
|
||||
[locationsFound, city.id]
|
||||
);
|
||||
} catch (error: any) {
|
||||
const msg = `Location discovery failed for ${city.citySlug}: ${error.message}`;
|
||||
console.error(`[DutchieLocationDiscovery] ${msg}`);
|
||||
errors.push(msg);
|
||||
}
|
||||
|
||||
const durationMs = Date.now() - startTime;
|
||||
|
||||
console.log(`[DutchieLocationDiscovery] City ${city.citySlug} complete:`);
|
||||
console.log(` Locations found: ${locationsFound}`);
|
||||
console.log(` Inserted: ${locationsInserted}`);
|
||||
console.log(` Updated: ${locationsUpdated}`);
|
||||
console.log(` Skipped (protected): ${locationsSkipped}`);
|
||||
console.log(` Errors: ${errors.length}`);
|
||||
console.log(` Duration: ${(durationMs / 1000).toFixed(1)}s`);
|
||||
|
||||
return {
|
||||
cityId: city.id,
|
||||
citySlug: city.citySlug,
|
||||
locationsFound,
|
||||
locationsInserted,
|
||||
locationsUpdated,
|
||||
locationsSkipped,
|
||||
errors,
|
||||
durationMs,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Discover locations for all enabled cities
|
||||
*/
|
||||
async discoverAllEnabled(options: {
|
||||
limit?: number;
|
||||
delayMs?: number;
|
||||
} = {}): Promise<{
|
||||
totalCities: number;
|
||||
totalLocationsFound: number;
|
||||
totalInserted: number;
|
||||
totalUpdated: number;
|
||||
totalSkipped: number;
|
||||
errors: string[];
|
||||
durationMs: number;
|
||||
}> {
|
||||
const { limit, delayMs = 2000 } = options;
|
||||
const startTime = Date.now();
|
||||
let totalLocationsFound = 0;
|
||||
let totalInserted = 0;
|
||||
let totalUpdated = 0;
|
||||
let totalSkipped = 0;
|
||||
const allErrors: string[] = [];
|
||||
|
||||
const cities = await this.getEnabledCities(limit);
|
||||
console.log(`[DutchieLocationDiscovery] Discovering locations for ${cities.length} cities...`);
|
||||
|
||||
for (let i = 0; i < cities.length; i++) {
|
||||
const city = cities[i];
|
||||
console.log(`\n[DutchieLocationDiscovery] City ${i + 1}/${cities.length}: ${city.cityName}, ${city.stateCode}`);
|
||||
|
||||
try {
|
||||
const result = await this.discoverForCity(city);
|
||||
totalLocationsFound += result.locationsFound;
|
||||
totalInserted += result.locationsInserted;
|
||||
totalUpdated += result.locationsUpdated;
|
||||
totalSkipped += result.locationsSkipped;
|
||||
allErrors.push(...result.errors);
|
||||
} catch (error: any) {
|
||||
allErrors.push(`City ${city.citySlug} failed: ${error.message}`);
|
||||
}
|
||||
|
||||
// Delay between cities
|
||||
if (i < cities.length - 1 && delayMs > 0) {
|
||||
await new Promise((r) => setTimeout(r, delayMs));
|
||||
}
|
||||
}
|
||||
|
||||
const durationMs = Date.now() - startTime;
|
||||
|
||||
console.log('\n[DutchieLocationDiscovery] All cities complete:');
|
||||
console.log(` Total cities: ${cities.length}`);
|
||||
console.log(` Total locations found: ${totalLocationsFound}`);
|
||||
console.log(` Total inserted: ${totalInserted}`);
|
||||
console.log(` Total updated: ${totalUpdated}`);
|
||||
console.log(` Total skipped: ${totalSkipped}`);
|
||||
console.log(` Total errors: ${allErrors.length}`);
|
||||
console.log(` Duration: ${(durationMs / 1000).toFixed(1)}s`);
|
||||
|
||||
return {
|
||||
totalCities: cities.length,
|
||||
totalLocationsFound,
|
||||
totalInserted,
|
||||
totalUpdated,
|
||||
totalSkipped,
|
||||
errors: allErrors,
|
||||
durationMs,
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
export default DutchieLocationDiscovery;
|
||||
@@ -1,73 +0,0 @@
|
||||
#!/usr/bin/env npx tsx
|
||||
/**
|
||||
* Discovery Entrypoint: Dutchie Cities (Auto)
|
||||
*
|
||||
* Attempts browser/API-based /cities discovery.
|
||||
* Even if currently blocked (403), this runner preserves the auto-discovery path.
|
||||
*
|
||||
* Usage:
|
||||
* npm run discovery:dt:cities:auto
|
||||
* DATABASE_URL="..." npx tsx src/dutchie-az/discovery/discovery-dt-cities-auto.ts
|
||||
*/
|
||||
|
||||
import { Pool } from 'pg';
|
||||
import { DtCityDiscoveryService } from './DtCityDiscoveryService';
|
||||
|
||||
const DB_URL = process.env.DATABASE_URL || process.env.CANNAIQ_DB_URL ||
|
||||
'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus';
|
||||
|
||||
async function main() {
|
||||
console.log('╔══════════════════════════════════════════════════╗');
|
||||
console.log('║ Dutchie City Discovery (AUTO) ║');
|
||||
console.log('║ Browser + API fallback ║');
|
||||
console.log('╚══════════════════════════════════════════════════╝');
|
||||
console.log(`\nDatabase: ${DB_URL.replace(/:[^:@]+@/, ':****@')}`);
|
||||
|
||||
const pool = new Pool({ connectionString: DB_URL });
|
||||
|
||||
try {
|
||||
const { rows } = await pool.query('SELECT NOW() as time');
|
||||
console.log(`Connected at: ${rows[0].time}\n`);
|
||||
|
||||
const service = new DtCityDiscoveryService(pool);
|
||||
const result = await service.runAutoDiscovery();
|
||||
|
||||
console.log('\n' + '═'.repeat(50));
|
||||
console.log('SUMMARY');
|
||||
console.log('═'.repeat(50));
|
||||
console.log(`Cities found: ${result.citiesFound}`);
|
||||
console.log(`Cities inserted: ${result.citiesInserted}`);
|
||||
console.log(`Cities updated: ${result.citiesUpdated}`);
|
||||
console.log(`Errors: ${result.errors.length}`);
|
||||
console.log(`Duration: ${(result.durationMs / 1000).toFixed(1)}s`);
|
||||
|
||||
if (result.errors.length > 0) {
|
||||
console.log('\nErrors:');
|
||||
result.errors.forEach((e, i) => console.log(` ${i + 1}. ${e}`));
|
||||
}
|
||||
|
||||
const stats = await service.getStats();
|
||||
console.log('\nCurrent Database Stats:');
|
||||
console.log(` Total cities: ${stats.total}`);
|
||||
console.log(` Crawl enabled: ${stats.crawlEnabled}`);
|
||||
console.log(` Never crawled: ${stats.neverCrawled}`);
|
||||
|
||||
if (result.citiesFound === 0) {
|
||||
console.log('\n⚠️ No cities found via auto-discovery.');
|
||||
console.log(' This may be due to Dutchie blocking scraping/API access.');
|
||||
console.log(' Use manual seeding instead:');
|
||||
console.log(' npm run discovery:dt:cities:manual -- --city-slug=ny-hudson --city-name=Hudson --state-code=NY');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
console.log('\n✅ Auto city discovery completed');
|
||||
process.exit(0);
|
||||
} catch (error: any) {
|
||||
console.error('\n❌ Auto city discovery failed:', error.message);
|
||||
process.exit(1);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
main();
|
||||
@@ -1,137 +0,0 @@
|
||||
#!/usr/bin/env npx tsx
|
||||
/**
|
||||
* Discovery Entrypoint: Dutchie Cities (Manual Seed)
|
||||
*
|
||||
* Manually seeds cities into dutchie_discovery_cities via CLI args.
|
||||
* Use this when auto-discovery is blocked (403).
|
||||
*
|
||||
* Usage:
|
||||
* npm run discovery:dt:cities:manual -- --city-slug=ny-hudson --city-name=Hudson --state-code=NY
|
||||
* npm run discovery:dt:cities:manual -- --city-slug=ma-boston --city-name=Boston --state-code=MA --country-code=US
|
||||
*
|
||||
* Options:
|
||||
* --city-slug Required. URL slug (e.g., "ny-hudson")
|
||||
* --city-name Required. Display name (e.g., "Hudson")
|
||||
* --state-code Required. State/province code (e.g., "NY", "CA", "ON")
|
||||
* --country-code Optional. Country code (default: "US")
|
||||
*
|
||||
* After seeding, run location discovery:
|
||||
* npm run discovery:dt:locations
|
||||
*/
|
||||
|
||||
import { Pool } from 'pg';
|
||||
import { DtCityDiscoveryService, DutchieCity } from './DtCityDiscoveryService';
|
||||
|
||||
const DB_URL = process.env.DATABASE_URL || process.env.CANNAIQ_DB_URL ||
|
||||
'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus';
|
||||
|
||||
interface Args {
|
||||
citySlug?: string;
|
||||
cityName?: string;
|
||||
stateCode?: string;
|
||||
countryCode: string;
|
||||
}
|
||||
|
||||
function parseArgs(): Args {
|
||||
const args: Args = { countryCode: 'US' };
|
||||
|
||||
for (const arg of process.argv.slice(2)) {
|
||||
const citySlugMatch = arg.match(/--city-slug=(.+)/);
|
||||
if (citySlugMatch) args.citySlug = citySlugMatch[1];
|
||||
|
||||
const cityNameMatch = arg.match(/--city-name=(.+)/);
|
||||
if (cityNameMatch) args.cityName = cityNameMatch[1];
|
||||
|
||||
const stateCodeMatch = arg.match(/--state-code=(.+)/);
|
||||
if (stateCodeMatch) args.stateCode = stateCodeMatch[1].toUpperCase();
|
||||
|
||||
const countryCodeMatch = arg.match(/--country-code=(.+)/);
|
||||
if (countryCodeMatch) args.countryCode = countryCodeMatch[1].toUpperCase();
|
||||
}
|
||||
|
||||
return args;
|
||||
}
|
||||
|
||||
function printUsage() {
|
||||
console.log(`
|
||||
Usage:
|
||||
npm run discovery:dt:cities:manual -- --city-slug=<slug> --city-name=<name> --state-code=<state>
|
||||
|
||||
Required arguments:
|
||||
--city-slug URL slug for the city (e.g., "ny-hudson", "ma-boston")
|
||||
--city-name Display name (e.g., "Hudson", "Boston")
|
||||
--state-code State/province code (e.g., "NY", "CA", "ON")
|
||||
|
||||
Optional arguments:
|
||||
--country-code Country code (default: "US")
|
||||
|
||||
Examples:
|
||||
npm run discovery:dt:cities:manual -- --city-slug=ny-hudson --city-name=Hudson --state-code=NY
|
||||
npm run discovery:dt:cities:manual -- --city-slug=ca-los-angeles --city-name="Los Angeles" --state-code=CA
|
||||
npm run discovery:dt:cities:manual -- --city-slug=on-toronto --city-name=Toronto --state-code=ON --country-code=CA
|
||||
|
||||
After seeding, run location discovery:
|
||||
npm run discovery:dt:locations
|
||||
`);
|
||||
}
|
||||
|
||||
async function main() {
|
||||
const args = parseArgs();
|
||||
|
||||
console.log('╔══════════════════════════════════════════════════╗');
|
||||
console.log('║ Dutchie City Discovery (MANUAL SEED) ║');
|
||||
console.log('╚══════════════════════════════════════════════════╝');
|
||||
|
||||
if (!args.citySlug || !args.cityName || !args.stateCode) {
|
||||
console.error('\n❌ Error: Missing required arguments\n');
|
||||
printUsage();
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
console.log(`\nCity Slug: ${args.citySlug}`);
|
||||
console.log(`City Name: ${args.cityName}`);
|
||||
console.log(`State Code: ${args.stateCode}`);
|
||||
console.log(`Country Code: ${args.countryCode}`);
|
||||
console.log(`Database: ${DB_URL.replace(/:[^:@]+@/, ':****@')}`);
|
||||
|
||||
const pool = new Pool({ connectionString: DB_URL });
|
||||
|
||||
try {
|
||||
const { rows } = await pool.query('SELECT NOW() as time');
|
||||
console.log(`\nConnected at: ${rows[0].time}`);
|
||||
|
||||
const service = new DtCityDiscoveryService(pool);
|
||||
|
||||
const city: DutchieCity = {
|
||||
slug: args.citySlug,
|
||||
name: args.cityName,
|
||||
stateCode: args.stateCode,
|
||||
countryCode: args.countryCode,
|
||||
};
|
||||
|
||||
const result = await service.seedCity(city);
|
||||
|
||||
const action = result.wasInserted ? 'INSERTED' : 'UPDATED';
|
||||
console.log(`\n✅ City ${action}:`);
|
||||
console.log(` ID: ${result.id}`);
|
||||
console.log(` City Slug: ${result.city.slug}`);
|
||||
console.log(` City Name: ${result.city.name}`);
|
||||
console.log(` State Code: ${result.city.stateCode}`);
|
||||
console.log(` Country Code: ${result.city.countryCode}`);
|
||||
|
||||
const stats = await service.getStats();
|
||||
console.log(`\nTotal Dutchie cities: ${stats.total} (${stats.crawlEnabled} enabled)`);
|
||||
|
||||
console.log('\n📍 Next step: Run location discovery');
|
||||
console.log(' npm run discovery:dt:locations');
|
||||
|
||||
process.exit(0);
|
||||
} catch (error: any) {
|
||||
console.error('\n❌ Failed to seed city:', error.message);
|
||||
process.exit(1);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
main();
|
||||
@@ -1,73 +0,0 @@
|
||||
#!/usr/bin/env npx tsx
|
||||
/**
|
||||
* Discovery Runner: Dutchie Cities
|
||||
*
|
||||
* Discovers cities from Dutchie's /cities page and upserts to dutchie_discovery_cities.
|
||||
*
|
||||
* Usage:
|
||||
* npm run discovery:platforms:dt:cities
|
||||
* DATABASE_URL="..." npx tsx src/dutchie-az/discovery/discovery-dt-cities.ts
|
||||
*/
|
||||
|
||||
import { Pool } from 'pg';
|
||||
import { DutchieCityDiscovery } from './DutchieCityDiscovery';
|
||||
|
||||
const DB_URL = process.env.DATABASE_URL || process.env.CANNAIQ_DB_URL ||
|
||||
'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus';
|
||||
|
||||
async function main() {
|
||||
console.log('╔══════════════════════════════════════════════════╗');
|
||||
console.log('║ Dutchie City Discovery Runner ║');
|
||||
console.log('╚══════════════════════════════════════════════════╝');
|
||||
console.log(`\nDatabase: ${DB_URL.replace(/:[^:@]+@/, ':****@')}`);
|
||||
|
||||
const pool = new Pool({ connectionString: DB_URL });
|
||||
|
||||
try {
|
||||
// Test DB connection
|
||||
const { rows } = await pool.query('SELECT NOW() as time');
|
||||
console.log(`Connected at: ${rows[0].time}\n`);
|
||||
|
||||
// Run city discovery
|
||||
const discovery = new DutchieCityDiscovery(pool);
|
||||
const result = await discovery.run();
|
||||
|
||||
// Print summary
|
||||
console.log('\n' + '═'.repeat(50));
|
||||
console.log('SUMMARY');
|
||||
console.log('═'.repeat(50));
|
||||
console.log(`Cities found: ${result.citiesFound}`);
|
||||
console.log(`Cities inserted: ${result.citiesInserted}`);
|
||||
console.log(`Cities updated: ${result.citiesUpdated}`);
|
||||
console.log(`Errors: ${result.errors.length}`);
|
||||
console.log(`Duration: ${(result.durationMs / 1000).toFixed(1)}s`);
|
||||
|
||||
if (result.errors.length > 0) {
|
||||
console.log('\nErrors:');
|
||||
result.errors.forEach((e, i) => console.log(` ${i + 1}. ${e}`));
|
||||
}
|
||||
|
||||
// Get final stats
|
||||
const stats = await discovery.getStats();
|
||||
console.log('\nCurrent Database Stats:');
|
||||
console.log(` Total cities: ${stats.total}`);
|
||||
console.log(` Crawl enabled: ${stats.crawlEnabled}`);
|
||||
console.log(` Never crawled: ${stats.neverCrawled}`);
|
||||
console.log(` By country: ${stats.byCountry.map(c => `${c.countryCode}=${c.count}`).join(', ')}`);
|
||||
|
||||
if (result.errors.length > 0) {
|
||||
console.log('\n⚠️ Completed with errors');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
console.log('\n✅ City discovery completed successfully');
|
||||
process.exit(0);
|
||||
} catch (error: any) {
|
||||
console.error('\n❌ City discovery failed:', error.message);
|
||||
process.exit(1);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
main();
|
||||
@@ -1,113 +0,0 @@
|
||||
#!/usr/bin/env npx tsx
|
||||
/**
|
||||
* Discovery Entrypoint: Dutchie Locations (From Cities)
|
||||
*
|
||||
* Reads from dutchie_discovery_cities (crawl_enabled = true)
|
||||
* and discovers store locations for each city.
|
||||
*
|
||||
* Geo coordinates are captured when available from Dutchie's payloads.
|
||||
*
|
||||
* Usage:
|
||||
* npm run discovery:dt:locations
|
||||
* npm run discovery:dt:locations -- --limit=10
|
||||
* npm run discovery:dt:locations -- --delay=3000
|
||||
* DATABASE_URL="..." npx tsx src/dutchie-az/discovery/discovery-dt-locations-from-cities.ts
|
||||
*
|
||||
* Options:
|
||||
* --limit=N Only process N cities (default: all)
|
||||
* --delay=N Delay between cities in ms (default: 2000)
|
||||
*/
|
||||
|
||||
import { Pool } from 'pg';
|
||||
import { DtLocationDiscoveryService } from './DtLocationDiscoveryService';
|
||||
|
||||
const DB_URL = process.env.DATABASE_URL || process.env.CANNAIQ_DB_URL ||
|
||||
'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus';
|
||||
|
||||
function parseArgs(): { limit?: number; delay?: number } {
|
||||
const args: { limit?: number; delay?: number } = {};
|
||||
|
||||
for (const arg of process.argv.slice(2)) {
|
||||
const limitMatch = arg.match(/--limit=(\d+)/);
|
||||
if (limitMatch) args.limit = parseInt(limitMatch[1], 10);
|
||||
|
||||
const delayMatch = arg.match(/--delay=(\d+)/);
|
||||
if (delayMatch) args.delay = parseInt(delayMatch[1], 10);
|
||||
}
|
||||
|
||||
return args;
|
||||
}
|
||||
|
||||
async function main() {
|
||||
const args = parseArgs();
|
||||
|
||||
console.log('╔══════════════════════════════════════════════════╗');
|
||||
console.log('║ Dutchie Location Discovery (From Cities) ║');
|
||||
console.log('║ Reads crawl_enabled cities, discovers stores ║');
|
||||
console.log('╚══════════════════════════════════════════════════╝');
|
||||
console.log(`\nDatabase: ${DB_URL.replace(/:[^:@]+@/, ':****@')}`);
|
||||
if (args.limit) console.log(`City limit: ${args.limit}`);
|
||||
if (args.delay) console.log(`Delay: ${args.delay}ms`);
|
||||
|
||||
const pool = new Pool({ connectionString: DB_URL });
|
||||
|
||||
try {
|
||||
const { rows } = await pool.query('SELECT NOW() as time');
|
||||
console.log(`Connected at: ${rows[0].time}\n`);
|
||||
|
||||
const service = new DtLocationDiscoveryService(pool);
|
||||
const result = await service.discoverAllEnabled({
|
||||
limit: args.limit,
|
||||
delayMs: args.delay ?? 2000,
|
||||
});
|
||||
|
||||
console.log('\n' + '═'.repeat(50));
|
||||
console.log('SUMMARY');
|
||||
console.log('═'.repeat(50));
|
||||
console.log(`Cities processed: ${result.totalCities}`);
|
||||
console.log(`Locations found: ${result.totalLocationsFound}`);
|
||||
console.log(`Locations inserted: ${result.totalInserted}`);
|
||||
console.log(`Locations updated: ${result.totalUpdated}`);
|
||||
console.log(`Locations skipped: ${result.totalSkipped} (protected status)`);
|
||||
console.log(`Errors: ${result.errors.length}`);
|
||||
console.log(`Duration: ${(result.durationMs / 1000).toFixed(1)}s`);
|
||||
|
||||
if (result.errors.length > 0) {
|
||||
console.log('\nErrors (first 10):');
|
||||
result.errors.slice(0, 10).forEach((e, i) => console.log(` ${i + 1}. ${e}`));
|
||||
if (result.errors.length > 10) {
|
||||
console.log(` ... and ${result.errors.length - 10} more`);
|
||||
}
|
||||
}
|
||||
|
||||
// Get location stats including coordinates
|
||||
const stats = await service.getStats();
|
||||
console.log('\nCurrent Database Stats:');
|
||||
console.log(` Total locations: ${stats.total}`);
|
||||
console.log(` With coordinates: ${stats.withCoordinates}`);
|
||||
console.log(` By status:`);
|
||||
stats.byStatus.forEach(s => console.log(` ${s.status}: ${s.count}`));
|
||||
|
||||
if (result.totalCities === 0) {
|
||||
console.log('\n⚠️ No crawl-enabled cities found.');
|
||||
console.log(' Seed cities first:');
|
||||
console.log(' npm run discovery:dt:cities:manual -- --city-slug=ny-hudson --city-name=Hudson --state-code=NY');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
if (result.errors.length > 0) {
|
||||
console.log('\n⚠️ Completed with errors');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
console.log('\n✅ Location discovery completed successfully');
|
||||
process.exit(0);
|
||||
} catch (error: any) {
|
||||
console.error('\n❌ Location discovery failed:', error.message);
|
||||
process.exit(1);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
main();
|
||||
@@ -1,117 +0,0 @@
|
||||
#!/usr/bin/env npx tsx
|
||||
/**
|
||||
* Discovery Runner: Dutchie Locations
|
||||
*
|
||||
* Discovers store locations for all crawl-enabled cities and upserts to dutchie_discovery_locations.
|
||||
*
|
||||
* Usage:
|
||||
* npm run discovery:platforms:dt:locations
|
||||
* npm run discovery:platforms:dt:locations -- --limit=10
|
||||
* DATABASE_URL="..." npx tsx src/dutchie-az/discovery/discovery-dt-locations.ts
|
||||
*
|
||||
* Options (via args):
|
||||
* --limit=N Only process N cities (default: all)
|
||||
* --delay=N Delay between cities in ms (default: 2000)
|
||||
*/
|
||||
|
||||
import { Pool } from 'pg';
|
||||
import { DutchieLocationDiscovery } from './DutchieLocationDiscovery';
|
||||
|
||||
const DB_URL = process.env.DATABASE_URL || process.env.CANNAIQ_DB_URL ||
|
||||
'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus';
|
||||
|
||||
// Parse CLI args
|
||||
function parseArgs(): { limit?: number; delay?: number } {
|
||||
const args: { limit?: number; delay?: number } = {};
|
||||
|
||||
for (const arg of process.argv.slice(2)) {
|
||||
const limitMatch = arg.match(/--limit=(\d+)/);
|
||||
if (limitMatch) args.limit = parseInt(limitMatch[1], 10);
|
||||
|
||||
const delayMatch = arg.match(/--delay=(\d+)/);
|
||||
if (delayMatch) args.delay = parseInt(delayMatch[1], 10);
|
||||
}
|
||||
|
||||
return args;
|
||||
}
|
||||
|
||||
async function main() {
|
||||
const args = parseArgs();
|
||||
|
||||
console.log('╔══════════════════════════════════════════════════╗');
|
||||
console.log('║ Dutchie Location Discovery Runner ║');
|
||||
console.log('╚══════════════════════════════════════════════════╝');
|
||||
console.log(`\nDatabase: ${DB_URL.replace(/:[^:@]+@/, ':****@')}`);
|
||||
if (args.limit) console.log(`City limit: ${args.limit}`);
|
||||
if (args.delay) console.log(`Delay: ${args.delay}ms`);
|
||||
|
||||
const pool = new Pool({ connectionString: DB_URL });
|
||||
|
||||
try {
|
||||
// Test DB connection
|
||||
const { rows } = await pool.query('SELECT NOW() as time');
|
||||
console.log(`Connected at: ${rows[0].time}\n`);
|
||||
|
||||
// Run location discovery
|
||||
const discovery = new DutchieLocationDiscovery(pool);
|
||||
const result = await discovery.discoverAllEnabled({
|
||||
limit: args.limit,
|
||||
delayMs: args.delay ?? 2000,
|
||||
});
|
||||
|
||||
// Print summary
|
||||
console.log('\n' + '═'.repeat(50));
|
||||
console.log('SUMMARY');
|
||||
console.log('═'.repeat(50));
|
||||
console.log(`Cities processed: ${result.totalCities}`);
|
||||
console.log(`Locations found: ${result.totalLocationsFound}`);
|
||||
console.log(`Locations inserted: ${result.totalInserted}`);
|
||||
console.log(`Locations updated: ${result.totalUpdated}`);
|
||||
console.log(`Locations skipped: ${result.totalSkipped} (protected status)`);
|
||||
console.log(`Errors: ${result.errors.length}`);
|
||||
console.log(`Duration: ${(result.durationMs / 1000).toFixed(1)}s`);
|
||||
|
||||
if (result.errors.length > 0) {
|
||||
console.log('\nErrors (first 10):');
|
||||
result.errors.slice(0, 10).forEach((e, i) => console.log(` ${i + 1}. ${e}`));
|
||||
if (result.errors.length > 10) {
|
||||
console.log(` ... and ${result.errors.length - 10} more`);
|
||||
}
|
||||
}
|
||||
|
||||
// Get DB counts
|
||||
const { rows: countRows } = await pool.query(`
|
||||
SELECT
|
||||
COUNT(*) as total,
|
||||
COUNT(*) FILTER (WHERE status = 'discovered') as discovered,
|
||||
COUNT(*) FILTER (WHERE status = 'verified') as verified,
|
||||
COUNT(*) FILTER (WHERE status = 'merged') as merged,
|
||||
COUNT(*) FILTER (WHERE status = 'rejected') as rejected
|
||||
FROM dutchie_discovery_locations
|
||||
WHERE platform = 'dutchie' AND active = TRUE
|
||||
`);
|
||||
|
||||
const counts = countRows[0];
|
||||
console.log('\nCurrent Database Stats:');
|
||||
console.log(` Total locations: ${counts.total}`);
|
||||
console.log(` Status discovered: ${counts.discovered}`);
|
||||
console.log(` Status verified: ${counts.verified}`);
|
||||
console.log(` Status merged: ${counts.merged}`);
|
||||
console.log(` Status rejected: ${counts.rejected}`);
|
||||
|
||||
if (result.errors.length > 0) {
|
||||
console.log('\n⚠️ Completed with errors');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
console.log('\n✅ Location discovery completed successfully');
|
||||
process.exit(0);
|
||||
} catch (error: any) {
|
||||
console.error('\n❌ Location discovery failed:', error.message);
|
||||
process.exit(1);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
main();
|
||||
@@ -1,10 +0,0 @@
|
||||
/**
|
||||
* Dutchie Discovery Module
|
||||
*
|
||||
* Store discovery pipeline for Dutchie platform.
|
||||
*/
|
||||
|
||||
export { DutchieCityDiscovery } from './DutchieCityDiscovery';
|
||||
export { DutchieLocationDiscovery } from './DutchieLocationDiscovery';
|
||||
export { createDutchieDiscoveryRoutes } from './routes';
|
||||
export { promoteDiscoveryLocation } from './promoteDiscoveryLocation';
|
||||
@@ -1,248 +0,0 @@
|
||||
/**
|
||||
* Promote Discovery Location to Crawlable Dispensary
|
||||
*
|
||||
* When a discovery location is verified or merged:
|
||||
* 1. Ensure a crawl profile exists for the dispensary
|
||||
* 2. Seed/update crawl schedule
|
||||
* 3. Create initial crawl job
|
||||
*/
|
||||
|
||||
import { Pool } from 'pg';
|
||||
|
||||
export interface PromotionResult {
|
||||
success: boolean;
|
||||
discoveryId: number;
|
||||
dispensaryId: number;
|
||||
crawlProfileId?: number;
|
||||
scheduleUpdated?: boolean;
|
||||
crawlJobCreated?: boolean;
|
||||
error?: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Promote a verified/merged discovery location to a crawlable dispensary.
|
||||
*
|
||||
* This function:
|
||||
* 1. Verifies the discovery location is verified/merged and has a dispensary_id
|
||||
* 2. Ensures the dispensary has platform info (menu_type, platform_dispensary_id)
|
||||
* 3. Creates/updates a crawler profile if the profile table exists
|
||||
* 4. Queues an initial crawl job
|
||||
*/
|
||||
export async function promoteDiscoveryLocation(
|
||||
pool: Pool,
|
||||
discoveryLocationId: number
|
||||
): Promise<PromotionResult> {
|
||||
console.log(`[Promote] Starting promotion for discovery location ${discoveryLocationId}...`);
|
||||
|
||||
// Get the discovery location
|
||||
const { rows: locRows } = await pool.query(
|
||||
`
|
||||
SELECT
|
||||
dl.*,
|
||||
d.id as disp_id,
|
||||
d.name as disp_name,
|
||||
d.menu_type as disp_menu_type,
|
||||
d.platform_dispensary_id as disp_platform_id
|
||||
FROM dutchie_discovery_locations dl
|
||||
JOIN dispensaries d ON dl.dispensary_id = d.id
|
||||
WHERE dl.id = $1
|
||||
`,
|
||||
[discoveryLocationId]
|
||||
);
|
||||
|
||||
if (locRows.length === 0) {
|
||||
return {
|
||||
success: false,
|
||||
discoveryId: discoveryLocationId,
|
||||
dispensaryId: 0,
|
||||
error: 'Discovery location not found or not linked to a dispensary',
|
||||
};
|
||||
}
|
||||
|
||||
const location = locRows[0];
|
||||
|
||||
// Verify status
|
||||
if (!['verified', 'merged'].includes(location.status)) {
|
||||
return {
|
||||
success: false,
|
||||
discoveryId: discoveryLocationId,
|
||||
dispensaryId: location.dispensary_id || 0,
|
||||
error: `Cannot promote: location status is '${location.status}', must be 'verified' or 'merged'`,
|
||||
};
|
||||
}
|
||||
|
||||
const dispensaryId = location.dispensary_id;
|
||||
console.log(`[Promote] Location ${discoveryLocationId} -> Dispensary ${dispensaryId} (${location.disp_name})`);
|
||||
|
||||
// Ensure dispensary has platform info
|
||||
if (!location.disp_platform_id) {
|
||||
console.log(`[Promote] Updating dispensary with platform info...`);
|
||||
await pool.query(
|
||||
`
|
||||
UPDATE dispensaries
|
||||
SET platform_dispensary_id = COALESCE(platform_dispensary_id, $1),
|
||||
menu_url = COALESCE(menu_url, $2),
|
||||
menu_type = COALESCE(menu_type, 'dutchie'),
|
||||
updated_at = NOW()
|
||||
WHERE id = $3
|
||||
`,
|
||||
[location.platform_location_id, location.platform_menu_url, dispensaryId]
|
||||
);
|
||||
}
|
||||
|
||||
let crawlProfileId: number | undefined;
|
||||
let scheduleUpdated = false;
|
||||
let crawlJobCreated = false;
|
||||
|
||||
// Check if dispensary_crawler_profiles table exists
|
||||
const { rows: tableCheck } = await pool.query(`
|
||||
SELECT EXISTS (
|
||||
SELECT FROM information_schema.tables
|
||||
WHERE table_name = 'dispensary_crawler_profiles'
|
||||
) as exists
|
||||
`);
|
||||
|
||||
if (tableCheck[0]?.exists) {
|
||||
// Create or get crawler profile
|
||||
console.log(`[Promote] Checking crawler profile...`);
|
||||
|
||||
const { rows: profileRows } = await pool.query(
|
||||
`
|
||||
SELECT id FROM dispensary_crawler_profiles
|
||||
WHERE dispensary_id = $1 AND platform = 'dutchie'
|
||||
`,
|
||||
[dispensaryId]
|
||||
);
|
||||
|
||||
if (profileRows.length > 0) {
|
||||
crawlProfileId = profileRows[0].id;
|
||||
console.log(`[Promote] Using existing profile ${crawlProfileId}`);
|
||||
} else {
|
||||
// Create new profile
|
||||
const profileKey = `dutchie-${location.platform_slug}`;
|
||||
const { rows: newProfile } = await pool.query(
|
||||
`
|
||||
INSERT INTO dispensary_crawler_profiles (
|
||||
dispensary_id,
|
||||
profile_key,
|
||||
profile_name,
|
||||
platform,
|
||||
config,
|
||||
status,
|
||||
enabled,
|
||||
created_at,
|
||||
updated_at
|
||||
) VALUES (
|
||||
$1, $2, $3, 'dutchie', $4, 'sandbox', TRUE, NOW(), NOW()
|
||||
)
|
||||
ON CONFLICT (dispensary_id, platform) DO UPDATE SET
|
||||
enabled = TRUE,
|
||||
updated_at = NOW()
|
||||
RETURNING id
|
||||
`,
|
||||
[
|
||||
dispensaryId,
|
||||
profileKey,
|
||||
`${location.name} (Dutchie)`,
|
||||
JSON.stringify({
|
||||
platformDispensaryId: location.platform_location_id,
|
||||
platformSlug: location.platform_slug,
|
||||
menuUrl: location.platform_menu_url,
|
||||
pricingType: 'rec',
|
||||
useBothModes: true,
|
||||
}),
|
||||
]
|
||||
);
|
||||
|
||||
crawlProfileId = newProfile[0]?.id;
|
||||
console.log(`[Promote] Created new profile ${crawlProfileId}`);
|
||||
}
|
||||
|
||||
// Link profile to dispensary if not already linked
|
||||
await pool.query(
|
||||
`
|
||||
UPDATE dispensaries
|
||||
SET active_crawler_profile_id = COALESCE(active_crawler_profile_id, $1),
|
||||
updated_at = NOW()
|
||||
WHERE id = $2
|
||||
`,
|
||||
[crawlProfileId, dispensaryId]
|
||||
);
|
||||
}
|
||||
|
||||
// Check if crawl_jobs table exists and create initial job
|
||||
const { rows: jobsTableCheck } = await pool.query(`
|
||||
SELECT EXISTS (
|
||||
SELECT FROM information_schema.tables
|
||||
WHERE table_name = 'crawl_jobs'
|
||||
) as exists
|
||||
`);
|
||||
|
||||
if (jobsTableCheck[0]?.exists) {
|
||||
// Check if there's already a pending job
|
||||
const { rows: existingJobs } = await pool.query(
|
||||
`
|
||||
SELECT id FROM crawl_jobs
|
||||
WHERE dispensary_id = $1 AND status IN ('pending', 'running')
|
||||
LIMIT 1
|
||||
`,
|
||||
[dispensaryId]
|
||||
);
|
||||
|
||||
if (existingJobs.length === 0) {
|
||||
// Create initial crawl job
|
||||
console.log(`[Promote] Creating initial crawl job...`);
|
||||
await pool.query(
|
||||
`
|
||||
INSERT INTO crawl_jobs (
|
||||
dispensary_id,
|
||||
job_type,
|
||||
status,
|
||||
priority,
|
||||
config,
|
||||
created_at,
|
||||
updated_at
|
||||
) VALUES (
|
||||
$1, 'dutchie_product_crawl', 'pending', 1, $2, NOW(), NOW()
|
||||
)
|
||||
`,
|
||||
[
|
||||
dispensaryId,
|
||||
JSON.stringify({
|
||||
source: 'discovery_promotion',
|
||||
discoveryLocationId,
|
||||
pricingType: 'rec',
|
||||
useBothModes: true,
|
||||
}),
|
||||
]
|
||||
);
|
||||
crawlJobCreated = true;
|
||||
} else {
|
||||
console.log(`[Promote] Crawl job already exists for dispensary`);
|
||||
}
|
||||
}
|
||||
|
||||
// Update discovery location notes
|
||||
await pool.query(
|
||||
`
|
||||
UPDATE dutchie_discovery_locations
|
||||
SET notes = COALESCE(notes || E'\n', '') || $1,
|
||||
updated_at = NOW()
|
||||
WHERE id = $2
|
||||
`,
|
||||
[`Promoted to crawlable at ${new Date().toISOString()}`, discoveryLocationId]
|
||||
);
|
||||
|
||||
console.log(`[Promote] Promotion complete for discovery location ${discoveryLocationId}`);
|
||||
|
||||
return {
|
||||
success: true,
|
||||
discoveryId: discoveryLocationId,
|
||||
dispensaryId,
|
||||
crawlProfileId,
|
||||
scheduleUpdated,
|
||||
crawlJobCreated,
|
||||
};
|
||||
}
|
||||
|
||||
export default promoteDiscoveryLocation;
|
||||
@@ -1,973 +0,0 @@
|
||||
/**
|
||||
* Platform Discovery API Routes (DT = Dutchie)
|
||||
*
|
||||
* Routes for the platform-specific store discovery pipeline.
|
||||
* Mount at /api/discovery/platforms/dt
|
||||
*
|
||||
* Platform Slug Mapping (for trademark-safe URLs):
|
||||
* dt = Dutchie
|
||||
* jn = Jane (future)
|
||||
* wm = Weedmaps (future)
|
||||
* lf = Leafly (future)
|
||||
* tz = Treez (future)
|
||||
*
|
||||
* Note: The actual platform value stored in the DB remains 'dutchie'.
|
||||
* Only the URL paths use neutral slugs.
|
||||
*/
|
||||
|
||||
import { Router, Request, Response } from 'express';
|
||||
import { Pool } from 'pg';
|
||||
import { DutchieCityDiscovery } from './DutchieCityDiscovery';
|
||||
import { DutchieLocationDiscovery } from './DutchieLocationDiscovery';
|
||||
import { DiscoveryGeoService } from '../../services/DiscoveryGeoService';
|
||||
import { GeoValidationService } from '../../services/GeoValidationService';
|
||||
|
||||
export function createDutchieDiscoveryRoutes(pool: Pool): Router {
|
||||
const router = Router();
|
||||
|
||||
// ============================================================
|
||||
// LOCATIONS
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* GET /api/discovery/platforms/dt/locations
|
||||
*
|
||||
* List discovered locations with filtering.
|
||||
*
|
||||
* Query params:
|
||||
* - status: 'discovered' | 'verified' | 'rejected' | 'merged'
|
||||
* - state_code: e.g., 'AZ', 'CA'
|
||||
* - country_code: 'US' | 'CA'
|
||||
* - unlinked_only: 'true' to show only locations without dispensary_id
|
||||
* - search: search by name
|
||||
* - limit: number (default 50)
|
||||
* - offset: number (default 0)
|
||||
*/
|
||||
router.get('/locations', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const {
|
||||
status,
|
||||
state_code,
|
||||
country_code,
|
||||
unlinked_only,
|
||||
search,
|
||||
limit = '50',
|
||||
offset = '0',
|
||||
} = req.query;
|
||||
|
||||
let whereClause = "WHERE platform = 'dutchie' AND active = TRUE";
|
||||
const params: any[] = [];
|
||||
let paramIndex = 1;
|
||||
|
||||
if (status) {
|
||||
whereClause += ` AND status = $${paramIndex}`;
|
||||
params.push(status);
|
||||
paramIndex++;
|
||||
}
|
||||
|
||||
if (state_code) {
|
||||
whereClause += ` AND state_code = $${paramIndex}`;
|
||||
params.push(state_code);
|
||||
paramIndex++;
|
||||
}
|
||||
|
||||
if (country_code) {
|
||||
whereClause += ` AND country_code = $${paramIndex}`;
|
||||
params.push(country_code);
|
||||
paramIndex++;
|
||||
}
|
||||
|
||||
if (unlinked_only === 'true') {
|
||||
whereClause += ' AND dispensary_id IS NULL';
|
||||
}
|
||||
|
||||
if (search) {
|
||||
whereClause += ` AND (name ILIKE $${paramIndex} OR platform_slug ILIKE $${paramIndex})`;
|
||||
params.push(`%${search}%`);
|
||||
paramIndex++;
|
||||
}
|
||||
|
||||
const limitVal = parseInt(limit as string, 10);
|
||||
const offsetVal = parseInt(offset as string, 10);
|
||||
params.push(limitVal, offsetVal);
|
||||
|
||||
const { rows } = await pool.query(
|
||||
`
|
||||
SELECT
|
||||
dl.id,
|
||||
dl.platform,
|
||||
dl.platform_location_id,
|
||||
dl.platform_slug,
|
||||
dl.platform_menu_url,
|
||||
dl.name,
|
||||
dl.raw_address,
|
||||
dl.address_line1,
|
||||
dl.city,
|
||||
dl.state_code,
|
||||
dl.postal_code,
|
||||
dl.country_code,
|
||||
dl.latitude,
|
||||
dl.longitude,
|
||||
dl.status,
|
||||
dl.dispensary_id,
|
||||
dl.offers_delivery,
|
||||
dl.offers_pickup,
|
||||
dl.is_recreational,
|
||||
dl.is_medical,
|
||||
dl.first_seen_at,
|
||||
dl.last_seen_at,
|
||||
dl.verified_at,
|
||||
dl.verified_by,
|
||||
dl.notes,
|
||||
d.name as dispensary_name
|
||||
FROM dutchie_discovery_locations dl
|
||||
LEFT JOIN dispensaries d ON dl.dispensary_id = d.id
|
||||
${whereClause}
|
||||
ORDER BY dl.first_seen_at DESC
|
||||
LIMIT $${paramIndex} OFFSET $${paramIndex + 1}
|
||||
`,
|
||||
params
|
||||
);
|
||||
|
||||
// Get total count
|
||||
const countParams = params.slice(0, -2);
|
||||
const { rows: countRows } = await pool.query(
|
||||
`SELECT COUNT(*) as total FROM dutchie_discovery_locations dl ${whereClause}`,
|
||||
countParams
|
||||
);
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
locations: rows.map((r) => ({
|
||||
id: r.id,
|
||||
platform: r.platform,
|
||||
platformLocationId: r.platform_location_id,
|
||||
platformSlug: r.platform_slug,
|
||||
platformMenuUrl: r.platform_menu_url,
|
||||
name: r.name,
|
||||
rawAddress: r.raw_address,
|
||||
addressLine1: r.address_line1,
|
||||
city: r.city,
|
||||
stateCode: r.state_code,
|
||||
postalCode: r.postal_code,
|
||||
countryCode: r.country_code,
|
||||
latitude: r.latitude,
|
||||
longitude: r.longitude,
|
||||
status: r.status,
|
||||
dispensaryId: r.dispensary_id,
|
||||
dispensaryName: r.dispensary_name,
|
||||
offersDelivery: r.offers_delivery,
|
||||
offersPickup: r.offers_pickup,
|
||||
isRecreational: r.is_recreational,
|
||||
isMedical: r.is_medical,
|
||||
firstSeenAt: r.first_seen_at,
|
||||
lastSeenAt: r.last_seen_at,
|
||||
verifiedAt: r.verified_at,
|
||||
verifiedBy: r.verified_by,
|
||||
notes: r.notes,
|
||||
})),
|
||||
total: parseInt(countRows[0]?.total || '0', 10),
|
||||
limit: limitVal,
|
||||
offset: offsetVal,
|
||||
});
|
||||
} catch (error: any) {
|
||||
console.error('[Discovery Routes] Error fetching locations:', error);
|
||||
res.status(500).json({ success: false, error: error.message });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/discovery/platforms/dt/locations/:id
|
||||
*
|
||||
* Get a single location by ID.
|
||||
*/
|
||||
router.get('/locations/:id', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
|
||||
const { rows } = await pool.query(
|
||||
`
|
||||
SELECT
|
||||
dl.*,
|
||||
d.name as dispensary_name,
|
||||
d.menu_url as dispensary_menu_url
|
||||
FROM dutchie_discovery_locations dl
|
||||
LEFT JOIN dispensaries d ON dl.dispensary_id = d.id
|
||||
WHERE dl.id = $1
|
||||
`,
|
||||
[parseInt(id, 10)]
|
||||
);
|
||||
|
||||
if (rows.length === 0) {
|
||||
return res.status(404).json({ success: false, error: 'Location not found' });
|
||||
}
|
||||
|
||||
const r = rows[0];
|
||||
res.json({
|
||||
success: true,
|
||||
location: {
|
||||
id: r.id,
|
||||
platform: r.platform,
|
||||
platformLocationId: r.platform_location_id,
|
||||
platformSlug: r.platform_slug,
|
||||
platformMenuUrl: r.platform_menu_url,
|
||||
name: r.name,
|
||||
rawAddress: r.raw_address,
|
||||
addressLine1: r.address_line1,
|
||||
addressLine2: r.address_line2,
|
||||
city: r.city,
|
||||
stateCode: r.state_code,
|
||||
postalCode: r.postal_code,
|
||||
countryCode: r.country_code,
|
||||
latitude: r.latitude,
|
||||
longitude: r.longitude,
|
||||
timezone: r.timezone,
|
||||
status: r.status,
|
||||
dispensaryId: r.dispensary_id,
|
||||
dispensaryName: r.dispensary_name,
|
||||
dispensaryMenuUrl: r.dispensary_menu_url,
|
||||
offersDelivery: r.offers_delivery,
|
||||
offersPickup: r.offers_pickup,
|
||||
isRecreational: r.is_recreational,
|
||||
isMedical: r.is_medical,
|
||||
firstSeenAt: r.first_seen_at,
|
||||
lastSeenAt: r.last_seen_at,
|
||||
verifiedAt: r.verified_at,
|
||||
verifiedBy: r.verified_by,
|
||||
notes: r.notes,
|
||||
metadata: r.metadata,
|
||||
},
|
||||
});
|
||||
} catch (error: any) {
|
||||
console.error('[Discovery Routes] Error fetching location:', error);
|
||||
res.status(500).json({ success: false, error: error.message });
|
||||
}
|
||||
});
|
||||
|
||||
// ============================================================
|
||||
// VERIFICATION ACTIONS
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* POST /api/discovery/platforms/dt/locations/:id/verify-create
|
||||
*
|
||||
* Verify a discovered location and create a new canonical dispensary.
|
||||
*/
|
||||
router.post('/locations/:id/verify-create', async (req: Request, res: Response) => {
|
||||
const client = await pool.connect();
|
||||
try {
|
||||
const { id } = req.params;
|
||||
const { verifiedBy = 'admin' } = req.body;
|
||||
|
||||
await client.query('BEGIN');
|
||||
|
||||
// Get the discovery location
|
||||
const { rows: locRows } = await client.query(
|
||||
`SELECT * FROM dutchie_discovery_locations WHERE id = $1 FOR UPDATE`,
|
||||
[parseInt(id, 10)]
|
||||
);
|
||||
|
||||
if (locRows.length === 0) {
|
||||
await client.query('ROLLBACK');
|
||||
return res.status(404).json({ success: false, error: 'Location not found' });
|
||||
}
|
||||
|
||||
const location = locRows[0];
|
||||
|
||||
if (location.status !== 'discovered') {
|
||||
await client.query('ROLLBACK');
|
||||
return res.status(400).json({
|
||||
success: false,
|
||||
error: `Cannot verify: location status is '${location.status}'`,
|
||||
});
|
||||
}
|
||||
|
||||
// Look up state_id if we have a state_code
|
||||
let stateId: number | null = null;
|
||||
if (location.state_code) {
|
||||
const { rows: stateRows } = await client.query(
|
||||
`SELECT id FROM states WHERE code = $1`,
|
||||
[location.state_code]
|
||||
);
|
||||
if (stateRows.length > 0) {
|
||||
stateId = stateRows[0].id;
|
||||
}
|
||||
}
|
||||
|
||||
// Create the canonical dispensary
|
||||
const { rows: dispRows } = await client.query(
|
||||
`
|
||||
INSERT INTO dispensaries (
|
||||
name,
|
||||
slug,
|
||||
address,
|
||||
city,
|
||||
state,
|
||||
zip,
|
||||
latitude,
|
||||
longitude,
|
||||
timezone,
|
||||
menu_type,
|
||||
menu_url,
|
||||
platform_dispensary_id,
|
||||
state_id,
|
||||
active,
|
||||
created_at,
|
||||
updated_at
|
||||
) VALUES (
|
||||
$1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, TRUE, NOW(), NOW()
|
||||
)
|
||||
RETURNING id
|
||||
`,
|
||||
[
|
||||
location.name,
|
||||
location.platform_slug,
|
||||
location.address_line1,
|
||||
location.city,
|
||||
location.state_code,
|
||||
location.postal_code,
|
||||
location.latitude,
|
||||
location.longitude,
|
||||
location.timezone,
|
||||
'dutchie',
|
||||
location.platform_menu_url,
|
||||
location.platform_location_id,
|
||||
stateId,
|
||||
]
|
||||
);
|
||||
|
||||
const dispensaryId = dispRows[0].id;
|
||||
|
||||
// Update the discovery location
|
||||
await client.query(
|
||||
`
|
||||
UPDATE dutchie_discovery_locations
|
||||
SET status = 'verified',
|
||||
dispensary_id = $1,
|
||||
verified_at = NOW(),
|
||||
verified_by = $2,
|
||||
updated_at = NOW()
|
||||
WHERE id = $3
|
||||
`,
|
||||
[dispensaryId, verifiedBy, id]
|
||||
);
|
||||
|
||||
await client.query('COMMIT');
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
action: 'created',
|
||||
discoveryId: parseInt(id, 10),
|
||||
dispensaryId,
|
||||
message: `Created new dispensary (ID: ${dispensaryId})`,
|
||||
});
|
||||
} catch (error: any) {
|
||||
await client.query('ROLLBACK');
|
||||
console.error('[Discovery Routes] Error in verify-create:', error);
|
||||
res.status(500).json({ success: false, error: error.message });
|
||||
} finally {
|
||||
client.release();
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* POST /api/discovery/platforms/dt/locations/:id/verify-link
|
||||
*
|
||||
* Link a discovered location to an existing dispensary.
|
||||
*
|
||||
* Body:
|
||||
* - dispensaryId: number (required)
|
||||
* - verifiedBy: string (optional)
|
||||
*/
|
||||
router.post('/locations/:id/verify-link', async (req: Request, res: Response) => {
|
||||
const client = await pool.connect();
|
||||
try {
|
||||
const { id } = req.params;
|
||||
const { dispensaryId, verifiedBy = 'admin' } = req.body;
|
||||
|
||||
if (!dispensaryId) {
|
||||
return res.status(400).json({ success: false, error: 'dispensaryId is required' });
|
||||
}
|
||||
|
||||
await client.query('BEGIN');
|
||||
|
||||
// Verify dispensary exists
|
||||
const { rows: dispRows } = await client.query(
|
||||
`SELECT id, name FROM dispensaries WHERE id = $1`,
|
||||
[dispensaryId]
|
||||
);
|
||||
|
||||
if (dispRows.length === 0) {
|
||||
await client.query('ROLLBACK');
|
||||
return res.status(404).json({ success: false, error: 'Dispensary not found' });
|
||||
}
|
||||
|
||||
// Get the discovery location
|
||||
const { rows: locRows } = await client.query(
|
||||
`SELECT * FROM dutchie_discovery_locations WHERE id = $1 FOR UPDATE`,
|
||||
[parseInt(id, 10)]
|
||||
);
|
||||
|
||||
if (locRows.length === 0) {
|
||||
await client.query('ROLLBACK');
|
||||
return res.status(404).json({ success: false, error: 'Location not found' });
|
||||
}
|
||||
|
||||
const location = locRows[0];
|
||||
|
||||
if (location.status !== 'discovered') {
|
||||
await client.query('ROLLBACK');
|
||||
return res.status(400).json({
|
||||
success: false,
|
||||
error: `Cannot link: location status is '${location.status}'`,
|
||||
});
|
||||
}
|
||||
|
||||
// Update dispensary with platform info if missing
|
||||
await client.query(
|
||||
`
|
||||
UPDATE dispensaries
|
||||
SET platform_dispensary_id = COALESCE(platform_dispensary_id, $1),
|
||||
menu_url = COALESCE(menu_url, $2),
|
||||
menu_type = COALESCE(menu_type, 'dutchie'),
|
||||
updated_at = NOW()
|
||||
WHERE id = $3
|
||||
`,
|
||||
[location.platform_location_id, location.platform_menu_url, dispensaryId]
|
||||
);
|
||||
|
||||
// Update the discovery location
|
||||
await client.query(
|
||||
`
|
||||
UPDATE dutchie_discovery_locations
|
||||
SET status = 'merged',
|
||||
dispensary_id = $1,
|
||||
verified_at = NOW(),
|
||||
verified_by = $2,
|
||||
updated_at = NOW()
|
||||
WHERE id = $3
|
||||
`,
|
||||
[dispensaryId, verifiedBy, id]
|
||||
);
|
||||
|
||||
await client.query('COMMIT');
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
action: 'linked',
|
||||
discoveryId: parseInt(id, 10),
|
||||
dispensaryId,
|
||||
dispensaryName: dispRows[0].name,
|
||||
message: `Linked to existing dispensary: ${dispRows[0].name}`,
|
||||
});
|
||||
} catch (error: any) {
|
||||
await client.query('ROLLBACK');
|
||||
console.error('[Discovery Routes] Error in verify-link:', error);
|
||||
res.status(500).json({ success: false, error: error.message });
|
||||
} finally {
|
||||
client.release();
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* POST /api/discovery/platforms/dt/locations/:id/reject
|
||||
*
|
||||
* Reject a discovered location.
|
||||
*
|
||||
* Body:
|
||||
* - reason: string (optional)
|
||||
* - verifiedBy: string (optional)
|
||||
*/
|
||||
router.post('/locations/:id/reject', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
const { reason, verifiedBy = 'admin' } = req.body;
|
||||
|
||||
// Get current status
|
||||
const { rows } = await pool.query(
|
||||
`SELECT status FROM dutchie_discovery_locations WHERE id = $1`,
|
||||
[parseInt(id, 10)]
|
||||
);
|
||||
|
||||
if (rows.length === 0) {
|
||||
return res.status(404).json({ success: false, error: 'Location not found' });
|
||||
}
|
||||
|
||||
if (rows[0].status !== 'discovered') {
|
||||
return res.status(400).json({
|
||||
success: false,
|
||||
error: `Cannot reject: location status is '${rows[0].status}'`,
|
||||
});
|
||||
}
|
||||
|
||||
await pool.query(
|
||||
`
|
||||
UPDATE dutchie_discovery_locations
|
||||
SET status = 'rejected',
|
||||
verified_at = NOW(),
|
||||
verified_by = $1,
|
||||
notes = COALESCE($2, notes),
|
||||
updated_at = NOW()
|
||||
WHERE id = $3
|
||||
`,
|
||||
[verifiedBy, reason, id]
|
||||
);
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
action: 'rejected',
|
||||
discoveryId: parseInt(id, 10),
|
||||
message: 'Location rejected',
|
||||
});
|
||||
} catch (error: any) {
|
||||
console.error('[Discovery Routes] Error in reject:', error);
|
||||
res.status(500).json({ success: false, error: error.message });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* POST /api/discovery/platforms/dt/locations/:id/unreject
|
||||
*
|
||||
* Restore a rejected location to discovered status.
|
||||
*/
|
||||
router.post('/locations/:id/unreject', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
|
||||
// Get current status
|
||||
const { rows } = await pool.query(
|
||||
`SELECT status FROM dutchie_discovery_locations WHERE id = $1`,
|
||||
[parseInt(id, 10)]
|
||||
);
|
||||
|
||||
if (rows.length === 0) {
|
||||
return res.status(404).json({ success: false, error: 'Location not found' });
|
||||
}
|
||||
|
||||
if (rows[0].status !== 'rejected') {
|
||||
return res.status(400).json({
|
||||
success: false,
|
||||
error: `Cannot unreject: location status is '${rows[0].status}'`,
|
||||
});
|
||||
}
|
||||
|
||||
await pool.query(
|
||||
`
|
||||
UPDATE dutchie_discovery_locations
|
||||
SET status = 'discovered',
|
||||
verified_at = NULL,
|
||||
verified_by = NULL,
|
||||
updated_at = NOW()
|
||||
WHERE id = $1
|
||||
`,
|
||||
[id]
|
||||
);
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
action: 'unrejected',
|
||||
discoveryId: parseInt(id, 10),
|
||||
message: 'Location restored to discovered status',
|
||||
});
|
||||
} catch (error: any) {
|
||||
console.error('[Discovery Routes] Error in unreject:', error);
|
||||
res.status(500).json({ success: false, error: error.message });
|
||||
}
|
||||
});
|
||||
|
||||
// ============================================================
|
||||
// SUMMARY / REPORTING
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* GET /api/discovery/platforms/dt/summary
|
||||
*
|
||||
* Get discovery summary statistics.
|
||||
*/
|
||||
router.get('/summary', async (_req: Request, res: Response) => {
|
||||
try {
|
||||
// Total counts by status
|
||||
const { rows: statusRows } = await pool.query(`
|
||||
SELECT status, COUNT(*) as cnt
|
||||
FROM dutchie_discovery_locations
|
||||
WHERE platform = 'dutchie' AND active = TRUE
|
||||
GROUP BY status
|
||||
`);
|
||||
|
||||
const statusCounts: Record<string, number> = {};
|
||||
let totalLocations = 0;
|
||||
for (const row of statusRows) {
|
||||
statusCounts[row.status] = parseInt(row.cnt, 10);
|
||||
totalLocations += parseInt(row.cnt, 10);
|
||||
}
|
||||
|
||||
// By state
|
||||
const { rows: stateRows } = await pool.query(`
|
||||
SELECT
|
||||
state_code,
|
||||
COUNT(*) as total,
|
||||
COUNT(*) FILTER (WHERE status = 'verified') as verified,
|
||||
COUNT(*) FILTER (WHERE dispensary_id IS NULL AND status = 'discovered') as unlinked
|
||||
FROM dutchie_discovery_locations
|
||||
WHERE platform = 'dutchie' AND active = TRUE AND state_code IS NOT NULL
|
||||
GROUP BY state_code
|
||||
ORDER BY total DESC
|
||||
`);
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
summary: {
|
||||
total_locations: totalLocations,
|
||||
discovered: statusCounts['discovered'] || 0,
|
||||
verified: statusCounts['verified'] || 0,
|
||||
merged: statusCounts['merged'] || 0,
|
||||
rejected: statusCounts['rejected'] || 0,
|
||||
},
|
||||
by_state: stateRows.map((r) => ({
|
||||
state_code: r.state_code,
|
||||
total: parseInt(r.total, 10),
|
||||
verified: parseInt(r.verified, 10),
|
||||
unlinked: parseInt(r.unlinked, 10),
|
||||
})),
|
||||
});
|
||||
} catch (error: any) {
|
||||
console.error('[Discovery Routes] Error in summary:', error);
|
||||
res.status(500).json({ success: false, error: error.message });
|
||||
}
|
||||
});
|
||||
|
||||
// ============================================================
|
||||
// CITIES
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* GET /api/discovery/platforms/dt/cities
|
||||
*
|
||||
* List discovery cities.
|
||||
*/
|
||||
router.get('/cities', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const { state_code, country_code, crawl_enabled, limit = '100', offset = '0' } = req.query;
|
||||
|
||||
let whereClause = "WHERE platform = 'dutchie'";
|
||||
const params: any[] = [];
|
||||
let paramIndex = 1;
|
||||
|
||||
if (state_code) {
|
||||
whereClause += ` AND state_code = $${paramIndex}`;
|
||||
params.push(state_code);
|
||||
paramIndex++;
|
||||
}
|
||||
|
||||
if (country_code) {
|
||||
whereClause += ` AND country_code = $${paramIndex}`;
|
||||
params.push(country_code);
|
||||
paramIndex++;
|
||||
}
|
||||
|
||||
if (crawl_enabled === 'true') {
|
||||
whereClause += ' AND crawl_enabled = TRUE';
|
||||
} else if (crawl_enabled === 'false') {
|
||||
whereClause += ' AND crawl_enabled = FALSE';
|
||||
}
|
||||
|
||||
params.push(parseInt(limit as string, 10), parseInt(offset as string, 10));
|
||||
|
||||
const { rows } = await pool.query(
|
||||
`
|
||||
SELECT
|
||||
id,
|
||||
platform,
|
||||
city_name,
|
||||
city_slug,
|
||||
state_code,
|
||||
country_code,
|
||||
last_crawled_at,
|
||||
crawl_enabled,
|
||||
location_count
|
||||
FROM dutchie_discovery_cities
|
||||
${whereClause}
|
||||
ORDER BY country_code, state_code, city_name
|
||||
LIMIT $${paramIndex} OFFSET $${paramIndex + 1}
|
||||
`,
|
||||
params
|
||||
);
|
||||
|
||||
const { rows: countRows } = await pool.query(
|
||||
`SELECT COUNT(*) as total FROM dutchie_discovery_cities ${whereClause}`,
|
||||
params.slice(0, -2)
|
||||
);
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
cities: rows.map((r) => ({
|
||||
id: r.id,
|
||||
platform: r.platform,
|
||||
cityName: r.city_name,
|
||||
citySlug: r.city_slug,
|
||||
stateCode: r.state_code,
|
||||
countryCode: r.country_code,
|
||||
lastCrawledAt: r.last_crawled_at,
|
||||
crawlEnabled: r.crawl_enabled,
|
||||
locationCount: r.location_count,
|
||||
})),
|
||||
total: parseInt(countRows[0]?.total || '0', 10),
|
||||
});
|
||||
} catch (error: any) {
|
||||
console.error('[Discovery Routes] Error fetching cities:', error);
|
||||
res.status(500).json({ success: false, error: error.message });
|
||||
}
|
||||
});
|
||||
|
||||
// ============================================================
|
||||
// MATCH CANDIDATES
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* GET /api/discovery/platforms/dt/locations/:id/match-candidates
|
||||
*
|
||||
* Find potential dispensary matches for a discovery location.
|
||||
*/
|
||||
router.get('/locations/:id/match-candidates', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
|
||||
// Get the discovery location
|
||||
const { rows: locRows } = await pool.query(
|
||||
`SELECT * FROM dutchie_discovery_locations WHERE id = $1`,
|
||||
[parseInt(id, 10)]
|
||||
);
|
||||
|
||||
if (locRows.length === 0) {
|
||||
return res.status(404).json({ success: false, error: 'Location not found' });
|
||||
}
|
||||
|
||||
const location = locRows[0];
|
||||
|
||||
// Find potential matches
|
||||
const { rows: candidates } = await pool.query(
|
||||
`
|
||||
SELECT
|
||||
d.id,
|
||||
d.name,
|
||||
d.city,
|
||||
d.state,
|
||||
d.address,
|
||||
d.menu_type,
|
||||
d.platform_dispensary_id,
|
||||
d.menu_url,
|
||||
d.latitude,
|
||||
d.longitude,
|
||||
CASE
|
||||
WHEN d.name ILIKE $1 THEN 'exact_name'
|
||||
WHEN d.name ILIKE $2 THEN 'partial_name'
|
||||
WHEN d.city ILIKE $3 AND d.state = $4 THEN 'same_city'
|
||||
ELSE 'location_match'
|
||||
END as match_type,
|
||||
CASE
|
||||
WHEN d.latitude IS NOT NULL AND d.longitude IS NOT NULL
|
||||
AND $5::float IS NOT NULL AND $6::float IS NOT NULL
|
||||
THEN (3959 * acos(
|
||||
LEAST(1.0, GREATEST(-1.0,
|
||||
cos(radians($5::float)) * cos(radians(d.latitude)) *
|
||||
cos(radians(d.longitude) - radians($6::float)) +
|
||||
sin(radians($5::float)) * sin(radians(d.latitude))
|
||||
))
|
||||
))
|
||||
ELSE NULL
|
||||
END as distance_miles
|
||||
FROM dispensaries d
|
||||
WHERE d.state = $4
|
||||
AND (
|
||||
d.name ILIKE $1
|
||||
OR d.name ILIKE $2
|
||||
OR d.city ILIKE $3
|
||||
OR (
|
||||
d.latitude IS NOT NULL
|
||||
AND d.longitude IS NOT NULL
|
||||
AND $5::float IS NOT NULL
|
||||
AND $6::float IS NOT NULL
|
||||
)
|
||||
)
|
||||
ORDER BY
|
||||
CASE
|
||||
WHEN d.name ILIKE $1 THEN 1
|
||||
WHEN d.name ILIKE $2 THEN 2
|
||||
ELSE 3
|
||||
END,
|
||||
distance_miles NULLS LAST
|
||||
LIMIT 10
|
||||
`,
|
||||
[
|
||||
location.name,
|
||||
`%${location.name.split(' ')[0]}%`,
|
||||
location.city,
|
||||
location.state_code,
|
||||
location.latitude,
|
||||
location.longitude,
|
||||
]
|
||||
);
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
location: {
|
||||
id: location.id,
|
||||
name: location.name,
|
||||
city: location.city,
|
||||
stateCode: location.state_code,
|
||||
},
|
||||
candidates: candidates.map((c) => ({
|
||||
id: c.id,
|
||||
name: c.name,
|
||||
city: c.city,
|
||||
state: c.state,
|
||||
address: c.address,
|
||||
menuType: c.menu_type,
|
||||
platformDispensaryId: c.platform_dispensary_id,
|
||||
menuUrl: c.menu_url,
|
||||
matchType: c.match_type,
|
||||
distanceMiles: c.distance_miles ? Math.round(c.distance_miles * 10) / 10 : null,
|
||||
})),
|
||||
});
|
||||
} catch (error: any) {
|
||||
console.error('[Discovery Routes] Error fetching match candidates:', error);
|
||||
res.status(500).json({ success: false, error: error.message });
|
||||
}
|
||||
});
|
||||
|
||||
// ============================================================
|
||||
// GEO / NEARBY (Admin/Debug Only)
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* GET /api/discovery/platforms/dt/nearby
|
||||
*
|
||||
* Find discovery locations near a given coordinate.
|
||||
* This is an internal/debug endpoint for admin use.
|
||||
*
|
||||
* Query params:
|
||||
* - lat: number (required)
|
||||
* - lon: number (required)
|
||||
* - radiusKm: number (optional, default 50)
|
||||
* - limit: number (optional, default 20)
|
||||
* - status: string (optional, filter by status)
|
||||
*/
|
||||
router.get('/nearby', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const { lat, lon, radiusKm = '50', limit = '20', status } = req.query;
|
||||
|
||||
// Validate required params
|
||||
if (!lat || !lon) {
|
||||
return res.status(400).json({
|
||||
success: false,
|
||||
error: 'lat and lon are required query parameters',
|
||||
});
|
||||
}
|
||||
|
||||
const latNum = parseFloat(lat as string);
|
||||
const lonNum = parseFloat(lon as string);
|
||||
const radiusNum = parseFloat(radiusKm as string);
|
||||
const limitNum = parseInt(limit as string, 10);
|
||||
|
||||
if (isNaN(latNum) || isNaN(lonNum)) {
|
||||
return res.status(400).json({
|
||||
success: false,
|
||||
error: 'lat and lon must be valid numbers',
|
||||
});
|
||||
}
|
||||
|
||||
const geoService = new DiscoveryGeoService(pool);
|
||||
|
||||
const locations = await geoService.findNearbyDiscoveryLocations(latNum, lonNum, {
|
||||
radiusKm: radiusNum,
|
||||
limit: limitNum,
|
||||
platform: 'dutchie',
|
||||
status: status as string | undefined,
|
||||
});
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
center: { lat: latNum, lon: lonNum },
|
||||
radiusKm: radiusNum,
|
||||
count: locations.length,
|
||||
locations,
|
||||
});
|
||||
} catch (error: any) {
|
||||
console.error('[Discovery Routes] Error in nearby:', error);
|
||||
res.status(500).json({ success: false, error: error.message });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/discovery/platforms/dt/geo-stats
|
||||
*
|
||||
* Get coordinate coverage statistics for discovery locations.
|
||||
* This is an internal/debug endpoint for admin use.
|
||||
*/
|
||||
router.get('/geo-stats', async (_req: Request, res: Response) => {
|
||||
try {
|
||||
const geoService = new DiscoveryGeoService(pool);
|
||||
const stats = await geoService.getCoordinateCoverageStats();
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
stats,
|
||||
});
|
||||
} catch (error: any) {
|
||||
console.error('[Discovery Routes] Error in geo-stats:', error);
|
||||
res.status(500).json({ success: false, error: error.message });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/discovery/platforms/dt/locations/:id/validate-geo
|
||||
*
|
||||
* Validate the geographic data for a discovery location.
|
||||
* This is an internal/debug endpoint for admin use.
|
||||
*/
|
||||
router.get('/locations/:id/validate-geo', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
|
||||
// Get the location
|
||||
const { rows } = await pool.query(
|
||||
`SELECT latitude, longitude, state_code, country_code, name
|
||||
FROM dutchie_discovery_locations WHERE id = $1`,
|
||||
[parseInt(id, 10)]
|
||||
);
|
||||
|
||||
if (rows.length === 0) {
|
||||
return res.status(404).json({ success: false, error: 'Location not found' });
|
||||
}
|
||||
|
||||
const location = rows[0];
|
||||
const geoValidation = new GeoValidationService();
|
||||
const result = geoValidation.validateLocationState({
|
||||
latitude: location.latitude,
|
||||
longitude: location.longitude,
|
||||
state_code: location.state_code,
|
||||
country_code: location.country_code,
|
||||
});
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
location: {
|
||||
id: parseInt(id, 10),
|
||||
name: location.name,
|
||||
latitude: location.latitude,
|
||||
longitude: location.longitude,
|
||||
stateCode: location.state_code,
|
||||
countryCode: location.country_code,
|
||||
},
|
||||
validation: result,
|
||||
});
|
||||
} catch (error: any) {
|
||||
console.error('[Discovery Routes] Error in validate-geo:', error);
|
||||
res.status(500).json({ success: false, error: error.message });
|
||||
}
|
||||
});
|
||||
|
||||
return router;
|
||||
}
|
||||
|
||||
export default createDutchieDiscoveryRoutes;
|
||||
@@ -1,92 +0,0 @@
|
||||
/**
|
||||
* Dutchie AZ Data Pipeline
|
||||
*
|
||||
* Isolated data pipeline for crawling and storing Dutchie Arizona dispensary data.
|
||||
* This module is completely separate from the main application database.
|
||||
*
|
||||
* Features:
|
||||
* - Two-mode crawling (Mode A: UI parity, Mode B: MAX COVERAGE)
|
||||
* - Derived stockStatus field (in_stock, out_of_stock, unknown)
|
||||
* - Full raw payload storage for 100% data preservation
|
||||
* - AZDHS dispensary list as canonical source
|
||||
*/
|
||||
|
||||
// Types
|
||||
export * from './types';
|
||||
|
||||
// Database
|
||||
export {
|
||||
getDutchieAZPool,
|
||||
query,
|
||||
getClient,
|
||||
closePool,
|
||||
healthCheck,
|
||||
} from './db/connection';
|
||||
|
||||
export {
|
||||
createSchema,
|
||||
dropSchema,
|
||||
schemaExists,
|
||||
ensureSchema,
|
||||
} from './db/schema';
|
||||
|
||||
// Services - GraphQL Client
|
||||
export {
|
||||
GRAPHQL_HASHES,
|
||||
ARIZONA_CENTERPOINTS,
|
||||
resolveDispensaryId,
|
||||
fetchAllProducts,
|
||||
fetchAllProductsBothModes,
|
||||
discoverArizonaDispensaries,
|
||||
// Alias for backward compatibility
|
||||
discoverArizonaDispensaries as discoverDispensaries,
|
||||
} from './services/graphql-client';
|
||||
|
||||
// Services - Discovery
|
||||
export {
|
||||
importFromExistingDispensaries,
|
||||
discoverDispensaries as discoverAndSaveDispensaries,
|
||||
resolvePlatformDispensaryIds,
|
||||
getAllDispensaries,
|
||||
getDispensaryById,
|
||||
getDispensariesWithPlatformIds,
|
||||
} from './services/discovery';
|
||||
|
||||
// Services - Product Crawler
|
||||
export {
|
||||
normalizeProduct,
|
||||
normalizeSnapshot,
|
||||
crawlDispensaryProducts,
|
||||
crawlAllArizonaDispensaries,
|
||||
} from './services/product-crawler';
|
||||
|
||||
export type { CrawlResult } from './services/product-crawler';
|
||||
|
||||
// Services - Scheduler
|
||||
export {
|
||||
startScheduler,
|
||||
stopScheduler,
|
||||
triggerImmediateCrawl,
|
||||
getSchedulerStatus,
|
||||
crawlSingleDispensary,
|
||||
// Schedule config CRUD
|
||||
getAllSchedules,
|
||||
getScheduleById,
|
||||
createSchedule,
|
||||
updateSchedule,
|
||||
deleteSchedule,
|
||||
triggerScheduleNow,
|
||||
initializeDefaultSchedules,
|
||||
// Run logs
|
||||
getRunLogs,
|
||||
} from './services/scheduler';
|
||||
|
||||
// Services - AZDHS Import
|
||||
export {
|
||||
importAZDHSDispensaries,
|
||||
importFromJSON,
|
||||
getImportStats,
|
||||
} from './services/azdhs-import';
|
||||
|
||||
// Routes
|
||||
export { default as dutchieAZRouter } from './routes';
|
||||
@@ -1,682 +0,0 @@
|
||||
/**
|
||||
* Analytics API Routes
|
||||
*
|
||||
* Provides REST API endpoints for all analytics services.
|
||||
* All routes are prefixed with /api/analytics
|
||||
*
|
||||
* Phase 3: Analytics Dashboards
|
||||
*/
|
||||
|
||||
import { Router, Request, Response } from 'express';
|
||||
import { Pool } from 'pg';
|
||||
import {
|
||||
AnalyticsCache,
|
||||
PriceTrendService,
|
||||
PenetrationService,
|
||||
CategoryAnalyticsService,
|
||||
StoreChangeService,
|
||||
BrandOpportunityService,
|
||||
} from '../services/analytics';
|
||||
|
||||
export function createAnalyticsRouter(pool: Pool): Router {
|
||||
const router = Router();
|
||||
|
||||
// Initialize services
|
||||
const cache = new AnalyticsCache(pool, { defaultTtlMinutes: 15 });
|
||||
const priceService = new PriceTrendService(pool, cache);
|
||||
const penetrationService = new PenetrationService(pool, cache);
|
||||
const categoryService = new CategoryAnalyticsService(pool, cache);
|
||||
const storeService = new StoreChangeService(pool, cache);
|
||||
const brandOpportunityService = new BrandOpportunityService(pool, cache);
|
||||
|
||||
// ============================================================
|
||||
// PRICE ANALYTICS
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* GET /api/analytics/price/product/:id
|
||||
* Get price trend for a specific product
|
||||
*/
|
||||
router.get('/price/product/:id', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const productId = parseInt(req.params.id);
|
||||
const storeId = req.query.storeId ? parseInt(req.query.storeId as string) : undefined;
|
||||
const days = req.query.days ? parseInt(req.query.days as string) : 30;
|
||||
|
||||
const result = await priceService.getProductPriceTrend(productId, storeId, days);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Price product error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch product price trend' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/analytics/price/brand/:name
|
||||
* Get price trend for a brand
|
||||
*/
|
||||
router.get('/price/brand/:name', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const brandName = decodeURIComponent(req.params.name);
|
||||
const filters = {
|
||||
storeId: req.query.storeId ? parseInt(req.query.storeId as string) : undefined,
|
||||
category: req.query.category as string | undefined,
|
||||
state: req.query.state as string | undefined,
|
||||
days: req.query.days ? parseInt(req.query.days as string) : 30,
|
||||
};
|
||||
|
||||
const result = await priceService.getBrandPriceTrend(brandName, filters);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Price brand error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch brand price trend' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/analytics/price/category/:name
|
||||
* Get price trend for a category
|
||||
*/
|
||||
router.get('/price/category/:name', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const category = decodeURIComponent(req.params.name);
|
||||
const filters = {
|
||||
storeId: req.query.storeId ? parseInt(req.query.storeId as string) : undefined,
|
||||
brandName: req.query.brand as string | undefined,
|
||||
state: req.query.state as string | undefined,
|
||||
days: req.query.days ? parseInt(req.query.days as string) : 30,
|
||||
};
|
||||
|
||||
const result = await priceService.getCategoryPriceTrend(category, filters);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Price category error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch category price trend' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/analytics/price/summary
|
||||
* Get price summary statistics
|
||||
*/
|
||||
router.get('/price/summary', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const filters = {
|
||||
storeId: req.query.storeId ? parseInt(req.query.storeId as string) : undefined,
|
||||
brandName: req.query.brand as string | undefined,
|
||||
category: req.query.category as string | undefined,
|
||||
state: req.query.state as string | undefined,
|
||||
};
|
||||
|
||||
const result = await priceService.getPriceSummary(filters);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Price summary error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch price summary' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/analytics/price/compression/:category
|
||||
* Get price compression analysis for a category
|
||||
*/
|
||||
router.get('/price/compression/:category', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const category = decodeURIComponent(req.params.category);
|
||||
const state = req.query.state as string | undefined;
|
||||
|
||||
const result = await priceService.detectPriceCompression(category, state);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Price compression error:', error);
|
||||
res.status(500).json({ error: 'Failed to analyze price compression' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/analytics/price/global
|
||||
* Get global price statistics
|
||||
*/
|
||||
router.get('/price/global', async (_req: Request, res: Response) => {
|
||||
try {
|
||||
const result = await priceService.getGlobalPriceStats();
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Global price error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch global price stats' });
|
||||
}
|
||||
});
|
||||
|
||||
// ============================================================
|
||||
// PENETRATION ANALYTICS
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* GET /api/analytics/penetration/brand/:name
|
||||
* Get penetration data for a brand
|
||||
*/
|
||||
router.get('/penetration/brand/:name', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const brandName = decodeURIComponent(req.params.name);
|
||||
const filters = {
|
||||
state: req.query.state as string | undefined,
|
||||
category: req.query.category as string | undefined,
|
||||
};
|
||||
|
||||
const result = await penetrationService.getBrandPenetration(brandName, filters);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Brand penetration error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch brand penetration' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/analytics/penetration/top
|
||||
* Get top brands by penetration
|
||||
*/
|
||||
router.get('/penetration/top', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const limit = req.query.limit ? parseInt(req.query.limit as string) : 20;
|
||||
const filters = {
|
||||
state: req.query.state as string | undefined,
|
||||
category: req.query.category as string | undefined,
|
||||
minStores: req.query.minStores ? parseInt(req.query.minStores as string) : 2,
|
||||
minSkus: req.query.minSkus ? parseInt(req.query.minSkus as string) : 5,
|
||||
};
|
||||
|
||||
const result = await penetrationService.getTopBrandsByPenetration(limit, filters);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Top penetration error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch top brands' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/analytics/penetration/trend/:brand
|
||||
* Get penetration trend for a brand
|
||||
*/
|
||||
router.get('/penetration/trend/:brand', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const brandName = decodeURIComponent(req.params.brand);
|
||||
const days = req.query.days ? parseInt(req.query.days as string) : 30;
|
||||
|
||||
const result = await penetrationService.getPenetrationTrend(brandName, days);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Penetration trend error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch penetration trend' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/analytics/penetration/shelf-share/:brand
|
||||
* Get shelf share by category for a brand
|
||||
*/
|
||||
router.get('/penetration/shelf-share/:brand', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const brandName = decodeURIComponent(req.params.brand);
|
||||
const result = await penetrationService.getShelfShareByCategory(brandName);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Shelf share error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch shelf share' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/analytics/penetration/by-state/:brand
|
||||
* Get brand presence by state
|
||||
*/
|
||||
router.get('/penetration/by-state/:brand', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const brandName = decodeURIComponent(req.params.brand);
|
||||
const result = await penetrationService.getBrandPresenceByState(brandName);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Brand by state error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch brand presence by state' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/analytics/penetration/stores/:brand
|
||||
* Get stores carrying a brand
|
||||
*/
|
||||
router.get('/penetration/stores/:brand', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const brandName = decodeURIComponent(req.params.brand);
|
||||
const result = await penetrationService.getStoresCarryingBrand(brandName);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Stores carrying brand error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch stores' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/analytics/penetration/heatmap
|
||||
* Get penetration heatmap data
|
||||
*/
|
||||
router.get('/penetration/heatmap', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const brandName = req.query.brand as string | undefined;
|
||||
const result = await penetrationService.getPenetrationHeatmap(brandName);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Heatmap error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch heatmap data' });
|
||||
}
|
||||
});
|
||||
|
||||
// ============================================================
|
||||
// CATEGORY ANALYTICS
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* GET /api/analytics/category/summary
|
||||
* Get category summary
|
||||
*/
|
||||
router.get('/category/summary', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const category = req.query.category as string | undefined;
|
||||
const filters = {
|
||||
state: req.query.state as string | undefined,
|
||||
storeId: req.query.storeId ? parseInt(req.query.storeId as string) : undefined,
|
||||
};
|
||||
|
||||
const result = await categoryService.getCategorySummary(category, filters);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Category summary error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch category summary' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/analytics/category/growth
|
||||
* Get category growth data
|
||||
*/
|
||||
router.get('/category/growth', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const days = req.query.days ? parseInt(req.query.days as string) : 7;
|
||||
const filters = {
|
||||
state: req.query.state as string | undefined,
|
||||
storeId: req.query.storeId ? parseInt(req.query.storeId as string) : undefined,
|
||||
minSkus: req.query.minSkus ? parseInt(req.query.minSkus as string) : 10,
|
||||
};
|
||||
|
||||
const result = await categoryService.getCategoryGrowth(days, filters);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Category growth error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch category growth' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/analytics/category/trend/:category
|
||||
* Get category growth trend over time
|
||||
*/
|
||||
router.get('/category/trend/:category', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const category = decodeURIComponent(req.params.category);
|
||||
const days = req.query.days ? parseInt(req.query.days as string) : 90;
|
||||
|
||||
const result = await categoryService.getCategoryGrowthTrend(category, days);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Category trend error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch category trend' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/analytics/category/heatmap
|
||||
* Get category heatmap data
|
||||
*/
|
||||
router.get('/category/heatmap', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const metric = (req.query.metric as 'skus' | 'growth' | 'price') || 'skus';
|
||||
const periods = req.query.periods ? parseInt(req.query.periods as string) : 12;
|
||||
|
||||
const result = await categoryService.getCategoryHeatmap(metric, periods);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Category heatmap error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch heatmap' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/analytics/category/top-movers
|
||||
* Get top growing and declining categories
|
||||
*/
|
||||
router.get('/category/top-movers', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const limit = req.query.limit ? parseInt(req.query.limit as string) : 5;
|
||||
const days = req.query.days ? parseInt(req.query.days as string) : 30;
|
||||
|
||||
const result = await categoryService.getTopMovers(limit, days);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Top movers error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch top movers' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/analytics/category/:category/subcategories
|
||||
* Get subcategory breakdown
|
||||
*/
|
||||
router.get('/category/:category/subcategories', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const category = decodeURIComponent(req.params.category);
|
||||
const result = await categoryService.getSubcategoryBreakdown(category);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Subcategory error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch subcategories' });
|
||||
}
|
||||
});
|
||||
|
||||
// ============================================================
|
||||
// STORE CHANGE TRACKING
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* GET /api/analytics/store/:id/summary
|
||||
* Get change summary for a store
|
||||
*/
|
||||
router.get('/store/:id/summary', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const storeId = parseInt(req.params.id);
|
||||
const result = await storeService.getStoreChangeSummary(storeId);
|
||||
|
||||
if (!result) {
|
||||
return res.status(404).json({ error: 'Store not found' });
|
||||
}
|
||||
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Store summary error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch store summary' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/analytics/store/:id/events
|
||||
* Get recent change events for a store
|
||||
*/
|
||||
router.get('/store/:id/events', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const storeId = parseInt(req.params.id);
|
||||
const filters = {
|
||||
eventType: req.query.type as string | undefined,
|
||||
days: req.query.days ? parseInt(req.query.days as string) : 30,
|
||||
limit: req.query.limit ? parseInt(req.query.limit as string) : 100,
|
||||
};
|
||||
|
||||
const result = await storeService.getStoreChangeEvents(storeId, filters);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Store events error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch store events' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/analytics/store/:id/brands/new
|
||||
* Get new brands added to a store
|
||||
*/
|
||||
router.get('/store/:id/brands/new', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const storeId = parseInt(req.params.id);
|
||||
const days = req.query.days ? parseInt(req.query.days as string) : 30;
|
||||
|
||||
const result = await storeService.getNewBrands(storeId, days);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] New brands error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch new brands' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/analytics/store/:id/brands/lost
|
||||
* Get brands lost from a store
|
||||
*/
|
||||
router.get('/store/:id/brands/lost', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const storeId = parseInt(req.params.id);
|
||||
const days = req.query.days ? parseInt(req.query.days as string) : 30;
|
||||
|
||||
const result = await storeService.getLostBrands(storeId, days);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Lost brands error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch lost brands' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/analytics/store/:id/products/changes
|
||||
* Get product changes for a store
|
||||
*/
|
||||
router.get('/store/:id/products/changes', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const storeId = parseInt(req.params.id);
|
||||
const changeType = req.query.type as 'added' | 'discontinued' | 'price_drop' | 'price_increase' | 'restocked' | 'out_of_stock' | undefined;
|
||||
const days = req.query.days ? parseInt(req.query.days as string) : 7;
|
||||
|
||||
const result = await storeService.getProductChanges(storeId, changeType, days);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Product changes error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch product changes' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/analytics/store/leaderboard/:category
|
||||
* Get category leaderboard across stores
|
||||
*/
|
||||
router.get('/store/leaderboard/:category', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const category = decodeURIComponent(req.params.category);
|
||||
const limit = req.query.limit ? parseInt(req.query.limit as string) : 20;
|
||||
|
||||
const result = await storeService.getCategoryLeaderboard(category, limit);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Leaderboard error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch leaderboard' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/analytics/store/most-active
|
||||
* Get most active stores (by changes)
|
||||
*/
|
||||
router.get('/store/most-active', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const days = req.query.days ? parseInt(req.query.days as string) : 7;
|
||||
const limit = req.query.limit ? parseInt(req.query.limit as string) : 10;
|
||||
|
||||
const result = await storeService.getMostActiveStores(days, limit);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Most active error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch active stores' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/analytics/store/compare
|
||||
* Compare two stores
|
||||
*/
|
||||
router.get('/store/compare', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const store1 = parseInt(req.query.store1 as string);
|
||||
const store2 = parseInt(req.query.store2 as string);
|
||||
|
||||
if (!store1 || !store2) {
|
||||
return res.status(400).json({ error: 'Both store1 and store2 are required' });
|
||||
}
|
||||
|
||||
const result = await storeService.compareStores(store1, store2);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Compare stores error:', error);
|
||||
res.status(500).json({ error: 'Failed to compare stores' });
|
||||
}
|
||||
});
|
||||
|
||||
// ============================================================
|
||||
// BRAND OPPORTUNITY / RISK
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* GET /api/analytics/brand/:name/opportunity
|
||||
* Get full opportunity analysis for a brand
|
||||
*/
|
||||
router.get('/brand/:name/opportunity', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const brandName = decodeURIComponent(req.params.name);
|
||||
const result = await brandOpportunityService.getBrandOpportunity(brandName);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Brand opportunity error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch brand opportunity' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/analytics/brand/:name/position
|
||||
* Get market position summary for a brand
|
||||
*/
|
||||
router.get('/brand/:name/position', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const brandName = decodeURIComponent(req.params.name);
|
||||
const result = await brandOpportunityService.getMarketPositionSummary(brandName);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Brand position error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch brand position' });
|
||||
}
|
||||
});
|
||||
|
||||
// ============================================================
|
||||
// ALERTS
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* GET /api/analytics/alerts
|
||||
* Get analytics alerts
|
||||
*/
|
||||
router.get('/alerts', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const filters = {
|
||||
brandName: req.query.brand as string | undefined,
|
||||
storeId: req.query.storeId ? parseInt(req.query.storeId as string) : undefined,
|
||||
alertType: req.query.type as string | undefined,
|
||||
unreadOnly: req.query.unreadOnly === 'true',
|
||||
limit: req.query.limit ? parseInt(req.query.limit as string) : 50,
|
||||
};
|
||||
|
||||
const result = await brandOpportunityService.getAlerts(filters);
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Alerts error:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch alerts' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* POST /api/analytics/alerts/mark-read
|
||||
* Mark alerts as read
|
||||
*/
|
||||
router.post('/alerts/mark-read', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const { alertIds } = req.body;
|
||||
|
||||
if (!Array.isArray(alertIds)) {
|
||||
return res.status(400).json({ error: 'alertIds must be an array' });
|
||||
}
|
||||
|
||||
await brandOpportunityService.markAlertsRead(alertIds);
|
||||
res.json({ success: true });
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Mark read error:', error);
|
||||
res.status(500).json({ error: 'Failed to mark alerts as read' });
|
||||
}
|
||||
});
|
||||
|
||||
// ============================================================
|
||||
// CACHE MANAGEMENT
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* GET /api/analytics/cache/stats
|
||||
* Get cache statistics
|
||||
*/
|
||||
router.get('/cache/stats', async (_req: Request, res: Response) => {
|
||||
try {
|
||||
const stats = await cache.getStats();
|
||||
res.json(stats);
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Cache stats error:', error);
|
||||
res.status(500).json({ error: 'Failed to get cache stats' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* POST /api/analytics/cache/clear
|
||||
* Clear cache (admin only)
|
||||
*/
|
||||
router.post('/cache/clear', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const pattern = req.query.pattern as string | undefined;
|
||||
|
||||
if (pattern) {
|
||||
const cleared = await cache.invalidatePattern(pattern);
|
||||
res.json({ success: true, clearedCount: cleared });
|
||||
} else {
|
||||
await cache.cleanExpired();
|
||||
res.json({ success: true, message: 'Expired entries cleaned' });
|
||||
}
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Cache clear error:', error);
|
||||
res.status(500).json({ error: 'Failed to clear cache' });
|
||||
}
|
||||
});
|
||||
|
||||
// ============================================================
|
||||
// SNAPSHOT CAPTURE (for cron/scheduled jobs)
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* POST /api/analytics/snapshots/capture
|
||||
* Capture daily snapshots (run by scheduler)
|
||||
*/
|
||||
router.post('/snapshots/capture', async (_req: Request, res: Response) => {
|
||||
try {
|
||||
const [brandResult, categoryResult] = await Promise.all([
|
||||
pool.query('SELECT capture_brand_snapshots() as count'),
|
||||
pool.query('SELECT capture_category_snapshots() as count'),
|
||||
]);
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
brandSnapshots: parseInt(brandResult.rows[0]?.count || '0'),
|
||||
categorySnapshots: parseInt(categoryResult.rows[0]?.count || '0'),
|
||||
});
|
||||
} catch (error) {
|
||||
console.error('[Analytics] Snapshot capture error:', error);
|
||||
res.status(500).json({ error: 'Failed to capture snapshots' });
|
||||
}
|
||||
});
|
||||
|
||||
return router;
|
||||
}
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,486 +0,0 @@
|
||||
#!/usr/bin/env npx tsx
|
||||
/**
|
||||
* Crawler Reliability Stress Test
|
||||
*
|
||||
* Simulates various failure scenarios to test:
|
||||
* - Retry logic with exponential backoff
|
||||
* - Error taxonomy classification
|
||||
* - Self-healing (proxy/UA rotation)
|
||||
* - Status transitions (active -> degraded -> failed)
|
||||
* - Minimum crawl gap enforcement
|
||||
*
|
||||
* Phase 1: Crawler Reliability & Stabilization
|
||||
*
|
||||
* Usage:
|
||||
* DATABASE_URL="postgresql://..." npx tsx src/dutchie-az/scripts/stress-test.ts [test-name]
|
||||
*
|
||||
* Available tests:
|
||||
* retry - Test retry manager with various error types
|
||||
* backoff - Test exponential backoff calculation
|
||||
* status - Test status transitions
|
||||
* gap - Test minimum crawl gap enforcement
|
||||
* rotation - Test proxy/UA rotation
|
||||
* all - Run all tests
|
||||
*/
|
||||
|
||||
import {
|
||||
CrawlErrorCode,
|
||||
classifyError,
|
||||
isRetryable,
|
||||
shouldRotateProxy,
|
||||
shouldRotateUserAgent,
|
||||
getBackoffMultiplier,
|
||||
getErrorMetadata,
|
||||
} from '../services/error-taxonomy';
|
||||
|
||||
import {
|
||||
RetryManager,
|
||||
withRetry,
|
||||
calculateNextCrawlDelay,
|
||||
calculateNextCrawlAt,
|
||||
determineCrawlStatus,
|
||||
shouldAttemptRecovery,
|
||||
sleep,
|
||||
} from '../services/retry-manager';
|
||||
|
||||
import {
|
||||
UserAgentRotator,
|
||||
USER_AGENTS,
|
||||
} from '../services/proxy-rotator';
|
||||
|
||||
import {
|
||||
validateStoreConfig,
|
||||
isCrawlable,
|
||||
DEFAULT_CONFIG,
|
||||
RawStoreConfig,
|
||||
} from '../services/store-validator';
|
||||
|
||||
// ============================================================
|
||||
// TEST UTILITIES
|
||||
// ============================================================
|
||||
|
||||
let testsPassed = 0;
|
||||
let testsFailed = 0;
|
||||
|
||||
function assert(condition: boolean, message: string): void {
|
||||
if (condition) {
|
||||
console.log(` ✓ ${message}`);
|
||||
testsPassed++;
|
||||
} else {
|
||||
console.log(` ✗ ${message}`);
|
||||
testsFailed++;
|
||||
}
|
||||
}
|
||||
|
||||
function section(name: string): void {
|
||||
console.log(`\n${'='.repeat(60)}`);
|
||||
console.log(`TEST: ${name}`);
|
||||
console.log('='.repeat(60));
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// TEST: Error Classification
|
||||
// ============================================================
|
||||
|
||||
function testErrorClassification(): void {
|
||||
section('Error Classification');
|
||||
|
||||
// HTTP status codes
|
||||
assert(classifyError(null, 429) === CrawlErrorCode.RATE_LIMITED, '429 -> RATE_LIMITED');
|
||||
assert(classifyError(null, 407) === CrawlErrorCode.BLOCKED_PROXY, '407 -> BLOCKED_PROXY');
|
||||
assert(classifyError(null, 401) === CrawlErrorCode.AUTH_FAILED, '401 -> AUTH_FAILED');
|
||||
assert(classifyError(null, 403) === CrawlErrorCode.AUTH_FAILED, '403 -> AUTH_FAILED');
|
||||
assert(classifyError(null, 503) === CrawlErrorCode.SERVICE_UNAVAILABLE, '503 -> SERVICE_UNAVAILABLE');
|
||||
assert(classifyError(null, 500) === CrawlErrorCode.SERVER_ERROR, '500 -> SERVER_ERROR');
|
||||
|
||||
// Error messages
|
||||
assert(classifyError('rate limit exceeded') === CrawlErrorCode.RATE_LIMITED, 'rate limit message -> RATE_LIMITED');
|
||||
assert(classifyError('request timed out') === CrawlErrorCode.TIMEOUT, 'timeout message -> TIMEOUT');
|
||||
assert(classifyError('proxy blocked') === CrawlErrorCode.BLOCKED_PROXY, 'proxy blocked -> BLOCKED_PROXY');
|
||||
assert(classifyError('ECONNREFUSED') === CrawlErrorCode.NETWORK_ERROR, 'ECONNREFUSED -> NETWORK_ERROR');
|
||||
assert(classifyError('ENOTFOUND') === CrawlErrorCode.DNS_ERROR, 'ENOTFOUND -> DNS_ERROR');
|
||||
assert(classifyError('selector not found') === CrawlErrorCode.HTML_CHANGED, 'selector error -> HTML_CHANGED');
|
||||
assert(classifyError('JSON parse error') === CrawlErrorCode.PARSE_ERROR, 'parse error -> PARSE_ERROR');
|
||||
assert(classifyError('0 products found') === CrawlErrorCode.NO_PRODUCTS, 'no products -> NO_PRODUCTS');
|
||||
|
||||
// Retryability
|
||||
assert(isRetryable(CrawlErrorCode.RATE_LIMITED) === true, 'RATE_LIMITED is retryable');
|
||||
assert(isRetryable(CrawlErrorCode.TIMEOUT) === true, 'TIMEOUT is retryable');
|
||||
assert(isRetryable(CrawlErrorCode.HTML_CHANGED) === false, 'HTML_CHANGED is NOT retryable');
|
||||
assert(isRetryable(CrawlErrorCode.INVALID_CONFIG) === false, 'INVALID_CONFIG is NOT retryable');
|
||||
|
||||
// Rotation decisions
|
||||
assert(shouldRotateProxy(CrawlErrorCode.BLOCKED_PROXY) === true, 'BLOCKED_PROXY -> rotate proxy');
|
||||
assert(shouldRotateProxy(CrawlErrorCode.RATE_LIMITED) === true, 'RATE_LIMITED -> rotate proxy');
|
||||
assert(shouldRotateUserAgent(CrawlErrorCode.AUTH_FAILED) === true, 'AUTH_FAILED -> rotate UA');
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// TEST: Retry Manager
|
||||
// ============================================================
|
||||
|
||||
function testRetryManager(): void {
|
||||
section('Retry Manager');
|
||||
|
||||
const manager = new RetryManager({ maxRetries: 3, baseBackoffMs: 100 });
|
||||
|
||||
// Initial state
|
||||
assert(manager.shouldAttempt() === true, 'Should attempt initially');
|
||||
assert(manager.getAttemptNumber() === 1, 'Attempt number starts at 1');
|
||||
|
||||
// First attempt
|
||||
manager.recordAttempt();
|
||||
assert(manager.getAttemptNumber() === 2, 'Attempt number increments');
|
||||
|
||||
// Evaluate retryable error
|
||||
const decision1 = manager.evaluateError(new Error('rate limit exceeded'), 429);
|
||||
assert(decision1.shouldRetry === true, 'Should retry on rate limit');
|
||||
assert(decision1.errorCode === CrawlErrorCode.RATE_LIMITED, 'Error code is RATE_LIMITED');
|
||||
assert(decision1.rotateProxy === true, 'Should rotate proxy');
|
||||
assert(decision1.backoffMs > 0, 'Backoff is positive');
|
||||
|
||||
// More attempts
|
||||
manager.recordAttempt();
|
||||
manager.recordAttempt();
|
||||
|
||||
// Now at max retries
|
||||
const decision2 = manager.evaluateError(new Error('timeout'), 504);
|
||||
assert(decision2.shouldRetry === true, 'Should still retry (at limit but not exceeded)');
|
||||
|
||||
manager.recordAttempt();
|
||||
const decision3 = manager.evaluateError(new Error('timeout'));
|
||||
assert(decision3.shouldRetry === false, 'Should NOT retry after max');
|
||||
assert(decision3.reason.includes('exhausted'), 'Reason mentions exhausted');
|
||||
|
||||
// Reset
|
||||
manager.reset();
|
||||
assert(manager.shouldAttempt() === true, 'Should attempt after reset');
|
||||
assert(manager.getAttemptNumber() === 1, 'Attempt number resets');
|
||||
|
||||
// Non-retryable error
|
||||
const manager2 = new RetryManager({ maxRetries: 3 });
|
||||
manager2.recordAttempt();
|
||||
const nonRetryable = manager2.evaluateError(new Error('HTML structure changed'));
|
||||
assert(nonRetryable.shouldRetry === false, 'Non-retryable error stops immediately');
|
||||
assert(nonRetryable.errorCode === CrawlErrorCode.HTML_CHANGED, 'Error code is HTML_CHANGED');
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// TEST: Exponential Backoff
|
||||
// ============================================================
|
||||
|
||||
function testExponentialBackoff(): void {
|
||||
section('Exponential Backoff');
|
||||
|
||||
// Calculate next crawl delay
|
||||
const delay0 = calculateNextCrawlDelay(0, 240); // No failures
|
||||
const delay1 = calculateNextCrawlDelay(1, 240); // 1 failure
|
||||
const delay2 = calculateNextCrawlDelay(2, 240); // 2 failures
|
||||
const delay3 = calculateNextCrawlDelay(3, 240); // 3 failures
|
||||
const delay5 = calculateNextCrawlDelay(5, 240); // 5 failures (should cap)
|
||||
|
||||
console.log(` Delay with 0 failures: ${delay0} minutes`);
|
||||
console.log(` Delay with 1 failure: ${delay1} minutes`);
|
||||
console.log(` Delay with 2 failures: ${delay2} minutes`);
|
||||
console.log(` Delay with 3 failures: ${delay3} minutes`);
|
||||
console.log(` Delay with 5 failures: ${delay5} minutes`);
|
||||
|
||||
assert(delay1 > delay0, 'Delay increases with failures');
|
||||
assert(delay2 > delay1, 'Delay keeps increasing');
|
||||
assert(delay3 > delay2, 'More delay with more failures');
|
||||
// With jitter, exact values vary but ratio should be close to 2x
|
||||
assert(delay5 <= 240 * 4 * 1.2, 'Delay is capped at max multiplier');
|
||||
|
||||
// Next crawl time calculation
|
||||
const now = new Date();
|
||||
const nextAt = calculateNextCrawlAt(2, 240);
|
||||
assert(nextAt > now, 'Next crawl is in future');
|
||||
assert(nextAt.getTime() - now.getTime() > 240 * 60 * 1000, 'Includes backoff');
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// TEST: Status Transitions
|
||||
// ============================================================
|
||||
|
||||
function testStatusTransitions(): void {
|
||||
section('Status Transitions');
|
||||
|
||||
// Active status
|
||||
assert(determineCrawlStatus(0) === 'active', '0 failures -> active');
|
||||
assert(determineCrawlStatus(1) === 'active', '1 failure -> active');
|
||||
assert(determineCrawlStatus(2) === 'active', '2 failures -> active');
|
||||
|
||||
// Degraded status
|
||||
assert(determineCrawlStatus(3) === 'degraded', '3 failures -> degraded');
|
||||
assert(determineCrawlStatus(5) === 'degraded', '5 failures -> degraded');
|
||||
assert(determineCrawlStatus(9) === 'degraded', '9 failures -> degraded');
|
||||
|
||||
// Failed status
|
||||
assert(determineCrawlStatus(10) === 'failed', '10 failures -> failed');
|
||||
assert(determineCrawlStatus(15) === 'failed', '15 failures -> failed');
|
||||
|
||||
// Custom thresholds
|
||||
const customStatus = determineCrawlStatus(5, { degraded: 5, failed: 8 });
|
||||
assert(customStatus === 'degraded', 'Custom threshold: 5 -> degraded');
|
||||
|
||||
// Recovery check
|
||||
const recentFailure = new Date(Date.now() - 1 * 60 * 60 * 1000); // 1 hour ago
|
||||
const oldFailure = new Date(Date.now() - 48 * 60 * 60 * 1000); // 48 hours ago
|
||||
|
||||
assert(shouldAttemptRecovery(recentFailure, 1) === false, 'No recovery for recent failure');
|
||||
assert(shouldAttemptRecovery(oldFailure, 1) === true, 'Recovery allowed for old failure');
|
||||
assert(shouldAttemptRecovery(null, 0) === true, 'Recovery allowed if no previous failure');
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// TEST: Store Validation
|
||||
// ============================================================
|
||||
|
||||
function testStoreValidation(): void {
|
||||
section('Store Validation');
|
||||
|
||||
// Valid config
|
||||
const validConfig: RawStoreConfig = {
|
||||
id: 1,
|
||||
name: 'Test Store',
|
||||
platformDispensaryId: '123abc',
|
||||
menuType: 'dutchie',
|
||||
};
|
||||
const validResult = validateStoreConfig(validConfig);
|
||||
assert(validResult.isValid === true, 'Valid config passes');
|
||||
assert(validResult.config !== null, 'Valid config returns config');
|
||||
assert(validResult.config?.slug === 'test-store', 'Slug is generated');
|
||||
|
||||
// Missing required fields
|
||||
const missingId: RawStoreConfig = {
|
||||
id: 0,
|
||||
name: 'Test',
|
||||
platformDispensaryId: '123',
|
||||
menuType: 'dutchie',
|
||||
};
|
||||
const missingIdResult = validateStoreConfig(missingId);
|
||||
assert(missingIdResult.isValid === false, 'Missing ID fails');
|
||||
|
||||
// Missing platform ID
|
||||
const missingPlatform: RawStoreConfig = {
|
||||
id: 1,
|
||||
name: 'Test',
|
||||
menuType: 'dutchie',
|
||||
};
|
||||
const missingPlatformResult = validateStoreConfig(missingPlatform);
|
||||
assert(missingPlatformResult.isValid === false, 'Missing platform ID fails');
|
||||
|
||||
// Unknown menu type
|
||||
const unknownMenu: RawStoreConfig = {
|
||||
id: 1,
|
||||
name: 'Test',
|
||||
platformDispensaryId: '123',
|
||||
menuType: 'unknown',
|
||||
};
|
||||
const unknownMenuResult = validateStoreConfig(unknownMenu);
|
||||
assert(unknownMenuResult.isValid === false, 'Unknown menu type fails');
|
||||
|
||||
// Crawlable check
|
||||
assert(isCrawlable(validConfig) === true, 'Valid config is crawlable');
|
||||
assert(isCrawlable(missingPlatform) === false, 'Missing platform not crawlable');
|
||||
assert(isCrawlable({ ...validConfig, crawlStatus: 'failed' }) === false, 'Failed status not crawlable');
|
||||
assert(isCrawlable({ ...validConfig, crawlStatus: 'paused' }) === false, 'Paused status not crawlable');
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// TEST: User Agent Rotation
|
||||
// ============================================================
|
||||
|
||||
function testUserAgentRotation(): void {
|
||||
section('User Agent Rotation');
|
||||
|
||||
const rotator = new UserAgentRotator();
|
||||
|
||||
const first = rotator.getCurrent();
|
||||
const second = rotator.getNext();
|
||||
const third = rotator.getNext();
|
||||
|
||||
assert(first !== second, 'User agents rotate');
|
||||
assert(second !== third, 'User agents keep rotating');
|
||||
assert(USER_AGENTS.includes(first), 'Returns valid UA');
|
||||
assert(USER_AGENTS.includes(second), 'Returns valid UA');
|
||||
|
||||
// Random UA
|
||||
const random = rotator.getRandom();
|
||||
assert(USER_AGENTS.includes(random), 'Random returns valid UA');
|
||||
|
||||
// Count
|
||||
assert(rotator.getCount() === USER_AGENTS.length, 'Reports correct count');
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// TEST: WithRetry Helper
|
||||
// ============================================================
|
||||
|
||||
async function testWithRetryHelper(): Promise<void> {
|
||||
section('WithRetry Helper');
|
||||
|
||||
// Successful on first try
|
||||
let attempts = 0;
|
||||
const successResult = await withRetry(async () => {
|
||||
attempts++;
|
||||
return 'success';
|
||||
}, { maxRetries: 3 });
|
||||
assert(attempts === 1, 'Succeeds on first try');
|
||||
assert(successResult.result === 'success', 'Returns result');
|
||||
|
||||
// Fails then succeeds
|
||||
let failThenSucceedAttempts = 0;
|
||||
const failThenSuccessResult = await withRetry(async () => {
|
||||
failThenSucceedAttempts++;
|
||||
if (failThenSucceedAttempts < 3) {
|
||||
throw new Error('temporary error');
|
||||
}
|
||||
return 'finally succeeded';
|
||||
}, { maxRetries: 5, baseBackoffMs: 10 });
|
||||
assert(failThenSucceedAttempts === 3, 'Retries until success');
|
||||
assert(failThenSuccessResult.result === 'finally succeeded', 'Returns final result');
|
||||
assert(failThenSuccessResult.summary.attemptsMade === 3, 'Summary tracks attempts');
|
||||
|
||||
// Exhausts retries
|
||||
let alwaysFailAttempts = 0;
|
||||
try {
|
||||
await withRetry(async () => {
|
||||
alwaysFailAttempts++;
|
||||
throw new Error('always fails');
|
||||
}, { maxRetries: 2, baseBackoffMs: 10 });
|
||||
assert(false, 'Should have thrown');
|
||||
} catch (error: any) {
|
||||
assert(alwaysFailAttempts === 3, 'Attempts all retries'); // 1 initial + 2 retries
|
||||
assert(error.name === 'RetryExhaustedError', 'Throws RetryExhaustedError');
|
||||
}
|
||||
|
||||
// Non-retryable error stops immediately
|
||||
let nonRetryableAttempts = 0;
|
||||
try {
|
||||
await withRetry(async () => {
|
||||
nonRetryableAttempts++;
|
||||
const err = new Error('HTML structure changed - selector not found');
|
||||
throw err;
|
||||
}, { maxRetries: 3, baseBackoffMs: 10 });
|
||||
assert(false, 'Should have thrown');
|
||||
} catch {
|
||||
assert(nonRetryableAttempts === 1, 'Non-retryable stops immediately');
|
||||
}
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// TEST: Minimum Crawl Gap
|
||||
// ============================================================
|
||||
|
||||
function testMinimumCrawlGap(): void {
|
||||
section('Minimum Crawl Gap');
|
||||
|
||||
// Default config
|
||||
assert(DEFAULT_CONFIG.minCrawlGapMinutes === 2, 'Default gap is 2 minutes');
|
||||
assert(DEFAULT_CONFIG.crawlFrequencyMinutes === 240, 'Default frequency is 4 hours');
|
||||
|
||||
// Gap calculation
|
||||
const gapMs = DEFAULT_CONFIG.minCrawlGapMinutes * 60 * 1000;
|
||||
assert(gapMs === 120000, 'Gap is 2 minutes in ms');
|
||||
|
||||
console.log(' Note: Gap enforcement is tested at DB level (trigger) and application level');
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// TEST: Error Metadata
|
||||
// ============================================================
|
||||
|
||||
function testErrorMetadata(): void {
|
||||
section('Error Metadata');
|
||||
|
||||
// RATE_LIMITED
|
||||
const rateLimited = getErrorMetadata(CrawlErrorCode.RATE_LIMITED);
|
||||
assert(rateLimited.retryable === true, 'RATE_LIMITED is retryable');
|
||||
assert(rateLimited.rotateProxy === true, 'RATE_LIMITED rotates proxy');
|
||||
assert(rateLimited.backoffMultiplier === 2.0, 'RATE_LIMITED has 2x backoff');
|
||||
assert(rateLimited.severity === 'medium', 'RATE_LIMITED is medium severity');
|
||||
|
||||
// HTML_CHANGED
|
||||
const htmlChanged = getErrorMetadata(CrawlErrorCode.HTML_CHANGED);
|
||||
assert(htmlChanged.retryable === false, 'HTML_CHANGED is NOT retryable');
|
||||
assert(htmlChanged.severity === 'high', 'HTML_CHANGED is high severity');
|
||||
|
||||
// INVALID_CONFIG
|
||||
const invalidConfig = getErrorMetadata(CrawlErrorCode.INVALID_CONFIG);
|
||||
assert(invalidConfig.retryable === false, 'INVALID_CONFIG is NOT retryable');
|
||||
assert(invalidConfig.severity === 'critical', 'INVALID_CONFIG is critical');
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// MAIN
|
||||
// ============================================================
|
||||
|
||||
async function runTests(testName?: string): Promise<void> {
|
||||
console.log('\n');
|
||||
console.log('╔══════════════════════════════════════════════════════════╗');
|
||||
console.log('║ CRAWLER RELIABILITY STRESS TEST - PHASE 1 ║');
|
||||
console.log('╚══════════════════════════════════════════════════════════╝');
|
||||
|
||||
const allTests = !testName || testName === 'all';
|
||||
|
||||
if (allTests || testName === 'error' || testName === 'classification') {
|
||||
testErrorClassification();
|
||||
}
|
||||
|
||||
if (allTests || testName === 'retry') {
|
||||
testRetryManager();
|
||||
}
|
||||
|
||||
if (allTests || testName === 'backoff') {
|
||||
testExponentialBackoff();
|
||||
}
|
||||
|
||||
if (allTests || testName === 'status') {
|
||||
testStatusTransitions();
|
||||
}
|
||||
|
||||
if (allTests || testName === 'validation' || testName === 'store') {
|
||||
testStoreValidation();
|
||||
}
|
||||
|
||||
if (allTests || testName === 'rotation' || testName === 'ua') {
|
||||
testUserAgentRotation();
|
||||
}
|
||||
|
||||
if (allTests || testName === 'withRetry' || testName === 'helper') {
|
||||
await testWithRetryHelper();
|
||||
}
|
||||
|
||||
if (allTests || testName === 'gap') {
|
||||
testMinimumCrawlGap();
|
||||
}
|
||||
|
||||
if (allTests || testName === 'metadata') {
|
||||
testErrorMetadata();
|
||||
}
|
||||
|
||||
// Summary
|
||||
console.log('\n');
|
||||
console.log('═'.repeat(60));
|
||||
console.log('SUMMARY');
|
||||
console.log('═'.repeat(60));
|
||||
console.log(` Passed: ${testsPassed}`);
|
||||
console.log(` Failed: ${testsFailed}`);
|
||||
console.log(` Total: ${testsPassed + testsFailed}`);
|
||||
|
||||
if (testsFailed > 0) {
|
||||
console.log('\n❌ SOME TESTS FAILED\n');
|
||||
process.exit(1);
|
||||
} else {
|
||||
console.log('\n✅ ALL TESTS PASSED\n');
|
||||
process.exit(0);
|
||||
}
|
||||
}
|
||||
|
||||
// Run tests
|
||||
const testName = process.argv[2];
|
||||
runTests(testName).catch((error) => {
|
||||
console.error('Fatal error:', error);
|
||||
process.exit(1);
|
||||
});
|
||||
@@ -1,659 +0,0 @@
|
||||
/**
|
||||
* Brand Opportunity / Risk Analytics Service
|
||||
*
|
||||
* Provides brand-level opportunity and risk analysis including:
|
||||
* - Under/overpriced vs market
|
||||
* - Missing SKU opportunities
|
||||
* - Stores with declining/growing shelf share
|
||||
* - Competitor intrusion alerts
|
||||
*
|
||||
* Phase 3: Analytics Dashboards
|
||||
*/
|
||||
|
||||
import { Pool } from 'pg';
|
||||
import { AnalyticsCache, cacheKey } from './cache';
|
||||
|
||||
export interface BrandOpportunity {
|
||||
brandName: string;
|
||||
underpricedVsMarket: PricePosition[];
|
||||
overpricedVsMarket: PricePosition[];
|
||||
missingSkuOpportunities: MissingSkuOpportunity[];
|
||||
storesWithDecliningShelfShare: StoreShelfShareChange[];
|
||||
storesWithGrowingShelfShare: StoreShelfShareChange[];
|
||||
competitorIntrusionAlerts: CompetitorAlert[];
|
||||
overallScore: number; // 0-100, higher = more opportunity
|
||||
riskScore: number; // 0-100, higher = more risk
|
||||
}
|
||||
|
||||
export interface PricePosition {
|
||||
category: string;
|
||||
brandAvgPrice: number;
|
||||
marketAvgPrice: number;
|
||||
priceDifferencePercent: number;
|
||||
skuCount: number;
|
||||
suggestion: string;
|
||||
}
|
||||
|
||||
export interface MissingSkuOpportunity {
|
||||
category: string;
|
||||
subcategory: string | null;
|
||||
marketSkuCount: number;
|
||||
brandSkuCount: number;
|
||||
gapPercent: number;
|
||||
topCompetitors: string[];
|
||||
opportunityScore: number; // 0-100
|
||||
}
|
||||
|
||||
export interface StoreShelfShareChange {
|
||||
storeId: number;
|
||||
storeName: string;
|
||||
city: string;
|
||||
state: string;
|
||||
currentShelfShare: number;
|
||||
previousShelfShare: number;
|
||||
changePercent: number;
|
||||
currentSkus: number;
|
||||
competitors: string[];
|
||||
}
|
||||
|
||||
export interface CompetitorAlert {
|
||||
competitorBrand: string;
|
||||
storeId: number;
|
||||
storeName: string;
|
||||
alertType: 'new_entry' | 'expanding' | 'price_undercut';
|
||||
details: string;
|
||||
severity: 'low' | 'medium' | 'high';
|
||||
date: string;
|
||||
}
|
||||
|
||||
export interface MarketPositionSummary {
|
||||
brandName: string;
|
||||
marketSharePercent: number;
|
||||
avgPriceVsMarket: number; // -X% to +X%
|
||||
categoryStrengths: Array<{ category: string; shelfSharePercent: number }>;
|
||||
categoryWeaknesses: Array<{ category: string; shelfSharePercent: number; marketLeader: string }>;
|
||||
growthTrend: 'growing' | 'stable' | 'declining';
|
||||
competitorThreats: string[];
|
||||
}
|
||||
|
||||
export class BrandOpportunityService {
|
||||
private pool: Pool;
|
||||
private cache: AnalyticsCache;
|
||||
|
||||
constructor(pool: Pool, cache: AnalyticsCache) {
|
||||
this.pool = pool;
|
||||
this.cache = cache;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get full opportunity analysis for a brand
|
||||
*/
|
||||
async getBrandOpportunity(brandName: string): Promise<BrandOpportunity> {
|
||||
const key = cacheKey('brand_opportunity', { brandName });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
const [
|
||||
underpriced,
|
||||
overpriced,
|
||||
missingSkus,
|
||||
decliningStores,
|
||||
growingStores,
|
||||
alerts,
|
||||
] = await Promise.all([
|
||||
this.getUnderpricedPositions(brandName),
|
||||
this.getOverpricedPositions(brandName),
|
||||
this.getMissingSkuOpportunities(brandName),
|
||||
this.getStoresWithDecliningShare(brandName),
|
||||
this.getStoresWithGrowingShare(brandName),
|
||||
this.getCompetitorAlerts(brandName),
|
||||
]);
|
||||
|
||||
// Calculate opportunity score (higher = more opportunity)
|
||||
const opportunityFactors = [
|
||||
missingSkus.length > 0 ? 20 : 0,
|
||||
underpriced.length > 0 ? 15 : 0,
|
||||
growingStores.length > 5 ? 20 : growingStores.length * 3,
|
||||
missingSkus.reduce((sum, m) => sum + m.opportunityScore, 0) / Math.max(1, missingSkus.length) * 0.3,
|
||||
];
|
||||
const opportunityScore = Math.min(100, opportunityFactors.reduce((a, b) => a + b, 0));
|
||||
|
||||
// Calculate risk score (higher = more risk)
|
||||
const riskFactors = [
|
||||
decliningStores.length > 5 ? 30 : decliningStores.length * 5,
|
||||
alerts.filter(a => a.severity === 'high').length * 15,
|
||||
alerts.filter(a => a.severity === 'medium').length * 8,
|
||||
overpriced.length > 3 ? 15 : overpriced.length * 3,
|
||||
];
|
||||
const riskScore = Math.min(100, riskFactors.reduce((a, b) => a + b, 0));
|
||||
|
||||
return {
|
||||
brandName,
|
||||
underpricedVsMarket: underpriced,
|
||||
overpricedVsMarket: overpriced,
|
||||
missingSkuOpportunities: missingSkus,
|
||||
storesWithDecliningShelfShare: decliningStores,
|
||||
storesWithGrowingShelfShare: growingStores,
|
||||
competitorIntrusionAlerts: alerts,
|
||||
overallScore: Math.round(opportunityScore),
|
||||
riskScore: Math.round(riskScore),
|
||||
};
|
||||
}, 30)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get categories where brand is underpriced vs market
|
||||
*/
|
||||
async getUnderpricedPositions(brandName: string): Promise<PricePosition[]> {
|
||||
const result = await this.pool.query(`
|
||||
WITH brand_prices AS (
|
||||
SELECT
|
||||
type as category,
|
||||
AVG(extract_min_price(latest_raw_payload)) as brand_avg,
|
||||
COUNT(*) as sku_count
|
||||
FROM dutchie_products
|
||||
WHERE brand_name = $1 AND type IS NOT NULL
|
||||
GROUP BY type
|
||||
HAVING COUNT(*) >= 3
|
||||
),
|
||||
market_prices AS (
|
||||
SELECT
|
||||
type as category,
|
||||
AVG(extract_min_price(latest_raw_payload)) as market_avg
|
||||
FROM dutchie_products
|
||||
WHERE type IS NOT NULL AND brand_name != $1
|
||||
GROUP BY type
|
||||
)
|
||||
SELECT
|
||||
bp.category,
|
||||
bp.brand_avg,
|
||||
mp.market_avg,
|
||||
bp.sku_count,
|
||||
((bp.brand_avg - mp.market_avg) / NULLIF(mp.market_avg, 0)) * 100 as diff_pct
|
||||
FROM brand_prices bp
|
||||
JOIN market_prices mp ON bp.category = mp.category
|
||||
WHERE bp.brand_avg < mp.market_avg * 0.9 -- 10% or more below market
|
||||
AND bp.brand_avg IS NOT NULL
|
||||
AND mp.market_avg IS NOT NULL
|
||||
ORDER BY diff_pct
|
||||
`, [brandName]);
|
||||
|
||||
return result.rows.map(row => ({
|
||||
category: row.category,
|
||||
brandAvgPrice: Math.round(parseFloat(row.brand_avg) * 100) / 100,
|
||||
marketAvgPrice: Math.round(parseFloat(row.market_avg) * 100) / 100,
|
||||
priceDifferencePercent: Math.round(parseFloat(row.diff_pct) * 10) / 10,
|
||||
skuCount: parseInt(row.sku_count) || 0,
|
||||
suggestion: `Consider price increase - ${Math.abs(Math.round(parseFloat(row.diff_pct)))}% below market average`,
|
||||
}));
|
||||
}
|
||||
|
||||
/**
|
||||
* Get categories where brand is overpriced vs market
|
||||
*/
|
||||
async getOverpricedPositions(brandName: string): Promise<PricePosition[]> {
|
||||
const result = await this.pool.query(`
|
||||
WITH brand_prices AS (
|
||||
SELECT
|
||||
type as category,
|
||||
AVG(extract_min_price(latest_raw_payload)) as brand_avg,
|
||||
COUNT(*) as sku_count
|
||||
FROM dutchie_products
|
||||
WHERE brand_name = $1 AND type IS NOT NULL
|
||||
GROUP BY type
|
||||
HAVING COUNT(*) >= 3
|
||||
),
|
||||
market_prices AS (
|
||||
SELECT
|
||||
type as category,
|
||||
AVG(extract_min_price(latest_raw_payload)) as market_avg
|
||||
FROM dutchie_products
|
||||
WHERE type IS NOT NULL AND brand_name != $1
|
||||
GROUP BY type
|
||||
)
|
||||
SELECT
|
||||
bp.category,
|
||||
bp.brand_avg,
|
||||
mp.market_avg,
|
||||
bp.sku_count,
|
||||
((bp.brand_avg - mp.market_avg) / NULLIF(mp.market_avg, 0)) * 100 as diff_pct
|
||||
FROM brand_prices bp
|
||||
JOIN market_prices mp ON bp.category = mp.category
|
||||
WHERE bp.brand_avg > mp.market_avg * 1.15 -- 15% or more above market
|
||||
AND bp.brand_avg IS NOT NULL
|
||||
AND mp.market_avg IS NOT NULL
|
||||
ORDER BY diff_pct DESC
|
||||
`, [brandName]);
|
||||
|
||||
return result.rows.map(row => ({
|
||||
category: row.category,
|
||||
brandAvgPrice: Math.round(parseFloat(row.brand_avg) * 100) / 100,
|
||||
marketAvgPrice: Math.round(parseFloat(row.market_avg) * 100) / 100,
|
||||
priceDifferencePercent: Math.round(parseFloat(row.diff_pct) * 10) / 10,
|
||||
skuCount: parseInt(row.sku_count) || 0,
|
||||
suggestion: `Price sensitivity risk - ${Math.round(parseFloat(row.diff_pct))}% above market average`,
|
||||
}));
|
||||
}
|
||||
|
||||
/**
|
||||
* Get missing SKU opportunities (category gaps)
|
||||
*/
|
||||
async getMissingSkuOpportunities(brandName: string): Promise<MissingSkuOpportunity[]> {
|
||||
const result = await this.pool.query(`
|
||||
WITH market_categories AS (
|
||||
SELECT
|
||||
type as category,
|
||||
subcategory,
|
||||
COUNT(*) as market_skus,
|
||||
ARRAY_AGG(DISTINCT brand_name ORDER BY brand_name) FILTER (WHERE brand_name IS NOT NULL) as top_brands
|
||||
FROM dutchie_products
|
||||
WHERE type IS NOT NULL
|
||||
GROUP BY type, subcategory
|
||||
HAVING COUNT(*) >= 20
|
||||
),
|
||||
brand_presence AS (
|
||||
SELECT
|
||||
type as category,
|
||||
subcategory,
|
||||
COUNT(*) as brand_skus
|
||||
FROM dutchie_products
|
||||
WHERE brand_name = $1 AND type IS NOT NULL
|
||||
GROUP BY type, subcategory
|
||||
)
|
||||
SELECT
|
||||
mc.category,
|
||||
mc.subcategory,
|
||||
mc.market_skus,
|
||||
COALESCE(bp.brand_skus, 0) as brand_skus,
|
||||
mc.top_brands[1:5] as competitors
|
||||
FROM market_categories mc
|
||||
LEFT JOIN brand_presence bp ON mc.category = bp.category
|
||||
AND (mc.subcategory = bp.subcategory OR (mc.subcategory IS NULL AND bp.subcategory IS NULL))
|
||||
WHERE COALESCE(bp.brand_skus, 0) < mc.market_skus * 0.05 -- Brand has <5% of market presence
|
||||
ORDER BY mc.market_skus DESC
|
||||
LIMIT 10
|
||||
`, [brandName]);
|
||||
|
||||
return result.rows.map(row => {
|
||||
const marketSkus = parseInt(row.market_skus) || 0;
|
||||
const brandSkus = parseInt(row.brand_skus) || 0;
|
||||
const gapPercent = marketSkus > 0 ? ((marketSkus - brandSkus) / marketSkus) * 100 : 100;
|
||||
const opportunityScore = Math.min(100, Math.round((marketSkus / 100) * (gapPercent / 100) * 100));
|
||||
|
||||
return {
|
||||
category: row.category,
|
||||
subcategory: row.subcategory,
|
||||
marketSkuCount: marketSkus,
|
||||
brandSkuCount: brandSkus,
|
||||
gapPercent: Math.round(gapPercent),
|
||||
topCompetitors: (row.competitors || []).filter((c: string) => c !== brandName).slice(0, 5),
|
||||
opportunityScore,
|
||||
};
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* Get stores where brand's shelf share is declining
|
||||
*/
|
||||
async getStoresWithDecliningShare(brandName: string): Promise<StoreShelfShareChange[]> {
|
||||
// Use brand_snapshots for historical comparison
|
||||
const result = await this.pool.query(`
|
||||
WITH current_share AS (
|
||||
SELECT
|
||||
dp.dispensary_id as store_id,
|
||||
d.name as store_name,
|
||||
d.city,
|
||||
d.state,
|
||||
COUNT(*) FILTER (WHERE dp.brand_name = $1) as brand_skus,
|
||||
COUNT(*) as total_skus,
|
||||
ARRAY_AGG(DISTINCT dp.brand_name) FILTER (WHERE dp.brand_name != $1 AND dp.brand_name IS NOT NULL) as competitors
|
||||
FROM dutchie_products dp
|
||||
JOIN dispensaries d ON dp.dispensary_id = d.id
|
||||
GROUP BY dp.dispensary_id, d.name, d.city, d.state
|
||||
HAVING COUNT(*) FILTER (WHERE dp.brand_name = $1) > 0
|
||||
)
|
||||
SELECT
|
||||
cs.store_id,
|
||||
cs.store_name,
|
||||
cs.city,
|
||||
cs.state,
|
||||
cs.brand_skus as current_skus,
|
||||
cs.total_skus,
|
||||
ROUND((cs.brand_skus::NUMERIC / cs.total_skus) * 100, 2) as current_share,
|
||||
cs.competitors[1:5] as top_competitors
|
||||
FROM current_share cs
|
||||
WHERE cs.brand_skus < 10 -- Low presence
|
||||
ORDER BY cs.brand_skus
|
||||
LIMIT 10
|
||||
`, [brandName]);
|
||||
|
||||
return result.rows.map(row => ({
|
||||
storeId: row.store_id,
|
||||
storeName: row.store_name,
|
||||
city: row.city,
|
||||
state: row.state,
|
||||
currentShelfShare: parseFloat(row.current_share) || 0,
|
||||
previousShelfShare: parseFloat(row.current_share) || 0, // Would need historical data
|
||||
changePercent: 0,
|
||||
currentSkus: parseInt(row.current_skus) || 0,
|
||||
competitors: row.top_competitors || [],
|
||||
}));
|
||||
}
|
||||
|
||||
/**
|
||||
* Get stores where brand's shelf share is growing
|
||||
*/
|
||||
async getStoresWithGrowingShare(brandName: string): Promise<StoreShelfShareChange[]> {
|
||||
const result = await this.pool.query(`
|
||||
WITH store_share AS (
|
||||
SELECT
|
||||
dp.dispensary_id as store_id,
|
||||
d.name as store_name,
|
||||
d.city,
|
||||
d.state,
|
||||
COUNT(*) FILTER (WHERE dp.brand_name = $1) as brand_skus,
|
||||
COUNT(*) as total_skus,
|
||||
ARRAY_AGG(DISTINCT dp.brand_name) FILTER (WHERE dp.brand_name != $1 AND dp.brand_name IS NOT NULL) as competitors
|
||||
FROM dutchie_products dp
|
||||
JOIN dispensaries d ON dp.dispensary_id = d.id
|
||||
GROUP BY dp.dispensary_id, d.name, d.city, d.state
|
||||
HAVING COUNT(*) FILTER (WHERE dp.brand_name = $1) > 0
|
||||
)
|
||||
SELECT
|
||||
ss.store_id,
|
||||
ss.store_name,
|
||||
ss.city,
|
||||
ss.state,
|
||||
ss.brand_skus as current_skus,
|
||||
ss.total_skus,
|
||||
ROUND((ss.brand_skus::NUMERIC / ss.total_skus) * 100, 2) as current_share,
|
||||
ss.competitors[1:5] as top_competitors
|
||||
FROM store_share ss
|
||||
ORDER BY current_share DESC
|
||||
LIMIT 10
|
||||
`, [brandName]);
|
||||
|
||||
return result.rows.map(row => ({
|
||||
storeId: row.store_id,
|
||||
storeName: row.store_name,
|
||||
city: row.city,
|
||||
state: row.state,
|
||||
currentShelfShare: parseFloat(row.current_share) || 0,
|
||||
previousShelfShare: parseFloat(row.current_share) || 0,
|
||||
changePercent: 0,
|
||||
currentSkus: parseInt(row.current_skus) || 0,
|
||||
competitors: row.top_competitors || [],
|
||||
}));
|
||||
}
|
||||
|
||||
/**
|
||||
* Get competitor intrusion alerts
|
||||
*/
|
||||
async getCompetitorAlerts(brandName: string): Promise<CompetitorAlert[]> {
|
||||
// Check for competitor entries in stores where this brand has presence
|
||||
const result = await this.pool.query(`
|
||||
WITH brand_stores AS (
|
||||
SELECT DISTINCT dispensary_id
|
||||
FROM dutchie_products
|
||||
WHERE brand_name = $1
|
||||
),
|
||||
competitor_presence AS (
|
||||
SELECT
|
||||
dp.brand_name as competitor,
|
||||
dp.dispensary_id as store_id,
|
||||
d.name as store_name,
|
||||
COUNT(*) as sku_count,
|
||||
MAX(dp.created_at) as latest_add
|
||||
FROM dutchie_products dp
|
||||
JOIN dispensaries d ON dp.dispensary_id = d.id
|
||||
WHERE dp.dispensary_id IN (SELECT dispensary_id FROM brand_stores)
|
||||
AND dp.brand_name != $1
|
||||
AND dp.brand_name IS NOT NULL
|
||||
AND dp.created_at >= NOW() - INTERVAL '30 days'
|
||||
GROUP BY dp.brand_name, dp.dispensary_id, d.name
|
||||
HAVING COUNT(*) >= 5
|
||||
)
|
||||
SELECT
|
||||
competitor,
|
||||
store_id,
|
||||
store_name,
|
||||
sku_count,
|
||||
latest_add
|
||||
FROM competitor_presence
|
||||
ORDER BY sku_count DESC
|
||||
LIMIT 10
|
||||
`, [brandName]);
|
||||
|
||||
return result.rows.map(row => {
|
||||
const skuCount = parseInt(row.sku_count) || 0;
|
||||
let severity: 'low' | 'medium' | 'high' = 'low';
|
||||
if (skuCount >= 20) severity = 'high';
|
||||
else if (skuCount >= 10) severity = 'medium';
|
||||
|
||||
return {
|
||||
competitorBrand: row.competitor,
|
||||
storeId: row.store_id,
|
||||
storeName: row.store_name,
|
||||
alertType: 'expanding' as const,
|
||||
details: `${row.competitor} has ${skuCount} SKUs in ${row.store_name}`,
|
||||
severity,
|
||||
date: new Date(row.latest_add).toISOString().split('T')[0],
|
||||
};
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* Get market position summary for a brand
|
||||
*/
|
||||
async getMarketPositionSummary(brandName: string): Promise<MarketPositionSummary> {
|
||||
const key = cacheKey('market_position', { brandName });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
const [shareResult, priceResult, categoryResult, threatResult] = await Promise.all([
|
||||
// Market share
|
||||
this.pool.query(`
|
||||
SELECT
|
||||
(SELECT COUNT(*) FROM dutchie_products WHERE brand_name = $1) as brand_count,
|
||||
(SELECT COUNT(*) FROM dutchie_products) as total_count
|
||||
`, [brandName]),
|
||||
|
||||
// Price vs market
|
||||
this.pool.query(`
|
||||
SELECT
|
||||
(SELECT AVG(extract_min_price(latest_raw_payload)) FROM dutchie_products WHERE brand_name = $1) as brand_avg,
|
||||
(SELECT AVG(extract_min_price(latest_raw_payload)) FROM dutchie_products WHERE brand_name != $1) as market_avg
|
||||
`, [brandName]),
|
||||
|
||||
// Category strengths/weaknesses
|
||||
this.pool.query(`
|
||||
WITH brand_by_cat AS (
|
||||
SELECT type as category, COUNT(*) as brand_count
|
||||
FROM dutchie_products
|
||||
WHERE brand_name = $1 AND type IS NOT NULL
|
||||
GROUP BY type
|
||||
),
|
||||
market_by_cat AS (
|
||||
SELECT type as category, COUNT(*) as total_count
|
||||
FROM dutchie_products WHERE type IS NOT NULL
|
||||
GROUP BY type
|
||||
),
|
||||
leaders AS (
|
||||
SELECT type as category, brand_name, COUNT(*) as cnt,
|
||||
RANK() OVER (PARTITION BY type ORDER BY COUNT(*) DESC) as rnk
|
||||
FROM dutchie_products WHERE type IS NOT NULL AND brand_name IS NOT NULL
|
||||
GROUP BY type, brand_name
|
||||
)
|
||||
SELECT
|
||||
mc.category,
|
||||
COALESCE(bc.brand_count, 0) as brand_count,
|
||||
mc.total_count,
|
||||
ROUND((COALESCE(bc.brand_count, 0)::NUMERIC / mc.total_count) * 100, 2) as share_pct,
|
||||
(SELECT brand_name FROM leaders WHERE category = mc.category AND rnk = 1) as leader
|
||||
FROM market_by_cat mc
|
||||
LEFT JOIN brand_by_cat bc ON mc.category = bc.category
|
||||
ORDER BY share_pct DESC
|
||||
`, [brandName]),
|
||||
|
||||
// Top competitors
|
||||
this.pool.query(`
|
||||
SELECT brand_name, COUNT(*) as cnt
|
||||
FROM dutchie_products
|
||||
WHERE brand_name IS NOT NULL AND brand_name != $1
|
||||
GROUP BY brand_name
|
||||
ORDER BY cnt DESC
|
||||
LIMIT 5
|
||||
`, [brandName]),
|
||||
]);
|
||||
|
||||
const brandCount = parseInt(shareResult.rows[0]?.brand_count) || 0;
|
||||
const totalCount = parseInt(shareResult.rows[0]?.total_count) || 1;
|
||||
const marketSharePercent = Math.round((brandCount / totalCount) * 1000) / 10;
|
||||
|
||||
const brandAvg = parseFloat(priceResult.rows[0]?.brand_avg) || 0;
|
||||
const marketAvg = parseFloat(priceResult.rows[0]?.market_avg) || 1;
|
||||
const avgPriceVsMarket = Math.round(((brandAvg - marketAvg) / marketAvg) * 1000) / 10;
|
||||
|
||||
const categories = categoryResult.rows;
|
||||
const strengths = categories
|
||||
.filter(c => parseFloat(c.share_pct) > 5)
|
||||
.map(c => ({ category: c.category, shelfSharePercent: parseFloat(c.share_pct) }));
|
||||
|
||||
const weaknesses = categories
|
||||
.filter(c => parseFloat(c.share_pct) < 2 && c.leader !== brandName)
|
||||
.map(c => ({
|
||||
category: c.category,
|
||||
shelfSharePercent: parseFloat(c.share_pct),
|
||||
marketLeader: c.leader || 'Unknown',
|
||||
}));
|
||||
|
||||
return {
|
||||
brandName,
|
||||
marketSharePercent,
|
||||
avgPriceVsMarket,
|
||||
categoryStrengths: strengths.slice(0, 5),
|
||||
categoryWeaknesses: weaknesses.slice(0, 5),
|
||||
growthTrend: 'stable' as const, // Would need historical data
|
||||
competitorThreats: threatResult.rows.map(r => r.brand_name),
|
||||
};
|
||||
}, 30)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Create an analytics alert
|
||||
*/
|
||||
async createAlert(alert: {
|
||||
alertType: string;
|
||||
severity: 'info' | 'warning' | 'critical';
|
||||
title: string;
|
||||
description?: string;
|
||||
storeId?: number;
|
||||
brandName?: string;
|
||||
productId?: number;
|
||||
category?: string;
|
||||
metadata?: Record<string, unknown>;
|
||||
}): Promise<void> {
|
||||
await this.pool.query(`
|
||||
INSERT INTO analytics_alerts
|
||||
(alert_type, severity, title, description, store_id, brand_name, product_id, category, metadata)
|
||||
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)
|
||||
`, [
|
||||
alert.alertType,
|
||||
alert.severity,
|
||||
alert.title,
|
||||
alert.description || null,
|
||||
alert.storeId || null,
|
||||
alert.brandName || null,
|
||||
alert.productId || null,
|
||||
alert.category || null,
|
||||
alert.metadata ? JSON.stringify(alert.metadata) : null,
|
||||
]);
|
||||
}
|
||||
|
||||
/**
|
||||
* Get recent alerts
|
||||
*/
|
||||
async getAlerts(filters: {
|
||||
brandName?: string;
|
||||
storeId?: number;
|
||||
alertType?: string;
|
||||
unreadOnly?: boolean;
|
||||
limit?: number;
|
||||
} = {}): Promise<Array<{
|
||||
id: number;
|
||||
alertType: string;
|
||||
severity: string;
|
||||
title: string;
|
||||
description: string | null;
|
||||
storeName: string | null;
|
||||
brandName: string | null;
|
||||
createdAt: string;
|
||||
isRead: boolean;
|
||||
}>> {
|
||||
const { brandName, storeId, alertType, unreadOnly = false, limit = 50 } = filters;
|
||||
const params: (string | number | boolean)[] = [limit];
|
||||
const conditions: string[] = [];
|
||||
let paramIndex = 2;
|
||||
|
||||
if (brandName) {
|
||||
conditions.push(`a.brand_name = $${paramIndex++}`);
|
||||
params.push(brandName);
|
||||
}
|
||||
if (storeId) {
|
||||
conditions.push(`a.store_id = $${paramIndex++}`);
|
||||
params.push(storeId);
|
||||
}
|
||||
if (alertType) {
|
||||
conditions.push(`a.alert_type = $${paramIndex++}`);
|
||||
params.push(alertType);
|
||||
}
|
||||
if (unreadOnly) {
|
||||
conditions.push('a.is_read = false');
|
||||
}
|
||||
|
||||
const whereClause = conditions.length > 0
|
||||
? 'WHERE ' + conditions.join(' AND ')
|
||||
: '';
|
||||
|
||||
const result = await this.pool.query(`
|
||||
SELECT
|
||||
a.id,
|
||||
a.alert_type,
|
||||
a.severity,
|
||||
a.title,
|
||||
a.description,
|
||||
d.name as store_name,
|
||||
a.brand_name,
|
||||
a.created_at,
|
||||
a.is_read
|
||||
FROM analytics_alerts a
|
||||
LEFT JOIN dispensaries d ON a.store_id = d.id
|
||||
${whereClause}
|
||||
ORDER BY a.created_at DESC
|
||||
LIMIT $1
|
||||
`, params);
|
||||
|
||||
return result.rows.map(row => ({
|
||||
id: row.id,
|
||||
alertType: row.alert_type,
|
||||
severity: row.severity,
|
||||
title: row.title,
|
||||
description: row.description,
|
||||
storeName: row.store_name,
|
||||
brandName: row.brand_name,
|
||||
createdAt: row.created_at.toISOString(),
|
||||
isRead: row.is_read,
|
||||
}));
|
||||
}
|
||||
|
||||
/**
|
||||
* Mark alerts as read
|
||||
*/
|
||||
async markAlertsRead(alertIds: number[]): Promise<void> {
|
||||
if (alertIds.length === 0) return;
|
||||
|
||||
await this.pool.query(`
|
||||
UPDATE analytics_alerts
|
||||
SET is_read = true
|
||||
WHERE id = ANY($1)
|
||||
`, [alertIds]);
|
||||
}
|
||||
}
|
||||
@@ -1,227 +0,0 @@
|
||||
/**
|
||||
* Analytics Cache Service
|
||||
*
|
||||
* Provides caching layer for expensive analytics queries.
|
||||
* Uses PostgreSQL for persistence with configurable TTLs.
|
||||
*
|
||||
* Phase 3: Analytics Dashboards
|
||||
*/
|
||||
|
||||
import { Pool } from 'pg';
|
||||
|
||||
export interface CacheEntry<T = unknown> {
|
||||
key: string;
|
||||
data: T;
|
||||
computedAt: Date;
|
||||
expiresAt: Date;
|
||||
queryTimeMs?: number;
|
||||
}
|
||||
|
||||
export interface CacheConfig {
|
||||
defaultTtlMinutes: number;
|
||||
}
|
||||
|
||||
const DEFAULT_CONFIG: CacheConfig = {
|
||||
defaultTtlMinutes: 15,
|
||||
};
|
||||
|
||||
export class AnalyticsCache {
|
||||
private pool: Pool;
|
||||
private config: CacheConfig;
|
||||
private memoryCache: Map<string, CacheEntry> = new Map();
|
||||
|
||||
constructor(pool: Pool, config: Partial<CacheConfig> = {}) {
|
||||
this.pool = pool;
|
||||
this.config = { ...DEFAULT_CONFIG, ...config };
|
||||
}
|
||||
|
||||
/**
|
||||
* Get cached data or compute and cache it
|
||||
*/
|
||||
async getOrCompute<T>(
|
||||
key: string,
|
||||
computeFn: () => Promise<T>,
|
||||
ttlMinutes?: number
|
||||
): Promise<{ data: T; fromCache: boolean; queryTimeMs: number }> {
|
||||
const ttl = ttlMinutes ?? this.config.defaultTtlMinutes;
|
||||
|
||||
// Check memory cache first
|
||||
const memEntry = this.memoryCache.get(key);
|
||||
if (memEntry && new Date() < memEntry.expiresAt) {
|
||||
return { data: memEntry.data as T, fromCache: true, queryTimeMs: memEntry.queryTimeMs || 0 };
|
||||
}
|
||||
|
||||
// Check database cache
|
||||
const dbEntry = await this.getFromDb<T>(key);
|
||||
if (dbEntry && new Date() < dbEntry.expiresAt) {
|
||||
this.memoryCache.set(key, dbEntry);
|
||||
return { data: dbEntry.data, fromCache: true, queryTimeMs: dbEntry.queryTimeMs || 0 };
|
||||
}
|
||||
|
||||
// Compute fresh data
|
||||
const startTime = Date.now();
|
||||
const data = await computeFn();
|
||||
const queryTimeMs = Date.now() - startTime;
|
||||
|
||||
// Cache result
|
||||
const entry: CacheEntry<T> = {
|
||||
key,
|
||||
data,
|
||||
computedAt: new Date(),
|
||||
expiresAt: new Date(Date.now() + ttl * 60 * 1000),
|
||||
queryTimeMs,
|
||||
};
|
||||
|
||||
await this.saveToDb(entry);
|
||||
this.memoryCache.set(key, entry);
|
||||
|
||||
return { data, fromCache: false, queryTimeMs };
|
||||
}
|
||||
|
||||
/**
|
||||
* Get from database cache
|
||||
*/
|
||||
private async getFromDb<T>(key: string): Promise<CacheEntry<T> | null> {
|
||||
try {
|
||||
const result = await this.pool.query(`
|
||||
SELECT cache_data, computed_at, expires_at, query_time_ms
|
||||
FROM analytics_cache
|
||||
WHERE cache_key = $1
|
||||
AND expires_at > NOW()
|
||||
`, [key]);
|
||||
|
||||
if (result.rows.length === 0) return null;
|
||||
|
||||
const row = result.rows[0];
|
||||
return {
|
||||
key,
|
||||
data: row.cache_data as T,
|
||||
computedAt: row.computed_at,
|
||||
expiresAt: row.expires_at,
|
||||
queryTimeMs: row.query_time_ms,
|
||||
};
|
||||
} catch (error) {
|
||||
console.warn(`[AnalyticsCache] Failed to get from DB: ${error}`);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Save to database cache
|
||||
*/
|
||||
private async saveToDb<T>(entry: CacheEntry<T>): Promise<void> {
|
||||
try {
|
||||
await this.pool.query(`
|
||||
INSERT INTO analytics_cache (cache_key, cache_data, computed_at, expires_at, query_time_ms)
|
||||
VALUES ($1, $2, $3, $4, $5)
|
||||
ON CONFLICT (cache_key)
|
||||
DO UPDATE SET
|
||||
cache_data = EXCLUDED.cache_data,
|
||||
computed_at = EXCLUDED.computed_at,
|
||||
expires_at = EXCLUDED.expires_at,
|
||||
query_time_ms = EXCLUDED.query_time_ms
|
||||
`, [entry.key, JSON.stringify(entry.data), entry.computedAt, entry.expiresAt, entry.queryTimeMs]);
|
||||
} catch (error) {
|
||||
console.warn(`[AnalyticsCache] Failed to save to DB: ${error}`);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Invalidate a cache entry
|
||||
*/
|
||||
async invalidate(key: string): Promise<void> {
|
||||
this.memoryCache.delete(key);
|
||||
try {
|
||||
await this.pool.query('DELETE FROM analytics_cache WHERE cache_key = $1', [key]);
|
||||
} catch (error) {
|
||||
console.warn(`[AnalyticsCache] Failed to invalidate: ${error}`);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Invalidate all entries matching a pattern
|
||||
*/
|
||||
async invalidatePattern(pattern: string): Promise<number> {
|
||||
// Clear memory cache
|
||||
for (const key of this.memoryCache.keys()) {
|
||||
if (key.includes(pattern)) {
|
||||
this.memoryCache.delete(key);
|
||||
}
|
||||
}
|
||||
|
||||
try {
|
||||
const result = await this.pool.query(
|
||||
'DELETE FROM analytics_cache WHERE cache_key LIKE $1',
|
||||
[`%${pattern}%`]
|
||||
);
|
||||
return result.rowCount || 0;
|
||||
} catch (error) {
|
||||
console.warn(`[AnalyticsCache] Failed to invalidate pattern: ${error}`);
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Clean expired entries
|
||||
*/
|
||||
async cleanExpired(): Promise<number> {
|
||||
// Clean memory cache
|
||||
const now = new Date();
|
||||
for (const [key, entry] of this.memoryCache.entries()) {
|
||||
if (now >= entry.expiresAt) {
|
||||
this.memoryCache.delete(key);
|
||||
}
|
||||
}
|
||||
|
||||
try {
|
||||
const result = await this.pool.query('DELETE FROM analytics_cache WHERE expires_at < NOW()');
|
||||
return result.rowCount || 0;
|
||||
} catch (error) {
|
||||
console.warn(`[AnalyticsCache] Failed to clean expired: ${error}`);
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Get cache statistics
|
||||
*/
|
||||
async getStats(): Promise<{
|
||||
memoryCacheSize: number;
|
||||
dbCacheSize: number;
|
||||
expiredCount: number;
|
||||
}> {
|
||||
try {
|
||||
const result = await this.pool.query(`
|
||||
SELECT
|
||||
COUNT(*) FILTER (WHERE expires_at > NOW()) as active,
|
||||
COUNT(*) FILTER (WHERE expires_at <= NOW()) as expired
|
||||
FROM analytics_cache
|
||||
`);
|
||||
|
||||
return {
|
||||
memoryCacheSize: this.memoryCache.size,
|
||||
dbCacheSize: parseInt(result.rows[0]?.active || '0'),
|
||||
expiredCount: parseInt(result.rows[0]?.expired || '0'),
|
||||
};
|
||||
} catch (error) {
|
||||
return {
|
||||
memoryCacheSize: this.memoryCache.size,
|
||||
dbCacheSize: 0,
|
||||
expiredCount: 0,
|
||||
};
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Generate cache key with parameters
|
||||
*/
|
||||
export function cacheKey(prefix: string, params: Record<string, unknown> = {}): string {
|
||||
const sortedParams = Object.keys(params)
|
||||
.sort()
|
||||
.filter(k => params[k] !== undefined && params[k] !== null)
|
||||
.map(k => `${k}=${params[k]}`)
|
||||
.join('&');
|
||||
|
||||
return sortedParams ? `${prefix}:${sortedParams}` : prefix;
|
||||
}
|
||||
@@ -1,530 +0,0 @@
|
||||
/**
|
||||
* Category Growth Analytics Service
|
||||
*
|
||||
* Provides category-level analytics including:
|
||||
* - SKU count growth
|
||||
* - Price growth trends
|
||||
* - New product additions
|
||||
* - Category shrinkage
|
||||
* - Seasonality patterns
|
||||
*
|
||||
* Phase 3: Analytics Dashboards
|
||||
*/
|
||||
|
||||
import { Pool } from 'pg';
|
||||
import { AnalyticsCache, cacheKey } from './cache';
|
||||
|
||||
export interface CategoryGrowth {
|
||||
category: string;
|
||||
currentSkuCount: number;
|
||||
previousSkuCount: number;
|
||||
skuGrowthPercent: number;
|
||||
currentBrandCount: number;
|
||||
previousBrandCount: number;
|
||||
brandGrowthPercent: number;
|
||||
currentAvgPrice: number | null;
|
||||
previousAvgPrice: number | null;
|
||||
priceChangePercent: number | null;
|
||||
newProducts: number;
|
||||
discontinuedProducts: number;
|
||||
trend: 'growing' | 'declining' | 'stable';
|
||||
}
|
||||
|
||||
export interface CategorySummary {
|
||||
category: string;
|
||||
totalSkus: number;
|
||||
brandCount: number;
|
||||
storeCount: number;
|
||||
avgPrice: number | null;
|
||||
minPrice: number | null;
|
||||
maxPrice: number | null;
|
||||
inStockSkus: number;
|
||||
outOfStockSkus: number;
|
||||
stockHealthPercent: number;
|
||||
}
|
||||
|
||||
export interface CategoryGrowthTrend {
|
||||
category: string;
|
||||
dataPoints: Array<{
|
||||
date: string;
|
||||
skuCount: number;
|
||||
brandCount: number;
|
||||
avgPrice: number | null;
|
||||
storeCount: number;
|
||||
}>;
|
||||
growth7d: number | null;
|
||||
growth30d: number | null;
|
||||
growth90d: number | null;
|
||||
}
|
||||
|
||||
export interface CategoryHeatmapData {
|
||||
categories: string[];
|
||||
periods: string[];
|
||||
data: Array<{
|
||||
category: string;
|
||||
period: string;
|
||||
value: number; // SKU count, growth %, or price
|
||||
changeFromPrevious: number | null;
|
||||
}>;
|
||||
}
|
||||
|
||||
export interface SeasonalityPattern {
|
||||
category: string;
|
||||
monthlyPattern: Array<{
|
||||
month: number;
|
||||
monthName: string;
|
||||
avgSkuCount: number;
|
||||
avgPrice: number | null;
|
||||
seasonalityIndex: number; // 100 = average, >100 = above, <100 = below
|
||||
}>;
|
||||
peakMonth: number;
|
||||
troughMonth: number;
|
||||
}
|
||||
|
||||
export interface CategoryFilters {
|
||||
state?: string;
|
||||
storeId?: number;
|
||||
minSkus?: number;
|
||||
}
|
||||
|
||||
export class CategoryAnalyticsService {
|
||||
private pool: Pool;
|
||||
private cache: AnalyticsCache;
|
||||
|
||||
constructor(pool: Pool, cache: AnalyticsCache) {
|
||||
this.pool = pool;
|
||||
this.cache = cache;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get current category summary
|
||||
*/
|
||||
async getCategorySummary(
|
||||
category?: string,
|
||||
filters: CategoryFilters = {}
|
||||
): Promise<CategorySummary[]> {
|
||||
const { state, storeId } = filters;
|
||||
const key = cacheKey('category_summary', { category, state, storeId });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
const params: (string | number)[] = [];
|
||||
const conditions: string[] = [];
|
||||
let paramIndex = 1;
|
||||
|
||||
if (category) {
|
||||
conditions.push(`dp.type = $${paramIndex++}`);
|
||||
params.push(category);
|
||||
}
|
||||
if (state) {
|
||||
conditions.push(`d.state = $${paramIndex++}`);
|
||||
params.push(state);
|
||||
}
|
||||
if (storeId) {
|
||||
conditions.push(`dp.dispensary_id = $${paramIndex++}`);
|
||||
params.push(storeId);
|
||||
}
|
||||
|
||||
const whereClause = conditions.length > 0
|
||||
? 'WHERE dp.type IS NOT NULL AND ' + conditions.join(' AND ')
|
||||
: 'WHERE dp.type IS NOT NULL';
|
||||
|
||||
const result = await this.pool.query(`
|
||||
SELECT
|
||||
dp.type as category,
|
||||
COUNT(*) as total_skus,
|
||||
COUNT(DISTINCT dp.brand_name) as brand_count,
|
||||
COUNT(DISTINCT dp.dispensary_id) as store_count,
|
||||
AVG(extract_min_price(dp.latest_raw_payload)) as avg_price,
|
||||
MIN(extract_min_price(dp.latest_raw_payload)) as min_price,
|
||||
MAX(extract_max_price(dp.latest_raw_payload)) as max_price,
|
||||
SUM(CASE WHEN dp.stock_status = 'in_stock' THEN 1 ELSE 0 END) as in_stock,
|
||||
SUM(CASE WHEN dp.stock_status != 'in_stock' OR dp.stock_status IS NULL THEN 1 ELSE 0 END) as out_of_stock
|
||||
FROM dutchie_products dp
|
||||
JOIN dispensaries d ON dp.dispensary_id = d.id
|
||||
${whereClause}
|
||||
GROUP BY dp.type
|
||||
ORDER BY total_skus DESC
|
||||
`, params);
|
||||
|
||||
return result.rows.map(row => {
|
||||
const totalSkus = parseInt(row.total_skus) || 0;
|
||||
const inStock = parseInt(row.in_stock) || 0;
|
||||
|
||||
return {
|
||||
category: row.category,
|
||||
totalSkus,
|
||||
brandCount: parseInt(row.brand_count) || 0,
|
||||
storeCount: parseInt(row.store_count) || 0,
|
||||
avgPrice: row.avg_price ? Math.round(parseFloat(row.avg_price) * 100) / 100 : null,
|
||||
minPrice: row.min_price ? Math.round(parseFloat(row.min_price) * 100) / 100 : null,
|
||||
maxPrice: row.max_price ? Math.round(parseFloat(row.max_price) * 100) / 100 : null,
|
||||
inStockSkus: inStock,
|
||||
outOfStockSkus: parseInt(row.out_of_stock) || 0,
|
||||
stockHealthPercent: totalSkus > 0
|
||||
? Math.round((inStock / totalSkus) * 100)
|
||||
: 0,
|
||||
};
|
||||
});
|
||||
}, 15)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get category growth (comparing periods)
|
||||
*/
|
||||
async getCategoryGrowth(
|
||||
days: number = 7,
|
||||
filters: CategoryFilters = {}
|
||||
): Promise<CategoryGrowth[]> {
|
||||
const { state, storeId, minSkus = 10 } = filters;
|
||||
const key = cacheKey('category_growth', { days, state, storeId, minSkus });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
// Use category_snapshots for historical comparison
|
||||
const result = await this.pool.query(`
|
||||
WITH current_data AS (
|
||||
SELECT
|
||||
category,
|
||||
total_skus,
|
||||
brand_count,
|
||||
avg_price,
|
||||
store_count
|
||||
FROM category_snapshots
|
||||
WHERE snapshot_date = (SELECT MAX(snapshot_date) FROM category_snapshots)
|
||||
),
|
||||
previous_data AS (
|
||||
SELECT
|
||||
category,
|
||||
total_skus,
|
||||
brand_count,
|
||||
avg_price,
|
||||
store_count
|
||||
FROM category_snapshots
|
||||
WHERE snapshot_date = (
|
||||
SELECT MAX(snapshot_date)
|
||||
FROM category_snapshots
|
||||
WHERE snapshot_date < (SELECT MAX(snapshot_date) FROM category_snapshots) - ($1 || ' days')::INTERVAL
|
||||
)
|
||||
)
|
||||
SELECT
|
||||
c.category,
|
||||
c.total_skus as current_skus,
|
||||
COALESCE(p.total_skus, c.total_skus) as previous_skus,
|
||||
c.brand_count as current_brands,
|
||||
COALESCE(p.brand_count, c.brand_count) as previous_brands,
|
||||
c.avg_price as current_price,
|
||||
p.avg_price as previous_price
|
||||
FROM current_data c
|
||||
LEFT JOIN previous_data p ON c.category = p.category
|
||||
WHERE c.total_skus >= $2
|
||||
ORDER BY c.total_skus DESC
|
||||
`, [days, minSkus]);
|
||||
|
||||
// If no snapshots exist, use current data
|
||||
if (result.rows.length === 0) {
|
||||
const fallbackResult = await this.pool.query(`
|
||||
SELECT
|
||||
type as category,
|
||||
COUNT(*) as total_skus,
|
||||
COUNT(DISTINCT brand_name) as brand_count,
|
||||
AVG(extract_min_price(latest_raw_payload)) as avg_price
|
||||
FROM dutchie_products
|
||||
WHERE type IS NOT NULL
|
||||
GROUP BY type
|
||||
HAVING COUNT(*) >= $1
|
||||
ORDER BY total_skus DESC
|
||||
`, [minSkus]);
|
||||
|
||||
return fallbackResult.rows.map(row => ({
|
||||
category: row.category,
|
||||
currentSkuCount: parseInt(row.total_skus) || 0,
|
||||
previousSkuCount: parseInt(row.total_skus) || 0,
|
||||
skuGrowthPercent: 0,
|
||||
currentBrandCount: parseInt(row.brand_count) || 0,
|
||||
previousBrandCount: parseInt(row.brand_count) || 0,
|
||||
brandGrowthPercent: 0,
|
||||
currentAvgPrice: row.avg_price ? Math.round(parseFloat(row.avg_price) * 100) / 100 : null,
|
||||
previousAvgPrice: row.avg_price ? Math.round(parseFloat(row.avg_price) * 100) / 100 : null,
|
||||
priceChangePercent: null,
|
||||
newProducts: 0,
|
||||
discontinuedProducts: 0,
|
||||
trend: 'stable' as const,
|
||||
}));
|
||||
}
|
||||
|
||||
return result.rows.map(row => {
|
||||
const currentSkus = parseInt(row.current_skus) || 0;
|
||||
const previousSkus = parseInt(row.previous_skus) || currentSkus;
|
||||
const currentBrands = parseInt(row.current_brands) || 0;
|
||||
const previousBrands = parseInt(row.previous_brands) || currentBrands;
|
||||
const currentPrice = row.current_price ? parseFloat(row.current_price) : null;
|
||||
const previousPrice = row.previous_price ? parseFloat(row.previous_price) : null;
|
||||
|
||||
const skuGrowth = previousSkus > 0
|
||||
? ((currentSkus - previousSkus) / previousSkus) * 100
|
||||
: 0;
|
||||
const brandGrowth = previousBrands > 0
|
||||
? ((currentBrands - previousBrands) / previousBrands) * 100
|
||||
: 0;
|
||||
const priceChange = previousPrice && currentPrice
|
||||
? ((currentPrice - previousPrice) / previousPrice) * 100
|
||||
: null;
|
||||
|
||||
let trend: 'growing' | 'declining' | 'stable' = 'stable';
|
||||
if (skuGrowth > 5) trend = 'growing';
|
||||
else if (skuGrowth < -5) trend = 'declining';
|
||||
|
||||
return {
|
||||
category: row.category,
|
||||
currentSkuCount: currentSkus,
|
||||
previousSkuCount: previousSkus,
|
||||
skuGrowthPercent: Math.round(skuGrowth * 10) / 10,
|
||||
currentBrandCount: currentBrands,
|
||||
previousBrandCount: previousBrands,
|
||||
brandGrowthPercent: Math.round(brandGrowth * 10) / 10,
|
||||
currentAvgPrice: currentPrice ? Math.round(currentPrice * 100) / 100 : null,
|
||||
previousAvgPrice: previousPrice ? Math.round(previousPrice * 100) / 100 : null,
|
||||
priceChangePercent: priceChange !== null ? Math.round(priceChange * 10) / 10 : null,
|
||||
newProducts: Math.max(0, currentSkus - previousSkus),
|
||||
discontinuedProducts: Math.max(0, previousSkus - currentSkus),
|
||||
trend,
|
||||
};
|
||||
});
|
||||
}, 15)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get category growth trend over time
|
||||
*/
|
||||
async getCategoryGrowthTrend(
|
||||
category: string,
|
||||
days: number = 90
|
||||
): Promise<CategoryGrowthTrend> {
|
||||
const key = cacheKey('category_growth_trend', { category, days });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
const result = await this.pool.query(`
|
||||
SELECT
|
||||
snapshot_date as date,
|
||||
total_skus as sku_count,
|
||||
brand_count,
|
||||
avg_price,
|
||||
store_count
|
||||
FROM category_snapshots
|
||||
WHERE category = $1
|
||||
AND snapshot_date >= CURRENT_DATE - ($2 || ' days')::INTERVAL
|
||||
ORDER BY snapshot_date
|
||||
`, [category, days]);
|
||||
|
||||
const dataPoints = result.rows.map(row => ({
|
||||
date: row.date.toISOString().split('T')[0],
|
||||
skuCount: parseInt(row.sku_count) || 0,
|
||||
brandCount: parseInt(row.brand_count) || 0,
|
||||
avgPrice: row.avg_price ? Math.round(parseFloat(row.avg_price) * 100) / 100 : null,
|
||||
storeCount: parseInt(row.store_count) || 0,
|
||||
}));
|
||||
|
||||
// Calculate growth rates
|
||||
const calculateGrowth = (daysBack: number): number | null => {
|
||||
if (dataPoints.length < 2) return null;
|
||||
const targetDate = new Date();
|
||||
targetDate.setDate(targetDate.getDate() - daysBack);
|
||||
const targetDateStr = targetDate.toISOString().split('T')[0];
|
||||
|
||||
const recent = dataPoints[dataPoints.length - 1];
|
||||
const older = dataPoints.find(d => d.date <= targetDateStr) || dataPoints[0];
|
||||
|
||||
if (older.skuCount === 0) return null;
|
||||
return Math.round(((recent.skuCount - older.skuCount) / older.skuCount) * 1000) / 10;
|
||||
};
|
||||
|
||||
return {
|
||||
category,
|
||||
dataPoints,
|
||||
growth7d: calculateGrowth(7),
|
||||
growth30d: calculateGrowth(30),
|
||||
growth90d: calculateGrowth(90),
|
||||
};
|
||||
}, 15)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get category heatmap data
|
||||
*/
|
||||
async getCategoryHeatmap(
|
||||
metric: 'skus' | 'growth' | 'price' = 'skus',
|
||||
periods: number = 12 // weeks
|
||||
): Promise<CategoryHeatmapData> {
|
||||
const key = cacheKey('category_heatmap', { metric, periods });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
const result = await this.pool.query(`
|
||||
SELECT
|
||||
category,
|
||||
snapshot_date,
|
||||
total_skus,
|
||||
avg_price
|
||||
FROM category_snapshots
|
||||
WHERE snapshot_date >= CURRENT_DATE - ($1 * 7 || ' days')::INTERVAL
|
||||
ORDER BY category, snapshot_date
|
||||
`, [periods]);
|
||||
|
||||
// Get unique categories and generate weekly periods
|
||||
const categoriesSet = new Set<string>();
|
||||
const periodsSet = new Set<string>();
|
||||
|
||||
result.rows.forEach(row => {
|
||||
categoriesSet.add(row.category);
|
||||
// Group by week
|
||||
const date = new Date(row.snapshot_date);
|
||||
const weekStart = new Date(date);
|
||||
weekStart.setDate(date.getDate() - date.getDay());
|
||||
periodsSet.add(weekStart.toISOString().split('T')[0]);
|
||||
});
|
||||
|
||||
const categories = Array.from(categoriesSet).sort();
|
||||
const periodsList = Array.from(periodsSet).sort();
|
||||
|
||||
// Aggregate data by category and week
|
||||
const dataMap = new Map<string, Map<string, { skus: number; price: number | null }>>();
|
||||
|
||||
result.rows.forEach(row => {
|
||||
const date = new Date(row.snapshot_date);
|
||||
const weekStart = new Date(date);
|
||||
weekStart.setDate(date.getDate() - date.getDay());
|
||||
const period = weekStart.toISOString().split('T')[0];
|
||||
|
||||
if (!dataMap.has(row.category)) {
|
||||
dataMap.set(row.category, new Map());
|
||||
}
|
||||
const categoryData = dataMap.get(row.category)!;
|
||||
|
||||
if (!categoryData.has(period)) {
|
||||
categoryData.set(period, { skus: 0, price: null });
|
||||
}
|
||||
const existing = categoryData.get(period)!;
|
||||
existing.skus = Math.max(existing.skus, parseInt(row.total_skus) || 0);
|
||||
if (row.avg_price) {
|
||||
existing.price = parseFloat(row.avg_price);
|
||||
}
|
||||
});
|
||||
|
||||
// Build heatmap data
|
||||
const data: CategoryHeatmapData['data'] = [];
|
||||
|
||||
categories.forEach(category => {
|
||||
let previousValue: number | null = null;
|
||||
|
||||
periodsList.forEach(period => {
|
||||
const categoryData = dataMap.get(category)?.get(period);
|
||||
let value = 0;
|
||||
|
||||
if (categoryData) {
|
||||
switch (metric) {
|
||||
case 'skus':
|
||||
value = categoryData.skus;
|
||||
break;
|
||||
case 'price':
|
||||
value = categoryData.price || 0;
|
||||
break;
|
||||
case 'growth':
|
||||
value = previousValue !== null && previousValue > 0
|
||||
? ((categoryData.skus - previousValue) / previousValue) * 100
|
||||
: 0;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
const changeFromPrevious = previousValue !== null && previousValue > 0
|
||||
? ((value - previousValue) / previousValue) * 100
|
||||
: null;
|
||||
|
||||
data.push({
|
||||
category,
|
||||
period,
|
||||
value: Math.round(value * 100) / 100,
|
||||
changeFromPrevious: changeFromPrevious !== null
|
||||
? Math.round(changeFromPrevious * 10) / 10
|
||||
: null,
|
||||
});
|
||||
|
||||
if (metric !== 'growth') {
|
||||
previousValue = value;
|
||||
} else if (categoryData) {
|
||||
previousValue = categoryData.skus;
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
return {
|
||||
categories,
|
||||
periods: periodsList,
|
||||
data,
|
||||
};
|
||||
}, 30)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get top growing/declining categories
|
||||
*/
|
||||
async getTopMovers(
|
||||
limit: number = 5,
|
||||
days: number = 30
|
||||
): Promise<{
|
||||
growing: CategoryGrowth[];
|
||||
declining: CategoryGrowth[];
|
||||
}> {
|
||||
const key = cacheKey('top_movers', { limit, days });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
const allGrowth = await this.getCategoryGrowth(days);
|
||||
|
||||
const sorted = [...allGrowth].sort((a, b) => b.skuGrowthPercent - a.skuGrowthPercent);
|
||||
|
||||
return {
|
||||
growing: sorted.filter(c => c.skuGrowthPercent > 0).slice(0, limit),
|
||||
declining: sorted.filter(c => c.skuGrowthPercent < 0).slice(-limit).reverse(),
|
||||
};
|
||||
}, 15)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get category subcategory breakdown
|
||||
*/
|
||||
async getSubcategoryBreakdown(category: string): Promise<Array<{
|
||||
subcategory: string;
|
||||
skuCount: number;
|
||||
brandCount: number;
|
||||
avgPrice: number | null;
|
||||
percentOfCategory: number;
|
||||
}>> {
|
||||
const key = cacheKey('subcategory_breakdown', { category });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
const result = await this.pool.query(`
|
||||
WITH category_total AS (
|
||||
SELECT COUNT(*) as total FROM dutchie_products WHERE type = $1
|
||||
)
|
||||
SELECT
|
||||
COALESCE(dp.subcategory, 'Other') as subcategory,
|
||||
COUNT(*) as sku_count,
|
||||
COUNT(DISTINCT dp.brand_name) as brand_count,
|
||||
AVG(extract_min_price(dp.latest_raw_payload)) as avg_price,
|
||||
ct.total as category_total
|
||||
FROM dutchie_products dp, category_total ct
|
||||
WHERE dp.type = $1
|
||||
GROUP BY dp.subcategory, ct.total
|
||||
ORDER BY sku_count DESC
|
||||
`, [category]);
|
||||
|
||||
return result.rows.map(row => ({
|
||||
subcategory: row.subcategory,
|
||||
skuCount: parseInt(row.sku_count) || 0,
|
||||
brandCount: parseInt(row.brand_count) || 0,
|
||||
avgPrice: row.avg_price ? Math.round(parseFloat(row.avg_price) * 100) / 100 : null,
|
||||
percentOfCategory: parseInt(row.category_total) > 0
|
||||
? Math.round((parseInt(row.sku_count) / parseInt(row.category_total)) * 1000) / 10
|
||||
: 0,
|
||||
}));
|
||||
}, 15)).data;
|
||||
}
|
||||
}
|
||||
@@ -1,57 +0,0 @@
|
||||
/**
|
||||
* Analytics Module Index
|
||||
*
|
||||
* Exports all analytics services for CannaiQ dashboards.
|
||||
*
|
||||
* Phase 3: Analytics Dashboards
|
||||
*/
|
||||
|
||||
export { AnalyticsCache, cacheKey, type CacheEntry, type CacheConfig } from './cache';
|
||||
|
||||
export {
|
||||
PriceTrendService,
|
||||
type PricePoint,
|
||||
type PriceTrend,
|
||||
type PriceSummary,
|
||||
type PriceCompressionResult,
|
||||
type PriceFilters,
|
||||
} from './price-trends';
|
||||
|
||||
export {
|
||||
PenetrationService,
|
||||
type BrandPenetration,
|
||||
type PenetrationTrend,
|
||||
type ShelfShare,
|
||||
type BrandPresenceByState,
|
||||
type PenetrationFilters,
|
||||
} from './penetration';
|
||||
|
||||
export {
|
||||
CategoryAnalyticsService,
|
||||
type CategoryGrowth,
|
||||
type CategorySummary,
|
||||
type CategoryGrowthTrend,
|
||||
type CategoryHeatmapData,
|
||||
type SeasonalityPattern,
|
||||
type CategoryFilters,
|
||||
} from './category-analytics';
|
||||
|
||||
export {
|
||||
StoreChangeService,
|
||||
type StoreChangeSummary,
|
||||
type StoreChangeEvent,
|
||||
type BrandChange,
|
||||
type ProductChange,
|
||||
type CategoryLeaderboard,
|
||||
type StoreFilters,
|
||||
} from './store-changes';
|
||||
|
||||
export {
|
||||
BrandOpportunityService,
|
||||
type BrandOpportunity,
|
||||
type PricePosition,
|
||||
type MissingSkuOpportunity,
|
||||
type StoreShelfShareChange,
|
||||
type CompetitorAlert,
|
||||
type MarketPositionSummary,
|
||||
} from './brand-opportunity';
|
||||
@@ -1,556 +0,0 @@
|
||||
/**
|
||||
* Brand Penetration Analytics Service
|
||||
*
|
||||
* Provides analytics for brand market penetration including:
|
||||
* - Stores carrying brand
|
||||
* - SKU counts per brand
|
||||
* - Percentage of stores carrying
|
||||
* - Shelf share calculations
|
||||
* - Penetration trends and momentum
|
||||
*
|
||||
* Phase 3: Analytics Dashboards
|
||||
*/
|
||||
|
||||
import { Pool } from 'pg';
|
||||
import { AnalyticsCache, cacheKey } from './cache';
|
||||
|
||||
export interface BrandPenetration {
|
||||
brandName: string;
|
||||
brandId: string | null;
|
||||
totalStores: number;
|
||||
storesCarrying: number;
|
||||
penetrationPercent: number;
|
||||
totalSkus: number;
|
||||
avgSkusPerStore: number;
|
||||
shelfSharePercent: number;
|
||||
categories: string[];
|
||||
avgPrice: number | null;
|
||||
inStockSkus: number;
|
||||
}
|
||||
|
||||
export interface PenetrationTrend {
|
||||
brandName: string;
|
||||
dataPoints: Array<{
|
||||
date: string;
|
||||
storeCount: number;
|
||||
skuCount: number;
|
||||
penetrationPercent: number;
|
||||
}>;
|
||||
momentumScore: number; // -100 to +100
|
||||
riskScore: number; // 0 to 100, higher = more risk
|
||||
trend: 'growing' | 'declining' | 'stable';
|
||||
}
|
||||
|
||||
export interface ShelfShare {
|
||||
brandName: string;
|
||||
category: string;
|
||||
skuCount: number;
|
||||
categoryTotalSkus: number;
|
||||
shelfSharePercent: number;
|
||||
rank: number;
|
||||
}
|
||||
|
||||
export interface BrandPresenceByState {
|
||||
state: string;
|
||||
storeCount: number;
|
||||
skuCount: number;
|
||||
avgPrice: number | null;
|
||||
}
|
||||
|
||||
export interface PenetrationFilters {
|
||||
state?: string;
|
||||
category?: string;
|
||||
minStores?: number;
|
||||
minSkus?: number;
|
||||
}
|
||||
|
||||
export class PenetrationService {
|
||||
private pool: Pool;
|
||||
private cache: AnalyticsCache;
|
||||
|
||||
constructor(pool: Pool, cache: AnalyticsCache) {
|
||||
this.pool = pool;
|
||||
this.cache = cache;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get penetration data for a specific brand
|
||||
*/
|
||||
async getBrandPenetration(
|
||||
brandName: string,
|
||||
filters: PenetrationFilters = {}
|
||||
): Promise<BrandPenetration> {
|
||||
const { state, category } = filters;
|
||||
const key = cacheKey('brand_penetration', { brandName, state, category });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
// Build where clauses
|
||||
const conditions: string[] = [];
|
||||
const params: (string | number)[] = [brandName];
|
||||
let paramIndex = 2;
|
||||
|
||||
if (state) {
|
||||
conditions.push(`d.state = $${paramIndex++}`);
|
||||
params.push(state);
|
||||
}
|
||||
if (category) {
|
||||
conditions.push(`dp.type = $${paramIndex++}`);
|
||||
params.push(category);
|
||||
}
|
||||
|
||||
const stateCondition = state ? `AND d.state = $${params.indexOf(state) + 1}` : '';
|
||||
const categoryCondition = category ? `AND dp.type = $${params.indexOf(category) + 1}` : '';
|
||||
|
||||
const result = await this.pool.query(`
|
||||
WITH total_stores AS (
|
||||
SELECT COUNT(DISTINCT id) as total
|
||||
FROM dispensaries
|
||||
WHERE 1=1 ${state ? `AND state = $2` : ''}
|
||||
),
|
||||
brand_data AS (
|
||||
SELECT
|
||||
dp.brand_name,
|
||||
dp.brand_id,
|
||||
COUNT(DISTINCT dp.dispensary_id) as stores_carrying,
|
||||
COUNT(*) as total_skus,
|
||||
AVG(extract_min_price(dp.latest_raw_payload)) as avg_price,
|
||||
SUM(CASE WHEN dp.stock_status = 'in_stock' THEN 1 ELSE 0 END) as in_stock,
|
||||
ARRAY_AGG(DISTINCT dp.type) FILTER (WHERE dp.type IS NOT NULL) as categories
|
||||
FROM dutchie_products dp
|
||||
JOIN dispensaries d ON dp.dispensary_id = d.id
|
||||
WHERE dp.brand_name = $1
|
||||
${stateCondition}
|
||||
${categoryCondition}
|
||||
GROUP BY dp.brand_name, dp.brand_id
|
||||
),
|
||||
total_skus AS (
|
||||
SELECT COUNT(*) as total
|
||||
FROM dutchie_products dp
|
||||
JOIN dispensaries d ON dp.dispensary_id = d.id
|
||||
WHERE 1=1 ${stateCondition} ${categoryCondition}
|
||||
)
|
||||
SELECT
|
||||
bd.brand_name,
|
||||
bd.brand_id,
|
||||
ts.total as total_stores,
|
||||
bd.stores_carrying,
|
||||
bd.total_skus,
|
||||
bd.avg_price,
|
||||
bd.in_stock,
|
||||
bd.categories,
|
||||
tsk.total as market_total_skus
|
||||
FROM brand_data bd, total_stores ts, total_skus tsk
|
||||
`, params);
|
||||
|
||||
if (result.rows.length === 0) {
|
||||
return {
|
||||
brandName,
|
||||
brandId: null,
|
||||
totalStores: 0,
|
||||
storesCarrying: 0,
|
||||
penetrationPercent: 0,
|
||||
totalSkus: 0,
|
||||
avgSkusPerStore: 0,
|
||||
shelfSharePercent: 0,
|
||||
categories: [],
|
||||
avgPrice: null,
|
||||
inStockSkus: 0,
|
||||
};
|
||||
}
|
||||
|
||||
const row = result.rows[0];
|
||||
const totalStores = parseInt(row.total_stores) || 1;
|
||||
const storesCarrying = parseInt(row.stores_carrying) || 0;
|
||||
const totalSkus = parseInt(row.total_skus) || 0;
|
||||
const marketTotalSkus = parseInt(row.market_total_skus) || 1;
|
||||
|
||||
return {
|
||||
brandName: row.brand_name,
|
||||
brandId: row.brand_id,
|
||||
totalStores,
|
||||
storesCarrying,
|
||||
penetrationPercent: Math.round((storesCarrying / totalStores) * 1000) / 10,
|
||||
totalSkus,
|
||||
avgSkusPerStore: storesCarrying > 0
|
||||
? Math.round((totalSkus / storesCarrying) * 10) / 10
|
||||
: 0,
|
||||
shelfSharePercent: Math.round((totalSkus / marketTotalSkus) * 1000) / 10,
|
||||
categories: row.categories || [],
|
||||
avgPrice: row.avg_price ? Math.round(parseFloat(row.avg_price) * 100) / 100 : null,
|
||||
inStockSkus: parseInt(row.in_stock) || 0,
|
||||
};
|
||||
}, 15)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get top brands by penetration
|
||||
*/
|
||||
async getTopBrandsByPenetration(
|
||||
limit: number = 20,
|
||||
filters: PenetrationFilters = {}
|
||||
): Promise<BrandPenetration[]> {
|
||||
const { state, category, minStores = 2, minSkus = 5 } = filters;
|
||||
const key = cacheKey('top_brands_penetration', { limit, state, category, minStores, minSkus });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
const params: (string | number)[] = [limit, minStores, minSkus];
|
||||
let paramIndex = 4;
|
||||
|
||||
let stateCondition = '';
|
||||
let categoryCondition = '';
|
||||
|
||||
if (state) {
|
||||
stateCondition = `AND d.state = $${paramIndex++}`;
|
||||
params.push(state);
|
||||
}
|
||||
if (category) {
|
||||
categoryCondition = `AND dp.type = $${paramIndex++}`;
|
||||
params.push(category);
|
||||
}
|
||||
|
||||
const result = await this.pool.query(`
|
||||
WITH total_stores AS (
|
||||
SELECT COUNT(DISTINCT id) as total
|
||||
FROM dispensaries
|
||||
WHERE 1=1 ${state ? `AND state = $${params.indexOf(state) + 1}` : ''}
|
||||
),
|
||||
total_skus AS (
|
||||
SELECT COUNT(*) as total
|
||||
FROM dutchie_products dp
|
||||
JOIN dispensaries d ON dp.dispensary_id = d.id
|
||||
WHERE 1=1 ${stateCondition} ${categoryCondition}
|
||||
),
|
||||
brand_data AS (
|
||||
SELECT
|
||||
dp.brand_name,
|
||||
dp.brand_id,
|
||||
COUNT(DISTINCT dp.dispensary_id) as stores_carrying,
|
||||
COUNT(*) as total_skus,
|
||||
AVG(extract_min_price(dp.latest_raw_payload)) as avg_price,
|
||||
SUM(CASE WHEN dp.stock_status = 'in_stock' THEN 1 ELSE 0 END) as in_stock,
|
||||
ARRAY_AGG(DISTINCT dp.type) FILTER (WHERE dp.type IS NOT NULL) as categories
|
||||
FROM dutchie_products dp
|
||||
JOIN dispensaries d ON dp.dispensary_id = d.id
|
||||
WHERE dp.brand_name IS NOT NULL
|
||||
${stateCondition}
|
||||
${categoryCondition}
|
||||
GROUP BY dp.brand_name, dp.brand_id
|
||||
HAVING COUNT(DISTINCT dp.dispensary_id) >= $2
|
||||
AND COUNT(*) >= $3
|
||||
)
|
||||
SELECT
|
||||
bd.*,
|
||||
ts.total as total_stores,
|
||||
tsk.total as market_total_skus
|
||||
FROM brand_data bd, total_stores ts, total_skus tsk
|
||||
ORDER BY bd.stores_carrying DESC, bd.total_skus DESC
|
||||
LIMIT $1
|
||||
`, params);
|
||||
|
||||
return result.rows.map(row => {
|
||||
const totalStores = parseInt(row.total_stores) || 1;
|
||||
const storesCarrying = parseInt(row.stores_carrying) || 0;
|
||||
const totalSkus = parseInt(row.total_skus) || 0;
|
||||
const marketTotalSkus = parseInt(row.market_total_skus) || 1;
|
||||
|
||||
return {
|
||||
brandName: row.brand_name,
|
||||
brandId: row.brand_id,
|
||||
totalStores,
|
||||
storesCarrying,
|
||||
penetrationPercent: Math.round((storesCarrying / totalStores) * 1000) / 10,
|
||||
totalSkus,
|
||||
avgSkusPerStore: storesCarrying > 0
|
||||
? Math.round((totalSkus / storesCarrying) * 10) / 10
|
||||
: 0,
|
||||
shelfSharePercent: Math.round((totalSkus / marketTotalSkus) * 1000) / 10,
|
||||
categories: row.categories || [],
|
||||
avgPrice: row.avg_price ? Math.round(parseFloat(row.avg_price) * 100) / 100 : null,
|
||||
inStockSkus: parseInt(row.in_stock) || 0,
|
||||
};
|
||||
});
|
||||
}, 15)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get penetration trend for a brand (requires historical snapshots)
|
||||
*/
|
||||
async getPenetrationTrend(
|
||||
brandName: string,
|
||||
days: number = 30
|
||||
): Promise<PenetrationTrend> {
|
||||
const key = cacheKey('penetration_trend', { brandName, days });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
// Use brand_snapshots table for historical data
|
||||
const result = await this.pool.query(`
|
||||
SELECT
|
||||
snapshot_date as date,
|
||||
store_count,
|
||||
total_skus
|
||||
FROM brand_snapshots
|
||||
WHERE brand_name = $1
|
||||
AND snapshot_date >= CURRENT_DATE - ($2 || ' days')::INTERVAL
|
||||
ORDER BY snapshot_date
|
||||
`, [brandName, days]);
|
||||
|
||||
// Get total stores for penetration calculation
|
||||
const totalResult = await this.pool.query(
|
||||
'SELECT COUNT(*) as total FROM dispensaries'
|
||||
);
|
||||
const totalStores = parseInt(totalResult.rows[0]?.total) || 1;
|
||||
|
||||
const dataPoints = result.rows.map(row => ({
|
||||
date: row.date.toISOString().split('T')[0],
|
||||
storeCount: parseInt(row.store_count) || 0,
|
||||
skuCount: parseInt(row.total_skus) || 0,
|
||||
penetrationPercent: Math.round((parseInt(row.store_count) / totalStores) * 1000) / 10,
|
||||
}));
|
||||
|
||||
// Calculate momentum and risk scores
|
||||
let momentumScore = 0;
|
||||
let riskScore = 0;
|
||||
let trend: 'growing' | 'declining' | 'stable' = 'stable';
|
||||
|
||||
if (dataPoints.length >= 2) {
|
||||
const first = dataPoints[0];
|
||||
const last = dataPoints[dataPoints.length - 1];
|
||||
|
||||
// Momentum: change in store count
|
||||
const storeChange = last.storeCount - first.storeCount;
|
||||
const storeChangePercent = first.storeCount > 0
|
||||
? (storeChange / first.storeCount) * 100
|
||||
: 0;
|
||||
|
||||
// Momentum score: -100 to +100
|
||||
momentumScore = Math.max(-100, Math.min(100, storeChangePercent * 10));
|
||||
|
||||
// Risk score: higher if losing stores
|
||||
if (storeChange < 0) {
|
||||
riskScore = Math.min(100, Math.abs(storeChangePercent) * 5);
|
||||
}
|
||||
|
||||
// Determine trend
|
||||
if (storeChangePercent > 5) trend = 'growing';
|
||||
else if (storeChangePercent < -5) trend = 'declining';
|
||||
}
|
||||
|
||||
return {
|
||||
brandName,
|
||||
dataPoints,
|
||||
momentumScore: Math.round(momentumScore),
|
||||
riskScore: Math.round(riskScore),
|
||||
trend,
|
||||
};
|
||||
}, 15)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get shelf share by category for a brand
|
||||
*/
|
||||
async getShelfShareByCategory(brandName: string): Promise<ShelfShare[]> {
|
||||
const key = cacheKey('shelf_share_category', { brandName });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
const result = await this.pool.query(`
|
||||
WITH category_totals AS (
|
||||
SELECT
|
||||
type as category,
|
||||
COUNT(*) as total_skus
|
||||
FROM dutchie_products
|
||||
WHERE type IS NOT NULL
|
||||
GROUP BY type
|
||||
),
|
||||
brand_by_category AS (
|
||||
SELECT
|
||||
type as category,
|
||||
COUNT(*) as sku_count
|
||||
FROM dutchie_products
|
||||
WHERE brand_name = $1
|
||||
AND type IS NOT NULL
|
||||
GROUP BY type
|
||||
),
|
||||
ranked AS (
|
||||
SELECT
|
||||
ct.category,
|
||||
COALESCE(bc.sku_count, 0) as sku_count,
|
||||
ct.total_skus,
|
||||
RANK() OVER (PARTITION BY ct.category ORDER BY bc.sku_count DESC NULLS LAST) as rank
|
||||
FROM category_totals ct
|
||||
LEFT JOIN brand_by_category bc ON ct.category = bc.category
|
||||
)
|
||||
SELECT
|
||||
r.category,
|
||||
r.sku_count,
|
||||
r.total_skus as category_total_skus,
|
||||
ROUND((r.sku_count::NUMERIC / r.total_skus) * 100, 2) as shelf_share_pct,
|
||||
(SELECT COUNT(*) + 1 FROM (
|
||||
SELECT brand_name, COUNT(*) as cnt
|
||||
FROM dutchie_products
|
||||
WHERE type = r.category AND brand_name IS NOT NULL
|
||||
GROUP BY brand_name
|
||||
HAVING COUNT(*) > r.sku_count
|
||||
) t) as rank
|
||||
FROM ranked r
|
||||
WHERE r.sku_count > 0
|
||||
ORDER BY r.shelf_share_pct DESC
|
||||
`, [brandName]);
|
||||
|
||||
return result.rows.map(row => ({
|
||||
brandName,
|
||||
category: row.category,
|
||||
skuCount: parseInt(row.sku_count) || 0,
|
||||
categoryTotalSkus: parseInt(row.category_total_skus) || 0,
|
||||
shelfSharePercent: parseFloat(row.shelf_share_pct) || 0,
|
||||
rank: parseInt(row.rank) || 0,
|
||||
}));
|
||||
}, 15)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get brand presence by state/region
|
||||
*/
|
||||
async getBrandPresenceByState(brandName: string): Promise<BrandPresenceByState[]> {
|
||||
const key = cacheKey('brand_presence_state', { brandName });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
const result = await this.pool.query(`
|
||||
SELECT
|
||||
d.state,
|
||||
COUNT(DISTINCT dp.dispensary_id) as store_count,
|
||||
COUNT(*) as sku_count,
|
||||
AVG(extract_min_price(dp.latest_raw_payload)) as avg_price
|
||||
FROM dutchie_products dp
|
||||
JOIN dispensaries d ON dp.dispensary_id = d.id
|
||||
WHERE dp.brand_name = $1
|
||||
GROUP BY d.state
|
||||
ORDER BY store_count DESC
|
||||
`, [brandName]);
|
||||
|
||||
return result.rows.map(row => ({
|
||||
state: row.state,
|
||||
storeCount: parseInt(row.store_count) || 0,
|
||||
skuCount: parseInt(row.sku_count) || 0,
|
||||
avgPrice: row.avg_price ? Math.round(parseFloat(row.avg_price) * 100) / 100 : null,
|
||||
}));
|
||||
}, 15)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get stores carrying a brand
|
||||
*/
|
||||
async getStoresCarryingBrand(brandName: string): Promise<Array<{
|
||||
storeId: number;
|
||||
storeName: string;
|
||||
city: string;
|
||||
state: string;
|
||||
skuCount: number;
|
||||
avgPrice: number | null;
|
||||
categories: string[];
|
||||
}>> {
|
||||
const key = cacheKey('stores_carrying_brand', { brandName });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
const result = await this.pool.query(`
|
||||
SELECT
|
||||
d.id as store_id,
|
||||
d.name as store_name,
|
||||
d.city,
|
||||
d.state,
|
||||
COUNT(*) as sku_count,
|
||||
AVG(extract_min_price(dp.latest_raw_payload)) as avg_price,
|
||||
ARRAY_AGG(DISTINCT dp.type) FILTER (WHERE dp.type IS NOT NULL) as categories
|
||||
FROM dutchie_products dp
|
||||
JOIN dispensaries d ON dp.dispensary_id = d.id
|
||||
WHERE dp.brand_name = $1
|
||||
GROUP BY d.id, d.name, d.city, d.state
|
||||
ORDER BY sku_count DESC
|
||||
`, [brandName]);
|
||||
|
||||
return result.rows.map(row => ({
|
||||
storeId: row.store_id,
|
||||
storeName: row.store_name,
|
||||
city: row.city,
|
||||
state: row.state,
|
||||
skuCount: parseInt(row.sku_count) || 0,
|
||||
avgPrice: row.avg_price ? Math.round(parseFloat(row.avg_price) * 100) / 100 : null,
|
||||
categories: row.categories || [],
|
||||
}));
|
||||
}, 15)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get penetration heatmap data (state-based)
|
||||
*/
|
||||
async getPenetrationHeatmap(
|
||||
brandName?: string
|
||||
): Promise<Array<{
|
||||
state: string;
|
||||
totalStores: number;
|
||||
storesWithBrand: number;
|
||||
penetrationPercent: number;
|
||||
totalSkus: number;
|
||||
}>> {
|
||||
const key = cacheKey('penetration_heatmap', { brandName });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
if (brandName) {
|
||||
const result = await this.pool.query(`
|
||||
WITH state_totals AS (
|
||||
SELECT state, COUNT(*) as total_stores
|
||||
FROM dispensaries
|
||||
GROUP BY state
|
||||
),
|
||||
brand_by_state AS (
|
||||
SELECT
|
||||
d.state,
|
||||
COUNT(DISTINCT dp.dispensary_id) as stores_with_brand,
|
||||
COUNT(*) as total_skus
|
||||
FROM dutchie_products dp
|
||||
JOIN dispensaries d ON dp.dispensary_id = d.id
|
||||
WHERE dp.brand_name = $1
|
||||
GROUP BY d.state
|
||||
)
|
||||
SELECT
|
||||
st.state,
|
||||
st.total_stores,
|
||||
COALESCE(bs.stores_with_brand, 0) as stores_with_brand,
|
||||
ROUND(COALESCE(bs.stores_with_brand, 0)::NUMERIC / st.total_stores * 100, 1) as penetration_pct,
|
||||
COALESCE(bs.total_skus, 0) as total_skus
|
||||
FROM state_totals st
|
||||
LEFT JOIN brand_by_state bs ON st.state = bs.state
|
||||
ORDER BY penetration_pct DESC
|
||||
`, [brandName]);
|
||||
|
||||
return result.rows.map(row => ({
|
||||
state: row.state,
|
||||
totalStores: parseInt(row.total_stores) || 0,
|
||||
storesWithBrand: parseInt(row.stores_with_brand) || 0,
|
||||
penetrationPercent: parseFloat(row.penetration_pct) || 0,
|
||||
totalSkus: parseInt(row.total_skus) || 0,
|
||||
}));
|
||||
} else {
|
||||
// Overall market data by state
|
||||
const result = await this.pool.query(`
|
||||
SELECT
|
||||
d.state,
|
||||
COUNT(DISTINCT d.id) as total_stores,
|
||||
COUNT(DISTINCT dp.brand_name) as brand_count,
|
||||
COUNT(*) as total_skus
|
||||
FROM dispensaries d
|
||||
LEFT JOIN dutchie_products dp ON d.id = dp.dispensary_id
|
||||
GROUP BY d.state
|
||||
ORDER BY total_stores DESC
|
||||
`);
|
||||
|
||||
return result.rows.map(row => ({
|
||||
state: row.state,
|
||||
totalStores: parseInt(row.total_stores) || 0,
|
||||
storesWithBrand: parseInt(row.brand_count) || 0, // Using brand count here
|
||||
penetrationPercent: 100, // Full penetration for overall view
|
||||
totalSkus: parseInt(row.total_skus) || 0,
|
||||
}));
|
||||
}
|
||||
}, 30)).data;
|
||||
}
|
||||
}
|
||||
@@ -1,534 +0,0 @@
|
||||
/**
|
||||
* Price Trend Analytics Service
|
||||
*
|
||||
* Provides time-series price analytics including:
|
||||
* - Price over time for products
|
||||
* - Average MSRP/Wholesale by period
|
||||
* - Price volatility scoring
|
||||
* - Price compression detection
|
||||
*
|
||||
* Phase 3: Analytics Dashboards
|
||||
*/
|
||||
|
||||
import { Pool } from 'pg';
|
||||
import { AnalyticsCache, cacheKey } from './cache';
|
||||
|
||||
export interface PricePoint {
|
||||
date: string;
|
||||
minPrice: number | null;
|
||||
maxPrice: number | null;
|
||||
avgPrice: number | null;
|
||||
wholesalePrice: number | null;
|
||||
sampleSize: number;
|
||||
}
|
||||
|
||||
export interface PriceTrend {
|
||||
productId?: number;
|
||||
storeId?: number;
|
||||
brandName?: string;
|
||||
category?: string;
|
||||
dataPoints: PricePoint[];
|
||||
summary: {
|
||||
currentAvg: number | null;
|
||||
previousAvg: number | null;
|
||||
changePercent: number | null;
|
||||
trend: 'up' | 'down' | 'stable';
|
||||
volatilityScore: number | null;
|
||||
};
|
||||
}
|
||||
|
||||
export interface PriceSummary {
|
||||
avg7d: number | null;
|
||||
avg30d: number | null;
|
||||
avg90d: number | null;
|
||||
wholesaleAvg7d: number | null;
|
||||
wholesaleAvg30d: number | null;
|
||||
wholesaleAvg90d: number | null;
|
||||
minPrice: number | null;
|
||||
maxPrice: number | null;
|
||||
priceRange: number | null;
|
||||
volatilityScore: number | null;
|
||||
}
|
||||
|
||||
export interface PriceCompressionResult {
|
||||
category: string;
|
||||
brands: Array<{
|
||||
brandName: string;
|
||||
avgPrice: number;
|
||||
priceDistance: number; // distance from category mean
|
||||
}>;
|
||||
compressionScore: number; // 0-100, higher = more compressed
|
||||
standardDeviation: number;
|
||||
}
|
||||
|
||||
export interface PriceFilters {
|
||||
storeId?: number;
|
||||
brandName?: string;
|
||||
category?: string;
|
||||
state?: string;
|
||||
days?: number;
|
||||
}
|
||||
|
||||
export class PriceTrendService {
|
||||
private pool: Pool;
|
||||
private cache: AnalyticsCache;
|
||||
|
||||
constructor(pool: Pool, cache: AnalyticsCache) {
|
||||
this.pool = pool;
|
||||
this.cache = cache;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get price trend for a specific product
|
||||
*/
|
||||
async getProductPriceTrend(
|
||||
productId: number,
|
||||
storeId?: number,
|
||||
days: number = 30
|
||||
): Promise<PriceTrend> {
|
||||
const key = cacheKey('price_trend_product', { productId, storeId, days });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
// Try to get from snapshots first
|
||||
const snapshotResult = await this.pool.query(`
|
||||
SELECT
|
||||
DATE(crawled_at) as date,
|
||||
MIN(rec_min_price_cents) / 100.0 as min_price,
|
||||
MAX(rec_max_price_cents) / 100.0 as max_price,
|
||||
AVG(rec_min_price_cents) / 100.0 as avg_price,
|
||||
AVG(wholesale_min_price_cents) / 100.0 as wholesale_price,
|
||||
COUNT(*) as sample_size
|
||||
FROM dutchie_product_snapshots
|
||||
WHERE dutchie_product_id = $1
|
||||
AND crawled_at >= NOW() - ($2 || ' days')::INTERVAL
|
||||
${storeId ? 'AND dispensary_id = $3' : ''}
|
||||
GROUP BY DATE(crawled_at)
|
||||
ORDER BY date
|
||||
`, storeId ? [productId, days, storeId] : [productId, days]);
|
||||
|
||||
let dataPoints: PricePoint[] = snapshotResult.rows.map(row => ({
|
||||
date: row.date.toISOString().split('T')[0],
|
||||
minPrice: parseFloat(row.min_price) || null,
|
||||
maxPrice: parseFloat(row.max_price) || null,
|
||||
avgPrice: parseFloat(row.avg_price) || null,
|
||||
wholesalePrice: parseFloat(row.wholesale_price) || null,
|
||||
sampleSize: parseInt(row.sample_size),
|
||||
}));
|
||||
|
||||
// If no snapshots, get current price from product
|
||||
if (dataPoints.length === 0) {
|
||||
const productResult = await this.pool.query(`
|
||||
SELECT
|
||||
extract_min_price(latest_raw_payload) as min_price,
|
||||
extract_max_price(latest_raw_payload) as max_price,
|
||||
extract_wholesale_price(latest_raw_payload) as wholesale_price
|
||||
FROM dutchie_products
|
||||
WHERE id = $1
|
||||
`, [productId]);
|
||||
|
||||
if (productResult.rows.length > 0) {
|
||||
const row = productResult.rows[0];
|
||||
dataPoints = [{
|
||||
date: new Date().toISOString().split('T')[0],
|
||||
minPrice: parseFloat(row.min_price) || null,
|
||||
maxPrice: parseFloat(row.max_price) || null,
|
||||
avgPrice: parseFloat(row.min_price) || null,
|
||||
wholesalePrice: parseFloat(row.wholesale_price) || null,
|
||||
sampleSize: 1,
|
||||
}];
|
||||
}
|
||||
}
|
||||
|
||||
const summary = this.calculatePriceSummary(dataPoints);
|
||||
|
||||
return {
|
||||
productId,
|
||||
storeId,
|
||||
dataPoints,
|
||||
summary,
|
||||
};
|
||||
}, 15)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get price trends by brand
|
||||
*/
|
||||
async getBrandPriceTrend(
|
||||
brandName: string,
|
||||
filters: PriceFilters = {}
|
||||
): Promise<PriceTrend> {
|
||||
const { storeId, category, state, days = 30 } = filters;
|
||||
const key = cacheKey('price_trend_brand', { brandName, storeId, category, state, days });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
// Use current product data aggregated by date
|
||||
const result = await this.pool.query(`
|
||||
SELECT
|
||||
DATE(dp.updated_at) as date,
|
||||
MIN(extract_min_price(dp.latest_raw_payload)) as min_price,
|
||||
MAX(extract_max_price(dp.latest_raw_payload)) as max_price,
|
||||
AVG(extract_min_price(dp.latest_raw_payload)) as avg_price,
|
||||
AVG(extract_wholesale_price(dp.latest_raw_payload)) as wholesale_price,
|
||||
COUNT(*) as sample_size
|
||||
FROM dutchie_products dp
|
||||
JOIN dispensaries d ON dp.dispensary_id = d.id
|
||||
WHERE dp.brand_name = $1
|
||||
AND dp.updated_at >= NOW() - ($2 || ' days')::INTERVAL
|
||||
${storeId ? 'AND dp.dispensary_id = $3' : ''}
|
||||
${category ? `AND dp.type = $${storeId ? 4 : 3}` : ''}
|
||||
${state ? `AND d.state = $${storeId ? (category ? 5 : 4) : (category ? 4 : 3)}` : ''}
|
||||
GROUP BY DATE(dp.updated_at)
|
||||
ORDER BY date
|
||||
`, this.buildParams([brandName, days], { storeId, category, state }));
|
||||
|
||||
const dataPoints: PricePoint[] = result.rows.map(row => ({
|
||||
date: row.date.toISOString().split('T')[0],
|
||||
minPrice: parseFloat(row.min_price) || null,
|
||||
maxPrice: parseFloat(row.max_price) || null,
|
||||
avgPrice: parseFloat(row.avg_price) || null,
|
||||
wholesalePrice: parseFloat(row.wholesale_price) || null,
|
||||
sampleSize: parseInt(row.sample_size),
|
||||
}));
|
||||
|
||||
return {
|
||||
brandName,
|
||||
storeId,
|
||||
category,
|
||||
dataPoints,
|
||||
summary: this.calculatePriceSummary(dataPoints),
|
||||
};
|
||||
}, 15)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get price trends by category
|
||||
*/
|
||||
async getCategoryPriceTrend(
|
||||
category: string,
|
||||
filters: PriceFilters = {}
|
||||
): Promise<PriceTrend> {
|
||||
const { storeId, brandName, state, days = 30 } = filters;
|
||||
const key = cacheKey('price_trend_category', { category, storeId, brandName, state, days });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
const result = await this.pool.query(`
|
||||
SELECT
|
||||
DATE(dp.updated_at) as date,
|
||||
MIN(extract_min_price(dp.latest_raw_payload)) as min_price,
|
||||
MAX(extract_max_price(dp.latest_raw_payload)) as max_price,
|
||||
AVG(extract_min_price(dp.latest_raw_payload)) as avg_price,
|
||||
AVG(extract_wholesale_price(dp.latest_raw_payload)) as wholesale_price,
|
||||
COUNT(*) as sample_size
|
||||
FROM dutchie_products dp
|
||||
JOIN dispensaries d ON dp.dispensary_id = d.id
|
||||
WHERE dp.type = $1
|
||||
AND dp.updated_at >= NOW() - ($2 || ' days')::INTERVAL
|
||||
${storeId ? 'AND dp.dispensary_id = $3' : ''}
|
||||
${brandName ? `AND dp.brand_name = $${storeId ? 4 : 3}` : ''}
|
||||
${state ? `AND d.state = $${storeId ? (brandName ? 5 : 4) : (brandName ? 4 : 3)}` : ''}
|
||||
GROUP BY DATE(dp.updated_at)
|
||||
ORDER BY date
|
||||
`, this.buildParams([category, days], { storeId, brandName, state }));
|
||||
|
||||
const dataPoints: PricePoint[] = result.rows.map(row => ({
|
||||
date: row.date.toISOString().split('T')[0],
|
||||
minPrice: parseFloat(row.min_price) || null,
|
||||
maxPrice: parseFloat(row.max_price) || null,
|
||||
avgPrice: parseFloat(row.avg_price) || null,
|
||||
wholesalePrice: parseFloat(row.wholesale_price) || null,
|
||||
sampleSize: parseInt(row.sample_size),
|
||||
}));
|
||||
|
||||
return {
|
||||
category,
|
||||
storeId,
|
||||
brandName,
|
||||
dataPoints,
|
||||
summary: this.calculatePriceSummary(dataPoints),
|
||||
};
|
||||
}, 15)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get price summary statistics
|
||||
*/
|
||||
async getPriceSummary(filters: PriceFilters = {}): Promise<PriceSummary> {
|
||||
const { storeId, brandName, category, state } = filters;
|
||||
const key = cacheKey('price_summary', filters as Record<string, unknown>);
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
const whereConditions: string[] = [];
|
||||
const params: (string | number)[] = [];
|
||||
let paramIndex = 1;
|
||||
|
||||
if (storeId) {
|
||||
whereConditions.push(`dp.dispensary_id = $${paramIndex++}`);
|
||||
params.push(storeId);
|
||||
}
|
||||
if (brandName) {
|
||||
whereConditions.push(`dp.brand_name = $${paramIndex++}`);
|
||||
params.push(brandName);
|
||||
}
|
||||
if (category) {
|
||||
whereConditions.push(`dp.type = $${paramIndex++}`);
|
||||
params.push(category);
|
||||
}
|
||||
if (state) {
|
||||
whereConditions.push(`d.state = $${paramIndex++}`);
|
||||
params.push(state);
|
||||
}
|
||||
|
||||
const whereClause = whereConditions.length > 0
|
||||
? 'WHERE ' + whereConditions.join(' AND ')
|
||||
: '';
|
||||
|
||||
const result = await this.pool.query(`
|
||||
WITH prices AS (
|
||||
SELECT
|
||||
extract_min_price(dp.latest_raw_payload) as min_price,
|
||||
extract_max_price(dp.latest_raw_payload) as max_price,
|
||||
extract_wholesale_price(dp.latest_raw_payload) as wholesale_price
|
||||
FROM dutchie_products dp
|
||||
JOIN dispensaries d ON dp.dispensary_id = d.id
|
||||
${whereClause}
|
||||
)
|
||||
SELECT
|
||||
AVG(min_price) as avg_price,
|
||||
AVG(wholesale_price) as avg_wholesale,
|
||||
MIN(min_price) as min_price,
|
||||
MAX(max_price) as max_price,
|
||||
STDDEV(min_price) as std_dev
|
||||
FROM prices
|
||||
WHERE min_price IS NOT NULL
|
||||
`, params);
|
||||
|
||||
const row = result.rows[0];
|
||||
const avgPrice = parseFloat(row.avg_price) || null;
|
||||
const stdDev = parseFloat(row.std_dev) || null;
|
||||
const volatility = avgPrice && stdDev ? (stdDev / avgPrice) * 100 : null;
|
||||
|
||||
return {
|
||||
avg7d: avgPrice, // Using current data as proxy
|
||||
avg30d: avgPrice,
|
||||
avg90d: avgPrice,
|
||||
wholesaleAvg7d: parseFloat(row.avg_wholesale) || null,
|
||||
wholesaleAvg30d: parseFloat(row.avg_wholesale) || null,
|
||||
wholesaleAvg90d: parseFloat(row.avg_wholesale) || null,
|
||||
minPrice: parseFloat(row.min_price) || null,
|
||||
maxPrice: parseFloat(row.max_price) || null,
|
||||
priceRange: row.max_price && row.min_price
|
||||
? parseFloat(row.max_price) - parseFloat(row.min_price)
|
||||
: null,
|
||||
volatilityScore: volatility ? Math.round(volatility * 10) / 10 : null,
|
||||
};
|
||||
}, 30)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Detect price compression in a category
|
||||
*/
|
||||
async detectPriceCompression(
|
||||
category: string,
|
||||
state?: string
|
||||
): Promise<PriceCompressionResult> {
|
||||
const key = cacheKey('price_compression', { category, state });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
const result = await this.pool.query(`
|
||||
WITH brand_prices AS (
|
||||
SELECT
|
||||
dp.brand_name,
|
||||
AVG(extract_min_price(dp.latest_raw_payload)) as avg_price,
|
||||
COUNT(*) as sku_count
|
||||
FROM dutchie_products dp
|
||||
JOIN dispensaries d ON dp.dispensary_id = d.id
|
||||
WHERE dp.type = $1
|
||||
AND dp.brand_name IS NOT NULL
|
||||
${state ? 'AND d.state = $2' : ''}
|
||||
GROUP BY dp.brand_name
|
||||
HAVING COUNT(*) >= 3
|
||||
),
|
||||
stats AS (
|
||||
SELECT
|
||||
AVG(avg_price) as category_avg,
|
||||
STDDEV(avg_price) as std_dev
|
||||
FROM brand_prices
|
||||
WHERE avg_price IS NOT NULL
|
||||
)
|
||||
SELECT
|
||||
bp.brand_name,
|
||||
bp.avg_price,
|
||||
ABS(bp.avg_price - s.category_avg) as price_distance,
|
||||
s.category_avg,
|
||||
s.std_dev
|
||||
FROM brand_prices bp, stats s
|
||||
WHERE bp.avg_price IS NOT NULL
|
||||
ORDER BY bp.avg_price
|
||||
`, state ? [category, state] : [category]);
|
||||
|
||||
if (result.rows.length === 0) {
|
||||
return {
|
||||
category,
|
||||
brands: [],
|
||||
compressionScore: 0,
|
||||
standardDeviation: 0,
|
||||
};
|
||||
}
|
||||
|
||||
const categoryAvg = parseFloat(result.rows[0].category_avg) || 0;
|
||||
const stdDev = parseFloat(result.rows[0].std_dev) || 0;
|
||||
|
||||
// Compression score: lower std dev relative to mean = more compression
|
||||
// Scale to 0-100 where 100 = very compressed
|
||||
const cv = categoryAvg > 0 ? (stdDev / categoryAvg) * 100 : 0;
|
||||
const compressionScore = Math.max(0, Math.min(100, 100 - cv));
|
||||
|
||||
const brands = result.rows.map(row => ({
|
||||
brandName: row.brand_name,
|
||||
avgPrice: parseFloat(row.avg_price) || 0,
|
||||
priceDistance: parseFloat(row.price_distance) || 0,
|
||||
}));
|
||||
|
||||
return {
|
||||
category,
|
||||
brands,
|
||||
compressionScore: Math.round(compressionScore),
|
||||
standardDeviation: Math.round(stdDev * 100) / 100,
|
||||
};
|
||||
}, 30)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get global price statistics
|
||||
*/
|
||||
async getGlobalPriceStats(): Promise<{
|
||||
totalProductsWithPrice: number;
|
||||
avgPrice: number | null;
|
||||
medianPrice: number | null;
|
||||
priceByCategory: Array<{ category: string; avgPrice: number; count: number }>;
|
||||
priceByState: Array<{ state: string; avgPrice: number; count: number }>;
|
||||
}> {
|
||||
const key = 'global_price_stats';
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
const [countResult, categoryResult, stateResult] = await Promise.all([
|
||||
this.pool.query(`
|
||||
SELECT
|
||||
COUNT(*) FILTER (WHERE extract_min_price(latest_raw_payload) IS NOT NULL) as with_price,
|
||||
AVG(extract_min_price(latest_raw_payload)) as avg_price,
|
||||
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY extract_min_price(latest_raw_payload)) as median
|
||||
FROM dutchie_products
|
||||
`),
|
||||
this.pool.query(`
|
||||
SELECT
|
||||
type as category,
|
||||
AVG(extract_min_price(latest_raw_payload)) as avg_price,
|
||||
COUNT(*) as count
|
||||
FROM dutchie_products
|
||||
WHERE type IS NOT NULL
|
||||
AND extract_min_price(latest_raw_payload) IS NOT NULL
|
||||
GROUP BY type
|
||||
ORDER BY avg_price DESC
|
||||
`),
|
||||
this.pool.query(`
|
||||
SELECT
|
||||
d.state,
|
||||
AVG(extract_min_price(dp.latest_raw_payload)) as avg_price,
|
||||
COUNT(*) as count
|
||||
FROM dutchie_products dp
|
||||
JOIN dispensaries d ON dp.dispensary_id = d.id
|
||||
WHERE extract_min_price(dp.latest_raw_payload) IS NOT NULL
|
||||
GROUP BY d.state
|
||||
ORDER BY avg_price DESC
|
||||
`),
|
||||
]);
|
||||
|
||||
return {
|
||||
totalProductsWithPrice: parseInt(countResult.rows[0]?.with_price || '0'),
|
||||
avgPrice: parseFloat(countResult.rows[0]?.avg_price) || null,
|
||||
medianPrice: parseFloat(countResult.rows[0]?.median) || null,
|
||||
priceByCategory: categoryResult.rows.map(r => ({
|
||||
category: r.category,
|
||||
avgPrice: parseFloat(r.avg_price) || 0,
|
||||
count: parseInt(r.count),
|
||||
})),
|
||||
priceByState: stateResult.rows.map(r => ({
|
||||
state: r.state,
|
||||
avgPrice: parseFloat(r.avg_price) || 0,
|
||||
count: parseInt(r.count),
|
||||
})),
|
||||
};
|
||||
}, 30)).data;
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// HELPER METHODS
|
||||
// ============================================================
|
||||
|
||||
private calculatePriceSummary(dataPoints: PricePoint[]): PriceTrend['summary'] {
|
||||
if (dataPoints.length === 0) {
|
||||
return {
|
||||
currentAvg: null,
|
||||
previousAvg: null,
|
||||
changePercent: null,
|
||||
trend: 'stable',
|
||||
volatilityScore: null,
|
||||
};
|
||||
}
|
||||
|
||||
const prices = dataPoints
|
||||
.map(d => d.avgPrice)
|
||||
.filter((p): p is number => p !== null);
|
||||
|
||||
if (prices.length === 0) {
|
||||
return {
|
||||
currentAvg: null,
|
||||
previousAvg: null,
|
||||
changePercent: null,
|
||||
trend: 'stable',
|
||||
volatilityScore: null,
|
||||
};
|
||||
}
|
||||
|
||||
const currentAvg = prices[prices.length - 1];
|
||||
const midpoint = Math.floor(prices.length / 2);
|
||||
const previousAvg = prices.length > 1 ? prices[midpoint] : currentAvg;
|
||||
|
||||
const changePercent = previousAvg > 0
|
||||
? ((currentAvg - previousAvg) / previousAvg) * 100
|
||||
: null;
|
||||
|
||||
// Calculate volatility (coefficient of variation)
|
||||
const mean = prices.reduce((a, b) => a + b, 0) / prices.length;
|
||||
const variance = prices.reduce((sum, p) => sum + Math.pow(p - mean, 2), 0) / prices.length;
|
||||
const stdDev = Math.sqrt(variance);
|
||||
const volatilityScore = mean > 0 ? (stdDev / mean) * 100 : null;
|
||||
|
||||
let trend: 'up' | 'down' | 'stable' = 'stable';
|
||||
if (changePercent !== null) {
|
||||
if (changePercent > 5) trend = 'up';
|
||||
else if (changePercent < -5) trend = 'down';
|
||||
}
|
||||
|
||||
return {
|
||||
currentAvg: Math.round(currentAvg * 100) / 100,
|
||||
previousAvg: Math.round(previousAvg * 100) / 100,
|
||||
changePercent: changePercent !== null ? Math.round(changePercent * 10) / 10 : null,
|
||||
trend,
|
||||
volatilityScore: volatilityScore !== null ? Math.round(volatilityScore * 10) / 10 : null,
|
||||
};
|
||||
}
|
||||
|
||||
private buildParams(
|
||||
baseParams: (string | number)[],
|
||||
optionalParams: Record<string, string | number | undefined>
|
||||
): (string | number)[] {
|
||||
const params = [...baseParams];
|
||||
for (const value of Object.values(optionalParams)) {
|
||||
if (value !== undefined) {
|
||||
params.push(value);
|
||||
}
|
||||
}
|
||||
return params;
|
||||
}
|
||||
}
|
||||
@@ -1,587 +0,0 @@
|
||||
/**
|
||||
* Store Change Tracking Service
|
||||
*
|
||||
* Tracks changes at the store level including:
|
||||
* - New/lost brands
|
||||
* - New/discontinued products
|
||||
* - Stock status transitions
|
||||
* - Price changes
|
||||
* - Category movement leaderboards
|
||||
*
|
||||
* Phase 3: Analytics Dashboards
|
||||
*/
|
||||
|
||||
import { Pool } from 'pg';
|
||||
import { AnalyticsCache, cacheKey } from './cache';
|
||||
|
||||
export interface StoreChangeSummary {
|
||||
storeId: number;
|
||||
storeName: string;
|
||||
city: string;
|
||||
state: string;
|
||||
brandsAdded7d: number;
|
||||
brandsAdded30d: number;
|
||||
brandsLost7d: number;
|
||||
brandsLost30d: number;
|
||||
productsAdded7d: number;
|
||||
productsAdded30d: number;
|
||||
productsDiscontinued7d: number;
|
||||
productsDiscontinued30d: number;
|
||||
priceDrops7d: number;
|
||||
priceIncreases7d: number;
|
||||
restocks7d: number;
|
||||
stockOuts7d: number;
|
||||
}
|
||||
|
||||
export interface StoreChangeEvent {
|
||||
id: number;
|
||||
storeId: number;
|
||||
storeName: string;
|
||||
eventType: string;
|
||||
eventDate: string;
|
||||
brandName: string | null;
|
||||
productName: string | null;
|
||||
category: string | null;
|
||||
oldValue: string | null;
|
||||
newValue: string | null;
|
||||
metadata: Record<string, unknown> | null;
|
||||
}
|
||||
|
||||
export interface BrandChange {
|
||||
brandName: string;
|
||||
changeType: 'added' | 'removed';
|
||||
date: string;
|
||||
skuCount: number;
|
||||
categories: string[];
|
||||
}
|
||||
|
||||
export interface ProductChange {
|
||||
productId: number;
|
||||
productName: string;
|
||||
brandName: string | null;
|
||||
category: string | null;
|
||||
changeType: 'added' | 'discontinued' | 'price_drop' | 'price_increase' | 'restocked' | 'out_of_stock';
|
||||
date: string;
|
||||
oldValue?: string;
|
||||
newValue?: string;
|
||||
}
|
||||
|
||||
export interface CategoryLeaderboard {
|
||||
category: string;
|
||||
storeId: number;
|
||||
storeName: string;
|
||||
skuCount: number;
|
||||
brandCount: number;
|
||||
avgPrice: number | null;
|
||||
changePercent7d: number;
|
||||
rank: number;
|
||||
}
|
||||
|
||||
export interface StoreFilters {
|
||||
storeId?: number;
|
||||
state?: string;
|
||||
days?: number;
|
||||
eventType?: string;
|
||||
}
|
||||
|
||||
export class StoreChangeService {
|
||||
private pool: Pool;
|
||||
private cache: AnalyticsCache;
|
||||
|
||||
constructor(pool: Pool, cache: AnalyticsCache) {
|
||||
this.pool = pool;
|
||||
this.cache = cache;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get change summary for a store
|
||||
*/
|
||||
async getStoreChangeSummary(
|
||||
storeId: number
|
||||
): Promise<StoreChangeSummary | null> {
|
||||
const key = cacheKey('store_change_summary', { storeId });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
// Get store info
|
||||
const storeResult = await this.pool.query(`
|
||||
SELECT id, name, city, state FROM dispensaries WHERE id = $1
|
||||
`, [storeId]);
|
||||
|
||||
if (storeResult.rows.length === 0) return null;
|
||||
const store = storeResult.rows[0];
|
||||
|
||||
// Get change events counts
|
||||
const eventsResult = await this.pool.query(`
|
||||
SELECT
|
||||
event_type,
|
||||
COUNT(*) FILTER (WHERE event_date >= CURRENT_DATE - INTERVAL '7 days') as count_7d,
|
||||
COUNT(*) FILTER (WHERE event_date >= CURRENT_DATE - INTERVAL '30 days') as count_30d
|
||||
FROM store_change_events
|
||||
WHERE store_id = $1
|
||||
GROUP BY event_type
|
||||
`, [storeId]);
|
||||
|
||||
const counts: Record<string, { count_7d: number; count_30d: number }> = {};
|
||||
eventsResult.rows.forEach(row => {
|
||||
counts[row.event_type] = {
|
||||
count_7d: parseInt(row.count_7d) || 0,
|
||||
count_30d: parseInt(row.count_30d) || 0,
|
||||
};
|
||||
});
|
||||
|
||||
return {
|
||||
storeId: store.id,
|
||||
storeName: store.name,
|
||||
city: store.city,
|
||||
state: store.state,
|
||||
brandsAdded7d: counts['brand_added']?.count_7d || 0,
|
||||
brandsAdded30d: counts['brand_added']?.count_30d || 0,
|
||||
brandsLost7d: counts['brand_removed']?.count_7d || 0,
|
||||
brandsLost30d: counts['brand_removed']?.count_30d || 0,
|
||||
productsAdded7d: counts['product_added']?.count_7d || 0,
|
||||
productsAdded30d: counts['product_added']?.count_30d || 0,
|
||||
productsDiscontinued7d: counts['product_removed']?.count_7d || 0,
|
||||
productsDiscontinued30d: counts['product_removed']?.count_30d || 0,
|
||||
priceDrops7d: counts['price_drop']?.count_7d || 0,
|
||||
priceIncreases7d: counts['price_increase']?.count_7d || 0,
|
||||
restocks7d: counts['restocked']?.count_7d || 0,
|
||||
stockOuts7d: counts['out_of_stock']?.count_7d || 0,
|
||||
};
|
||||
}, 15)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get recent change events for a store
|
||||
*/
|
||||
async getStoreChangeEvents(
|
||||
storeId: number,
|
||||
filters: { eventType?: string; days?: number; limit?: number } = {}
|
||||
): Promise<StoreChangeEvent[]> {
|
||||
const { eventType, days = 30, limit = 100 } = filters;
|
||||
const key = cacheKey('store_change_events', { storeId, eventType, days, limit });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
const params: (string | number)[] = [storeId, days, limit];
|
||||
let eventTypeCondition = '';
|
||||
|
||||
if (eventType) {
|
||||
eventTypeCondition = 'AND event_type = $4';
|
||||
params.push(eventType);
|
||||
}
|
||||
|
||||
const result = await this.pool.query(`
|
||||
SELECT
|
||||
sce.id,
|
||||
sce.store_id,
|
||||
d.name as store_name,
|
||||
sce.event_type,
|
||||
sce.event_date,
|
||||
sce.brand_name,
|
||||
sce.product_name,
|
||||
sce.category,
|
||||
sce.old_value,
|
||||
sce.new_value,
|
||||
sce.metadata
|
||||
FROM store_change_events sce
|
||||
JOIN dispensaries d ON sce.store_id = d.id
|
||||
WHERE sce.store_id = $1
|
||||
AND sce.event_date >= CURRENT_DATE - ($2 || ' days')::INTERVAL
|
||||
${eventTypeCondition}
|
||||
ORDER BY sce.event_date DESC, sce.id DESC
|
||||
LIMIT $3
|
||||
`, params);
|
||||
|
||||
return result.rows.map(row => ({
|
||||
id: row.id,
|
||||
storeId: row.store_id,
|
||||
storeName: row.store_name,
|
||||
eventType: row.event_type,
|
||||
eventDate: row.event_date.toISOString().split('T')[0],
|
||||
brandName: row.brand_name,
|
||||
productName: row.product_name,
|
||||
category: row.category,
|
||||
oldValue: row.old_value,
|
||||
newValue: row.new_value,
|
||||
metadata: row.metadata,
|
||||
}));
|
||||
}, 5)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get new brands added to a store
|
||||
*/
|
||||
async getNewBrands(
|
||||
storeId: number,
|
||||
days: number = 30
|
||||
): Promise<BrandChange[]> {
|
||||
const key = cacheKey('new_brands', { storeId, days });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
const result = await this.pool.query(`
|
||||
SELECT
|
||||
brand_name,
|
||||
event_date,
|
||||
metadata
|
||||
FROM store_change_events
|
||||
WHERE store_id = $1
|
||||
AND event_type = 'brand_added'
|
||||
AND event_date >= CURRENT_DATE - ($2 || ' days')::INTERVAL
|
||||
ORDER BY event_date DESC
|
||||
`, [storeId, days]);
|
||||
|
||||
return result.rows.map(row => ({
|
||||
brandName: row.brand_name,
|
||||
changeType: 'added' as const,
|
||||
date: row.event_date.toISOString().split('T')[0],
|
||||
skuCount: row.metadata?.sku_count || 0,
|
||||
categories: row.metadata?.categories || [],
|
||||
}));
|
||||
}, 15)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get brands lost from a store
|
||||
*/
|
||||
async getLostBrands(
|
||||
storeId: number,
|
||||
days: number = 30
|
||||
): Promise<BrandChange[]> {
|
||||
const key = cacheKey('lost_brands', { storeId, days });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
const result = await this.pool.query(`
|
||||
SELECT
|
||||
brand_name,
|
||||
event_date,
|
||||
metadata
|
||||
FROM store_change_events
|
||||
WHERE store_id = $1
|
||||
AND event_type = 'brand_removed'
|
||||
AND event_date >= CURRENT_DATE - ($2 || ' days')::INTERVAL
|
||||
ORDER BY event_date DESC
|
||||
`, [storeId, days]);
|
||||
|
||||
return result.rows.map(row => ({
|
||||
brandName: row.brand_name,
|
||||
changeType: 'removed' as const,
|
||||
date: row.event_date.toISOString().split('T')[0],
|
||||
skuCount: row.metadata?.sku_count || 0,
|
||||
categories: row.metadata?.categories || [],
|
||||
}));
|
||||
}, 15)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get product changes for a store
|
||||
*/
|
||||
async getProductChanges(
|
||||
storeId: number,
|
||||
changeType?: 'added' | 'discontinued' | 'price_drop' | 'price_increase' | 'restocked' | 'out_of_stock',
|
||||
days: number = 7
|
||||
): Promise<ProductChange[]> {
|
||||
const key = cacheKey('product_changes', { storeId, changeType, days });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
const eventTypeMap: Record<string, string> = {
|
||||
'added': 'product_added',
|
||||
'discontinued': 'product_removed',
|
||||
'price_drop': 'price_drop',
|
||||
'price_increase': 'price_increase',
|
||||
'restocked': 'restocked',
|
||||
'out_of_stock': 'out_of_stock',
|
||||
};
|
||||
|
||||
const params: (string | number)[] = [storeId, days];
|
||||
let eventCondition = '';
|
||||
|
||||
if (changeType) {
|
||||
eventCondition = 'AND event_type = $3';
|
||||
params.push(eventTypeMap[changeType]);
|
||||
}
|
||||
|
||||
const result = await this.pool.query(`
|
||||
SELECT
|
||||
product_id,
|
||||
product_name,
|
||||
brand_name,
|
||||
category,
|
||||
event_type,
|
||||
event_date,
|
||||
old_value,
|
||||
new_value
|
||||
FROM store_change_events
|
||||
WHERE store_id = $1
|
||||
AND event_date >= CURRENT_DATE - ($2 || ' days')::INTERVAL
|
||||
AND product_id IS NOT NULL
|
||||
${eventCondition}
|
||||
ORDER BY event_date DESC
|
||||
LIMIT 100
|
||||
`, params);
|
||||
|
||||
const reverseMap: Record<string, ProductChange['changeType']> = {
|
||||
'product_added': 'added',
|
||||
'product_removed': 'discontinued',
|
||||
'price_drop': 'price_drop',
|
||||
'price_increase': 'price_increase',
|
||||
'restocked': 'restocked',
|
||||
'out_of_stock': 'out_of_stock',
|
||||
};
|
||||
|
||||
return result.rows.map(row => ({
|
||||
productId: row.product_id,
|
||||
productName: row.product_name,
|
||||
brandName: row.brand_name,
|
||||
category: row.category,
|
||||
changeType: reverseMap[row.event_type] || 'added',
|
||||
date: row.event_date.toISOString().split('T')[0],
|
||||
oldValue: row.old_value,
|
||||
newValue: row.new_value,
|
||||
}));
|
||||
}, 5)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get category leaderboard across stores
|
||||
*/
|
||||
async getCategoryLeaderboard(
|
||||
category: string,
|
||||
limit: number = 20
|
||||
): Promise<CategoryLeaderboard[]> {
|
||||
const key = cacheKey('category_leaderboard', { category, limit });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
const result = await this.pool.query(`
|
||||
WITH store_category_stats AS (
|
||||
SELECT
|
||||
dp.dispensary_id as store_id,
|
||||
d.name as store_name,
|
||||
COUNT(*) as sku_count,
|
||||
COUNT(DISTINCT dp.brand_name) as brand_count,
|
||||
AVG(extract_min_price(dp.latest_raw_payload)) as avg_price
|
||||
FROM dutchie_products dp
|
||||
JOIN dispensaries d ON dp.dispensary_id = d.id
|
||||
WHERE dp.type = $1
|
||||
GROUP BY dp.dispensary_id, d.name
|
||||
)
|
||||
SELECT
|
||||
scs.*,
|
||||
RANK() OVER (ORDER BY scs.sku_count DESC) as rank
|
||||
FROM store_category_stats scs
|
||||
ORDER BY scs.sku_count DESC
|
||||
LIMIT $2
|
||||
`, [category, limit]);
|
||||
|
||||
return result.rows.map(row => ({
|
||||
category,
|
||||
storeId: row.store_id,
|
||||
storeName: row.store_name,
|
||||
skuCount: parseInt(row.sku_count) || 0,
|
||||
brandCount: parseInt(row.brand_count) || 0,
|
||||
avgPrice: row.avg_price ? Math.round(parseFloat(row.avg_price) * 100) / 100 : null,
|
||||
changePercent7d: 0, // Would need historical data
|
||||
rank: parseInt(row.rank) || 0,
|
||||
}));
|
||||
}, 15)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get stores with most activity (changes)
|
||||
*/
|
||||
async getMostActiveStores(
|
||||
days: number = 7,
|
||||
limit: number = 10
|
||||
): Promise<Array<{
|
||||
storeId: number;
|
||||
storeName: string;
|
||||
city: string;
|
||||
state: string;
|
||||
totalChanges: number;
|
||||
brandsChanged: number;
|
||||
productsChanged: number;
|
||||
priceChanges: number;
|
||||
stockChanges: number;
|
||||
}>> {
|
||||
const key = cacheKey('most_active_stores', { days, limit });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
const result = await this.pool.query(`
|
||||
SELECT
|
||||
d.id as store_id,
|
||||
d.name as store_name,
|
||||
d.city,
|
||||
d.state,
|
||||
COUNT(*) as total_changes,
|
||||
COUNT(*) FILTER (WHERE sce.event_type IN ('brand_added', 'brand_removed')) as brands_changed,
|
||||
COUNT(*) FILTER (WHERE sce.event_type IN ('product_added', 'product_removed')) as products_changed,
|
||||
COUNT(*) FILTER (WHERE sce.event_type IN ('price_drop', 'price_increase')) as price_changes,
|
||||
COUNT(*) FILTER (WHERE sce.event_type IN ('restocked', 'out_of_stock')) as stock_changes
|
||||
FROM store_change_events sce
|
||||
JOIN dispensaries d ON sce.store_id = d.id
|
||||
WHERE sce.event_date >= CURRENT_DATE - ($1 || ' days')::INTERVAL
|
||||
GROUP BY d.id, d.name, d.city, d.state
|
||||
ORDER BY total_changes DESC
|
||||
LIMIT $2
|
||||
`, [days, limit]);
|
||||
|
||||
return result.rows.map(row => ({
|
||||
storeId: row.store_id,
|
||||
storeName: row.store_name,
|
||||
city: row.city,
|
||||
state: row.state,
|
||||
totalChanges: parseInt(row.total_changes) || 0,
|
||||
brandsChanged: parseInt(row.brands_changed) || 0,
|
||||
productsChanged: parseInt(row.products_changed) || 0,
|
||||
priceChanges: parseInt(row.price_changes) || 0,
|
||||
stockChanges: parseInt(row.stock_changes) || 0,
|
||||
}));
|
||||
}, 15)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Compare two stores
|
||||
*/
|
||||
async compareStores(
|
||||
storeId1: number,
|
||||
storeId2: number
|
||||
): Promise<{
|
||||
store1: { id: number; name: string; brands: string[]; categories: string[]; skuCount: number };
|
||||
store2: { id: number; name: string; brands: string[]; categories: string[]; skuCount: number };
|
||||
sharedBrands: string[];
|
||||
uniqueToStore1: string[];
|
||||
uniqueToStore2: string[];
|
||||
categoryComparison: Array<{
|
||||
category: string;
|
||||
store1Skus: number;
|
||||
store2Skus: number;
|
||||
difference: number;
|
||||
}>;
|
||||
}> {
|
||||
const key = cacheKey('compare_stores', { storeId1, storeId2 });
|
||||
|
||||
return (await this.cache.getOrCompute(key, async () => {
|
||||
const [store1Data, store2Data] = await Promise.all([
|
||||
this.pool.query(`
|
||||
SELECT
|
||||
d.id, d.name,
|
||||
ARRAY_AGG(DISTINCT dp.brand_name) FILTER (WHERE dp.brand_name IS NOT NULL) as brands,
|
||||
ARRAY_AGG(DISTINCT dp.type) FILTER (WHERE dp.type IS NOT NULL) as categories,
|
||||
COUNT(*) as sku_count
|
||||
FROM dispensaries d
|
||||
LEFT JOIN dutchie_products dp ON d.id = dp.dispensary_id
|
||||
WHERE d.id = $1
|
||||
GROUP BY d.id, d.name
|
||||
`, [storeId1]),
|
||||
this.pool.query(`
|
||||
SELECT
|
||||
d.id, d.name,
|
||||
ARRAY_AGG(DISTINCT dp.brand_name) FILTER (WHERE dp.brand_name IS NOT NULL) as brands,
|
||||
ARRAY_AGG(DISTINCT dp.type) FILTER (WHERE dp.type IS NOT NULL) as categories,
|
||||
COUNT(*) as sku_count
|
||||
FROM dispensaries d
|
||||
LEFT JOIN dutchie_products dp ON d.id = dp.dispensary_id
|
||||
WHERE d.id = $1
|
||||
GROUP BY d.id, d.name
|
||||
`, [storeId2]),
|
||||
]);
|
||||
|
||||
const s1 = store1Data.rows[0];
|
||||
const s2 = store2Data.rows[0];
|
||||
|
||||
const brands1Array: string[] = (s1?.brands || []).filter((b: string | null): b is string => b !== null);
|
||||
const brands2Array: string[] = (s2?.brands || []).filter((b: string | null): b is string => b !== null);
|
||||
const brands1 = new Set(brands1Array);
|
||||
const brands2 = new Set(brands2Array);
|
||||
|
||||
const sharedBrands: string[] = brands1Array.filter(b => brands2.has(b));
|
||||
const uniqueToStore1: string[] = brands1Array.filter(b => !brands2.has(b));
|
||||
const uniqueToStore2: string[] = brands2Array.filter(b => !brands1.has(b));
|
||||
|
||||
// Category comparison
|
||||
const categoryResult = await this.pool.query(`
|
||||
WITH store1_cats AS (
|
||||
SELECT type as category, COUNT(*) as sku_count
|
||||
FROM dutchie_products WHERE dispensary_id = $1 AND type IS NOT NULL
|
||||
GROUP BY type
|
||||
),
|
||||
store2_cats AS (
|
||||
SELECT type as category, COUNT(*) as sku_count
|
||||
FROM dutchie_products WHERE dispensary_id = $2 AND type IS NOT NULL
|
||||
GROUP BY type
|
||||
),
|
||||
all_cats AS (
|
||||
SELECT category FROM store1_cats
|
||||
UNION
|
||||
SELECT category FROM store2_cats
|
||||
)
|
||||
SELECT
|
||||
ac.category,
|
||||
COALESCE(s1.sku_count, 0) as store1_skus,
|
||||
COALESCE(s2.sku_count, 0) as store2_skus
|
||||
FROM all_cats ac
|
||||
LEFT JOIN store1_cats s1 ON ac.category = s1.category
|
||||
LEFT JOIN store2_cats s2 ON ac.category = s2.category
|
||||
ORDER BY (COALESCE(s1.sku_count, 0) + COALESCE(s2.sku_count, 0)) DESC
|
||||
`, [storeId1, storeId2]);
|
||||
|
||||
return {
|
||||
store1: {
|
||||
id: s1?.id || storeId1,
|
||||
name: s1?.name || 'Unknown',
|
||||
brands: s1?.brands || [],
|
||||
categories: s1?.categories || [],
|
||||
skuCount: parseInt(s1?.sku_count) || 0,
|
||||
},
|
||||
store2: {
|
||||
id: s2?.id || storeId2,
|
||||
name: s2?.name || 'Unknown',
|
||||
brands: s2?.brands || [],
|
||||
categories: s2?.categories || [],
|
||||
skuCount: parseInt(s2?.sku_count) || 0,
|
||||
},
|
||||
sharedBrands,
|
||||
uniqueToStore1,
|
||||
uniqueToStore2,
|
||||
categoryComparison: categoryResult.rows.map(row => ({
|
||||
category: row.category,
|
||||
store1Skus: parseInt(row.store1_skus) || 0,
|
||||
store2Skus: parseInt(row.store2_skus) || 0,
|
||||
difference: (parseInt(row.store1_skus) || 0) - (parseInt(row.store2_skus) || 0),
|
||||
})),
|
||||
};
|
||||
}, 15)).data;
|
||||
}
|
||||
|
||||
/**
|
||||
* Record a change event (used by crawler/worker)
|
||||
*/
|
||||
async recordChangeEvent(event: {
|
||||
storeId: number;
|
||||
eventType: string;
|
||||
brandName?: string;
|
||||
productId?: number;
|
||||
productName?: string;
|
||||
category?: string;
|
||||
oldValue?: string;
|
||||
newValue?: string;
|
||||
metadata?: Record<string, unknown>;
|
||||
}): Promise<void> {
|
||||
await this.pool.query(`
|
||||
INSERT INTO store_change_events
|
||||
(store_id, event_type, event_date, brand_name, product_id, product_name, category, old_value, new_value, metadata)
|
||||
VALUES ($1, $2, CURRENT_DATE, $3, $4, $5, $6, $7, $8, $9)
|
||||
`, [
|
||||
event.storeId,
|
||||
event.eventType,
|
||||
event.brandName || null,
|
||||
event.productId || null,
|
||||
event.productName || null,
|
||||
event.category || null,
|
||||
event.oldValue || null,
|
||||
event.newValue || null,
|
||||
event.metadata ? JSON.stringify(event.metadata) : null,
|
||||
]);
|
||||
|
||||
// Invalidate cache
|
||||
await this.cache.invalidatePattern(`store_change_summary:storeId=${event.storeId}`);
|
||||
}
|
||||
}
|
||||
@@ -1,266 +0,0 @@
|
||||
/**
|
||||
* LEGACY SERVICE - AZDHS Import
|
||||
*
|
||||
* DEPRECATED: This service creates its own database pool.
|
||||
* Future implementations should use the canonical CannaiQ connection.
|
||||
*
|
||||
* Imports Arizona dispensaries from the main database's dispensaries table
|
||||
* (which was populated from AZDHS data) into the isolated Dutchie AZ database.
|
||||
*
|
||||
* This establishes the canonical list of AZ dispensaries to match against Dutchie.
|
||||
*
|
||||
* DO NOT:
|
||||
* - Run this in automated jobs
|
||||
* - Use DATABASE_URL directly
|
||||
*/
|
||||
|
||||
import { Pool } from 'pg';
|
||||
import { query as dutchieQuery } from '../db/connection';
|
||||
import { Dispensary } from '../types';
|
||||
|
||||
// Single database connection (cannaiq in cannaiq-postgres container)
|
||||
// Use CANNAIQ_DB_* env vars or defaults
|
||||
const MAIN_DB_CONNECTION = process.env.CANNAIQ_DB_URL ||
|
||||
`postgresql://${process.env.CANNAIQ_DB_USER || 'dutchie'}:${process.env.CANNAIQ_DB_PASS || 'dutchie_local_pass'}@${process.env.CANNAIQ_DB_HOST || 'localhost'}:${process.env.CANNAIQ_DB_PORT || '54320'}/${process.env.CANNAIQ_DB_NAME || 'cannaiq'}`;
|
||||
|
||||
/**
|
||||
* AZDHS dispensary record from the main database
|
||||
*/
|
||||
interface AZDHSDispensary {
|
||||
id: number;
|
||||
azdhs_id: number;
|
||||
name: string;
|
||||
company_name?: string;
|
||||
address?: string;
|
||||
city: string;
|
||||
state: string;
|
||||
zip?: string;
|
||||
latitude?: number;
|
||||
longitude?: number;
|
||||
dba_name?: string;
|
||||
phone?: string;
|
||||
email?: string;
|
||||
website?: string;
|
||||
google_rating?: string;
|
||||
google_review_count?: number;
|
||||
slug: string;
|
||||
menu_provider?: string;
|
||||
product_provider?: string;
|
||||
created_at: Date;
|
||||
updated_at: Date;
|
||||
}
|
||||
|
||||
/**
|
||||
* Import result statistics
|
||||
*/
|
||||
interface ImportResult {
|
||||
total: number;
|
||||
imported: number;
|
||||
skipped: number;
|
||||
errors: string[];
|
||||
}
|
||||
|
||||
/**
|
||||
* Create a temporary connection to the main database
|
||||
*/
|
||||
function getMainDBPool(): Pool {
|
||||
console.warn('[AZDHS Import] LEGACY: Using separate pool. Should use canonical CannaiQ connection.');
|
||||
return new Pool({
|
||||
connectionString: MAIN_DB_CONNECTION,
|
||||
max: 5,
|
||||
idleTimeoutMillis: 30000,
|
||||
connectionTimeoutMillis: 5000,
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* Fetch all AZ dispensaries from the main database
|
||||
*/
|
||||
async function fetchAZDHSDispensaries(): Promise<AZDHSDispensary[]> {
|
||||
const pool = getMainDBPool();
|
||||
|
||||
try {
|
||||
const result = await pool.query<AZDHSDispensary>(`
|
||||
SELECT
|
||||
id, azdhs_id, name, company_name, address, city, state, zip,
|
||||
latitude, longitude, dba_name, phone, email, website,
|
||||
google_rating, google_review_count, slug,
|
||||
menu_provider, product_provider,
|
||||
created_at, updated_at
|
||||
FROM dispensaries
|
||||
WHERE state = 'AZ'
|
||||
ORDER BY id
|
||||
`);
|
||||
|
||||
return result.rows;
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Import a single dispensary into the Dutchie AZ database
|
||||
*/
|
||||
async function importDispensary(disp: AZDHSDispensary): Promise<number> {
|
||||
const result = await dutchieQuery<{ id: number }>(
|
||||
`
|
||||
INSERT INTO dispensaries (
|
||||
platform, name, slug, city, state, postal_code, address,
|
||||
latitude, longitude, is_delivery, is_pickup, raw_metadata, updated_at
|
||||
) VALUES (
|
||||
$1, $2, $3, $4, $5, $6, $7,
|
||||
$8, $9, $10, $11, $12, NOW()
|
||||
)
|
||||
ON CONFLICT (platform, slug, city, state) DO UPDATE SET
|
||||
name = EXCLUDED.name,
|
||||
postal_code = EXCLUDED.postal_code,
|
||||
address = EXCLUDED.address,
|
||||
latitude = EXCLUDED.latitude,
|
||||
longitude = EXCLUDED.longitude,
|
||||
raw_metadata = EXCLUDED.raw_metadata,
|
||||
updated_at = NOW()
|
||||
RETURNING id
|
||||
`,
|
||||
[
|
||||
'dutchie', // Will be updated when Dutchie match is found
|
||||
disp.dba_name || disp.name,
|
||||
disp.slug,
|
||||
disp.city,
|
||||
disp.state,
|
||||
disp.zip,
|
||||
disp.address,
|
||||
disp.latitude,
|
||||
disp.longitude,
|
||||
false, // is_delivery - unknown
|
||||
true, // is_pickup - assume true
|
||||
JSON.stringify({
|
||||
azdhs_id: disp.azdhs_id,
|
||||
main_db_id: disp.id,
|
||||
company_name: disp.company_name,
|
||||
phone: disp.phone,
|
||||
email: disp.email,
|
||||
website: disp.website,
|
||||
google_rating: disp.google_rating,
|
||||
google_review_count: disp.google_review_count,
|
||||
menu_provider: disp.menu_provider,
|
||||
product_provider: disp.product_provider,
|
||||
}),
|
||||
]
|
||||
);
|
||||
|
||||
return result.rows[0].id;
|
||||
}
|
||||
|
||||
/**
|
||||
* Import all AZDHS dispensaries into the Dutchie AZ database
|
||||
*/
|
||||
export async function importAZDHSDispensaries(): Promise<ImportResult> {
|
||||
console.log('[AZDHS Import] Starting import from main database...');
|
||||
|
||||
const result: ImportResult = {
|
||||
total: 0,
|
||||
imported: 0,
|
||||
skipped: 0,
|
||||
errors: [],
|
||||
};
|
||||
|
||||
try {
|
||||
const dispensaries = await fetchAZDHSDispensaries();
|
||||
result.total = dispensaries.length;
|
||||
|
||||
console.log(`[AZDHS Import] Found ${dispensaries.length} AZ dispensaries in main DB`);
|
||||
|
||||
for (const disp of dispensaries) {
|
||||
try {
|
||||
const id = await importDispensary(disp);
|
||||
result.imported++;
|
||||
console.log(`[AZDHS Import] Imported: ${disp.name} (${disp.city}) -> id=${id}`);
|
||||
} catch (error: any) {
|
||||
if (error.message.includes('duplicate')) {
|
||||
result.skipped++;
|
||||
} else {
|
||||
result.errors.push(`${disp.name}: ${error.message}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
} catch (error: any) {
|
||||
result.errors.push(`Failed to fetch from main DB: ${error.message}`);
|
||||
}
|
||||
|
||||
console.log(`[AZDHS Import] Complete: ${result.imported} imported, ${result.skipped} skipped, ${result.errors.length} errors`);
|
||||
return result;
|
||||
}
|
||||
|
||||
/**
|
||||
* Import dispensaries from JSON file (backup export)
|
||||
*/
|
||||
export async function importFromJSON(jsonPath: string): Promise<ImportResult> {
|
||||
console.log(`[AZDHS Import] Importing from JSON: ${jsonPath}`);
|
||||
|
||||
const result: ImportResult = {
|
||||
total: 0,
|
||||
imported: 0,
|
||||
skipped: 0,
|
||||
errors: [],
|
||||
};
|
||||
|
||||
try {
|
||||
const fs = await import('fs/promises');
|
||||
const data = await fs.readFile(jsonPath, 'utf-8');
|
||||
const dispensaries: AZDHSDispensary[] = JSON.parse(data);
|
||||
|
||||
result.total = dispensaries.length;
|
||||
console.log(`[AZDHS Import] Found ${dispensaries.length} dispensaries in JSON file`);
|
||||
|
||||
for (const disp of dispensaries) {
|
||||
try {
|
||||
const id = await importDispensary(disp);
|
||||
result.imported++;
|
||||
} catch (error: any) {
|
||||
if (error.message.includes('duplicate')) {
|
||||
result.skipped++;
|
||||
} else {
|
||||
result.errors.push(`${disp.name}: ${error.message}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
} catch (error: any) {
|
||||
result.errors.push(`Failed to read JSON file: ${error.message}`);
|
||||
}
|
||||
|
||||
console.log(`[AZDHS Import] Complete: ${result.imported} imported, ${result.skipped} skipped`);
|
||||
return result;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get import statistics
|
||||
*/
|
||||
export async function getImportStats(): Promise<{
|
||||
totalDispensaries: number;
|
||||
withPlatformIds: number;
|
||||
withoutPlatformIds: number;
|
||||
lastImportedAt?: Date;
|
||||
}> {
|
||||
const { rows } = await dutchieQuery<{
|
||||
total: string;
|
||||
with_platform_id: string;
|
||||
without_platform_id: string;
|
||||
last_updated: Date;
|
||||
}>(`
|
||||
SELECT
|
||||
COUNT(*) as total,
|
||||
COUNT(platform_dispensary_id) as with_platform_id,
|
||||
COUNT(*) - COUNT(platform_dispensary_id) as without_platform_id,
|
||||
MAX(updated_at) as last_updated
|
||||
FROM dispensaries
|
||||
WHERE state = 'AZ'
|
||||
`);
|
||||
|
||||
const stats = rows[0];
|
||||
return {
|
||||
totalDispensaries: parseInt(stats.total, 10),
|
||||
withPlatformIds: parseInt(stats.with_platform_id, 10),
|
||||
withoutPlatformIds: parseInt(stats.without_platform_id, 10),
|
||||
lastImportedAt: stats.last_updated,
|
||||
};
|
||||
}
|
||||
@@ -1,481 +0,0 @@
|
||||
/**
|
||||
* Directory-Based Store Matcher
|
||||
*
|
||||
* Scrapes provider directory pages (Curaleaf, Sol, etc.) to get store lists,
|
||||
* then matches them to existing dispensaries by fuzzy name/city/address matching.
|
||||
*
|
||||
* This allows us to:
|
||||
* 1. Find specific store URLs for directory-style websites
|
||||
* 2. Match stores confidently by name+city
|
||||
* 3. Mark non-Dutchie providers as not_crawlable until we build crawlers
|
||||
*/
|
||||
|
||||
import { query } from '../db/connection';
|
||||
|
||||
// ============================================================
|
||||
// TYPES
|
||||
// ============================================================
|
||||
|
||||
export interface DirectoryStore {
|
||||
name: string;
|
||||
city: string;
|
||||
state: string;
|
||||
address: string | null;
|
||||
storeUrl: string;
|
||||
}
|
||||
|
||||
export interface MatchResult {
|
||||
directoryStore: DirectoryStore;
|
||||
dispensaryId: number | null;
|
||||
dispensaryName: string | null;
|
||||
confidence: 'high' | 'medium' | 'low' | 'none';
|
||||
matchReason: string;
|
||||
}
|
||||
|
||||
export interface DirectoryMatchReport {
|
||||
provider: string;
|
||||
totalDirectoryStores: number;
|
||||
highConfidenceMatches: number;
|
||||
mediumConfidenceMatches: number;
|
||||
lowConfidenceMatches: number;
|
||||
unmatched: number;
|
||||
results: MatchResult[];
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// NORMALIZATION FUNCTIONS
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Normalize a string for comparison:
|
||||
* - Lowercase
|
||||
* - Remove common suffixes (dispensary, cannabis, etc.)
|
||||
* - Remove punctuation
|
||||
* - Collapse whitespace
|
||||
*/
|
||||
function normalizeForComparison(str: string): string {
|
||||
if (!str) return '';
|
||||
|
||||
return str
|
||||
.toLowerCase()
|
||||
.replace(/\s+(dispensary|cannabis|marijuana|medical|recreational|shop|store|flower|wellness)(\s|$)/gi, ' ')
|
||||
.replace(/[^\w\s]/g, ' ') // Remove punctuation
|
||||
.replace(/\s+/g, ' ') // Collapse whitespace
|
||||
.trim();
|
||||
}
|
||||
|
||||
/**
|
||||
* Normalize city name for comparison
|
||||
*/
|
||||
function normalizeCity(city: string): string {
|
||||
if (!city) return '';
|
||||
|
||||
return city
|
||||
.toLowerCase()
|
||||
.replace(/[^\w\s]/g, '')
|
||||
.trim();
|
||||
}
|
||||
|
||||
/**
|
||||
* Calculate similarity between two strings (0-1)
|
||||
* Uses Levenshtein distance normalized by max length
|
||||
*/
|
||||
function stringSimilarity(a: string, b: string): number {
|
||||
if (!a || !b) return 0;
|
||||
if (a === b) return 1;
|
||||
|
||||
const longer = a.length > b.length ? a : b;
|
||||
const shorter = a.length > b.length ? b : a;
|
||||
|
||||
if (longer.length === 0) return 1;
|
||||
|
||||
const distance = levenshteinDistance(longer, shorter);
|
||||
return (longer.length - distance) / longer.length;
|
||||
}
|
||||
|
||||
/**
|
||||
* Levenshtein distance between two strings
|
||||
*/
|
||||
function levenshteinDistance(a: string, b: string): number {
|
||||
const matrix: number[][] = [];
|
||||
|
||||
for (let i = 0; i <= b.length; i++) {
|
||||
matrix[i] = [i];
|
||||
}
|
||||
|
||||
for (let j = 0; j <= a.length; j++) {
|
||||
matrix[0][j] = j;
|
||||
}
|
||||
|
||||
for (let i = 1; i <= b.length; i++) {
|
||||
for (let j = 1; j <= a.length; j++) {
|
||||
if (b.charAt(i - 1) === a.charAt(j - 1)) {
|
||||
matrix[i][j] = matrix[i - 1][j - 1];
|
||||
} else {
|
||||
matrix[i][j] = Math.min(
|
||||
matrix[i - 1][j - 1] + 1, // substitution
|
||||
matrix[i][j - 1] + 1, // insertion
|
||||
matrix[i - 1][j] + 1 // deletion
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return matrix[b.length][a.length];
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if string contains another (with normalization)
|
||||
*/
|
||||
function containsNormalized(haystack: string, needle: string): boolean {
|
||||
return normalizeForComparison(haystack).includes(normalizeForComparison(needle));
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// PROVIDER DIRECTORY SCRAPERS
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Sol Flower (livewithsol.com) - Static HTML, easy to scrape
|
||||
*/
|
||||
export async function scrapeSolDirectory(): Promise<DirectoryStore[]> {
|
||||
console.log('[DirectoryMatcher] Scraping Sol Flower directory...');
|
||||
|
||||
try {
|
||||
const response = await fetch('https://www.livewithsol.com/locations/', {
|
||||
headers: {
|
||||
'User-Agent':
|
||||
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
|
||||
Accept: 'text/html',
|
||||
},
|
||||
});
|
||||
|
||||
if (!response.ok) {
|
||||
throw new Error(`HTTP ${response.status}`);
|
||||
}
|
||||
|
||||
const html = await response.text();
|
||||
|
||||
// Extract store entries from HTML
|
||||
// Sol's structure: Each location has name, address in specific divs
|
||||
const stores: DirectoryStore[] = [];
|
||||
|
||||
// Pattern to find location cards
|
||||
// Format: <a href="/locations/slug/">NAME</a> with address nearby
|
||||
const locationRegex =
|
||||
/<a[^>]+href="(\/locations\/[^"]+)"[^>]*>([^<]+)<\/a>[\s\S]*?(\d+[^<]+(?:Ave|St|Blvd|Dr|Rd|Way)[^<]*)/gi;
|
||||
|
||||
let match;
|
||||
while ((match = locationRegex.exec(html)) !== null) {
|
||||
const [, path, name, address] = match;
|
||||
|
||||
// Extract city from common Arizona cities
|
||||
let city = 'Unknown';
|
||||
const cityPatterns = [
|
||||
{ pattern: /phoenix/i, city: 'Phoenix' },
|
||||
{ pattern: /scottsdale/i, city: 'Scottsdale' },
|
||||
{ pattern: /tempe/i, city: 'Tempe' },
|
||||
{ pattern: /tucson/i, city: 'Tucson' },
|
||||
{ pattern: /mesa/i, city: 'Mesa' },
|
||||
{ pattern: /sun city/i, city: 'Sun City' },
|
||||
{ pattern: /glendale/i, city: 'Glendale' },
|
||||
];
|
||||
|
||||
for (const { pattern, city: cityName } of cityPatterns) {
|
||||
if (pattern.test(name) || pattern.test(address)) {
|
||||
city = cityName;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
stores.push({
|
||||
name: name.trim(),
|
||||
city,
|
||||
state: 'AZ',
|
||||
address: address.trim(),
|
||||
storeUrl: `https://www.livewithsol.com${path}`,
|
||||
});
|
||||
}
|
||||
|
||||
// If regex didn't work, use known hardcoded values (fallback)
|
||||
if (stores.length === 0) {
|
||||
console.log('[DirectoryMatcher] Using hardcoded Sol locations');
|
||||
return [
|
||||
{ name: 'Sol Flower 32nd & Shea', city: 'Phoenix', state: 'AZ', address: '3217 E Shea Blvd Suite 1 A', storeUrl: 'https://www.livewithsol.com/locations/deer-valley/' },
|
||||
{ name: 'Sol Flower Scottsdale Airpark', city: 'Scottsdale', state: 'AZ', address: '14980 N 78th Way Ste 204', storeUrl: 'https://www.livewithsol.com/locations/scottsdale-airpark/' },
|
||||
{ name: 'Sol Flower Sun City', city: 'Sun City', state: 'AZ', address: '13650 N 99th Ave', storeUrl: 'https://www.livewithsol.com/locations/sun-city/' },
|
||||
{ name: 'Sol Flower Tempe McClintock', city: 'Tempe', state: 'AZ', address: '1322 N McClintock Dr', storeUrl: 'https://www.livewithsol.com/locations/tempe-mcclintock/' },
|
||||
{ name: 'Sol Flower Tempe University', city: 'Tempe', state: 'AZ', address: '2424 W University Dr', storeUrl: 'https://www.livewithsol.com/locations/tempe-university/' },
|
||||
{ name: 'Sol Flower Foothills Tucson', city: 'Tucson', state: 'AZ', address: '6026 N Oracle Rd', storeUrl: 'https://www.livewithsol.com/locations/foothills-tucson/' },
|
||||
{ name: 'Sol Flower South Tucson', city: 'Tucson', state: 'AZ', address: '3000 W Valencia Rd Ste 210', storeUrl: 'https://www.livewithsol.com/locations/south-tucson/' },
|
||||
{ name: 'Sol Flower North Tucson', city: 'Tucson', state: 'AZ', address: '4837 N 1st Ave', storeUrl: 'https://www.livewithsol.com/locations/north-tucson/' },
|
||||
{ name: 'Sol Flower Casas Adobes', city: 'Tucson', state: 'AZ', address: '6437 N Oracle Rd', storeUrl: 'https://www.livewithsol.com/locations/casas-adobes/' },
|
||||
];
|
||||
}
|
||||
|
||||
console.log(`[DirectoryMatcher] Found ${stores.length} Sol Flower locations`);
|
||||
return stores;
|
||||
} catch (error: any) {
|
||||
console.error('[DirectoryMatcher] Error scraping Sol directory:', error.message);
|
||||
// Return hardcoded fallback
|
||||
return [
|
||||
{ name: 'Sol Flower 32nd & Shea', city: 'Phoenix', state: 'AZ', address: '3217 E Shea Blvd Suite 1 A', storeUrl: 'https://www.livewithsol.com/locations/deer-valley/' },
|
||||
{ name: 'Sol Flower Scottsdale Airpark', city: 'Scottsdale', state: 'AZ', address: '14980 N 78th Way Ste 204', storeUrl: 'https://www.livewithsol.com/locations/scottsdale-airpark/' },
|
||||
{ name: 'Sol Flower Sun City', city: 'Sun City', state: 'AZ', address: '13650 N 99th Ave', storeUrl: 'https://www.livewithsol.com/locations/sun-city/' },
|
||||
{ name: 'Sol Flower Tempe McClintock', city: 'Tempe', state: 'AZ', address: '1322 N McClintock Dr', storeUrl: 'https://www.livewithsol.com/locations/tempe-mcclintock/' },
|
||||
{ name: 'Sol Flower Tempe University', city: 'Tempe', state: 'AZ', address: '2424 W University Dr', storeUrl: 'https://www.livewithsol.com/locations/tempe-university/' },
|
||||
{ name: 'Sol Flower Foothills Tucson', city: 'Tucson', state: 'AZ', address: '6026 N Oracle Rd', storeUrl: 'https://www.livewithsol.com/locations/foothills-tucson/' },
|
||||
{ name: 'Sol Flower South Tucson', city: 'Tucson', state: 'AZ', address: '3000 W Valencia Rd Ste 210', storeUrl: 'https://www.livewithsol.com/locations/south-tucson/' },
|
||||
{ name: 'Sol Flower North Tucson', city: 'Tucson', state: 'AZ', address: '4837 N 1st Ave', storeUrl: 'https://www.livewithsol.com/locations/north-tucson/' },
|
||||
{ name: 'Sol Flower Casas Adobes', city: 'Tucson', state: 'AZ', address: '6437 N Oracle Rd', storeUrl: 'https://www.livewithsol.com/locations/casas-adobes/' },
|
||||
];
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Curaleaf - Has age-gate, so we need hardcoded AZ locations
|
||||
* In production, this would use Playwright to bypass age-gate
|
||||
*/
|
||||
export async function scrapeCuraleafDirectory(): Promise<DirectoryStore[]> {
|
||||
console.log('[DirectoryMatcher] Using hardcoded Curaleaf AZ locations (age-gate blocks simple fetch)...');
|
||||
|
||||
// Hardcoded Arizona Curaleaf locations from public knowledge
|
||||
// These would be scraped via Playwright in production
|
||||
return [
|
||||
{ name: 'Curaleaf Phoenix Camelback', city: 'Phoenix', state: 'AZ', address: '4811 E Camelback Rd', storeUrl: 'https://curaleaf.com/stores/curaleaf-az-phoenix-camelback' },
|
||||
{ name: 'Curaleaf Phoenix Midtown', city: 'Phoenix', state: 'AZ', address: '1928 E Highland Ave', storeUrl: 'https://curaleaf.com/stores/curaleaf-az-phoenix-midtown' },
|
||||
{ name: 'Curaleaf Glendale East', city: 'Glendale', state: 'AZ', address: '5150 W Glendale Ave', storeUrl: 'https://curaleaf.com/stores/curaleaf-az-glendale-east' },
|
||||
{ name: 'Curaleaf Glendale West', city: 'Glendale', state: 'AZ', address: '6501 W Glendale Ave', storeUrl: 'https://curaleaf.com/stores/curaleaf-az-glendale-west' },
|
||||
{ name: 'Curaleaf Gilbert', city: 'Gilbert', state: 'AZ', address: '1736 E Williams Field Rd', storeUrl: 'https://curaleaf.com/stores/curaleaf-az-gilbert' },
|
||||
{ name: 'Curaleaf Mesa', city: 'Mesa', state: 'AZ', address: '1540 S Power Rd', storeUrl: 'https://curaleaf.com/stores/curaleaf-az-mesa' },
|
||||
{ name: 'Curaleaf Tempe', city: 'Tempe', state: 'AZ', address: '1815 E Broadway Rd', storeUrl: 'https://curaleaf.com/stores/curaleaf-az-tempe' },
|
||||
{ name: 'Curaleaf Scottsdale', city: 'Scottsdale', state: 'AZ', address: '8904 E Indian Bend Rd', storeUrl: 'https://curaleaf.com/stores/curaleaf-az-scottsdale' },
|
||||
{ name: 'Curaleaf Tucson Prince', city: 'Tucson', state: 'AZ', address: '3955 W Prince Rd', storeUrl: 'https://curaleaf.com/stores/curaleaf-az-tucson-prince' },
|
||||
{ name: 'Curaleaf Tucson Midvale', city: 'Tucson', state: 'AZ', address: '2936 N Midvale Park Rd', storeUrl: 'https://curaleaf.com/stores/curaleaf-az-tucson-midvale' },
|
||||
{ name: 'Curaleaf Sedona', city: 'Sedona', state: 'AZ', address: '525 AZ-179', storeUrl: 'https://curaleaf.com/stores/curaleaf-az-sedona' },
|
||||
{ name: 'Curaleaf Youngtown', city: 'Youngtown', state: 'AZ', address: '11125 W Grand Ave', storeUrl: 'https://curaleaf.com/stores/curaleaf-az-youngtown' },
|
||||
];
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// MATCHING LOGIC
|
||||
// ============================================================
|
||||
|
||||
interface Dispensary {
|
||||
id: number;
|
||||
name: string;
|
||||
city: string | null;
|
||||
state: string | null;
|
||||
address: string | null;
|
||||
menu_type: string | null;
|
||||
menu_url: string | null;
|
||||
website: string | null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Match a directory store to an existing dispensary
|
||||
*/
|
||||
function matchStoreToDispensary(store: DirectoryStore, dispensaries: Dispensary[]): MatchResult {
|
||||
const normalizedStoreName = normalizeForComparison(store.name);
|
||||
const normalizedStoreCity = normalizeCity(store.city);
|
||||
|
||||
let bestMatch: Dispensary | null = null;
|
||||
let bestScore = 0;
|
||||
let matchReason = '';
|
||||
|
||||
for (const disp of dispensaries) {
|
||||
const normalizedDispName = normalizeForComparison(disp.name);
|
||||
const normalizedDispCity = normalizeCity(disp.city || '');
|
||||
|
||||
let score = 0;
|
||||
const reasons: string[] = [];
|
||||
|
||||
// 1. Name similarity (max 50 points)
|
||||
const nameSimilarity = stringSimilarity(normalizedStoreName, normalizedDispName);
|
||||
score += nameSimilarity * 50;
|
||||
if (nameSimilarity > 0.8) reasons.push(`name_match(${(nameSimilarity * 100).toFixed(0)}%)`);
|
||||
|
||||
// 2. City match (25 points for exact, 15 for partial)
|
||||
if (normalizedStoreCity && normalizedDispCity) {
|
||||
if (normalizedStoreCity === normalizedDispCity) {
|
||||
score += 25;
|
||||
reasons.push('city_exact');
|
||||
} else if (
|
||||
normalizedStoreCity.includes(normalizedDispCity) ||
|
||||
normalizedDispCity.includes(normalizedStoreCity)
|
||||
) {
|
||||
score += 15;
|
||||
reasons.push('city_partial');
|
||||
}
|
||||
}
|
||||
|
||||
// 3. Address contains street name (15 points)
|
||||
if (store.address && disp.address) {
|
||||
const storeStreet = store.address.toLowerCase().split(/\s+/).slice(1, 4).join(' ');
|
||||
const dispStreet = disp.address.toLowerCase().split(/\s+/).slice(1, 4).join(' ');
|
||||
if (storeStreet && dispStreet && stringSimilarity(storeStreet, dispStreet) > 0.7) {
|
||||
score += 15;
|
||||
reasons.push('address_match');
|
||||
}
|
||||
}
|
||||
|
||||
// 4. Brand name in dispensary name (10 points)
|
||||
const brandName = store.name.split(' ')[0].toLowerCase(); // e.g., "Curaleaf", "Sol"
|
||||
if (disp.name.toLowerCase().includes(brandName)) {
|
||||
score += 10;
|
||||
reasons.push('brand_match');
|
||||
}
|
||||
|
||||
if (score > bestScore) {
|
||||
bestScore = score;
|
||||
bestMatch = disp;
|
||||
matchReason = reasons.join(', ');
|
||||
}
|
||||
}
|
||||
|
||||
// Determine confidence level
|
||||
let confidence: 'high' | 'medium' | 'low' | 'none';
|
||||
if (bestScore >= 70) {
|
||||
confidence = 'high';
|
||||
} else if (bestScore >= 50) {
|
||||
confidence = 'medium';
|
||||
} else if (bestScore >= 30) {
|
||||
confidence = 'low';
|
||||
} else {
|
||||
confidence = 'none';
|
||||
}
|
||||
|
||||
return {
|
||||
directoryStore: store,
|
||||
dispensaryId: bestMatch?.id || null,
|
||||
dispensaryName: bestMatch?.name || null,
|
||||
confidence,
|
||||
matchReason: matchReason || 'no_match',
|
||||
};
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// MAIN FUNCTIONS
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Run directory matching for a provider and update database
|
||||
* Only applies high-confidence matches automatically
|
||||
*/
|
||||
export async function matchDirectoryToDispensaries(
|
||||
provider: 'curaleaf' | 'sol',
|
||||
dryRun: boolean = true
|
||||
): Promise<DirectoryMatchReport> {
|
||||
console.log(`[DirectoryMatcher] Running ${provider} directory matching (dryRun=${dryRun})...`);
|
||||
|
||||
// Get directory stores
|
||||
let directoryStores: DirectoryStore[];
|
||||
if (provider === 'curaleaf') {
|
||||
directoryStores = await scrapeCuraleafDirectory();
|
||||
} else if (provider === 'sol') {
|
||||
directoryStores = await scrapeSolDirectory();
|
||||
} else {
|
||||
throw new Error(`Unknown provider: ${provider}`);
|
||||
}
|
||||
|
||||
// Get all AZ dispensaries from database
|
||||
const { rows: dispensaries } = await query<Dispensary>(
|
||||
`SELECT id, name, city, state, address, menu_type, menu_url, website
|
||||
FROM dispensaries
|
||||
WHERE state = 'AZ'`
|
||||
);
|
||||
|
||||
console.log(`[DirectoryMatcher] Matching ${directoryStores.length} directory stores against ${dispensaries.length} dispensaries`);
|
||||
|
||||
// Match each directory store
|
||||
const results: MatchResult[] = [];
|
||||
for (const store of directoryStores) {
|
||||
const match = matchStoreToDispensary(store, dispensaries);
|
||||
results.push(match);
|
||||
|
||||
// Only apply high-confidence matches if not dry run
|
||||
if (!dryRun && match.confidence === 'high' && match.dispensaryId) {
|
||||
await applyDirectoryMatch(match.dispensaryId, provider, store);
|
||||
}
|
||||
}
|
||||
|
||||
// Count results
|
||||
const report: DirectoryMatchReport = {
|
||||
provider,
|
||||
totalDirectoryStores: directoryStores.length,
|
||||
highConfidenceMatches: results.filter((r) => r.confidence === 'high').length,
|
||||
mediumConfidenceMatches: results.filter((r) => r.confidence === 'medium').length,
|
||||
lowConfidenceMatches: results.filter((r) => r.confidence === 'low').length,
|
||||
unmatched: results.filter((r) => r.confidence === 'none').length,
|
||||
results,
|
||||
};
|
||||
|
||||
console.log(`[DirectoryMatcher] ${provider} matching complete:`);
|
||||
console.log(` - High confidence: ${report.highConfidenceMatches}`);
|
||||
console.log(` - Medium confidence: ${report.mediumConfidenceMatches}`);
|
||||
console.log(` - Low confidence: ${report.lowConfidenceMatches}`);
|
||||
console.log(` - Unmatched: ${report.unmatched}`);
|
||||
|
||||
return report;
|
||||
}
|
||||
|
||||
/**
|
||||
* Apply a directory match to a dispensary
|
||||
*/
|
||||
async function applyDirectoryMatch(
|
||||
dispensaryId: number,
|
||||
provider: string,
|
||||
store: DirectoryStore
|
||||
): Promise<void> {
|
||||
console.log(`[DirectoryMatcher] Applying match: dispensary ${dispensaryId} -> ${store.storeUrl}`);
|
||||
|
||||
await query(
|
||||
`
|
||||
UPDATE dispensaries SET
|
||||
menu_type = $1,
|
||||
menu_url = $2,
|
||||
platform_dispensary_id = NULL,
|
||||
provider_detection_data = COALESCE(provider_detection_data, '{}'::jsonb) ||
|
||||
jsonb_build_object(
|
||||
'detected_provider', $1::text,
|
||||
'detection_method', 'directory_match'::text,
|
||||
'detected_at', NOW(),
|
||||
'directory_store_name', $3::text,
|
||||
'directory_store_url', $2::text,
|
||||
'directory_store_city', $4::text,
|
||||
'directory_store_address', $5::text,
|
||||
'not_crawlable', true,
|
||||
'not_crawlable_reason', $6::text
|
||||
),
|
||||
updated_at = NOW()
|
||||
WHERE id = $7
|
||||
`,
|
||||
[
|
||||
provider,
|
||||
store.storeUrl,
|
||||
store.name,
|
||||
store.city,
|
||||
store.address,
|
||||
`${provider} proprietary menu - no crawler available`,
|
||||
dispensaryId,
|
||||
]
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
* Preview matches without applying them
|
||||
*/
|
||||
export async function previewDirectoryMatches(
|
||||
provider: 'curaleaf' | 'sol'
|
||||
): Promise<DirectoryMatchReport> {
|
||||
return matchDirectoryToDispensaries(provider, true);
|
||||
}
|
||||
|
||||
/**
|
||||
* Apply high-confidence matches
|
||||
*/
|
||||
export async function applyHighConfidenceMatches(
|
||||
provider: 'curaleaf' | 'sol'
|
||||
): Promise<DirectoryMatchReport> {
|
||||
return matchDirectoryToDispensaries(provider, false);
|
||||
}
|
||||
@@ -1,592 +0,0 @@
|
||||
/**
|
||||
* Dutchie AZ Discovery Service
|
||||
*
|
||||
* Discovers and manages dispensaries from Dutchie for Arizona.
|
||||
*/
|
||||
|
||||
import { query, getClient } from '../db/connection';
|
||||
import { discoverArizonaDispensaries, resolveDispensaryId, resolveDispensaryIdWithDetails, ResolveDispensaryResult } from './graphql-client';
|
||||
import { Dispensary } from '../types';
|
||||
|
||||
/**
|
||||
* Upsert a dispensary record
|
||||
*/
|
||||
async function upsertDispensary(dispensary: Partial<Dispensary>): Promise<number> {
|
||||
const result = await query<{ id: number }>(
|
||||
`
|
||||
INSERT INTO dispensaries (
|
||||
platform, name, slug, city, state, postal_code, address,
|
||||
latitude, longitude, platform_dispensary_id,
|
||||
is_delivery, is_pickup, raw_metadata, updated_at
|
||||
) VALUES (
|
||||
$1, $2, $3, $4, $5, $6, $7,
|
||||
$8, $9, $10,
|
||||
$11, $12, $13, NOW()
|
||||
)
|
||||
ON CONFLICT (platform, slug, city, state) DO UPDATE SET
|
||||
name = EXCLUDED.name,
|
||||
postal_code = EXCLUDED.postal_code,
|
||||
address = EXCLUDED.address,
|
||||
latitude = EXCLUDED.latitude,
|
||||
longitude = EXCLUDED.longitude,
|
||||
platform_dispensary_id = COALESCE(EXCLUDED.platform_dispensary_id, dispensaries.platform_dispensary_id),
|
||||
is_delivery = EXCLUDED.is_delivery,
|
||||
is_pickup = EXCLUDED.is_pickup,
|
||||
raw_metadata = EXCLUDED.raw_metadata,
|
||||
updated_at = NOW()
|
||||
RETURNING id
|
||||
`,
|
||||
[
|
||||
dispensary.platform || 'dutchie',
|
||||
dispensary.name,
|
||||
dispensary.slug,
|
||||
dispensary.city,
|
||||
dispensary.state || 'AZ',
|
||||
dispensary.postalCode,
|
||||
dispensary.address,
|
||||
dispensary.latitude,
|
||||
dispensary.longitude,
|
||||
dispensary.platformDispensaryId,
|
||||
dispensary.isDelivery || false,
|
||||
dispensary.isPickup || true,
|
||||
dispensary.rawMetadata ? JSON.stringify(dispensary.rawMetadata) : null,
|
||||
]
|
||||
);
|
||||
|
||||
return result.rows[0].id;
|
||||
}
|
||||
|
||||
/**
|
||||
* Normalize a raw discovery result to Dispensary
|
||||
*/
|
||||
function normalizeDispensary(raw: any): Partial<Dispensary> {
|
||||
return {
|
||||
platform: 'dutchie',
|
||||
name: raw.name || raw.Name || '',
|
||||
slug: raw.slug || raw.cName || raw.id || '',
|
||||
city: raw.city || raw.address?.city || '',
|
||||
state: 'AZ',
|
||||
postalCode: raw.postalCode || raw.address?.postalCode || raw.address?.zip,
|
||||
address: raw.streetAddress || raw.address?.streetAddress,
|
||||
latitude: raw.latitude || raw.location?.lat,
|
||||
longitude: raw.longitude || raw.location?.lng,
|
||||
platformDispensaryId: raw.dispensaryId || raw.id || null,
|
||||
isDelivery: raw.isDelivery || raw.delivery || false,
|
||||
isPickup: raw.isPickup || raw.pickup || true,
|
||||
rawMetadata: raw,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Import dispensaries from the existing dispensaries table (from AZDHS data)
|
||||
* This creates records in the dutchie_az database for AZ dispensaries
|
||||
*/
|
||||
export async function importFromExistingDispensaries(): Promise<{ imported: number }> {
|
||||
console.log('[Discovery] Importing from existing dispensaries table...');
|
||||
|
||||
// This is a workaround - we'll use the dispensaries we already know about
|
||||
// and try to resolve their Dutchie IDs
|
||||
const knownDispensaries = [
|
||||
{ name: 'Deeply Rooted', slug: 'AZ-Deeply-Rooted', city: 'Phoenix', state: 'AZ' },
|
||||
{ name: 'Curaleaf Gilbert', slug: 'curaleaf-gilbert', city: 'Gilbert', state: 'AZ' },
|
||||
{ name: 'Zen Leaf Prescott', slug: 'AZ-zen-leaf-prescott', city: 'Prescott', state: 'AZ' },
|
||||
// Add more known Dutchie stores here
|
||||
];
|
||||
|
||||
let imported = 0;
|
||||
|
||||
for (const disp of knownDispensaries) {
|
||||
try {
|
||||
const id = await upsertDispensary({
|
||||
platform: 'dutchie',
|
||||
name: disp.name,
|
||||
slug: disp.slug,
|
||||
city: disp.city,
|
||||
state: disp.state,
|
||||
});
|
||||
imported++;
|
||||
console.log(`[Discovery] Imported: ${disp.name} (id=${id})`);
|
||||
} catch (error: any) {
|
||||
console.error(`[Discovery] Failed to import ${disp.name}:`, error.message);
|
||||
}
|
||||
}
|
||||
|
||||
return { imported };
|
||||
}
|
||||
|
||||
/**
|
||||
* Discover all Arizona Dutchie dispensaries via GraphQL
|
||||
*/
|
||||
export async function discoverDispensaries(): Promise<{ discovered: number; errors: string[] }> {
|
||||
console.log('[Discovery] Starting Arizona dispensary discovery...');
|
||||
const errors: string[] = [];
|
||||
let discovered = 0;
|
||||
|
||||
try {
|
||||
const rawDispensaries = await discoverArizonaDispensaries();
|
||||
console.log(`[Discovery] Found ${rawDispensaries.length} dispensaries from GraphQL`);
|
||||
|
||||
for (const raw of rawDispensaries) {
|
||||
try {
|
||||
const normalized = normalizeDispensary(raw);
|
||||
if (normalized.name && normalized.slug && normalized.city) {
|
||||
await upsertDispensary(normalized);
|
||||
discovered++;
|
||||
}
|
||||
} catch (error: any) {
|
||||
errors.push(`${raw.name || raw.slug}: ${error.message}`);
|
||||
}
|
||||
}
|
||||
} catch (error: any) {
|
||||
errors.push(`Discovery failed: ${error.message}`);
|
||||
}
|
||||
|
||||
console.log(`[Discovery] Completed: ${discovered} dispensaries, ${errors.length} errors`);
|
||||
return { discovered, errors };
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if a string looks like a MongoDB ObjectId (24 hex chars)
|
||||
*/
|
||||
export function isObjectId(value: string): boolean {
|
||||
return /^[a-f0-9]{24}$/i.test(value);
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract cName (slug) or platform_dispensary_id from a Dutchie menu_url
|
||||
*
|
||||
* Supports formats:
|
||||
* - https://dutchie.com/embedded-menu/<cName> -> returns { type: 'cName', value: '<cName>' }
|
||||
* - https://dutchie.com/dispensary/<cName> -> returns { type: 'cName', value: '<cName>' }
|
||||
* - https://dutchie.com/api/v2/embedded-menu/<id>.js -> returns { type: 'platformId', value: '<id>' }
|
||||
*
|
||||
* For backward compatibility, extractCNameFromMenuUrl still returns just the string value.
|
||||
*/
|
||||
export interface MenuUrlExtraction {
|
||||
type: 'cName' | 'platformId';
|
||||
value: string;
|
||||
}
|
||||
|
||||
export function extractFromMenuUrl(menuUrl: string | null | undefined): MenuUrlExtraction | null {
|
||||
if (!menuUrl) return null;
|
||||
|
||||
try {
|
||||
const url = new URL(menuUrl);
|
||||
const pathname = url.pathname;
|
||||
|
||||
// Match /api/v2/embedded-menu/<id>.js - this contains the platform_dispensary_id directly
|
||||
const apiMatch = pathname.match(/^\/api\/v2\/embedded-menu\/([a-f0-9]{24})\.js$/i);
|
||||
if (apiMatch) {
|
||||
return { type: 'platformId', value: apiMatch[1] };
|
||||
}
|
||||
|
||||
// Match /embedded-menu/<cName> or /dispensary/<cName>
|
||||
const embeddedMatch = pathname.match(/^\/embedded-menu\/([^/?]+)/);
|
||||
if (embeddedMatch) {
|
||||
const value = embeddedMatch[1];
|
||||
// Check if it's actually an ObjectId (some URLs use ID directly)
|
||||
if (isObjectId(value)) {
|
||||
return { type: 'platformId', value };
|
||||
}
|
||||
return { type: 'cName', value };
|
||||
}
|
||||
|
||||
const dispensaryMatch = pathname.match(/^\/dispensary\/([^/?]+)/);
|
||||
if (dispensaryMatch) {
|
||||
const value = dispensaryMatch[1];
|
||||
if (isObjectId(value)) {
|
||||
return { type: 'platformId', value };
|
||||
}
|
||||
return { type: 'cName', value };
|
||||
}
|
||||
|
||||
return null;
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract cName (slug) from a Dutchie menu_url
|
||||
* Backward compatible - use extractFromMenuUrl for full info
|
||||
*/
|
||||
export function extractCNameFromMenuUrl(menuUrl: string | null | undefined): string | null {
|
||||
const extraction = extractFromMenuUrl(menuUrl);
|
||||
return extraction?.value || null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Resolve platform dispensary IDs for all dispensaries that don't have one
|
||||
* CRITICAL: Uses cName extracted from menu_url, NOT the slug column!
|
||||
*
|
||||
* Uses the new resolveDispensaryIdWithDetails which:
|
||||
* 1. Extracts dispensaryId from window.reactEnv in the embedded menu page (preferred)
|
||||
* 2. Falls back to GraphQL if reactEnv extraction fails
|
||||
* 3. Returns HTTP status so we can mark 403/404 stores as not_crawlable
|
||||
*/
|
||||
export async function resolvePlatformDispensaryIds(): Promise<{ resolved: number; failed: number; skipped: number; notCrawlable: number }> {
|
||||
console.log('[Discovery] Resolving platform dispensary IDs...');
|
||||
|
||||
const { rows: dispensaries } = await query<any>(
|
||||
`
|
||||
SELECT id, name, slug, menu_url, menu_type, platform_dispensary_id, crawl_status
|
||||
FROM dispensaries
|
||||
WHERE menu_type = 'dutchie'
|
||||
AND platform_dispensary_id IS NULL
|
||||
AND menu_url IS NOT NULL
|
||||
AND (crawl_status IS NULL OR crawl_status != 'not_crawlable')
|
||||
ORDER BY id
|
||||
`
|
||||
);
|
||||
|
||||
let resolved = 0;
|
||||
let failed = 0;
|
||||
let skipped = 0;
|
||||
let notCrawlable = 0;
|
||||
|
||||
for (const dispensary of dispensaries) {
|
||||
try {
|
||||
// Extract cName from menu_url - this is the CORRECT way to get the Dutchie slug
|
||||
const cName = extractCNameFromMenuUrl(dispensary.menu_url);
|
||||
|
||||
if (!cName) {
|
||||
console.log(`[Discovery] Skipping ${dispensary.name}: Could not extract cName from menu_url: ${dispensary.menu_url}`);
|
||||
skipped++;
|
||||
continue;
|
||||
}
|
||||
|
||||
console.log(`[Discovery] Resolving ID for: ${dispensary.name} (cName=${cName}, menu_url=${dispensary.menu_url})`);
|
||||
|
||||
// Use the new detailed resolver that extracts from reactEnv first
|
||||
const result = await resolveDispensaryIdWithDetails(cName);
|
||||
|
||||
if (result.dispensaryId) {
|
||||
// SUCCESS: Store resolved
|
||||
await query(
|
||||
`
|
||||
UPDATE dispensaries
|
||||
SET platform_dispensary_id = $1,
|
||||
platform_dispensary_id_resolved_at = NOW(),
|
||||
crawl_status = 'ready',
|
||||
crawl_status_reason = $2,
|
||||
crawl_status_updated_at = NOW(),
|
||||
last_tested_menu_url = $3,
|
||||
last_http_status = $4,
|
||||
updated_at = NOW()
|
||||
WHERE id = $5
|
||||
`,
|
||||
[
|
||||
result.dispensaryId,
|
||||
`Resolved from ${result.source || 'page'}`,
|
||||
dispensary.menu_url,
|
||||
result.httpStatus,
|
||||
dispensary.id,
|
||||
]
|
||||
);
|
||||
resolved++;
|
||||
console.log(`[Discovery] Resolved: ${cName} -> ${result.dispensaryId} (source: ${result.source})`);
|
||||
} else if (result.httpStatus === 403 || result.httpStatus === 404) {
|
||||
// NOT CRAWLABLE: Store removed or not accessible
|
||||
await query(
|
||||
`
|
||||
UPDATE dispensaries
|
||||
SET platform_dispensary_id = NULL,
|
||||
crawl_status = 'not_crawlable',
|
||||
crawl_status_reason = $1,
|
||||
crawl_status_updated_at = NOW(),
|
||||
last_tested_menu_url = $2,
|
||||
last_http_status = $3,
|
||||
updated_at = NOW()
|
||||
WHERE id = $4
|
||||
`,
|
||||
[
|
||||
result.error || `HTTP ${result.httpStatus}: Removed from Dutchie`,
|
||||
dispensary.menu_url,
|
||||
result.httpStatus,
|
||||
dispensary.id,
|
||||
]
|
||||
);
|
||||
notCrawlable++;
|
||||
console.log(`[Discovery] Marked not crawlable: ${cName} (HTTP ${result.httpStatus})`);
|
||||
} else {
|
||||
// FAILED: Could not resolve but page loaded
|
||||
await query(
|
||||
`
|
||||
UPDATE dispensaries
|
||||
SET crawl_status = 'not_ready',
|
||||
crawl_status_reason = $1,
|
||||
crawl_status_updated_at = NOW(),
|
||||
last_tested_menu_url = $2,
|
||||
last_http_status = $3,
|
||||
updated_at = NOW()
|
||||
WHERE id = $4
|
||||
`,
|
||||
[
|
||||
result.error || 'Could not extract dispensaryId from page',
|
||||
dispensary.menu_url,
|
||||
result.httpStatus,
|
||||
dispensary.id,
|
||||
]
|
||||
);
|
||||
failed++;
|
||||
console.log(`[Discovery] Could not resolve: ${cName} - ${result.error}`);
|
||||
}
|
||||
|
||||
// Delay between requests
|
||||
await new Promise((r) => setTimeout(r, 2000));
|
||||
} catch (error: any) {
|
||||
failed++;
|
||||
console.error(`[Discovery] Error resolving ${dispensary.name}:`, error.message);
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`[Discovery] Completed: ${resolved} resolved, ${failed} failed, ${skipped} skipped, ${notCrawlable} not crawlable`);
|
||||
return { resolved, failed, skipped, notCrawlable };
|
||||
}
|
||||
|
||||
// Use shared dispensary columns (handles optional columns like provider_detection_data)
|
||||
import { DISPENSARY_COLUMNS } from '../db/dispensary-columns';
|
||||
|
||||
/**
|
||||
* Get all dispensaries
|
||||
*/
|
||||
|
||||
export async function getAllDispensaries(): Promise<Dispensary[]> {
|
||||
const { rows } = await query(
|
||||
`SELECT ${DISPENSARY_COLUMNS} FROM dispensaries WHERE menu_type = 'dutchie' ORDER BY name`
|
||||
);
|
||||
return rows.map(mapDbRowToDispensary);
|
||||
}
|
||||
|
||||
/**
|
||||
* Map snake_case DB row to camelCase Dispensary object
|
||||
* CRITICAL: DB returns snake_case (platform_dispensary_id) but TypeScript expects camelCase (platformDispensaryId)
|
||||
* This function is exported for use in other modules that query dispensaries directly.
|
||||
*
|
||||
* NOTE: The consolidated dispensaries table column mappings:
|
||||
* - zip → postalCode
|
||||
* - menu_type → menuType (keep platform as 'dutchie')
|
||||
* - last_crawl_at → lastCrawledAt
|
||||
* - platform_dispensary_id → platformDispensaryId
|
||||
*/
|
||||
export function mapDbRowToDispensary(row: any): Dispensary {
|
||||
// Extract website from raw_metadata if available (field may not exist in all environments)
|
||||
let rawMetadata = undefined;
|
||||
if (row.raw_metadata !== undefined) {
|
||||
rawMetadata = typeof row.raw_metadata === 'string'
|
||||
? JSON.parse(row.raw_metadata)
|
||||
: row.raw_metadata;
|
||||
}
|
||||
const website = row.website || rawMetadata?.website || undefined;
|
||||
|
||||
return {
|
||||
id: row.id,
|
||||
platform: row.platform || 'dutchie', // keep platform as-is, default to 'dutchie'
|
||||
name: row.name,
|
||||
dbaName: row.dbaName || row.dba_name || undefined, // dba_name column is optional
|
||||
slug: row.slug,
|
||||
city: row.city,
|
||||
state: row.state,
|
||||
postalCode: row.postalCode || row.zip || row.postal_code,
|
||||
latitude: row.latitude ? parseFloat(row.latitude) : undefined,
|
||||
longitude: row.longitude ? parseFloat(row.longitude) : undefined,
|
||||
address: row.address,
|
||||
platformDispensaryId: row.platformDispensaryId || row.platform_dispensary_id, // CRITICAL mapping!
|
||||
isDelivery: row.is_delivery,
|
||||
isPickup: row.is_pickup,
|
||||
rawMetadata: rawMetadata,
|
||||
lastCrawledAt: row.lastCrawledAt || row.last_crawl_at, // use last_crawl_at
|
||||
productCount: row.product_count,
|
||||
createdAt: row.created_at,
|
||||
updatedAt: row.updated_at,
|
||||
menuType: row.menuType || row.menu_type,
|
||||
menuUrl: row.menuUrl || row.menu_url,
|
||||
scrapeEnabled: row.scrapeEnabled ?? row.scrape_enabled,
|
||||
providerDetectionData: row.provider_detection_data,
|
||||
platformDispensaryIdResolvedAt: row.platform_dispensary_id_resolved_at,
|
||||
website,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Get dispensary by ID
|
||||
* NOTE: Uses SQL aliases to map snake_case → camelCase directly
|
||||
*/
|
||||
export async function getDispensaryById(id: number): Promise<Dispensary | null> {
|
||||
const { rows } = await query(
|
||||
`
|
||||
SELECT
|
||||
id,
|
||||
name,
|
||||
slug,
|
||||
city,
|
||||
state,
|
||||
zip AS "postalCode",
|
||||
address,
|
||||
latitude,
|
||||
longitude,
|
||||
menu_type AS "menuType",
|
||||
menu_url AS "menuUrl",
|
||||
platform_dispensary_id AS "platformDispensaryId",
|
||||
website,
|
||||
provider_detection_data AS "providerDetectionData",
|
||||
created_at,
|
||||
updated_at
|
||||
FROM dispensaries
|
||||
WHERE id = $1
|
||||
`,
|
||||
[id]
|
||||
);
|
||||
if (!rows[0]) return null;
|
||||
return mapDbRowToDispensary(rows[0]);
|
||||
}
|
||||
|
||||
/**
|
||||
* Get dispensaries with platform IDs (ready for crawling)
|
||||
*/
|
||||
export async function getDispensariesWithPlatformIds(): Promise<Dispensary[]> {
|
||||
const { rows } = await query(
|
||||
`
|
||||
SELECT ${DISPENSARY_COLUMNS} FROM dispensaries
|
||||
WHERE menu_type = 'dutchie' AND platform_dispensary_id IS NOT NULL
|
||||
ORDER BY name
|
||||
`
|
||||
);
|
||||
return rows.map(mapDbRowToDispensary);
|
||||
}
|
||||
|
||||
/**
|
||||
* Re-resolve a single dispensary's platform ID
|
||||
* Clears the existing ID and re-resolves from the menu_url cName
|
||||
*/
|
||||
export async function reResolveDispensaryPlatformId(dispensaryId: number): Promise<{
|
||||
success: boolean;
|
||||
platformId: string | null;
|
||||
cName: string | null;
|
||||
error?: string;
|
||||
}> {
|
||||
console.log(`[Discovery] Re-resolving platform ID for dispensary ${dispensaryId}...`);
|
||||
|
||||
const dispensary = await getDispensaryById(dispensaryId);
|
||||
if (!dispensary) {
|
||||
return { success: false, platformId: null, cName: null, error: 'Dispensary not found' };
|
||||
}
|
||||
|
||||
const cName = extractCNameFromMenuUrl(dispensary.menuUrl);
|
||||
if (!cName) {
|
||||
console.log(`[Discovery] Could not extract cName from menu_url: ${dispensary.menuUrl}`);
|
||||
return {
|
||||
success: false,
|
||||
platformId: null,
|
||||
cName: null,
|
||||
error: `Could not extract cName from menu_url: ${dispensary.menuUrl}`,
|
||||
};
|
||||
}
|
||||
|
||||
console.log(`[Discovery] Extracted cName: ${cName} from menu_url: ${dispensary.menuUrl}`);
|
||||
|
||||
try {
|
||||
const platformId = await resolveDispensaryId(cName);
|
||||
|
||||
if (platformId) {
|
||||
await query(
|
||||
`
|
||||
UPDATE dispensaries
|
||||
SET platform_dispensary_id = $1,
|
||||
platform_dispensary_id_resolved_at = NOW(),
|
||||
updated_at = NOW()
|
||||
WHERE id = $2
|
||||
`,
|
||||
[platformId, dispensaryId]
|
||||
);
|
||||
console.log(`[Discovery] Resolved: ${cName} -> ${platformId}`);
|
||||
return { success: true, platformId, cName };
|
||||
} else {
|
||||
// Clear the invalid platform ID and mark as not crawlable
|
||||
await query(
|
||||
`
|
||||
UPDATE dispensaries
|
||||
SET platform_dispensary_id = NULL,
|
||||
provider_detection_data = COALESCE(provider_detection_data, '{}'::jsonb) ||
|
||||
'{"resolution_error": "cName no longer exists on Dutchie", "not_crawlable": true}'::jsonb,
|
||||
updated_at = NOW()
|
||||
WHERE id = $1
|
||||
`,
|
||||
[dispensaryId]
|
||||
);
|
||||
console.log(`[Discovery] Could not resolve: ${cName} - marked as not crawlable`);
|
||||
return {
|
||||
success: false,
|
||||
platformId: null,
|
||||
cName,
|
||||
error: `cName "${cName}" no longer exists on Dutchie`,
|
||||
};
|
||||
}
|
||||
} catch (error: any) {
|
||||
console.error(`[Discovery] Error resolving ${cName}:`, error.message);
|
||||
return { success: false, platformId: null, cName, error: error.message };
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Update menu_url for a dispensary and re-resolve platform ID
|
||||
*/
|
||||
export async function updateMenuUrlAndResolve(dispensaryId: number, newMenuUrl: string): Promise<{
|
||||
success: boolean;
|
||||
platformId: string | null;
|
||||
cName: string | null;
|
||||
error?: string;
|
||||
}> {
|
||||
console.log(`[Discovery] Updating menu_url for dispensary ${dispensaryId} to: ${newMenuUrl}`);
|
||||
|
||||
const cName = extractCNameFromMenuUrl(newMenuUrl);
|
||||
if (!cName) {
|
||||
return {
|
||||
success: false,
|
||||
platformId: null,
|
||||
cName: null,
|
||||
error: `Could not extract cName from new menu_url: ${newMenuUrl}`,
|
||||
};
|
||||
}
|
||||
|
||||
// Update the menu_url first
|
||||
await query(
|
||||
`
|
||||
UPDATE dispensaries
|
||||
SET menu_url = $1,
|
||||
menu_type = 'dutchie',
|
||||
platform_dispensary_id = NULL,
|
||||
updated_at = NOW()
|
||||
WHERE id = $2
|
||||
`,
|
||||
[newMenuUrl, dispensaryId]
|
||||
);
|
||||
|
||||
// Now resolve the platform ID with the new cName
|
||||
return await reResolveDispensaryPlatformId(dispensaryId);
|
||||
}
|
||||
|
||||
/**
|
||||
* Mark a dispensary as not crawlable (when resolution fails permanently)
|
||||
*/
|
||||
export async function markDispensaryNotCrawlable(dispensaryId: number, reason: string): Promise<void> {
|
||||
await query(
|
||||
`
|
||||
UPDATE dispensaries
|
||||
SET platform_dispensary_id = NULL,
|
||||
provider_detection_data = COALESCE(provider_detection_data, '{}'::jsonb) ||
|
||||
jsonb_build_object('not_crawlable', true, 'not_crawlable_reason', $1::text, 'not_crawlable_at', NOW()::text),
|
||||
updated_at = NOW()
|
||||
WHERE id = $2
|
||||
`,
|
||||
[reason, dispensaryId]
|
||||
);
|
||||
console.log(`[Discovery] Marked dispensary ${dispensaryId} as not crawlable: ${reason}`);
|
||||
}
|
||||
|
||||
/**
|
||||
* Get the cName for a dispensary (extracted from menu_url)
|
||||
*/
|
||||
export function getDispensaryCName(dispensary: Dispensary): string | null {
|
||||
return extractCNameFromMenuUrl(dispensary.menuUrl);
|
||||
}
|
||||
@@ -1,491 +0,0 @@
|
||||
/**
|
||||
* Error Taxonomy Module
|
||||
*
|
||||
* Standardized error codes and classification for crawler reliability.
|
||||
* All crawl results must use these codes for consistent error handling.
|
||||
*
|
||||
* Phase 1: Crawler Reliability & Stabilization
|
||||
*/
|
||||
|
||||
// ============================================================
|
||||
// ERROR CODES
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Standardized error codes for all crawl operations.
|
||||
* These codes are stored in the database for analytics and debugging.
|
||||
*/
|
||||
export const CrawlErrorCode = {
|
||||
// Success states
|
||||
SUCCESS: 'SUCCESS',
|
||||
|
||||
// Rate limiting
|
||||
RATE_LIMITED: 'RATE_LIMITED', // 429 responses
|
||||
|
||||
// Proxy issues
|
||||
BLOCKED_PROXY: 'BLOCKED_PROXY', // 407 or proxy-related blocks
|
||||
PROXY_TIMEOUT: 'PROXY_TIMEOUT', // Proxy connection timeout
|
||||
|
||||
// Content issues
|
||||
HTML_CHANGED: 'HTML_CHANGED', // Page structure changed
|
||||
NO_PRODUCTS: 'NO_PRODUCTS', // Empty response (valid but no data)
|
||||
PARSE_ERROR: 'PARSE_ERROR', // Failed to parse response
|
||||
|
||||
// Network issues
|
||||
TIMEOUT: 'TIMEOUT', // Request timeout
|
||||
NETWORK_ERROR: 'NETWORK_ERROR', // Connection failed
|
||||
DNS_ERROR: 'DNS_ERROR', // DNS resolution failed
|
||||
|
||||
// Authentication
|
||||
AUTH_FAILED: 'AUTH_FAILED', // Authentication/session issues
|
||||
|
||||
// Server errors
|
||||
SERVER_ERROR: 'SERVER_ERROR', // 5xx responses
|
||||
SERVICE_UNAVAILABLE: 'SERVICE_UNAVAILABLE', // 503
|
||||
|
||||
// Configuration issues
|
||||
INVALID_CONFIG: 'INVALID_CONFIG', // Bad store configuration
|
||||
MISSING_PLATFORM_ID: 'MISSING_PLATFORM_ID', // No platform_dispensary_id
|
||||
|
||||
// Unknown
|
||||
UNKNOWN_ERROR: 'UNKNOWN_ERROR', // Catch-all for unclassified errors
|
||||
} as const;
|
||||
|
||||
export type CrawlErrorCodeType = typeof CrawlErrorCode[keyof typeof CrawlErrorCode];
|
||||
|
||||
// ============================================================
|
||||
// ERROR CLASSIFICATION
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Error metadata for each error code
|
||||
*/
|
||||
interface ErrorMetadata {
|
||||
code: CrawlErrorCodeType;
|
||||
retryable: boolean;
|
||||
rotateProxy: boolean;
|
||||
rotateUserAgent: boolean;
|
||||
backoffMultiplier: number;
|
||||
severity: 'low' | 'medium' | 'high' | 'critical';
|
||||
description: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Metadata for each error code - defines retry behavior
|
||||
*/
|
||||
export const ERROR_METADATA: Record<CrawlErrorCodeType, ErrorMetadata> = {
|
||||
[CrawlErrorCode.SUCCESS]: {
|
||||
code: CrawlErrorCode.SUCCESS,
|
||||
retryable: false,
|
||||
rotateProxy: false,
|
||||
rotateUserAgent: false,
|
||||
backoffMultiplier: 0,
|
||||
severity: 'low',
|
||||
description: 'Crawl completed successfully',
|
||||
},
|
||||
|
||||
[CrawlErrorCode.RATE_LIMITED]: {
|
||||
code: CrawlErrorCode.RATE_LIMITED,
|
||||
retryable: true,
|
||||
rotateProxy: true,
|
||||
rotateUserAgent: true,
|
||||
backoffMultiplier: 2.0,
|
||||
severity: 'medium',
|
||||
description: 'Rate limited by target (429)',
|
||||
},
|
||||
|
||||
[CrawlErrorCode.BLOCKED_PROXY]: {
|
||||
code: CrawlErrorCode.BLOCKED_PROXY,
|
||||
retryable: true,
|
||||
rotateProxy: true,
|
||||
rotateUserAgent: true,
|
||||
backoffMultiplier: 1.5,
|
||||
severity: 'medium',
|
||||
description: 'Proxy blocked or rejected (407)',
|
||||
},
|
||||
|
||||
[CrawlErrorCode.PROXY_TIMEOUT]: {
|
||||
code: CrawlErrorCode.PROXY_TIMEOUT,
|
||||
retryable: true,
|
||||
rotateProxy: true,
|
||||
rotateUserAgent: false,
|
||||
backoffMultiplier: 1.0,
|
||||
severity: 'low',
|
||||
description: 'Proxy connection timed out',
|
||||
},
|
||||
|
||||
[CrawlErrorCode.HTML_CHANGED]: {
|
||||
code: CrawlErrorCode.HTML_CHANGED,
|
||||
retryable: false,
|
||||
rotateProxy: false,
|
||||
rotateUserAgent: false,
|
||||
backoffMultiplier: 1.0,
|
||||
severity: 'high',
|
||||
description: 'Page structure changed - needs selector update',
|
||||
},
|
||||
|
||||
[CrawlErrorCode.NO_PRODUCTS]: {
|
||||
code: CrawlErrorCode.NO_PRODUCTS,
|
||||
retryable: true,
|
||||
rotateProxy: false,
|
||||
rotateUserAgent: false,
|
||||
backoffMultiplier: 1.0,
|
||||
severity: 'low',
|
||||
description: 'No products returned (may be temporary)',
|
||||
},
|
||||
|
||||
[CrawlErrorCode.PARSE_ERROR]: {
|
||||
code: CrawlErrorCode.PARSE_ERROR,
|
||||
retryable: true,
|
||||
rotateProxy: false,
|
||||
rotateUserAgent: false,
|
||||
backoffMultiplier: 1.0,
|
||||
severity: 'medium',
|
||||
description: 'Failed to parse response data',
|
||||
},
|
||||
|
||||
[CrawlErrorCode.TIMEOUT]: {
|
||||
code: CrawlErrorCode.TIMEOUT,
|
||||
retryable: true,
|
||||
rotateProxy: true,
|
||||
rotateUserAgent: false,
|
||||
backoffMultiplier: 1.5,
|
||||
severity: 'medium',
|
||||
description: 'Request timed out',
|
||||
},
|
||||
|
||||
[CrawlErrorCode.NETWORK_ERROR]: {
|
||||
code: CrawlErrorCode.NETWORK_ERROR,
|
||||
retryable: true,
|
||||
rotateProxy: true,
|
||||
rotateUserAgent: false,
|
||||
backoffMultiplier: 1.0,
|
||||
severity: 'medium',
|
||||
description: 'Network connection failed',
|
||||
},
|
||||
|
||||
[CrawlErrorCode.DNS_ERROR]: {
|
||||
code: CrawlErrorCode.DNS_ERROR,
|
||||
retryable: true,
|
||||
rotateProxy: true,
|
||||
rotateUserAgent: false,
|
||||
backoffMultiplier: 1.0,
|
||||
severity: 'medium',
|
||||
description: 'DNS resolution failed',
|
||||
},
|
||||
|
||||
[CrawlErrorCode.AUTH_FAILED]: {
|
||||
code: CrawlErrorCode.AUTH_FAILED,
|
||||
retryable: true,
|
||||
rotateProxy: false,
|
||||
rotateUserAgent: true,
|
||||
backoffMultiplier: 2.0,
|
||||
severity: 'high',
|
||||
description: 'Authentication or session failed',
|
||||
},
|
||||
|
||||
[CrawlErrorCode.SERVER_ERROR]: {
|
||||
code: CrawlErrorCode.SERVER_ERROR,
|
||||
retryable: true,
|
||||
rotateProxy: false,
|
||||
rotateUserAgent: false,
|
||||
backoffMultiplier: 1.5,
|
||||
severity: 'medium',
|
||||
description: 'Server error (5xx)',
|
||||
},
|
||||
|
||||
[CrawlErrorCode.SERVICE_UNAVAILABLE]: {
|
||||
code: CrawlErrorCode.SERVICE_UNAVAILABLE,
|
||||
retryable: true,
|
||||
rotateProxy: false,
|
||||
rotateUserAgent: false,
|
||||
backoffMultiplier: 2.0,
|
||||
severity: 'high',
|
||||
description: 'Service temporarily unavailable (503)',
|
||||
},
|
||||
|
||||
[CrawlErrorCode.INVALID_CONFIG]: {
|
||||
code: CrawlErrorCode.INVALID_CONFIG,
|
||||
retryable: false,
|
||||
rotateProxy: false,
|
||||
rotateUserAgent: false,
|
||||
backoffMultiplier: 0,
|
||||
severity: 'critical',
|
||||
description: 'Invalid store configuration',
|
||||
},
|
||||
|
||||
[CrawlErrorCode.MISSING_PLATFORM_ID]: {
|
||||
code: CrawlErrorCode.MISSING_PLATFORM_ID,
|
||||
retryable: false,
|
||||
rotateProxy: false,
|
||||
rotateUserAgent: false,
|
||||
backoffMultiplier: 0,
|
||||
severity: 'critical',
|
||||
description: 'Missing platform_dispensary_id',
|
||||
},
|
||||
|
||||
[CrawlErrorCode.UNKNOWN_ERROR]: {
|
||||
code: CrawlErrorCode.UNKNOWN_ERROR,
|
||||
retryable: true,
|
||||
rotateProxy: false,
|
||||
rotateUserAgent: false,
|
||||
backoffMultiplier: 1.0,
|
||||
severity: 'high',
|
||||
description: 'Unknown/unclassified error',
|
||||
},
|
||||
};
|
||||
|
||||
// ============================================================
|
||||
// ERROR CLASSIFICATION FUNCTIONS
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Classify an error into a standardized error code.
|
||||
*
|
||||
* @param error - The error to classify (Error object, string, or HTTP status)
|
||||
* @param httpStatus - Optional HTTP status code
|
||||
* @returns Standardized error code
|
||||
*/
|
||||
export function classifyError(
|
||||
error: Error | string | null,
|
||||
httpStatus?: number
|
||||
): CrawlErrorCodeType {
|
||||
// Check HTTP status first
|
||||
if (httpStatus) {
|
||||
if (httpStatus === 429) return CrawlErrorCode.RATE_LIMITED;
|
||||
if (httpStatus === 407) return CrawlErrorCode.BLOCKED_PROXY;
|
||||
if (httpStatus === 401 || httpStatus === 403) return CrawlErrorCode.AUTH_FAILED;
|
||||
if (httpStatus === 503) return CrawlErrorCode.SERVICE_UNAVAILABLE;
|
||||
if (httpStatus >= 500) return CrawlErrorCode.SERVER_ERROR;
|
||||
}
|
||||
|
||||
if (!error) return CrawlErrorCode.UNKNOWN_ERROR;
|
||||
|
||||
const message = typeof error === 'string' ? error.toLowerCase() : error.message.toLowerCase();
|
||||
|
||||
// Rate limiting patterns
|
||||
if (message.includes('rate limit') || message.includes('too many requests') || message.includes('429')) {
|
||||
return CrawlErrorCode.RATE_LIMITED;
|
||||
}
|
||||
|
||||
// Proxy patterns
|
||||
if (message.includes('proxy') && (message.includes('block') || message.includes('reject') || message.includes('407'))) {
|
||||
return CrawlErrorCode.BLOCKED_PROXY;
|
||||
}
|
||||
|
||||
// Timeout patterns
|
||||
if (message.includes('timeout') || message.includes('timed out') || message.includes('etimedout')) {
|
||||
if (message.includes('proxy')) {
|
||||
return CrawlErrorCode.PROXY_TIMEOUT;
|
||||
}
|
||||
return CrawlErrorCode.TIMEOUT;
|
||||
}
|
||||
|
||||
// Network patterns
|
||||
if (message.includes('econnrefused') || message.includes('econnreset') || message.includes('network')) {
|
||||
return CrawlErrorCode.NETWORK_ERROR;
|
||||
}
|
||||
|
||||
// DNS patterns
|
||||
if (message.includes('enotfound') || message.includes('dns') || message.includes('getaddrinfo')) {
|
||||
return CrawlErrorCode.DNS_ERROR;
|
||||
}
|
||||
|
||||
// Auth patterns
|
||||
if (message.includes('auth') || message.includes('unauthorized') || message.includes('forbidden') || message.includes('401') || message.includes('403')) {
|
||||
return CrawlErrorCode.AUTH_FAILED;
|
||||
}
|
||||
|
||||
// HTML change patterns
|
||||
if (message.includes('selector') || message.includes('element not found') || message.includes('structure changed')) {
|
||||
return CrawlErrorCode.HTML_CHANGED;
|
||||
}
|
||||
|
||||
// Parse patterns
|
||||
if (message.includes('parse') || message.includes('json') || message.includes('syntax')) {
|
||||
return CrawlErrorCode.PARSE_ERROR;
|
||||
}
|
||||
|
||||
// No products patterns
|
||||
if (message.includes('no products') || message.includes('empty') || message.includes('0 products')) {
|
||||
return CrawlErrorCode.NO_PRODUCTS;
|
||||
}
|
||||
|
||||
// Server error patterns
|
||||
if (message.includes('500') || message.includes('502') || message.includes('503') || message.includes('504')) {
|
||||
return CrawlErrorCode.SERVER_ERROR;
|
||||
}
|
||||
|
||||
// Config patterns
|
||||
if (message.includes('config') || message.includes('invalid') || message.includes('missing')) {
|
||||
if (message.includes('platform') || message.includes('dispensary_id')) {
|
||||
return CrawlErrorCode.MISSING_PLATFORM_ID;
|
||||
}
|
||||
return CrawlErrorCode.INVALID_CONFIG;
|
||||
}
|
||||
|
||||
return CrawlErrorCode.UNKNOWN_ERROR;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get metadata for an error code
|
||||
*/
|
||||
export function getErrorMetadata(code: CrawlErrorCodeType): ErrorMetadata {
|
||||
return ERROR_METADATA[code] || ERROR_METADATA[CrawlErrorCode.UNKNOWN_ERROR];
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if an error is retryable
|
||||
*/
|
||||
export function isRetryable(code: CrawlErrorCodeType): boolean {
|
||||
return getErrorMetadata(code).retryable;
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if proxy should be rotated for this error
|
||||
*/
|
||||
export function shouldRotateProxy(code: CrawlErrorCodeType): boolean {
|
||||
return getErrorMetadata(code).rotateProxy;
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if user agent should be rotated for this error
|
||||
*/
|
||||
export function shouldRotateUserAgent(code: CrawlErrorCodeType): boolean {
|
||||
return getErrorMetadata(code).rotateUserAgent;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get backoff multiplier for this error
|
||||
*/
|
||||
export function getBackoffMultiplier(code: CrawlErrorCodeType): number {
|
||||
return getErrorMetadata(code).backoffMultiplier;
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// CRAWL RESULT TYPE
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Standardized crawl result with error taxonomy
|
||||
*/
|
||||
export interface CrawlResult {
|
||||
success: boolean;
|
||||
dispensaryId: number;
|
||||
|
||||
// Error info
|
||||
errorCode: CrawlErrorCodeType;
|
||||
errorMessage?: string;
|
||||
httpStatus?: number;
|
||||
|
||||
// Timing
|
||||
startedAt: Date;
|
||||
finishedAt: Date;
|
||||
durationMs: number;
|
||||
|
||||
// Context
|
||||
attemptNumber: number;
|
||||
proxyUsed?: string;
|
||||
userAgentUsed?: string;
|
||||
|
||||
// Metrics (on success)
|
||||
productsFound?: number;
|
||||
productsUpserted?: number;
|
||||
snapshotsCreated?: number;
|
||||
imagesDownloaded?: number;
|
||||
|
||||
// Metadata
|
||||
metadata?: Record<string, any>;
|
||||
}
|
||||
|
||||
/**
|
||||
* Create a success result
|
||||
*/
|
||||
export function createSuccessResult(
|
||||
dispensaryId: number,
|
||||
startedAt: Date,
|
||||
metrics: {
|
||||
productsFound: number;
|
||||
productsUpserted: number;
|
||||
snapshotsCreated: number;
|
||||
imagesDownloaded?: number;
|
||||
},
|
||||
context?: {
|
||||
attemptNumber?: number;
|
||||
proxyUsed?: string;
|
||||
userAgentUsed?: string;
|
||||
}
|
||||
): CrawlResult {
|
||||
const finishedAt = new Date();
|
||||
return {
|
||||
success: true,
|
||||
dispensaryId,
|
||||
errorCode: CrawlErrorCode.SUCCESS,
|
||||
startedAt,
|
||||
finishedAt,
|
||||
durationMs: finishedAt.getTime() - startedAt.getTime(),
|
||||
attemptNumber: context?.attemptNumber || 1,
|
||||
proxyUsed: context?.proxyUsed,
|
||||
userAgentUsed: context?.userAgentUsed,
|
||||
...metrics,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Create a failure result
|
||||
*/
|
||||
export function createFailureResult(
|
||||
dispensaryId: number,
|
||||
startedAt: Date,
|
||||
error: Error | string,
|
||||
httpStatus?: number,
|
||||
context?: {
|
||||
attemptNumber?: number;
|
||||
proxyUsed?: string;
|
||||
userAgentUsed?: string;
|
||||
}
|
||||
): CrawlResult {
|
||||
const finishedAt = new Date();
|
||||
const errorCode = classifyError(error, httpStatus);
|
||||
const errorMessage = typeof error === 'string' ? error : error.message;
|
||||
|
||||
return {
|
||||
success: false,
|
||||
dispensaryId,
|
||||
errorCode,
|
||||
errorMessage,
|
||||
httpStatus,
|
||||
startedAt,
|
||||
finishedAt,
|
||||
durationMs: finishedAt.getTime() - startedAt.getTime(),
|
||||
attemptNumber: context?.attemptNumber || 1,
|
||||
proxyUsed: context?.proxyUsed,
|
||||
userAgentUsed: context?.userAgentUsed,
|
||||
};
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// LOGGING HELPERS
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Format error code for logging
|
||||
*/
|
||||
export function formatErrorForLog(result: CrawlResult): string {
|
||||
const metadata = getErrorMetadata(result.errorCode);
|
||||
const retryInfo = metadata.retryable ? '(retryable)' : '(non-retryable)';
|
||||
const proxyInfo = result.proxyUsed ? ` via ${result.proxyUsed}` : '';
|
||||
|
||||
if (result.success) {
|
||||
return `[${result.errorCode}] Crawl successful: ${result.productsFound} products${proxyInfo}`;
|
||||
}
|
||||
|
||||
return `[${result.errorCode}] ${result.errorMessage}${proxyInfo} ${retryInfo}`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get user-friendly error description
|
||||
*/
|
||||
export function getErrorDescription(code: CrawlErrorCodeType): string {
|
||||
return getErrorMetadata(code).description;
|
||||
}
|
||||
@@ -1,712 +0,0 @@
|
||||
/**
|
||||
* Dutchie GraphQL Client
|
||||
*
|
||||
* Uses Puppeteer to establish a session (get CF cookies), then makes
|
||||
* SERVER-SIDE fetch calls to api-gw.dutchie.com with those cookies.
|
||||
*
|
||||
* DUTCHIE FETCH RULES:
|
||||
* 1. Server-side only - use axios (never browser fetch with CORS)
|
||||
* 2. Use dispensaryFilter.cNameOrID, NOT dispensaryId directly
|
||||
* 3. Headers must mimic Chrome: User-Agent, Origin, Referer
|
||||
* 4. If 403, extract CF cookies from Puppeteer session and include them
|
||||
* 5. Log status codes, error bodies, and product counts
|
||||
*/
|
||||
|
||||
import axios, { AxiosError } from 'axios';
|
||||
import puppeteer from 'puppeteer-extra';
|
||||
import type { Browser, Page, Protocol } from 'puppeteer';
|
||||
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
|
||||
import {
|
||||
DutchieRawProduct,
|
||||
DutchiePOSChild,
|
||||
CrawlMode,
|
||||
} from '../types';
|
||||
import { dutchieConfig, GRAPHQL_HASHES, ARIZONA_CENTERPOINTS } from '../config/dutchie';
|
||||
|
||||
puppeteer.use(StealthPlugin());
|
||||
|
||||
// Re-export for backward compatibility
|
||||
export { GRAPHQL_HASHES, ARIZONA_CENTERPOINTS };
|
||||
|
||||
// ============================================================
|
||||
// SESSION MANAGEMENT - Get CF cookies via Puppeteer
|
||||
// ============================================================
|
||||
|
||||
interface SessionCredentials {
|
||||
cookies: string; // Cookie header string
|
||||
userAgent: string;
|
||||
browser: Browser;
|
||||
page: Page; // Keep page reference for extracting dispensaryId
|
||||
dispensaryId?: string; // Extracted from window.reactEnv if available
|
||||
httpStatus?: number; // HTTP status code from navigation
|
||||
}
|
||||
|
||||
/**
|
||||
* Create a session by navigating to the embedded menu page
|
||||
* and extracting CF clearance cookies for server-side requests.
|
||||
* Also extracts dispensaryId from window.reactEnv if available.
|
||||
*/
|
||||
async function createSession(cName: string): Promise<SessionCredentials> {
|
||||
const browser = await puppeteer.launch({
|
||||
headless: 'new',
|
||||
args: dutchieConfig.browserArgs,
|
||||
});
|
||||
|
||||
const page = await browser.newPage();
|
||||
const userAgent = dutchieConfig.userAgent;
|
||||
|
||||
await page.setUserAgent(userAgent);
|
||||
await page.setViewport({ width: 1920, height: 1080 });
|
||||
await page.evaluateOnNewDocument(() => {
|
||||
Object.defineProperty(navigator, 'webdriver', { get: () => false });
|
||||
(window as any).chrome = { runtime: {} };
|
||||
});
|
||||
|
||||
// Navigate to the embedded menu page for this dispensary
|
||||
const embeddedMenuUrl = `https://dutchie.com/embedded-menu/${cName}`;
|
||||
console.log(`[GraphQL Client] Loading ${embeddedMenuUrl} to get CF cookies...`);
|
||||
|
||||
let httpStatus: number | undefined;
|
||||
let dispensaryId: string | undefined;
|
||||
|
||||
try {
|
||||
const response = await page.goto(embeddedMenuUrl, {
|
||||
waitUntil: 'networkidle2',
|
||||
timeout: dutchieConfig.navigationTimeout,
|
||||
});
|
||||
httpStatus = response?.status();
|
||||
await new Promise((r) => setTimeout(r, dutchieConfig.pageLoadDelay));
|
||||
|
||||
// Try to extract dispensaryId from window.reactEnv
|
||||
try {
|
||||
dispensaryId = await page.evaluate(() => {
|
||||
return (window as any).reactEnv?.dispensaryId || null;
|
||||
});
|
||||
if (dispensaryId) {
|
||||
console.log(`[GraphQL Client] Extracted dispensaryId from reactEnv: ${dispensaryId}`);
|
||||
}
|
||||
} catch (evalError: any) {
|
||||
console.log(`[GraphQL Client] Could not extract dispensaryId from reactEnv: ${evalError.message}`);
|
||||
}
|
||||
} catch (error: any) {
|
||||
console.warn(`[GraphQL Client] Navigation warning: ${error.message}`);
|
||||
// Continue anyway - we may have gotten cookies
|
||||
}
|
||||
|
||||
// Extract cookies
|
||||
const cookies = await page.cookies();
|
||||
const cookieString = cookies.map((c: Protocol.Network.Cookie) => `${c.name}=${c.value}`).join('; ');
|
||||
|
||||
console.log(`[GraphQL Client] Got ${cookies.length} cookies, HTTP status: ${httpStatus}`);
|
||||
if (cookies.length > 0) {
|
||||
console.log(`[GraphQL Client] Cookie names: ${cookies.map(c => c.name).join(', ')}`);
|
||||
}
|
||||
|
||||
return { cookies: cookieString, userAgent, browser, page, dispensaryId, httpStatus };
|
||||
}
|
||||
|
||||
/**
|
||||
* Close session (browser)
|
||||
*/
|
||||
async function closeSession(session: SessionCredentials): Promise<void> {
|
||||
await session.browser.close();
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// SERVER-SIDE GRAPHQL FETCH USING AXIOS
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Build headers that mimic a real browser request
|
||||
*/
|
||||
function buildHeaders(session: SessionCredentials, cName: string): Record<string, string> {
|
||||
const embeddedMenuUrl = `https://dutchie.com/embedded-menu/${cName}`;
|
||||
|
||||
return {
|
||||
'accept': 'application/json, text/plain, */*',
|
||||
'accept-language': 'en-US,en;q=0.9',
|
||||
'accept-encoding': 'gzip, deflate, br',
|
||||
'content-type': 'application/json',
|
||||
'origin': 'https://dutchie.com',
|
||||
'referer': embeddedMenuUrl,
|
||||
'user-agent': session.userAgent,
|
||||
'apollographql-client-name': 'Marketplace (production)',
|
||||
'sec-ch-ua': '"Chromium";v="120", "Google Chrome";v="120", "Not-A.Brand";v="99"',
|
||||
'sec-ch-ua-mobile': '?0',
|
||||
'sec-ch-ua-platform': '"Windows"',
|
||||
'sec-fetch-dest': 'empty',
|
||||
'sec-fetch-mode': 'cors',
|
||||
'sec-fetch-site': 'same-site',
|
||||
...(session.cookies ? { 'cookie': session.cookies } : {}),
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Execute GraphQL query server-side using axios
|
||||
* Uses cookies from the browser session to bypass CF
|
||||
*/
|
||||
async function executeGraphQL(
|
||||
session: SessionCredentials,
|
||||
operationName: string,
|
||||
variables: any,
|
||||
hash: string,
|
||||
cName: string
|
||||
): Promise<any> {
|
||||
const endpoint = dutchieConfig.graphqlEndpoint;
|
||||
const headers = buildHeaders(session, cName);
|
||||
|
||||
// Build request body for POST
|
||||
const body = {
|
||||
operationName,
|
||||
variables,
|
||||
extensions: {
|
||||
persistedQuery: { version: 1, sha256Hash: hash },
|
||||
},
|
||||
};
|
||||
|
||||
console.log(`[GraphQL Client] POST: ${operationName} -> ${endpoint}`);
|
||||
console.log(`[GraphQL Client] Variables: ${JSON.stringify(variables).slice(0, 300)}...`);
|
||||
|
||||
try {
|
||||
const response = await axios.post(endpoint, body, {
|
||||
headers,
|
||||
timeout: 30000,
|
||||
validateStatus: () => true, // Don't throw on non-2xx
|
||||
});
|
||||
|
||||
// Log response details
|
||||
console.log(`[GraphQL Client] Response status: ${response.status}`);
|
||||
|
||||
if (response.status !== 200) {
|
||||
const bodyPreview = typeof response.data === 'string'
|
||||
? response.data.slice(0, 500)
|
||||
: JSON.stringify(response.data).slice(0, 500);
|
||||
console.error(`[GraphQL Client] HTTP ${response.status}: ${bodyPreview}`);
|
||||
throw new Error(`HTTP ${response.status}`);
|
||||
}
|
||||
|
||||
// Check for GraphQL errors
|
||||
if (response.data?.errors && response.data.errors.length > 0) {
|
||||
console.error(`[GraphQL Client] GraphQL errors: ${JSON.stringify(response.data.errors[0])}`);
|
||||
}
|
||||
|
||||
return response.data;
|
||||
} catch (error: any) {
|
||||
if (axios.isAxiosError(error)) {
|
||||
const axiosError = error as AxiosError;
|
||||
console.error(`[GraphQL Client] Axios error: ${axiosError.message}`);
|
||||
if (axiosError.response) {
|
||||
console.error(`[GraphQL Client] Response status: ${axiosError.response.status}`);
|
||||
console.error(`[GraphQL Client] Response data: ${JSON.stringify(axiosError.response.data).slice(0, 500)}`);
|
||||
}
|
||||
if (axiosError.code) {
|
||||
console.error(`[GraphQL Client] Error code: ${axiosError.code}`);
|
||||
}
|
||||
} else {
|
||||
console.error(`[GraphQL Client] Error: ${error.message}`);
|
||||
}
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// DISPENSARY ID RESOLUTION
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Resolution result with HTTP status for error handling
|
||||
*/
|
||||
export interface ResolveDispensaryResult {
|
||||
dispensaryId: string | null;
|
||||
httpStatus?: number;
|
||||
error?: string;
|
||||
source?: 'reactEnv' | 'graphql';
|
||||
}
|
||||
|
||||
/**
|
||||
* Resolve a dispensary slug to its internal platform ID.
|
||||
*
|
||||
* STRATEGY:
|
||||
* 1. Navigate to embedded menu page and extract window.reactEnv.dispensaryId (preferred)
|
||||
* 2. Fall back to GraphQL GetAddressBasedDispensaryData query if reactEnv fails
|
||||
*
|
||||
* Returns the dispensaryId (platform_dispensary_id) or null if not found.
|
||||
* Throws if page returns 403/404 so caller can mark as not_crawlable.
|
||||
*/
|
||||
export async function resolveDispensaryId(slug: string): Promise<string | null> {
|
||||
const result = await resolveDispensaryIdWithDetails(slug);
|
||||
return result.dispensaryId;
|
||||
}
|
||||
|
||||
/**
|
||||
* Resolve a dispensary slug with full details (HTTP status, source, error).
|
||||
* Use this when you need to know WHY resolution failed.
|
||||
*/
|
||||
export async function resolveDispensaryIdWithDetails(slug: string): Promise<ResolveDispensaryResult> {
|
||||
console.log(`[GraphQL Client] Resolving dispensary ID for slug: ${slug}`);
|
||||
|
||||
const session = await createSession(slug);
|
||||
|
||||
try {
|
||||
// Check HTTP status first - if 403/404, the store is not crawlable
|
||||
if (session.httpStatus && (session.httpStatus === 403 || session.httpStatus === 404)) {
|
||||
console.log(`[GraphQL Client] Page returned HTTP ${session.httpStatus} for ${slug} - not crawlable`);
|
||||
return {
|
||||
dispensaryId: null,
|
||||
httpStatus: session.httpStatus,
|
||||
error: `HTTP ${session.httpStatus}: Store removed or not accessible`,
|
||||
source: 'reactEnv',
|
||||
};
|
||||
}
|
||||
|
||||
// PREFERRED: Use dispensaryId from window.reactEnv (extracted during createSession)
|
||||
if (session.dispensaryId) {
|
||||
console.log(`[GraphQL Client] Resolved ${slug} -> ${session.dispensaryId} (from reactEnv)`);
|
||||
return {
|
||||
dispensaryId: session.dispensaryId,
|
||||
httpStatus: session.httpStatus,
|
||||
source: 'reactEnv',
|
||||
};
|
||||
}
|
||||
|
||||
// FALLBACK: Try GraphQL query
|
||||
console.log(`[GraphQL Client] reactEnv.dispensaryId not found for ${slug}, trying GraphQL...`);
|
||||
|
||||
const variables = {
|
||||
dispensaryFilter: {
|
||||
cNameOrID: slug,
|
||||
},
|
||||
};
|
||||
|
||||
const result = await executeGraphQL(
|
||||
session,
|
||||
'GetAddressBasedDispensaryData',
|
||||
variables,
|
||||
GRAPHQL_HASHES.GetAddressBasedDispensaryData,
|
||||
slug
|
||||
);
|
||||
|
||||
const dispensaryId = result?.data?.dispensaryBySlug?.id ||
|
||||
result?.data?.dispensary?.id ||
|
||||
result?.data?.getAddressBasedDispensaryData?.dispensary?.id;
|
||||
|
||||
if (dispensaryId) {
|
||||
console.log(`[GraphQL Client] Resolved ${slug} -> ${dispensaryId} (from GraphQL)`);
|
||||
return {
|
||||
dispensaryId,
|
||||
httpStatus: session.httpStatus,
|
||||
source: 'graphql',
|
||||
};
|
||||
}
|
||||
|
||||
console.log(`[GraphQL Client] Could not resolve ${slug}, GraphQL response:`, JSON.stringify(result).slice(0, 300));
|
||||
return {
|
||||
dispensaryId: null,
|
||||
httpStatus: session.httpStatus,
|
||||
error: 'Could not extract dispensaryId from reactEnv or GraphQL',
|
||||
};
|
||||
} finally {
|
||||
await closeSession(session);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Discover Arizona dispensaries via geo-based query
|
||||
*/
|
||||
export async function discoverArizonaDispensaries(): Promise<any[]> {
|
||||
console.log('[GraphQL Client] Discovering Arizona dispensaries...');
|
||||
|
||||
// Use Phoenix as the default center
|
||||
const session = await createSession('AZ-Deeply-Rooted');
|
||||
const allDispensaries: any[] = [];
|
||||
const seenIds = new Set<string>();
|
||||
|
||||
try {
|
||||
for (const centerpoint of ARIZONA_CENTERPOINTS) {
|
||||
console.log(`[GraphQL Client] Scanning ${centerpoint.name}...`);
|
||||
|
||||
const variables = {
|
||||
dispensariesFilter: {
|
||||
latitude: centerpoint.lat,
|
||||
longitude: centerpoint.lng,
|
||||
distance: 100,
|
||||
state: 'AZ',
|
||||
},
|
||||
};
|
||||
|
||||
try {
|
||||
const result = await executeGraphQL(
|
||||
session,
|
||||
'ConsumerDispensaries',
|
||||
variables,
|
||||
GRAPHQL_HASHES.ConsumerDispensaries,
|
||||
'AZ-Deeply-Rooted'
|
||||
);
|
||||
|
||||
const dispensaries = result?.data?.consumerDispensaries || [];
|
||||
|
||||
for (const d of dispensaries) {
|
||||
const id = d.id || d.dispensaryId;
|
||||
if (id && !seenIds.has(id)) {
|
||||
seenIds.add(id);
|
||||
allDispensaries.push(d);
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`[GraphQL Client] Found ${dispensaries.length} in ${centerpoint.name} (${allDispensaries.length} total unique)`);
|
||||
} catch (error: any) {
|
||||
console.warn(`[GraphQL Client] Error scanning ${centerpoint.name}: ${error.message}`);
|
||||
}
|
||||
|
||||
// Delay between requests
|
||||
await new Promise((r) => setTimeout(r, 1000));
|
||||
}
|
||||
} finally {
|
||||
await closeSession(session);
|
||||
}
|
||||
|
||||
console.log(`[GraphQL Client] Discovery complete: ${allDispensaries.length} dispensaries`);
|
||||
return allDispensaries;
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// PRODUCT FILTERING VARIABLES
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Build filter variables for FilteredProducts query
|
||||
*
|
||||
* CRITICAL: Uses dispensaryId directly (the MongoDB ObjectId, e.g. "6405ef617056e8014d79101b")
|
||||
* NOT dispensaryFilter.cNameOrID!
|
||||
*
|
||||
* The actual browser request structure is:
|
||||
* {
|
||||
* "productsFilter": {
|
||||
* "dispensaryId": "6405ef617056e8014d79101b",
|
||||
* "pricingType": "rec",
|
||||
* "Status": "Active", // Mode A only
|
||||
* "strainTypes": [],
|
||||
* "subcategories": [],
|
||||
* "types": [],
|
||||
* "useCache": true,
|
||||
* ...
|
||||
* },
|
||||
* "page": 0,
|
||||
* "perPage": 100
|
||||
* }
|
||||
*
|
||||
* Mode A = UI parity (Status: "Active")
|
||||
* Mode B = MAX COVERAGE (no Status filter)
|
||||
*/
|
||||
function buildFilterVariables(
|
||||
platformDispensaryId: string,
|
||||
pricingType: 'rec' | 'med',
|
||||
crawlMode: CrawlMode,
|
||||
page: number,
|
||||
perPage: number
|
||||
): any {
|
||||
const isModeA = crawlMode === 'mode_a';
|
||||
|
||||
// Per CLAUDE.md Rule #11: Use simple productsFilter with dispensaryId directly
|
||||
// Do NOT use dispensaryFilter.cNameOrID - that's outdated
|
||||
const productsFilter: Record<string, any> = {
|
||||
dispensaryId: platformDispensaryId,
|
||||
pricingType: pricingType,
|
||||
};
|
||||
|
||||
// Mode A: Only active products (UI parity) - Status: "Active"
|
||||
// Mode B: MAX COVERAGE (OOS/inactive) - omit Status or set to null
|
||||
if (isModeA) {
|
||||
productsFilter.Status = 'Active';
|
||||
}
|
||||
// Mode B: No Status filter = returns all products including OOS/inactive
|
||||
|
||||
return {
|
||||
productsFilter,
|
||||
page,
|
||||
perPage,
|
||||
};
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// PRODUCT FETCHING WITH PAGINATION
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Fetch products for a single mode with pagination
|
||||
*/
|
||||
async function fetchProductsForMode(
|
||||
session: SessionCredentials,
|
||||
platformDispensaryId: string,
|
||||
cName: string,
|
||||
pricingType: 'rec' | 'med',
|
||||
crawlMode: CrawlMode
|
||||
): Promise<{ products: DutchieRawProduct[]; totalCount: number; crawlMode: CrawlMode }> {
|
||||
const perPage = dutchieConfig.perPage;
|
||||
const maxPages = dutchieConfig.maxPages;
|
||||
const maxRetries = dutchieConfig.maxRetries;
|
||||
const pageDelayMs = dutchieConfig.pageDelayMs;
|
||||
|
||||
const allProducts: DutchieRawProduct[] = [];
|
||||
let pageNum = 0;
|
||||
let totalCount = 0;
|
||||
let consecutiveEmptyPages = 0;
|
||||
|
||||
console.log(`[GraphQL Client] Fetching products for ${cName} (platformId: ${platformDispensaryId}, ${pricingType}, ${crawlMode})...`);
|
||||
|
||||
while (pageNum < maxPages) {
|
||||
const variables = buildFilterVariables(platformDispensaryId, pricingType, crawlMode, pageNum, perPage);
|
||||
|
||||
let result: any = null;
|
||||
let lastError: Error | null = null;
|
||||
|
||||
// Retry logic
|
||||
for (let attempt = 0; attempt <= maxRetries; attempt++) {
|
||||
try {
|
||||
result = await executeGraphQL(
|
||||
session,
|
||||
'FilteredProducts',
|
||||
variables,
|
||||
GRAPHQL_HASHES.FilteredProducts,
|
||||
cName
|
||||
);
|
||||
lastError = null;
|
||||
break;
|
||||
} catch (error: any) {
|
||||
lastError = error;
|
||||
console.warn(`[GraphQL Client] Page ${pageNum} attempt ${attempt + 1} failed: ${error.message}`);
|
||||
if (attempt < maxRetries) {
|
||||
await new Promise((r) => setTimeout(r, 1000 * (attempt + 1)));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (lastError) {
|
||||
console.error(`[GraphQL Client] Page ${pageNum} failed after ${maxRetries + 1} attempts`);
|
||||
break;
|
||||
}
|
||||
|
||||
if (result?.errors) {
|
||||
console.error('[GraphQL Client] GraphQL errors:', JSON.stringify(result.errors));
|
||||
break;
|
||||
}
|
||||
|
||||
// Log response shape on first page
|
||||
if (pageNum === 0) {
|
||||
console.log(`[GraphQL Client] Response keys: ${Object.keys(result || {}).join(', ')}`);
|
||||
if (result?.data) {
|
||||
console.log(`[GraphQL Client] data keys: ${Object.keys(result.data || {}).join(', ')}`);
|
||||
}
|
||||
if (!result?.data?.filteredProducts) {
|
||||
console.log(`[GraphQL Client] WARNING: No filteredProducts in response!`);
|
||||
console.log(`[GraphQL Client] Full response: ${JSON.stringify(result).slice(0, 1000)}`);
|
||||
}
|
||||
}
|
||||
|
||||
const products = result?.data?.filteredProducts?.products || [];
|
||||
const queryInfo = result?.data?.filteredProducts?.queryInfo;
|
||||
|
||||
if (queryInfo?.totalCount) {
|
||||
totalCount = queryInfo.totalCount;
|
||||
}
|
||||
|
||||
console.log(
|
||||
`[GraphQL Client] Page ${pageNum}: ${products.length} products (total so far: ${allProducts.length + products.length}/${totalCount})`
|
||||
);
|
||||
|
||||
if (products.length === 0) {
|
||||
consecutiveEmptyPages++;
|
||||
if (consecutiveEmptyPages >= 2) {
|
||||
console.log('[GraphQL Client] Multiple empty pages, stopping pagination');
|
||||
break;
|
||||
}
|
||||
} else {
|
||||
consecutiveEmptyPages = 0;
|
||||
allProducts.push(...products);
|
||||
}
|
||||
|
||||
// Stop if incomplete page (last page)
|
||||
if (products.length < perPage) {
|
||||
console.log(`[GraphQL Client] Incomplete page (${products.length} < ${perPage}), stopping`);
|
||||
break;
|
||||
}
|
||||
|
||||
pageNum++;
|
||||
await new Promise((r) => setTimeout(r, pageDelayMs));
|
||||
}
|
||||
|
||||
console.log(`[GraphQL Client] Fetched ${allProducts.length} total products (${crawlMode})`);
|
||||
return { products: allProducts, totalCount: totalCount || allProducts.length, crawlMode };
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// LEGACY SINGLE-MODE INTERFACE
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Fetch all products for a dispensary (single mode)
|
||||
*/
|
||||
export async function fetchAllProducts(
|
||||
platformDispensaryId: string,
|
||||
pricingType: 'rec' | 'med' = 'rec',
|
||||
options: {
|
||||
perPage?: number;
|
||||
maxPages?: number;
|
||||
menuUrl?: string;
|
||||
crawlMode?: CrawlMode;
|
||||
cName?: string;
|
||||
} = {}
|
||||
): Promise<{ products: DutchieRawProduct[]; totalCount: number; crawlMode: CrawlMode }> {
|
||||
const { crawlMode = 'mode_a' } = options;
|
||||
|
||||
// cName is now REQUIRED - no default fallback to avoid using wrong store's session
|
||||
const cName = options.cName;
|
||||
if (!cName) {
|
||||
throw new Error('[GraphQL Client] cName is required for fetchAllProducts - cannot use another store\'s session');
|
||||
}
|
||||
|
||||
const session = await createSession(cName);
|
||||
|
||||
try {
|
||||
return await fetchProductsForMode(session, platformDispensaryId, cName, pricingType, crawlMode);
|
||||
} finally {
|
||||
await closeSession(session);
|
||||
}
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// MODE A+B MERGING
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Merge POSMetaData.children arrays from Mode A and Mode B products
|
||||
*/
|
||||
function mergeProductOptions(
|
||||
modeAProduct: DutchieRawProduct,
|
||||
modeBProduct: DutchieRawProduct
|
||||
): DutchiePOSChild[] {
|
||||
const modeAChildren = modeAProduct.POSMetaData?.children || [];
|
||||
const modeBChildren = modeBProduct.POSMetaData?.children || [];
|
||||
|
||||
const getOptionKey = (child: DutchiePOSChild): string => {
|
||||
return child.canonicalID || child.canonicalSKU || child.canonicalPackageId || child.option || '';
|
||||
};
|
||||
|
||||
const mergedMap = new Map<string, DutchiePOSChild>();
|
||||
|
||||
for (const child of modeAChildren) {
|
||||
const key = getOptionKey(child);
|
||||
if (key) mergedMap.set(key, child);
|
||||
}
|
||||
|
||||
for (const child of modeBChildren) {
|
||||
const key = getOptionKey(child);
|
||||
if (key && !mergedMap.has(key)) {
|
||||
mergedMap.set(key, child);
|
||||
}
|
||||
}
|
||||
|
||||
return Array.from(mergedMap.values());
|
||||
}
|
||||
|
||||
/**
|
||||
* Merge a Mode A product with a Mode B product
|
||||
*/
|
||||
function mergeProducts(
|
||||
modeAProduct: DutchieRawProduct,
|
||||
modeBProduct: DutchieRawProduct | undefined
|
||||
): DutchieRawProduct {
|
||||
if (!modeBProduct) {
|
||||
return modeAProduct;
|
||||
}
|
||||
|
||||
const mergedChildren = mergeProductOptions(modeAProduct, modeBProduct);
|
||||
|
||||
return {
|
||||
...modeAProduct,
|
||||
POSMetaData: {
|
||||
...modeAProduct.POSMetaData,
|
||||
children: mergedChildren,
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// MAIN EXPORT: TWO-MODE CRAWL
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Fetch products using BOTH crawl modes with SINGLE session
|
||||
* Runs Mode A then Mode B, merges results
|
||||
*/
|
||||
export async function fetchAllProductsBothModes(
|
||||
platformDispensaryId: string,
|
||||
pricingType: 'rec' | 'med' = 'rec',
|
||||
options: {
|
||||
perPage?: number;
|
||||
maxPages?: number;
|
||||
menuUrl?: string;
|
||||
cName?: string;
|
||||
} = {}
|
||||
): Promise<{
|
||||
modeA: { products: DutchieRawProduct[]; totalCount: number };
|
||||
modeB: { products: DutchieRawProduct[]; totalCount: number };
|
||||
merged: { products: DutchieRawProduct[]; totalCount: number };
|
||||
}> {
|
||||
// cName is now REQUIRED - no default fallback to avoid using wrong store's session
|
||||
const cName = options.cName;
|
||||
if (!cName) {
|
||||
throw new Error('[GraphQL Client] cName is required for fetchAllProductsBothModes - cannot use another store\'s session');
|
||||
}
|
||||
|
||||
console.log(`[GraphQL Client] Running two-mode crawl for ${cName} (${pricingType})...`);
|
||||
console.log(`[GraphQL Client] Platform ID: ${platformDispensaryId}, cName: ${cName}`);
|
||||
|
||||
const session = await createSession(cName);
|
||||
|
||||
try {
|
||||
// Mode A (UI parity)
|
||||
const modeAResult = await fetchProductsForMode(session, platformDispensaryId, cName, pricingType, 'mode_a');
|
||||
|
||||
// Delay between modes
|
||||
await new Promise((r) => setTimeout(r, dutchieConfig.modeDelayMs));
|
||||
|
||||
// Mode B (MAX COVERAGE)
|
||||
const modeBResult = await fetchProductsForMode(session, platformDispensaryId, cName, pricingType, 'mode_b');
|
||||
|
||||
// Merge results
|
||||
const modeBMap = new Map<string, DutchieRawProduct>();
|
||||
for (const product of modeBResult.products) {
|
||||
modeBMap.set(product._id, product);
|
||||
}
|
||||
|
||||
const productMap = new Map<string, DutchieRawProduct>();
|
||||
|
||||
// Add Mode A products, merging with Mode B if exists
|
||||
for (const product of modeAResult.products) {
|
||||
const modeBProduct = modeBMap.get(product._id);
|
||||
const mergedProduct = mergeProducts(product, modeBProduct);
|
||||
productMap.set(product._id, mergedProduct);
|
||||
}
|
||||
|
||||
// Add Mode B products not in Mode A
|
||||
for (const product of modeBResult.products) {
|
||||
if (!productMap.has(product._id)) {
|
||||
productMap.set(product._id, product);
|
||||
}
|
||||
}
|
||||
|
||||
const mergedProducts = Array.from(productMap.values());
|
||||
|
||||
console.log(`[GraphQL Client] Merged: ${mergedProducts.length} unique products`);
|
||||
console.log(`[GraphQL Client] Mode A: ${modeAResult.products.length}, Mode B: ${modeBResult.products.length}`);
|
||||
|
||||
return {
|
||||
modeA: { products: modeAResult.products, totalCount: modeAResult.totalCount },
|
||||
modeB: { products: modeBResult.products, totalCount: modeBResult.totalCount },
|
||||
merged: { products: mergedProducts, totalCount: mergedProducts.length },
|
||||
};
|
||||
} finally {
|
||||
await closeSession(session);
|
||||
}
|
||||
}
|
||||
@@ -1,665 +0,0 @@
|
||||
/**
|
||||
* Job Queue Service
|
||||
*
|
||||
* DB-backed job queue with claiming/locking for distributed workers.
|
||||
* Ensures only one worker processes a given store at a time.
|
||||
*/
|
||||
|
||||
import { query, getClient } from '../db/connection';
|
||||
import { v4 as uuidv4 } from 'uuid';
|
||||
import * as os from 'os';
|
||||
import { DEFAULT_CONFIG } from './store-validator';
|
||||
|
||||
// Minimum gap between crawls for the same dispensary (in minutes)
|
||||
const MIN_CRAWL_GAP_MINUTES = DEFAULT_CONFIG.minCrawlGapMinutes; // 2 minutes
|
||||
|
||||
// ============================================================
|
||||
// TYPES
|
||||
// ============================================================
|
||||
|
||||
export interface QueuedJob {
|
||||
id: number;
|
||||
jobType: string;
|
||||
dispensaryId: number | null;
|
||||
status: 'pending' | 'running' | 'completed' | 'failed';
|
||||
priority: number;
|
||||
retryCount: number;
|
||||
maxRetries: number;
|
||||
claimedBy: string | null;
|
||||
claimedAt: Date | null;
|
||||
workerHostname: string | null;
|
||||
startedAt: Date | null;
|
||||
completedAt: Date | null;
|
||||
errorMessage: string | null;
|
||||
productsFound: number;
|
||||
productsUpserted: number;
|
||||
snapshotsCreated: number;
|
||||
currentPage: number;
|
||||
totalPages: number | null;
|
||||
lastHeartbeatAt: Date | null;
|
||||
metadata: Record<string, any> | null;
|
||||
createdAt: Date;
|
||||
}
|
||||
|
||||
export interface EnqueueJobOptions {
|
||||
jobType: string;
|
||||
dispensaryId?: number;
|
||||
priority?: number;
|
||||
metadata?: Record<string, any>;
|
||||
maxRetries?: number;
|
||||
}
|
||||
|
||||
export interface ClaimJobOptions {
|
||||
workerId: string;
|
||||
jobTypes?: string[];
|
||||
lockDurationMinutes?: number;
|
||||
}
|
||||
|
||||
export interface JobProgress {
|
||||
productsFound?: number;
|
||||
productsUpserted?: number;
|
||||
snapshotsCreated?: number;
|
||||
currentPage?: number;
|
||||
totalPages?: number;
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// WORKER IDENTITY
|
||||
// ============================================================
|
||||
|
||||
let _workerId: string | null = null;
|
||||
|
||||
/**
|
||||
* Get or create a unique worker ID for this process
|
||||
* In Kubernetes, uses POD_NAME for clarity; otherwise generates a unique ID
|
||||
*/
|
||||
export function getWorkerId(): string {
|
||||
if (!_workerId) {
|
||||
// Prefer POD_NAME in K8s (set via fieldRef)
|
||||
const podName = process.env.POD_NAME;
|
||||
if (podName) {
|
||||
_workerId = podName;
|
||||
} else {
|
||||
const hostname = os.hostname();
|
||||
const pid = process.pid;
|
||||
const uuid = uuidv4().slice(0, 8);
|
||||
_workerId = `${hostname}-${pid}-${uuid}`;
|
||||
}
|
||||
}
|
||||
return _workerId;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get hostname for worker tracking
|
||||
* In Kubernetes, uses POD_NAME; otherwise uses os.hostname()
|
||||
*/
|
||||
export function getWorkerHostname(): string {
|
||||
return process.env.POD_NAME || os.hostname();
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// JOB ENQUEUEING
|
||||
// ============================================================
|
||||
|
||||
export interface EnqueueResult {
|
||||
jobId: number | null;
|
||||
skipped: boolean;
|
||||
reason?: 'already_queued' | 'too_soon' | 'error';
|
||||
message?: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Enqueue a new job for processing
|
||||
* Returns null if a pending/running job already exists for this dispensary
|
||||
* or if a job was completed/failed within the minimum gap period
|
||||
*/
|
||||
export async function enqueueJob(options: EnqueueJobOptions): Promise<number | null> {
|
||||
const result = await enqueueJobWithReason(options);
|
||||
return result.jobId;
|
||||
}
|
||||
|
||||
/**
|
||||
* Enqueue a new job with detailed result info
|
||||
* Enforces:
|
||||
* 1. No duplicate pending/running jobs for same dispensary
|
||||
* 2. Minimum 2-minute gap between crawls for same dispensary
|
||||
*/
|
||||
export async function enqueueJobWithReason(options: EnqueueJobOptions): Promise<EnqueueResult> {
|
||||
const {
|
||||
jobType,
|
||||
dispensaryId,
|
||||
priority = 0,
|
||||
metadata,
|
||||
maxRetries = 3,
|
||||
} = options;
|
||||
|
||||
// Check if there's already a pending/running job for this dispensary
|
||||
if (dispensaryId) {
|
||||
const { rows: existing } = await query<any>(
|
||||
`SELECT id FROM dispensary_crawl_jobs
|
||||
WHERE dispensary_id = $1 AND status IN ('pending', 'running')
|
||||
LIMIT 1`,
|
||||
[dispensaryId]
|
||||
);
|
||||
|
||||
if (existing.length > 0) {
|
||||
console.log(`[JobQueue] Skipping enqueue - job already exists for dispensary ${dispensaryId}`);
|
||||
return {
|
||||
jobId: null,
|
||||
skipped: true,
|
||||
reason: 'already_queued',
|
||||
message: `Job already pending/running for dispensary ${dispensaryId}`,
|
||||
};
|
||||
}
|
||||
|
||||
// Check minimum gap since last job (2 minutes)
|
||||
const { rows: recent } = await query<any>(
|
||||
`SELECT id, created_at, status
|
||||
FROM dispensary_crawl_jobs
|
||||
WHERE dispensary_id = $1
|
||||
ORDER BY created_at DESC
|
||||
LIMIT 1`,
|
||||
[dispensaryId]
|
||||
);
|
||||
|
||||
if (recent.length > 0) {
|
||||
const lastJobTime = new Date(recent[0].created_at);
|
||||
const minGapMs = MIN_CRAWL_GAP_MINUTES * 60 * 1000;
|
||||
const timeSinceLastJob = Date.now() - lastJobTime.getTime();
|
||||
|
||||
if (timeSinceLastJob < minGapMs) {
|
||||
const waitSeconds = Math.ceil((minGapMs - timeSinceLastJob) / 1000);
|
||||
console.log(`[JobQueue] Skipping enqueue - minimum ${MIN_CRAWL_GAP_MINUTES}min gap not met for dispensary ${dispensaryId}. Wait ${waitSeconds}s`);
|
||||
return {
|
||||
jobId: null,
|
||||
skipped: true,
|
||||
reason: 'too_soon',
|
||||
message: `Minimum ${MIN_CRAWL_GAP_MINUTES}-minute gap required. Try again in ${waitSeconds} seconds.`,
|
||||
};
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
try {
|
||||
const { rows } = await query<any>(
|
||||
`INSERT INTO dispensary_crawl_jobs (job_type, dispensary_id, status, priority, max_retries, metadata, created_at)
|
||||
VALUES ($1, $2, 'pending', $3, $4, $5, NOW())
|
||||
RETURNING id`,
|
||||
[jobType, dispensaryId || null, priority, maxRetries, metadata ? JSON.stringify(metadata) : null]
|
||||
);
|
||||
|
||||
const jobId = rows[0].id;
|
||||
console.log(`[JobQueue] Enqueued job ${jobId} (type=${jobType}, dispensary=${dispensaryId})`);
|
||||
return { jobId, skipped: false };
|
||||
} catch (error: any) {
|
||||
// Handle database trigger rejection for minimum gap
|
||||
if (error.message?.includes('Minimum') && error.message?.includes('gap')) {
|
||||
console.log(`[JobQueue] DB rejected - minimum gap not met for dispensary ${dispensaryId}`);
|
||||
return {
|
||||
jobId: null,
|
||||
skipped: true,
|
||||
reason: 'too_soon',
|
||||
message: error.message,
|
||||
};
|
||||
}
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
export interface BulkEnqueueResult {
|
||||
enqueued: number;
|
||||
skipped: number;
|
||||
skippedReasons: {
|
||||
alreadyQueued: number;
|
||||
tooSoon: number;
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Bulk enqueue jobs for multiple dispensaries
|
||||
* Skips dispensaries that already have pending/running jobs
|
||||
* or have jobs within the minimum gap period
|
||||
*/
|
||||
export async function bulkEnqueueJobs(
|
||||
jobType: string,
|
||||
dispensaryIds: number[],
|
||||
options: { priority?: number; metadata?: Record<string, any> } = {}
|
||||
): Promise<BulkEnqueueResult> {
|
||||
const { priority = 0, metadata } = options;
|
||||
|
||||
// Get dispensaries that already have pending/running jobs
|
||||
const { rows: existing } = await query<any>(
|
||||
`SELECT DISTINCT dispensary_id FROM dispensary_crawl_jobs
|
||||
WHERE dispensary_id = ANY($1) AND status IN ('pending', 'running')`,
|
||||
[dispensaryIds]
|
||||
);
|
||||
const existingSet = new Set(existing.map((r: any) => r.dispensary_id));
|
||||
|
||||
// Get dispensaries that have recent jobs within minimum gap
|
||||
const { rows: recent } = await query<any>(
|
||||
`SELECT DISTINCT dispensary_id FROM dispensary_crawl_jobs
|
||||
WHERE dispensary_id = ANY($1)
|
||||
AND created_at > NOW() - ($2 || ' minutes')::INTERVAL
|
||||
AND dispensary_id NOT IN (
|
||||
SELECT dispensary_id FROM dispensary_crawl_jobs
|
||||
WHERE dispensary_id = ANY($1) AND status IN ('pending', 'running')
|
||||
)`,
|
||||
[dispensaryIds, MIN_CRAWL_GAP_MINUTES]
|
||||
);
|
||||
const recentSet = new Set(recent.map((r: any) => r.dispensary_id));
|
||||
|
||||
// Filter out dispensaries with existing or recent jobs
|
||||
const toEnqueue = dispensaryIds.filter(id => !existingSet.has(id) && !recentSet.has(id));
|
||||
|
||||
if (toEnqueue.length === 0) {
|
||||
return {
|
||||
enqueued: 0,
|
||||
skipped: dispensaryIds.length,
|
||||
skippedReasons: {
|
||||
alreadyQueued: existingSet.size,
|
||||
tooSoon: recentSet.size,
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
// Bulk insert - each row needs 4 params: job_type, dispensary_id, priority, metadata
|
||||
const metadataJson = metadata ? JSON.stringify(metadata) : null;
|
||||
const values = toEnqueue.map((_, i) => {
|
||||
const offset = i * 4;
|
||||
return `($${offset + 1}, $${offset + 2}, 'pending', $${offset + 3}, 3, $${offset + 4}, NOW())`;
|
||||
}).join(', ');
|
||||
|
||||
const params: any[] = [];
|
||||
toEnqueue.forEach(dispensaryId => {
|
||||
params.push(jobType, dispensaryId, priority, metadataJson);
|
||||
});
|
||||
|
||||
await query(
|
||||
`INSERT INTO dispensary_crawl_jobs (job_type, dispensary_id, status, priority, max_retries, metadata, created_at)
|
||||
VALUES ${values}`,
|
||||
params
|
||||
);
|
||||
|
||||
console.log(`[JobQueue] Bulk enqueued ${toEnqueue.length} jobs, skipped ${existingSet.size} (queued) + ${recentSet.size} (recent)`);
|
||||
return {
|
||||
enqueued: toEnqueue.length,
|
||||
skipped: existingSet.size + recentSet.size,
|
||||
skippedReasons: {
|
||||
alreadyQueued: existingSet.size,
|
||||
tooSoon: recentSet.size,
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// JOB CLAIMING (with locking)
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Claim the next available job from the queue
|
||||
* Uses SELECT FOR UPDATE SKIP LOCKED to prevent double-claims
|
||||
*/
|
||||
export async function claimNextJob(options: ClaimJobOptions): Promise<QueuedJob | null> {
|
||||
const { workerId, jobTypes, lockDurationMinutes = 30 } = options;
|
||||
const hostname = getWorkerHostname();
|
||||
|
||||
const client = await getClient();
|
||||
|
||||
try {
|
||||
await client.query('BEGIN');
|
||||
|
||||
// Build job type filter
|
||||
let typeFilter = '';
|
||||
const params: any[] = [workerId, hostname, lockDurationMinutes];
|
||||
let paramIndex = 4;
|
||||
|
||||
if (jobTypes && jobTypes.length > 0) {
|
||||
typeFilter = `AND job_type = ANY($${paramIndex})`;
|
||||
params.push(jobTypes);
|
||||
paramIndex++;
|
||||
}
|
||||
|
||||
// Claim the next pending job using FOR UPDATE SKIP LOCKED
|
||||
// This atomically selects and locks a row, skipping any already locked by other workers
|
||||
const { rows } = await client.query(
|
||||
`UPDATE dispensary_crawl_jobs
|
||||
SET
|
||||
status = 'running',
|
||||
claimed_by = $1,
|
||||
claimed_at = NOW(),
|
||||
worker_id = $1,
|
||||
worker_hostname = $2,
|
||||
started_at = NOW(),
|
||||
locked_until = NOW() + ($3 || ' minutes')::INTERVAL,
|
||||
last_heartbeat_at = NOW(),
|
||||
updated_at = NOW()
|
||||
WHERE id = (
|
||||
SELECT id FROM dispensary_crawl_jobs
|
||||
WHERE status = 'pending'
|
||||
${typeFilter}
|
||||
ORDER BY priority DESC, created_at ASC
|
||||
FOR UPDATE SKIP LOCKED
|
||||
LIMIT 1
|
||||
)
|
||||
RETURNING *`,
|
||||
params
|
||||
);
|
||||
|
||||
await client.query('COMMIT');
|
||||
|
||||
if (rows.length === 0) {
|
||||
return null;
|
||||
}
|
||||
|
||||
const job = mapDbRowToJob(rows[0]);
|
||||
console.log(`[JobQueue] Worker ${workerId} claimed job ${job.id} (type=${job.jobType}, dispensary=${job.dispensaryId})`);
|
||||
return job;
|
||||
} catch (error) {
|
||||
await client.query('ROLLBACK');
|
||||
throw error;
|
||||
} finally {
|
||||
client.release();
|
||||
}
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// JOB PROGRESS & COMPLETION
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Update job progress (for live monitoring)
|
||||
*/
|
||||
export async function updateJobProgress(jobId: number, progress: JobProgress): Promise<void> {
|
||||
const updates: string[] = ['last_heartbeat_at = NOW()', 'updated_at = NOW()'];
|
||||
const params: any[] = [];
|
||||
let paramIndex = 1;
|
||||
|
||||
if (progress.productsFound !== undefined) {
|
||||
updates.push(`products_found = $${paramIndex++}`);
|
||||
params.push(progress.productsFound);
|
||||
}
|
||||
if (progress.productsUpserted !== undefined) {
|
||||
updates.push(`products_upserted = $${paramIndex++}`);
|
||||
params.push(progress.productsUpserted);
|
||||
}
|
||||
if (progress.snapshotsCreated !== undefined) {
|
||||
updates.push(`snapshots_created = $${paramIndex++}`);
|
||||
params.push(progress.snapshotsCreated);
|
||||
}
|
||||
if (progress.currentPage !== undefined) {
|
||||
updates.push(`current_page = $${paramIndex++}`);
|
||||
params.push(progress.currentPage);
|
||||
}
|
||||
if (progress.totalPages !== undefined) {
|
||||
updates.push(`total_pages = $${paramIndex++}`);
|
||||
params.push(progress.totalPages);
|
||||
}
|
||||
|
||||
params.push(jobId);
|
||||
|
||||
await query(
|
||||
`UPDATE dispensary_crawl_jobs SET ${updates.join(', ')} WHERE id = $${paramIndex}`,
|
||||
params
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
* Send heartbeat to keep job alive (prevents timeout)
|
||||
*/
|
||||
export async function heartbeat(jobId: number): Promise<void> {
|
||||
await query(
|
||||
`UPDATE dispensary_crawl_jobs
|
||||
SET last_heartbeat_at = NOW(), locked_until = NOW() + INTERVAL '30 minutes'
|
||||
WHERE id = $1 AND status = 'running'`,
|
||||
[jobId]
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
* Mark job as completed
|
||||
*
|
||||
* Stores visibility tracking stats (visibilityLostCount, visibilityRestoredCount)
|
||||
* in the metadata JSONB column for dashboard analytics.
|
||||
*/
|
||||
export async function completeJob(
|
||||
jobId: number,
|
||||
result: {
|
||||
productsFound?: number;
|
||||
productsUpserted?: number;
|
||||
snapshotsCreated?: number;
|
||||
visibilityLostCount?: number;
|
||||
visibilityRestoredCount?: number;
|
||||
}
|
||||
): Promise<void> {
|
||||
// Build metadata with visibility stats if provided
|
||||
const metadata: Record<string, any> = {};
|
||||
if (result.visibilityLostCount !== undefined) {
|
||||
metadata.visibilityLostCount = result.visibilityLostCount;
|
||||
}
|
||||
if (result.visibilityRestoredCount !== undefined) {
|
||||
metadata.visibilityRestoredCount = result.visibilityRestoredCount;
|
||||
}
|
||||
if (result.snapshotsCreated !== undefined) {
|
||||
metadata.snapshotsCreated = result.snapshotsCreated;
|
||||
}
|
||||
|
||||
await query(
|
||||
`UPDATE dispensary_crawl_jobs
|
||||
SET
|
||||
status = 'completed',
|
||||
completed_at = NOW(),
|
||||
products_found = COALESCE($2, products_found),
|
||||
products_updated = COALESCE($3, products_updated),
|
||||
metadata = COALESCE(metadata, '{}'::jsonb) || $4::jsonb,
|
||||
updated_at = NOW()
|
||||
WHERE id = $1`,
|
||||
[
|
||||
jobId,
|
||||
result.productsFound,
|
||||
result.productsUpserted,
|
||||
JSON.stringify(metadata),
|
||||
]
|
||||
);
|
||||
console.log(`[JobQueue] Job ${jobId} completed`);
|
||||
}
|
||||
|
||||
/**
|
||||
* Mark job as failed
|
||||
*/
|
||||
export async function failJob(jobId: number, errorMessage: string): Promise<boolean> {
|
||||
// Check if we should retry
|
||||
const { rows } = await query<any>(
|
||||
`SELECT retry_count, max_retries FROM dispensary_crawl_jobs WHERE id = $1`,
|
||||
[jobId]
|
||||
);
|
||||
|
||||
if (rows.length === 0) return false;
|
||||
|
||||
const { retry_count, max_retries } = rows[0];
|
||||
|
||||
if (retry_count < max_retries) {
|
||||
// Re-queue for retry
|
||||
await query(
|
||||
`UPDATE dispensary_crawl_jobs
|
||||
SET
|
||||
status = 'pending',
|
||||
retry_count = retry_count + 1,
|
||||
claimed_by = NULL,
|
||||
claimed_at = NULL,
|
||||
worker_id = NULL,
|
||||
worker_hostname = NULL,
|
||||
started_at = NULL,
|
||||
locked_until = NULL,
|
||||
last_heartbeat_at = NULL,
|
||||
error_message = $2,
|
||||
updated_at = NOW()
|
||||
WHERE id = $1`,
|
||||
[jobId, errorMessage]
|
||||
);
|
||||
console.log(`[JobQueue] Job ${jobId} failed, re-queued for retry (${retry_count + 1}/${max_retries})`);
|
||||
return true; // Will retry
|
||||
} else {
|
||||
// Mark as failed permanently
|
||||
await query(
|
||||
`UPDATE dispensary_crawl_jobs
|
||||
SET
|
||||
status = 'failed',
|
||||
completed_at = NOW(),
|
||||
error_message = $2,
|
||||
updated_at = NOW()
|
||||
WHERE id = $1`,
|
||||
[jobId, errorMessage]
|
||||
);
|
||||
console.log(`[JobQueue] Job ${jobId} failed permanently after ${retry_count} retries`);
|
||||
return false; // No more retries
|
||||
}
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// QUEUE MONITORING
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Get queue statistics
|
||||
*/
|
||||
export async function getQueueStats(): Promise<{
|
||||
pending: number;
|
||||
running: number;
|
||||
completed1h: number;
|
||||
failed1h: number;
|
||||
activeWorkers: number;
|
||||
avgDurationSeconds: number | null;
|
||||
}> {
|
||||
const { rows } = await query<any>(`SELECT * FROM v_queue_stats`);
|
||||
const stats = rows[0] || {};
|
||||
|
||||
return {
|
||||
pending: parseInt(stats.pending_jobs || '0', 10),
|
||||
running: parseInt(stats.running_jobs || '0', 10),
|
||||
completed1h: parseInt(stats.completed_1h || '0', 10),
|
||||
failed1h: parseInt(stats.failed_1h || '0', 10),
|
||||
activeWorkers: parseInt(stats.active_workers || '0', 10),
|
||||
avgDurationSeconds: stats.avg_duration_seconds ? parseFloat(stats.avg_duration_seconds) : null,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Get active workers
|
||||
*/
|
||||
export async function getActiveWorkers(): Promise<Array<{
|
||||
workerId: string;
|
||||
hostname: string | null;
|
||||
currentJobs: number;
|
||||
totalProductsFound: number;
|
||||
totalProductsUpserted: number;
|
||||
totalSnapshots: number;
|
||||
firstClaimedAt: Date;
|
||||
lastHeartbeat: Date | null;
|
||||
}>> {
|
||||
const { rows } = await query<any>(`SELECT * FROM v_active_workers`);
|
||||
|
||||
return rows.map((row: any) => ({
|
||||
workerId: row.worker_id,
|
||||
hostname: row.worker_hostname,
|
||||
currentJobs: parseInt(row.current_jobs || '0', 10),
|
||||
totalProductsFound: parseInt(row.total_products_found || '0', 10),
|
||||
totalProductsUpserted: parseInt(row.total_products_upserted || '0', 10),
|
||||
totalSnapshots: parseInt(row.total_snapshots || '0', 10),
|
||||
firstClaimedAt: new Date(row.first_claimed_at),
|
||||
lastHeartbeat: row.last_heartbeat ? new Date(row.last_heartbeat) : null,
|
||||
}));
|
||||
}
|
||||
|
||||
/**
|
||||
* Get running jobs with worker info
|
||||
*/
|
||||
export async function getRunningJobs(): Promise<QueuedJob[]> {
|
||||
const { rows } = await query<any>(
|
||||
`SELECT cj.*, d.name as dispensary_name, d.city
|
||||
FROM dispensary_crawl_jobs cj
|
||||
LEFT JOIN dispensaries d ON cj.dispensary_id = d.id
|
||||
WHERE cj.status = 'running'
|
||||
ORDER BY cj.started_at DESC`
|
||||
);
|
||||
|
||||
return rows.map(mapDbRowToJob);
|
||||
}
|
||||
|
||||
/**
|
||||
* Recover stale jobs (workers that died without completing)
|
||||
*/
|
||||
export async function recoverStaleJobs(staleMinutes: number = 15): Promise<number> {
|
||||
const { rowCount } = await query(
|
||||
`UPDATE dispensary_crawl_jobs
|
||||
SET
|
||||
status = 'pending',
|
||||
claimed_by = NULL,
|
||||
claimed_at = NULL,
|
||||
worker_id = NULL,
|
||||
worker_hostname = NULL,
|
||||
started_at = NULL,
|
||||
locked_until = NULL,
|
||||
error_message = 'Recovered from stale worker',
|
||||
retry_count = retry_count + 1,
|
||||
updated_at = NOW()
|
||||
WHERE status = 'running'
|
||||
AND last_heartbeat_at < NOW() - ($1 || ' minutes')::INTERVAL
|
||||
AND retry_count < max_retries`,
|
||||
[staleMinutes]
|
||||
);
|
||||
|
||||
if (rowCount && rowCount > 0) {
|
||||
console.log(`[JobQueue] Recovered ${rowCount} stale jobs`);
|
||||
}
|
||||
return rowCount || 0;
|
||||
}
|
||||
|
||||
/**
|
||||
* Clean up old completed/failed jobs
|
||||
*/
|
||||
export async function cleanupOldJobs(olderThanDays: number = 7): Promise<number> {
|
||||
const { rowCount } = await query(
|
||||
`DELETE FROM dispensary_crawl_jobs
|
||||
WHERE status IN ('completed', 'failed')
|
||||
AND completed_at < NOW() - ($1 || ' days')::INTERVAL`,
|
||||
[olderThanDays]
|
||||
);
|
||||
|
||||
if (rowCount && rowCount > 0) {
|
||||
console.log(`[JobQueue] Cleaned up ${rowCount} old jobs`);
|
||||
}
|
||||
return rowCount || 0;
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// HELPERS
|
||||
// ============================================================
|
||||
|
||||
function mapDbRowToJob(row: any): QueuedJob {
|
||||
return {
|
||||
id: row.id,
|
||||
jobType: row.job_type,
|
||||
dispensaryId: row.dispensary_id,
|
||||
status: row.status,
|
||||
priority: row.priority || 0,
|
||||
retryCount: row.retry_count || 0,
|
||||
maxRetries: row.max_retries || 3,
|
||||
claimedBy: row.claimed_by,
|
||||
claimedAt: row.claimed_at ? new Date(row.claimed_at) : null,
|
||||
workerHostname: row.worker_hostname,
|
||||
startedAt: row.started_at ? new Date(row.started_at) : null,
|
||||
completedAt: row.completed_at ? new Date(row.completed_at) : null,
|
||||
errorMessage: row.error_message,
|
||||
productsFound: row.products_found || 0,
|
||||
productsUpserted: row.products_upserted || 0,
|
||||
snapshotsCreated: row.snapshots_created || 0,
|
||||
currentPage: row.current_page || 0,
|
||||
totalPages: row.total_pages,
|
||||
lastHeartbeatAt: row.last_heartbeat_at ? new Date(row.last_heartbeat_at) : null,
|
||||
metadata: row.metadata,
|
||||
createdAt: new Date(row.created_at),
|
||||
// Add extra fields from join if present
|
||||
...(row.dispensary_name && { dispensaryName: row.dispensary_name }),
|
||||
...(row.city && { city: row.city }),
|
||||
};
|
||||
}
|
||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -1,435 +0,0 @@
|
||||
/**
|
||||
* Unified Retry Manager
|
||||
*
|
||||
* Handles retry logic with exponential backoff, jitter, and
|
||||
* intelligent error-based decisions (rotate proxy, rotate UA, etc.)
|
||||
*
|
||||
* Phase 1: Crawler Reliability & Stabilization
|
||||
*/
|
||||
|
||||
import {
|
||||
CrawlErrorCodeType,
|
||||
CrawlErrorCode,
|
||||
classifyError,
|
||||
getErrorMetadata,
|
||||
isRetryable,
|
||||
shouldRotateProxy,
|
||||
shouldRotateUserAgent,
|
||||
getBackoffMultiplier,
|
||||
} from './error-taxonomy';
|
||||
import { DEFAULT_CONFIG } from './store-validator';
|
||||
|
||||
// ============================================================
|
||||
// RETRY CONFIGURATION
|
||||
// ============================================================
|
||||
|
||||
export interface RetryConfig {
|
||||
maxRetries: number;
|
||||
baseBackoffMs: number;
|
||||
maxBackoffMs: number;
|
||||
backoffMultiplier: number;
|
||||
jitterFactor: number; // 0.0 - 1.0 (percentage of backoff to randomize)
|
||||
}
|
||||
|
||||
export const DEFAULT_RETRY_CONFIG: RetryConfig = {
|
||||
maxRetries: DEFAULT_CONFIG.maxRetries,
|
||||
baseBackoffMs: DEFAULT_CONFIG.baseBackoffMs,
|
||||
maxBackoffMs: DEFAULT_CONFIG.maxBackoffMs,
|
||||
backoffMultiplier: DEFAULT_CONFIG.backoffMultiplier,
|
||||
jitterFactor: 0.25, // +/- 25% jitter
|
||||
};
|
||||
|
||||
// ============================================================
|
||||
// RETRY CONTEXT
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Context for tracking retry state across attempts
|
||||
*/
|
||||
export interface RetryContext {
|
||||
attemptNumber: number;
|
||||
maxAttempts: number;
|
||||
lastErrorCode: CrawlErrorCodeType | null;
|
||||
lastHttpStatus: number | null;
|
||||
totalBackoffMs: number;
|
||||
proxyRotated: boolean;
|
||||
userAgentRotated: boolean;
|
||||
startedAt: Date;
|
||||
}
|
||||
|
||||
/**
|
||||
* Decision about what to do after an error
|
||||
*/
|
||||
export interface RetryDecision {
|
||||
shouldRetry: boolean;
|
||||
reason: string;
|
||||
backoffMs: number;
|
||||
rotateProxy: boolean;
|
||||
rotateUserAgent: boolean;
|
||||
errorCode: CrawlErrorCodeType;
|
||||
attemptNumber: number;
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// RETRY MANAGER CLASS
|
||||
// ============================================================
|
||||
|
||||
export class RetryManager {
|
||||
private config: RetryConfig;
|
||||
private context: RetryContext;
|
||||
|
||||
constructor(config: Partial<RetryConfig> = {}) {
|
||||
this.config = { ...DEFAULT_RETRY_CONFIG, ...config };
|
||||
this.context = this.createInitialContext();
|
||||
}
|
||||
|
||||
/**
|
||||
* Create initial retry context
|
||||
*/
|
||||
private createInitialContext(): RetryContext {
|
||||
return {
|
||||
attemptNumber: 0,
|
||||
maxAttempts: this.config.maxRetries + 1, // +1 for initial attempt
|
||||
lastErrorCode: null,
|
||||
lastHttpStatus: null,
|
||||
totalBackoffMs: 0,
|
||||
proxyRotated: false,
|
||||
userAgentRotated: false,
|
||||
startedAt: new Date(),
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Reset retry state for a new operation
|
||||
*/
|
||||
reset(): void {
|
||||
this.context = this.createInitialContext();
|
||||
}
|
||||
|
||||
/**
|
||||
* Get current attempt number (1-based)
|
||||
*/
|
||||
getAttemptNumber(): number {
|
||||
return this.context.attemptNumber + 1;
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if we should attempt (call before each attempt)
|
||||
*/
|
||||
shouldAttempt(): boolean {
|
||||
return this.context.attemptNumber < this.context.maxAttempts;
|
||||
}
|
||||
|
||||
/**
|
||||
* Record an attempt (call at start of each attempt)
|
||||
*/
|
||||
recordAttempt(): void {
|
||||
this.context.attemptNumber++;
|
||||
}
|
||||
|
||||
/**
|
||||
* Evaluate an error and decide what to do
|
||||
*/
|
||||
evaluateError(
|
||||
error: Error | string | null,
|
||||
httpStatus?: number
|
||||
): RetryDecision {
|
||||
const errorCode = classifyError(error, httpStatus);
|
||||
const metadata = getErrorMetadata(errorCode);
|
||||
const attemptNumber = this.context.attemptNumber;
|
||||
|
||||
// Update context
|
||||
this.context.lastErrorCode = errorCode;
|
||||
this.context.lastHttpStatus = httpStatus || null;
|
||||
|
||||
// Check if error is retryable
|
||||
if (!isRetryable(errorCode)) {
|
||||
return {
|
||||
shouldRetry: false,
|
||||
reason: `Error ${errorCode} is not retryable: ${metadata.description}`,
|
||||
backoffMs: 0,
|
||||
rotateProxy: false,
|
||||
rotateUserAgent: false,
|
||||
errorCode,
|
||||
attemptNumber,
|
||||
};
|
||||
}
|
||||
|
||||
// Check if we've exhausted retries
|
||||
if (!this.shouldAttempt()) {
|
||||
return {
|
||||
shouldRetry: false,
|
||||
reason: `Max retries (${this.config.maxRetries}) exhausted`,
|
||||
backoffMs: 0,
|
||||
rotateProxy: false,
|
||||
rotateUserAgent: false,
|
||||
errorCode,
|
||||
attemptNumber,
|
||||
};
|
||||
}
|
||||
|
||||
// Calculate backoff with exponential increase and jitter
|
||||
const baseBackoff = this.calculateBackoff(attemptNumber, errorCode);
|
||||
const backoffWithJitter = this.addJitter(baseBackoff);
|
||||
|
||||
// Track total backoff
|
||||
this.context.totalBackoffMs += backoffWithJitter;
|
||||
|
||||
// Determine rotation needs
|
||||
const rotateProxy = shouldRotateProxy(errorCode);
|
||||
const rotateUserAgent = shouldRotateUserAgent(errorCode);
|
||||
|
||||
if (rotateProxy) this.context.proxyRotated = true;
|
||||
if (rotateUserAgent) this.context.userAgentRotated = true;
|
||||
|
||||
const rotationInfo = [];
|
||||
if (rotateProxy) rotationInfo.push('rotate proxy');
|
||||
if (rotateUserAgent) rotationInfo.push('rotate UA');
|
||||
const rotationStr = rotationInfo.length > 0 ? ` (${rotationInfo.join(', ')})` : '';
|
||||
|
||||
return {
|
||||
shouldRetry: true,
|
||||
reason: `Retrying after ${errorCode}${rotationStr}, backoff ${backoffWithJitter}ms`,
|
||||
backoffMs: backoffWithJitter,
|
||||
rotateProxy,
|
||||
rotateUserAgent,
|
||||
errorCode,
|
||||
attemptNumber,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Calculate exponential backoff for an attempt
|
||||
*/
|
||||
private calculateBackoff(attemptNumber: number, errorCode: CrawlErrorCodeType): number {
|
||||
// Base exponential: baseBackoff * multiplier^(attempt-1)
|
||||
const exponential = this.config.baseBackoffMs *
|
||||
Math.pow(this.config.backoffMultiplier, attemptNumber - 1);
|
||||
|
||||
// Apply error-specific multiplier
|
||||
const errorMultiplier = getBackoffMultiplier(errorCode);
|
||||
const adjusted = exponential * errorMultiplier;
|
||||
|
||||
// Cap at max backoff
|
||||
return Math.min(adjusted, this.config.maxBackoffMs);
|
||||
}
|
||||
|
||||
/**
|
||||
* Add jitter to backoff to prevent thundering herd
|
||||
*/
|
||||
private addJitter(backoffMs: number): number {
|
||||
const jitterRange = backoffMs * this.config.jitterFactor;
|
||||
// Random between -jitterRange and +jitterRange
|
||||
const jitter = (Math.random() * 2 - 1) * jitterRange;
|
||||
return Math.max(0, Math.round(backoffMs + jitter));
|
||||
}
|
||||
|
||||
/**
|
||||
* Get retry context summary
|
||||
*/
|
||||
getSummary(): RetryContextSummary {
|
||||
const elapsedMs = Date.now() - this.context.startedAt.getTime();
|
||||
return {
|
||||
attemptsMade: this.context.attemptNumber,
|
||||
maxAttempts: this.context.maxAttempts,
|
||||
lastErrorCode: this.context.lastErrorCode,
|
||||
lastHttpStatus: this.context.lastHttpStatus,
|
||||
totalBackoffMs: this.context.totalBackoffMs,
|
||||
totalElapsedMs: elapsedMs,
|
||||
proxyWasRotated: this.context.proxyRotated,
|
||||
userAgentWasRotated: this.context.userAgentRotated,
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
export interface RetryContextSummary {
|
||||
attemptsMade: number;
|
||||
maxAttempts: number;
|
||||
lastErrorCode: CrawlErrorCodeType | null;
|
||||
lastHttpStatus: number | null;
|
||||
totalBackoffMs: number;
|
||||
totalElapsedMs: number;
|
||||
proxyWasRotated: boolean;
|
||||
userAgentWasRotated: boolean;
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// CONVENIENCE FUNCTIONS
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Sleep for specified milliseconds
|
||||
*/
|
||||
export function sleep(ms: number): Promise<void> {
|
||||
return new Promise(resolve => setTimeout(resolve, ms));
|
||||
}
|
||||
|
||||
/**
|
||||
* Execute a function with automatic retry logic
|
||||
*/
|
||||
export async function withRetry<T>(
|
||||
fn: (attemptNumber: number) => Promise<T>,
|
||||
config: Partial<RetryConfig> = {},
|
||||
callbacks?: {
|
||||
onRetry?: (decision: RetryDecision) => void | Promise<void>;
|
||||
onRotateProxy?: () => void | Promise<void>;
|
||||
onRotateUserAgent?: () => void | Promise<void>;
|
||||
}
|
||||
): Promise<{ result: T; summary: RetryContextSummary }> {
|
||||
const manager = new RetryManager(config);
|
||||
|
||||
while (manager.shouldAttempt()) {
|
||||
manager.recordAttempt();
|
||||
const attemptNumber = manager.getAttemptNumber();
|
||||
|
||||
try {
|
||||
const result = await fn(attemptNumber);
|
||||
return { result, summary: manager.getSummary() };
|
||||
} catch (error) {
|
||||
const err = error instanceof Error ? error : new Error(String(error));
|
||||
const httpStatus = (error as any)?.status || (error as any)?.statusCode;
|
||||
|
||||
const decision = manager.evaluateError(err, httpStatus);
|
||||
|
||||
if (!decision.shouldRetry) {
|
||||
// Re-throw with enhanced context
|
||||
const enhancedError = new RetryExhaustedError(
|
||||
`${err.message} (${decision.reason})`,
|
||||
err,
|
||||
manager.getSummary()
|
||||
);
|
||||
throw enhancedError;
|
||||
}
|
||||
|
||||
// Notify callbacks
|
||||
if (callbacks?.onRetry) {
|
||||
await callbacks.onRetry(decision);
|
||||
}
|
||||
if (decision.rotateProxy && callbacks?.onRotateProxy) {
|
||||
await callbacks.onRotateProxy();
|
||||
}
|
||||
if (decision.rotateUserAgent && callbacks?.onRotateUserAgent) {
|
||||
await callbacks.onRotateUserAgent();
|
||||
}
|
||||
|
||||
// Log retry decision
|
||||
console.log(
|
||||
`[RetryManager] Attempt ${attemptNumber} failed: ${decision.errorCode}. ` +
|
||||
`${decision.reason}. Waiting ${decision.backoffMs}ms before retry.`
|
||||
);
|
||||
|
||||
// Wait before retry
|
||||
await sleep(decision.backoffMs);
|
||||
}
|
||||
}
|
||||
|
||||
// Should not reach here, but handle edge case
|
||||
throw new RetryExhaustedError(
|
||||
'Max retries exhausted',
|
||||
null,
|
||||
manager.getSummary()
|
||||
);
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// CUSTOM ERROR CLASS
|
||||
// ============================================================
|
||||
|
||||
export class RetryExhaustedError extends Error {
|
||||
public readonly originalError: Error | null;
|
||||
public readonly summary: RetryContextSummary;
|
||||
public readonly errorCode: CrawlErrorCodeType;
|
||||
|
||||
constructor(
|
||||
message: string,
|
||||
originalError: Error | null,
|
||||
summary: RetryContextSummary
|
||||
) {
|
||||
super(message);
|
||||
this.name = 'RetryExhaustedError';
|
||||
this.originalError = originalError;
|
||||
this.summary = summary;
|
||||
this.errorCode = summary.lastErrorCode || CrawlErrorCode.UNKNOWN_ERROR;
|
||||
}
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// BACKOFF CALCULATOR (for external use)
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Calculate next crawl time based on consecutive failures
|
||||
*/
|
||||
export function calculateNextCrawlDelay(
|
||||
consecutiveFailures: number,
|
||||
baseFrequencyMinutes: number,
|
||||
maxBackoffMultiplier: number = 4.0
|
||||
): number {
|
||||
// Each failure doubles the delay, up to max multiplier
|
||||
const multiplier = Math.min(
|
||||
Math.pow(2, consecutiveFailures),
|
||||
maxBackoffMultiplier
|
||||
);
|
||||
|
||||
const delayMinutes = baseFrequencyMinutes * multiplier;
|
||||
|
||||
// Add jitter (0-10% of delay)
|
||||
const jitterMinutes = delayMinutes * Math.random() * 0.1;
|
||||
|
||||
return Math.round(delayMinutes + jitterMinutes);
|
||||
}
|
||||
|
||||
/**
|
||||
* Calculate next crawl timestamp
|
||||
*/
|
||||
export function calculateNextCrawlAt(
|
||||
consecutiveFailures: number,
|
||||
baseFrequencyMinutes: number
|
||||
): Date {
|
||||
const delayMinutes = calculateNextCrawlDelay(consecutiveFailures, baseFrequencyMinutes);
|
||||
return new Date(Date.now() + delayMinutes * 60 * 1000);
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// STATUS DETERMINATION
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Determine crawl status based on failure count
|
||||
*/
|
||||
export function determineCrawlStatus(
|
||||
consecutiveFailures: number,
|
||||
thresholds: { degraded: number; failed: number } = { degraded: 3, failed: 10 }
|
||||
): 'active' | 'degraded' | 'failed' {
|
||||
if (consecutiveFailures >= thresholds.failed) {
|
||||
return 'failed';
|
||||
}
|
||||
if (consecutiveFailures >= thresholds.degraded) {
|
||||
return 'degraded';
|
||||
}
|
||||
return 'active';
|
||||
}
|
||||
|
||||
/**
|
||||
* Determine if store should be auto-recovered
|
||||
* (Called periodically to check if failed stores can be retried)
|
||||
*/
|
||||
export function shouldAttemptRecovery(
|
||||
lastFailureAt: Date | null,
|
||||
consecutiveFailures: number,
|
||||
recoveryIntervalHours: number = 24
|
||||
): boolean {
|
||||
if (!lastFailureAt) return true;
|
||||
|
||||
// Wait longer for more failures
|
||||
const waitHours = recoveryIntervalHours * Math.min(consecutiveFailures, 5);
|
||||
const recoveryTime = new Date(lastFailureAt.getTime() + waitHours * 60 * 60 * 1000);
|
||||
|
||||
return new Date() >= recoveryTime;
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// SINGLETON INSTANCE
|
||||
// ============================================================
|
||||
|
||||
export const retryManager = new RetryManager();
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,465 +0,0 @@
|
||||
/**
|
||||
* Store Configuration Validator
|
||||
*
|
||||
* Validates and sanitizes store configurations before crawling.
|
||||
* Applies defaults for missing values and logs warnings.
|
||||
*
|
||||
* Phase 1: Crawler Reliability & Stabilization
|
||||
*/
|
||||
|
||||
import { CrawlErrorCode, CrawlErrorCodeType } from './error-taxonomy';
|
||||
|
||||
// ============================================================
|
||||
// DEFAULT CONFIGURATION
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Default crawl configuration values
|
||||
*/
|
||||
export const DEFAULT_CONFIG = {
|
||||
// Scheduling
|
||||
crawlFrequencyMinutes: 240, // 4 hours
|
||||
minCrawlGapMinutes: 2, // Minimum 2 minutes between crawls
|
||||
|
||||
// Retries
|
||||
maxRetries: 3,
|
||||
baseBackoffMs: 1000, // 1 second
|
||||
maxBackoffMs: 60000, // 1 minute
|
||||
backoffMultiplier: 2.0, // Exponential backoff
|
||||
|
||||
// Timeouts
|
||||
requestTimeoutMs: 30000, // 30 seconds
|
||||
pageLoadTimeoutMs: 60000, // 60 seconds
|
||||
|
||||
// Limits
|
||||
maxProductsPerPage: 100,
|
||||
maxPages: 50,
|
||||
|
||||
// Proxy
|
||||
proxyRotationEnabled: true,
|
||||
proxyRotationOnFailure: true,
|
||||
|
||||
// User Agent
|
||||
userAgentRotationEnabled: true,
|
||||
userAgentRotationOnFailure: true,
|
||||
} as const;
|
||||
|
||||
// ============================================================
|
||||
// STORE CONFIG INTERFACE
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Raw store configuration from database
|
||||
*/
|
||||
export interface RawStoreConfig {
|
||||
id: number;
|
||||
name: string;
|
||||
slug?: string;
|
||||
platform?: string;
|
||||
menuType?: string;
|
||||
platformDispensaryId?: string;
|
||||
menuUrl?: string;
|
||||
website?: string;
|
||||
|
||||
// Crawl config
|
||||
crawlFrequencyMinutes?: number;
|
||||
maxRetries?: number;
|
||||
currentProxyId?: number;
|
||||
currentUserAgent?: string;
|
||||
|
||||
// Status
|
||||
crawlStatus?: string;
|
||||
consecutiveFailures?: number;
|
||||
backoffMultiplier?: number;
|
||||
lastCrawlAt?: Date;
|
||||
lastSuccessAt?: Date;
|
||||
lastFailureAt?: Date;
|
||||
lastErrorCode?: string;
|
||||
nextCrawlAt?: Date;
|
||||
}
|
||||
|
||||
/**
|
||||
* Validated and sanitized store configuration
|
||||
*/
|
||||
export interface ValidatedStoreConfig {
|
||||
id: number;
|
||||
name: string;
|
||||
slug: string;
|
||||
platform: string;
|
||||
menuType: string;
|
||||
platformDispensaryId: string;
|
||||
menuUrl: string;
|
||||
|
||||
// Crawl config (with defaults applied)
|
||||
crawlFrequencyMinutes: number;
|
||||
maxRetries: number;
|
||||
currentProxyId: number | null;
|
||||
currentUserAgent: string | null;
|
||||
|
||||
// Status
|
||||
crawlStatus: 'active' | 'degraded' | 'paused' | 'failed';
|
||||
consecutiveFailures: number;
|
||||
backoffMultiplier: number;
|
||||
lastCrawlAt: Date | null;
|
||||
lastSuccessAt: Date | null;
|
||||
lastFailureAt: Date | null;
|
||||
lastErrorCode: CrawlErrorCodeType | null;
|
||||
nextCrawlAt: Date | null;
|
||||
|
||||
// Validation metadata
|
||||
isValid: boolean;
|
||||
validationErrors: ValidationError[];
|
||||
validationWarnings: ValidationWarning[];
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// VALIDATION TYPES
|
||||
// ============================================================
|
||||
|
||||
export interface ValidationError {
|
||||
field: string;
|
||||
message: string;
|
||||
code: CrawlErrorCodeType;
|
||||
}
|
||||
|
||||
export interface ValidationWarning {
|
||||
field: string;
|
||||
message: string;
|
||||
appliedDefault?: any;
|
||||
}
|
||||
|
||||
export interface ValidationResult {
|
||||
isValid: boolean;
|
||||
config: ValidatedStoreConfig | null;
|
||||
errors: ValidationError[];
|
||||
warnings: ValidationWarning[];
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// VALIDATOR CLASS
|
||||
// ============================================================
|
||||
|
||||
export class StoreValidator {
|
||||
private errors: ValidationError[] = [];
|
||||
private warnings: ValidationWarning[] = [];
|
||||
|
||||
/**
|
||||
* Validate and sanitize a store configuration
|
||||
*/
|
||||
validate(raw: RawStoreConfig): ValidationResult {
|
||||
this.errors = [];
|
||||
this.warnings = [];
|
||||
|
||||
// Required field validation
|
||||
this.validateRequired(raw);
|
||||
|
||||
// If critical errors, return early
|
||||
if (this.errors.length > 0) {
|
||||
return {
|
||||
isValid: false,
|
||||
config: null,
|
||||
errors: this.errors,
|
||||
warnings: this.warnings,
|
||||
};
|
||||
}
|
||||
|
||||
// Build validated config with defaults
|
||||
const config = this.buildValidatedConfig(raw);
|
||||
|
||||
return {
|
||||
isValid: this.errors.length === 0,
|
||||
config,
|
||||
errors: this.errors,
|
||||
warnings: this.warnings,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Validate required fields
|
||||
*/
|
||||
private validateRequired(raw: RawStoreConfig): void {
|
||||
if (!raw.id) {
|
||||
this.addError('id', 'Store ID is required', CrawlErrorCode.INVALID_CONFIG);
|
||||
}
|
||||
|
||||
if (!raw.name) {
|
||||
this.addError('name', 'Store name is required', CrawlErrorCode.INVALID_CONFIG);
|
||||
}
|
||||
|
||||
if (!raw.platformDispensaryId) {
|
||||
this.addError(
|
||||
'platformDispensaryId',
|
||||
'Platform dispensary ID is required for crawling',
|
||||
CrawlErrorCode.MISSING_PLATFORM_ID
|
||||
);
|
||||
}
|
||||
|
||||
if (!raw.menuType || raw.menuType === 'unknown') {
|
||||
this.addError(
|
||||
'menuType',
|
||||
'Menu type must be detected before crawling',
|
||||
CrawlErrorCode.INVALID_CONFIG
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Build validated config with defaults applied
|
||||
*/
|
||||
private buildValidatedConfig(raw: RawStoreConfig): ValidatedStoreConfig {
|
||||
// Slug
|
||||
const slug = raw.slug || this.generateSlug(raw.name);
|
||||
if (!raw.slug) {
|
||||
this.addWarning('slug', 'Slug was missing, generated from name', slug);
|
||||
}
|
||||
|
||||
// Platform
|
||||
const platform = raw.platform || 'dutchie';
|
||||
if (!raw.platform) {
|
||||
this.addWarning('platform', 'Platform was missing, defaulting to dutchie', platform);
|
||||
}
|
||||
|
||||
// Menu URL
|
||||
const menuUrl = raw.menuUrl || this.generateMenuUrl(raw.platformDispensaryId!, platform);
|
||||
if (!raw.menuUrl) {
|
||||
this.addWarning('menuUrl', 'Menu URL was missing, generated from platform ID', menuUrl);
|
||||
}
|
||||
|
||||
// Crawl frequency
|
||||
const crawlFrequencyMinutes = this.validateNumeric(
|
||||
raw.crawlFrequencyMinutes,
|
||||
'crawlFrequencyMinutes',
|
||||
DEFAULT_CONFIG.crawlFrequencyMinutes,
|
||||
60, // min: 1 hour
|
||||
1440 // max: 24 hours
|
||||
);
|
||||
|
||||
// Max retries
|
||||
const maxRetries = this.validateNumeric(
|
||||
raw.maxRetries,
|
||||
'maxRetries',
|
||||
DEFAULT_CONFIG.maxRetries,
|
||||
1, // min
|
||||
10 // max
|
||||
);
|
||||
|
||||
// Backoff multiplier
|
||||
const backoffMultiplier = this.validateNumeric(
|
||||
raw.backoffMultiplier,
|
||||
'backoffMultiplier',
|
||||
1.0,
|
||||
1.0, // min
|
||||
10.0 // max
|
||||
);
|
||||
|
||||
// Crawl status
|
||||
const crawlStatus = this.validateCrawlStatus(raw.crawlStatus);
|
||||
|
||||
// Consecutive failures
|
||||
const consecutiveFailures = Math.max(0, raw.consecutiveFailures || 0);
|
||||
|
||||
// Last error code
|
||||
const lastErrorCode = this.validateErrorCode(raw.lastErrorCode);
|
||||
|
||||
return {
|
||||
id: raw.id,
|
||||
name: raw.name,
|
||||
slug,
|
||||
platform,
|
||||
menuType: raw.menuType!,
|
||||
platformDispensaryId: raw.platformDispensaryId!,
|
||||
menuUrl,
|
||||
|
||||
crawlFrequencyMinutes,
|
||||
maxRetries,
|
||||
currentProxyId: raw.currentProxyId || null,
|
||||
currentUserAgent: raw.currentUserAgent || null,
|
||||
|
||||
crawlStatus,
|
||||
consecutiveFailures,
|
||||
backoffMultiplier,
|
||||
lastCrawlAt: raw.lastCrawlAt || null,
|
||||
lastSuccessAt: raw.lastSuccessAt || null,
|
||||
lastFailureAt: raw.lastFailureAt || null,
|
||||
lastErrorCode,
|
||||
nextCrawlAt: raw.nextCrawlAt || null,
|
||||
|
||||
isValid: true,
|
||||
validationErrors: [],
|
||||
validationWarnings: this.warnings,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Validate numeric value with bounds
|
||||
*/
|
||||
private validateNumeric(
|
||||
value: number | undefined,
|
||||
field: string,
|
||||
defaultValue: number,
|
||||
min: number,
|
||||
max: number
|
||||
): number {
|
||||
if (value === undefined || value === null) {
|
||||
this.addWarning(field, `Missing, defaulting to ${defaultValue}`, defaultValue);
|
||||
return defaultValue;
|
||||
}
|
||||
|
||||
if (value < min) {
|
||||
this.addWarning(field, `Value ${value} below minimum ${min}, using minimum`, min);
|
||||
return min;
|
||||
}
|
||||
|
||||
if (value > max) {
|
||||
this.addWarning(field, `Value ${value} above maximum ${max}, using maximum`, max);
|
||||
return max;
|
||||
}
|
||||
|
||||
return value;
|
||||
}
|
||||
|
||||
/**
|
||||
* Validate crawl status
|
||||
*/
|
||||
private validateCrawlStatus(status?: string): 'active' | 'degraded' | 'paused' | 'failed' {
|
||||
const validStatuses = ['active', 'degraded', 'paused', 'failed'];
|
||||
if (!status || !validStatuses.includes(status)) {
|
||||
if (status) {
|
||||
this.addWarning('crawlStatus', `Invalid status "${status}", defaulting to active`, 'active');
|
||||
}
|
||||
return 'active';
|
||||
}
|
||||
return status as 'active' | 'degraded' | 'paused' | 'failed';
|
||||
}
|
||||
|
||||
/**
|
||||
* Validate error code
|
||||
*/
|
||||
private validateErrorCode(code?: string): CrawlErrorCodeType | null {
|
||||
if (!code) return null;
|
||||
const validCodes = Object.values(CrawlErrorCode);
|
||||
if (!validCodes.includes(code as CrawlErrorCodeType)) {
|
||||
this.addWarning('lastErrorCode', `Invalid error code "${code}"`, null);
|
||||
return CrawlErrorCode.UNKNOWN_ERROR;
|
||||
}
|
||||
return code as CrawlErrorCodeType;
|
||||
}
|
||||
|
||||
/**
|
||||
* Generate slug from name
|
||||
*/
|
||||
private generateSlug(name: string): string {
|
||||
return name
|
||||
.toLowerCase()
|
||||
.replace(/[^a-z0-9]+/g, '-')
|
||||
.replace(/^-+|-+$/g, '')
|
||||
.substring(0, 100);
|
||||
}
|
||||
|
||||
/**
|
||||
* Generate menu URL from platform ID
|
||||
*/
|
||||
private generateMenuUrl(platformId: string, platform: string): string {
|
||||
if (platform === 'dutchie') {
|
||||
return `https://dutchie.com/embedded-menu/${platformId}`;
|
||||
}
|
||||
return `https://${platform}.com/menu/${platformId}`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Add validation error
|
||||
*/
|
||||
private addError(field: string, message: string, code: CrawlErrorCodeType): void {
|
||||
this.errors.push({ field, message, code });
|
||||
console.warn(`[StoreValidator] ERROR ${field}: ${message}`);
|
||||
}
|
||||
|
||||
/**
|
||||
* Add validation warning
|
||||
*/
|
||||
private addWarning(field: string, message: string, appliedDefault?: any): void {
|
||||
this.warnings.push({ field, message, appliedDefault });
|
||||
// Log at debug level - warnings are expected for incomplete configs
|
||||
console.debug(`[StoreValidator] WARNING ${field}: ${message}`);
|
||||
}
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// CONVENIENCE FUNCTIONS
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Validate a single store config
|
||||
*/
|
||||
export function validateStoreConfig(raw: RawStoreConfig): ValidationResult {
|
||||
const validator = new StoreValidator();
|
||||
return validator.validate(raw);
|
||||
}
|
||||
|
||||
/**
|
||||
* Validate multiple store configs
|
||||
*/
|
||||
export function validateStoreConfigs(raws: RawStoreConfig[]): {
|
||||
valid: ValidatedStoreConfig[];
|
||||
invalid: { raw: RawStoreConfig; errors: ValidationError[] }[];
|
||||
warnings: { storeId: number; warnings: ValidationWarning[] }[];
|
||||
} {
|
||||
const valid: ValidatedStoreConfig[] = [];
|
||||
const invalid: { raw: RawStoreConfig; errors: ValidationError[] }[] = [];
|
||||
const warnings: { storeId: number; warnings: ValidationWarning[] }[] = [];
|
||||
|
||||
for (const raw of raws) {
|
||||
const result = validateStoreConfig(raw);
|
||||
|
||||
if (result.isValid && result.config) {
|
||||
valid.push(result.config);
|
||||
if (result.warnings.length > 0) {
|
||||
warnings.push({ storeId: raw.id, warnings: result.warnings });
|
||||
}
|
||||
} else {
|
||||
invalid.push({ raw, errors: result.errors });
|
||||
}
|
||||
}
|
||||
|
||||
return { valid, invalid, warnings };
|
||||
}
|
||||
|
||||
/**
|
||||
* Quick check if a store is crawlable
|
||||
*/
|
||||
export function isCrawlable(raw: RawStoreConfig): boolean {
|
||||
return !!(
|
||||
raw.id &&
|
||||
raw.name &&
|
||||
raw.platformDispensaryId &&
|
||||
raw.menuType &&
|
||||
raw.menuType !== 'unknown' &&
|
||||
raw.crawlStatus !== 'failed' &&
|
||||
raw.crawlStatus !== 'paused'
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
* Get reason why store is not crawlable
|
||||
*/
|
||||
export function getNotCrawlableReason(raw: RawStoreConfig): string | null {
|
||||
if (!raw.platformDispensaryId) {
|
||||
return 'Missing platform_dispensary_id';
|
||||
}
|
||||
if (!raw.menuType || raw.menuType === 'unknown') {
|
||||
return 'Menu type not detected';
|
||||
}
|
||||
if (raw.crawlStatus === 'failed') {
|
||||
return 'Store is marked as failed';
|
||||
}
|
||||
if (raw.crawlStatus === 'paused') {
|
||||
return 'Crawling is paused';
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// SINGLETON INSTANCE
|
||||
// ============================================================
|
||||
|
||||
export const storeValidator = new StoreValidator();
|
||||
@@ -1,750 +0,0 @@
|
||||
/**
|
||||
* Worker Service
|
||||
*
|
||||
* Polls the job queue and processes crawl jobs.
|
||||
* Each worker instance runs independently, claiming jobs atomically.
|
||||
*
|
||||
* Phase 1: Enhanced with self-healing logic, error taxonomy, and retry management.
|
||||
*/
|
||||
|
||||
import {
|
||||
claimNextJob,
|
||||
completeJob,
|
||||
failJob,
|
||||
updateJobProgress,
|
||||
heartbeat,
|
||||
getWorkerId,
|
||||
getWorkerHostname,
|
||||
recoverStaleJobs,
|
||||
QueuedJob,
|
||||
} from './job-queue';
|
||||
import { crawlDispensaryProducts } from './product-crawler';
|
||||
import { mapDbRowToDispensary } from './discovery';
|
||||
import { query } from '../db/connection';
|
||||
|
||||
// Phase 1: Error taxonomy and retry management
|
||||
import {
|
||||
CrawlErrorCode,
|
||||
CrawlErrorCodeType,
|
||||
classifyError,
|
||||
isRetryable,
|
||||
shouldRotateProxy,
|
||||
shouldRotateUserAgent,
|
||||
createSuccessResult,
|
||||
createFailureResult,
|
||||
CrawlResult,
|
||||
} from './error-taxonomy';
|
||||
import {
|
||||
RetryManager,
|
||||
RetryDecision,
|
||||
calculateNextCrawlAt,
|
||||
determineCrawlStatus,
|
||||
shouldAttemptRecovery,
|
||||
sleep,
|
||||
} from './retry-manager';
|
||||
import {
|
||||
CrawlRotator,
|
||||
userAgentRotator,
|
||||
updateDispensaryRotation,
|
||||
} from './proxy-rotator';
|
||||
import { DEFAULT_CONFIG, validateStoreConfig, isCrawlable } from './store-validator';
|
||||
|
||||
// Use shared dispensary columns (handles optional columns like provider_detection_data)
|
||||
// NOTE: Using WITH_FAILED variant for worker compatibility checks
|
||||
import { DISPENSARY_COLUMNS_WITH_FAILED as DISPENSARY_COLUMNS } from '../db/dispensary-columns';
|
||||
|
||||
// ============================================================
|
||||
// WORKER CONFIG
|
||||
// ============================================================
|
||||
|
||||
const POLL_INTERVAL_MS = 5000; // Check for jobs every 5 seconds
|
||||
const HEARTBEAT_INTERVAL_MS = 60000; // Send heartbeat every 60 seconds
|
||||
const STALE_CHECK_INTERVAL_MS = 300000; // Check for stale jobs every 5 minutes
|
||||
const SHUTDOWN_GRACE_PERIOD_MS = 30000; // Wait 30s for job to complete on shutdown
|
||||
|
||||
// ============================================================
|
||||
// WORKER STATE
|
||||
// ============================================================
|
||||
|
||||
let isRunning = false;
|
||||
let currentJob: QueuedJob | null = null;
|
||||
let pollTimer: NodeJS.Timeout | null = null;
|
||||
let heartbeatTimer: NodeJS.Timeout | null = null;
|
||||
let staleCheckTimer: NodeJS.Timeout | null = null;
|
||||
let shutdownPromise: Promise<void> | null = null;
|
||||
|
||||
// ============================================================
|
||||
// WORKER LIFECYCLE
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Start the worker
|
||||
*/
|
||||
export async function startWorker(): Promise<void> {
|
||||
if (isRunning) {
|
||||
console.log('[Worker] Already running');
|
||||
return;
|
||||
}
|
||||
|
||||
const workerId = getWorkerId();
|
||||
const hostname = getWorkerHostname();
|
||||
|
||||
console.log(`[Worker] Starting worker ${workerId} on ${hostname}`);
|
||||
isRunning = true;
|
||||
|
||||
// Set up graceful shutdown
|
||||
setupShutdownHandlers();
|
||||
|
||||
// Start polling for jobs
|
||||
pollTimer = setInterval(pollForJobs, POLL_INTERVAL_MS);
|
||||
|
||||
// Start stale job recovery (only one worker should do this, but it's idempotent)
|
||||
staleCheckTimer = setInterval(async () => {
|
||||
try {
|
||||
await recoverStaleJobs(15);
|
||||
} catch (error) {
|
||||
console.error('[Worker] Error recovering stale jobs:', error);
|
||||
}
|
||||
}, STALE_CHECK_INTERVAL_MS);
|
||||
|
||||
// Immediately poll for a job
|
||||
await pollForJobs();
|
||||
|
||||
console.log(`[Worker] Worker ${workerId} started, polling every ${POLL_INTERVAL_MS}ms`);
|
||||
}
|
||||
|
||||
/**
|
||||
* Stop the worker gracefully
|
||||
*/
|
||||
export async function stopWorker(): Promise<void> {
|
||||
if (!isRunning) return;
|
||||
|
||||
console.log('[Worker] Stopping worker...');
|
||||
isRunning = false;
|
||||
|
||||
// Clear timers
|
||||
if (pollTimer) {
|
||||
clearInterval(pollTimer);
|
||||
pollTimer = null;
|
||||
}
|
||||
if (heartbeatTimer) {
|
||||
clearInterval(heartbeatTimer);
|
||||
heartbeatTimer = null;
|
||||
}
|
||||
if (staleCheckTimer) {
|
||||
clearInterval(staleCheckTimer);
|
||||
staleCheckTimer = null;
|
||||
}
|
||||
|
||||
// Wait for current job to complete
|
||||
if (currentJob) {
|
||||
console.log(`[Worker] Waiting for job ${currentJob.id} to complete...`);
|
||||
const startWait = Date.now();
|
||||
|
||||
while (currentJob && Date.now() - startWait < SHUTDOWN_GRACE_PERIOD_MS) {
|
||||
await new Promise(r => setTimeout(r, 1000));
|
||||
}
|
||||
|
||||
if (currentJob) {
|
||||
console.log(`[Worker] Job ${currentJob.id} did not complete in time, marking for retry`);
|
||||
await failJob(currentJob.id, 'Worker shutdown');
|
||||
}
|
||||
}
|
||||
|
||||
console.log('[Worker] Worker stopped');
|
||||
}
|
||||
|
||||
/**
|
||||
* Get worker status
|
||||
*/
|
||||
export function getWorkerStatus(): {
|
||||
isRunning: boolean;
|
||||
workerId: string;
|
||||
hostname: string;
|
||||
currentJob: QueuedJob | null;
|
||||
} {
|
||||
return {
|
||||
isRunning,
|
||||
workerId: getWorkerId(),
|
||||
hostname: getWorkerHostname(),
|
||||
currentJob,
|
||||
};
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// JOB PROCESSING
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Poll for and process the next available job
|
||||
*/
|
||||
async function pollForJobs(): Promise<void> {
|
||||
if (!isRunning || currentJob) {
|
||||
return; // Already processing a job
|
||||
}
|
||||
|
||||
try {
|
||||
const workerId = getWorkerId();
|
||||
|
||||
// Try to claim a job
|
||||
const job = await claimNextJob({
|
||||
workerId,
|
||||
jobTypes: ['dutchie_product_crawl', 'menu_detection', 'menu_detection_single'],
|
||||
lockDurationMinutes: 30,
|
||||
});
|
||||
|
||||
if (!job) {
|
||||
return; // No jobs available
|
||||
}
|
||||
|
||||
currentJob = job;
|
||||
console.log(`[Worker] Processing job ${job.id} (type=${job.jobType}, dispensary=${job.dispensaryId})`);
|
||||
|
||||
// Start heartbeat for this job
|
||||
heartbeatTimer = setInterval(async () => {
|
||||
if (currentJob) {
|
||||
try {
|
||||
await heartbeat(currentJob.id);
|
||||
} catch (error) {
|
||||
console.error('[Worker] Heartbeat error:', error);
|
||||
}
|
||||
}
|
||||
}, HEARTBEAT_INTERVAL_MS);
|
||||
|
||||
// Process the job
|
||||
await processJob(job);
|
||||
|
||||
} catch (error: any) {
|
||||
console.error('[Worker] Error polling for jobs:', error);
|
||||
|
||||
if (currentJob) {
|
||||
try {
|
||||
await failJob(currentJob.id, error.message);
|
||||
} catch (failError) {
|
||||
console.error('[Worker] Error failing job:', failError);
|
||||
}
|
||||
}
|
||||
} finally {
|
||||
// Clear heartbeat timer
|
||||
if (heartbeatTimer) {
|
||||
clearInterval(heartbeatTimer);
|
||||
heartbeatTimer = null;
|
||||
}
|
||||
currentJob = null;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Process a single job
|
||||
*/
|
||||
async function processJob(job: QueuedJob): Promise<void> {
|
||||
try {
|
||||
switch (job.jobType) {
|
||||
case 'dutchie_product_crawl':
|
||||
await processProductCrawlJob(job);
|
||||
break;
|
||||
|
||||
case 'menu_detection':
|
||||
await processMenuDetectionJob(job);
|
||||
break;
|
||||
|
||||
case 'menu_detection_single':
|
||||
await processSingleDetectionJob(job);
|
||||
break;
|
||||
|
||||
default:
|
||||
throw new Error(`Unknown job type: ${job.jobType}`);
|
||||
}
|
||||
} catch (error: any) {
|
||||
console.error(`[Worker] Job ${job.id} failed:`, error);
|
||||
await failJob(job.id, error.message);
|
||||
}
|
||||
}
|
||||
|
||||
// Thresholds for crawl status transitions
|
||||
const DEGRADED_THRESHOLD = 3; // Mark as degraded after 3 consecutive failures
|
||||
const FAILED_THRESHOLD = 10; // Mark as failed after 10 consecutive failures
|
||||
|
||||
// For backwards compatibility
|
||||
const MAX_CONSECUTIVE_FAILURES = FAILED_THRESHOLD;
|
||||
|
||||
/**
|
||||
* Record a successful crawl - resets failure counter and restores active status
|
||||
*/
|
||||
async function recordCrawlSuccess(
|
||||
dispensaryId: number,
|
||||
result: CrawlResult
|
||||
): Promise<void> {
|
||||
// Calculate next crawl time (use store's frequency or default)
|
||||
const { rows: storeRows } = await query<any>(
|
||||
`SELECT crawl_frequency_minutes FROM dispensaries WHERE id = $1`,
|
||||
[dispensaryId]
|
||||
);
|
||||
const frequencyMinutes = storeRows[0]?.crawl_frequency_minutes || DEFAULT_CONFIG.crawlFrequencyMinutes;
|
||||
const nextCrawlAt = calculateNextCrawlAt(0, frequencyMinutes);
|
||||
|
||||
// Reset failure state and schedule next crawl
|
||||
await query(
|
||||
`UPDATE dispensaries
|
||||
SET consecutive_failures = 0,
|
||||
crawl_status = 'active',
|
||||
backoff_multiplier = 1.0,
|
||||
last_crawl_at = NOW(),
|
||||
last_success_at = NOW(),
|
||||
last_error_code = NULL,
|
||||
next_crawl_at = $2,
|
||||
total_attempts = COALESCE(total_attempts, 0) + 1,
|
||||
total_successes = COALESCE(total_successes, 0) + 1,
|
||||
updated_at = NOW()
|
||||
WHERE id = $1`,
|
||||
[dispensaryId, nextCrawlAt]
|
||||
);
|
||||
|
||||
// Log to crawl_attempts table for analytics
|
||||
await logCrawlAttempt(dispensaryId, result);
|
||||
|
||||
console.log(`[Worker] Dispensary ${dispensaryId} crawl success. Next crawl at ${nextCrawlAt.toISOString()}`);
|
||||
}
|
||||
|
||||
/**
|
||||
* Record a crawl failure with self-healing logic
|
||||
* - Rotates proxy/UA based on error type
|
||||
* - Transitions through: active -> degraded -> failed
|
||||
* - Calculates backoff for next attempt
|
||||
*/
|
||||
async function recordCrawlFailure(
|
||||
dispensaryId: number,
|
||||
errorMessage: string,
|
||||
errorCode?: CrawlErrorCodeType,
|
||||
httpStatus?: number,
|
||||
context?: {
|
||||
proxyUsed?: string;
|
||||
userAgentUsed?: string;
|
||||
attemptNumber?: number;
|
||||
}
|
||||
): Promise<{ wasFlagged: boolean; newStatus: string; shouldRotateProxy: boolean; shouldRotateUA: boolean }> {
|
||||
// Classify the error if not provided
|
||||
const code = errorCode || classifyError(errorMessage, httpStatus);
|
||||
|
||||
// Get current state
|
||||
const { rows: storeRows } = await query<any>(
|
||||
`SELECT
|
||||
consecutive_failures,
|
||||
crawl_status,
|
||||
backoff_multiplier,
|
||||
crawl_frequency_minutes,
|
||||
current_proxy_id,
|
||||
current_user_agent
|
||||
FROM dispensaries WHERE id = $1`,
|
||||
[dispensaryId]
|
||||
);
|
||||
|
||||
if (storeRows.length === 0) {
|
||||
return { wasFlagged: false, newStatus: 'unknown', shouldRotateProxy: false, shouldRotateUA: false };
|
||||
}
|
||||
|
||||
const store = storeRows[0];
|
||||
const currentFailures = (store.consecutive_failures || 0) + 1;
|
||||
const frequencyMinutes = store.crawl_frequency_minutes || DEFAULT_CONFIG.crawlFrequencyMinutes;
|
||||
|
||||
// Determine if we should rotate proxy/UA based on error type
|
||||
const rotateProxy = shouldRotateProxy(code);
|
||||
const rotateUA = shouldRotateUserAgent(code);
|
||||
|
||||
// Get new proxy/UA if rotation is needed
|
||||
let newProxyId = store.current_proxy_id;
|
||||
let newUserAgent = store.current_user_agent;
|
||||
|
||||
if (rotateUA) {
|
||||
newUserAgent = userAgentRotator.getNext();
|
||||
console.log(`[Worker] Rotating user agent for dispensary ${dispensaryId} after ${code}`);
|
||||
}
|
||||
|
||||
// Determine new crawl status
|
||||
const newStatus = determineCrawlStatus(currentFailures, {
|
||||
degraded: DEGRADED_THRESHOLD,
|
||||
failed: FAILED_THRESHOLD,
|
||||
});
|
||||
|
||||
// Calculate backoff multiplier and next crawl time
|
||||
const newBackoffMultiplier = Math.min(
|
||||
(store.backoff_multiplier || 1.0) * 1.5,
|
||||
4.0 // Max 4x backoff
|
||||
);
|
||||
const nextCrawlAt = calculateNextCrawlAt(currentFailures, frequencyMinutes);
|
||||
|
||||
// Update dispensary with new failure state
|
||||
if (newStatus === 'failed') {
|
||||
// Mark as failed - won't be crawled again until manual intervention
|
||||
await query(
|
||||
`UPDATE dispensaries
|
||||
SET consecutive_failures = $2,
|
||||
crawl_status = $3,
|
||||
backoff_multiplier = $4,
|
||||
last_failure_at = NOW(),
|
||||
last_error_code = $5,
|
||||
failed_at = NOW(),
|
||||
failure_notes = $6,
|
||||
next_crawl_at = NULL,
|
||||
current_proxy_id = $7,
|
||||
current_user_agent = $8,
|
||||
total_attempts = COALESCE(total_attempts, 0) + 1,
|
||||
updated_at = NOW()
|
||||
WHERE id = $1`,
|
||||
[
|
||||
dispensaryId,
|
||||
currentFailures,
|
||||
newStatus,
|
||||
newBackoffMultiplier,
|
||||
code,
|
||||
`Auto-flagged after ${currentFailures} consecutive failures. Last error: ${errorMessage}`,
|
||||
newProxyId,
|
||||
newUserAgent,
|
||||
]
|
||||
);
|
||||
console.log(`[Worker] Dispensary ${dispensaryId} marked as FAILED after ${currentFailures} failures (${code})`);
|
||||
} else {
|
||||
// Update failure count but keep crawling (active or degraded)
|
||||
await query(
|
||||
`UPDATE dispensaries
|
||||
SET consecutive_failures = $2,
|
||||
crawl_status = $3,
|
||||
backoff_multiplier = $4,
|
||||
last_failure_at = NOW(),
|
||||
last_error_code = $5,
|
||||
next_crawl_at = $6,
|
||||
current_proxy_id = $7,
|
||||
current_user_agent = $8,
|
||||
total_attempts = COALESCE(total_attempts, 0) + 1,
|
||||
updated_at = NOW()
|
||||
WHERE id = $1`,
|
||||
[
|
||||
dispensaryId,
|
||||
currentFailures,
|
||||
newStatus,
|
||||
newBackoffMultiplier,
|
||||
code,
|
||||
nextCrawlAt,
|
||||
newProxyId,
|
||||
newUserAgent,
|
||||
]
|
||||
);
|
||||
|
||||
if (newStatus === 'degraded') {
|
||||
console.log(`[Worker] Dispensary ${dispensaryId} marked as DEGRADED (${currentFailures}/${FAILED_THRESHOLD} failures). Next crawl: ${nextCrawlAt.toISOString()}`);
|
||||
} else {
|
||||
console.log(`[Worker] Dispensary ${dispensaryId} failure recorded (${currentFailures}/${DEGRADED_THRESHOLD}). Next crawl: ${nextCrawlAt.toISOString()}`);
|
||||
}
|
||||
}
|
||||
|
||||
// Log to crawl_attempts table
|
||||
const result = createFailureResult(
|
||||
dispensaryId,
|
||||
new Date(),
|
||||
errorMessage,
|
||||
httpStatus,
|
||||
context
|
||||
);
|
||||
await logCrawlAttempt(dispensaryId, result);
|
||||
|
||||
return {
|
||||
wasFlagged: newStatus === 'failed',
|
||||
newStatus,
|
||||
shouldRotateProxy: rotateProxy,
|
||||
shouldRotateUA: rotateUA,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Log a crawl attempt to the crawl_attempts table for analytics
|
||||
*/
|
||||
async function logCrawlAttempt(
|
||||
dispensaryId: number,
|
||||
result: CrawlResult
|
||||
): Promise<void> {
|
||||
try {
|
||||
await query(
|
||||
`INSERT INTO crawl_attempts (
|
||||
dispensary_id, started_at, finished_at, duration_ms,
|
||||
error_code, error_message, http_status,
|
||||
attempt_number, proxy_used, user_agent_used,
|
||||
products_found, products_upserted, snapshots_created,
|
||||
created_at
|
||||
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, NOW())`,
|
||||
[
|
||||
dispensaryId,
|
||||
result.startedAt,
|
||||
result.finishedAt,
|
||||
result.durationMs,
|
||||
result.errorCode,
|
||||
result.errorMessage || null,
|
||||
result.httpStatus || null,
|
||||
result.attemptNumber,
|
||||
result.proxyUsed || null,
|
||||
result.userAgentUsed || null,
|
||||
result.productsFound || 0,
|
||||
result.productsUpserted || 0,
|
||||
result.snapshotsCreated || 0,
|
||||
]
|
||||
);
|
||||
} catch (error) {
|
||||
// Don't fail the job if logging fails
|
||||
console.error(`[Worker] Failed to log crawl attempt for dispensary ${dispensaryId}:`, error);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Process a product crawl job for a single dispensary
|
||||
*/
|
||||
async function processProductCrawlJob(job: QueuedJob): Promise<void> {
|
||||
const startedAt = new Date();
|
||||
const userAgent = userAgentRotator.getCurrent();
|
||||
|
||||
if (!job.dispensaryId) {
|
||||
throw new Error('Product crawl job requires dispensary_id');
|
||||
}
|
||||
|
||||
// Get dispensary details
|
||||
const { rows } = await query<any>(
|
||||
`SELECT ${DISPENSARY_COLUMNS} FROM dispensaries WHERE id = $1`,
|
||||
[job.dispensaryId]
|
||||
);
|
||||
|
||||
if (rows.length === 0) {
|
||||
throw new Error(`Dispensary ${job.dispensaryId} not found`);
|
||||
}
|
||||
|
||||
const dispensary = mapDbRowToDispensary(rows[0]);
|
||||
const rawDispensary = rows[0];
|
||||
|
||||
// Check if dispensary is already flagged as failed
|
||||
if (rawDispensary.failed_at) {
|
||||
console.log(`[Worker] Skipping dispensary ${job.dispensaryId} - already flagged as failed`);
|
||||
await completeJob(job.id, { productsFound: 0, productsUpserted: 0 });
|
||||
return;
|
||||
}
|
||||
|
||||
// Check crawl status - skip if paused or failed
|
||||
if (rawDispensary.crawl_status === 'paused' || rawDispensary.crawl_status === 'failed') {
|
||||
console.log(`[Worker] Skipping dispensary ${job.dispensaryId} - crawl_status is ${rawDispensary.crawl_status}`);
|
||||
await completeJob(job.id, { productsFound: 0, productsUpserted: 0 });
|
||||
return;
|
||||
}
|
||||
|
||||
if (!dispensary.platformDispensaryId) {
|
||||
// Record failure with error taxonomy
|
||||
const { wasFlagged } = await recordCrawlFailure(
|
||||
job.dispensaryId,
|
||||
'Missing platform_dispensary_id',
|
||||
CrawlErrorCode.MISSING_PLATFORM_ID,
|
||||
undefined,
|
||||
{ userAgentUsed: userAgent, attemptNumber: job.retryCount + 1 }
|
||||
);
|
||||
if (wasFlagged) {
|
||||
await completeJob(job.id, { productsFound: 0, productsUpserted: 0 });
|
||||
return;
|
||||
}
|
||||
throw new Error(`Dispensary ${job.dispensaryId} has no platform_dispensary_id`);
|
||||
}
|
||||
|
||||
// Get crawl options from job metadata
|
||||
const pricingType = job.metadata?.pricingType || 'rec';
|
||||
const useBothModes = job.metadata?.useBothModes !== false;
|
||||
|
||||
try {
|
||||
// Crawl the dispensary
|
||||
const result = await crawlDispensaryProducts(dispensary, pricingType, {
|
||||
useBothModes,
|
||||
onProgress: async (progress) => {
|
||||
// Update progress for live monitoring
|
||||
await updateJobProgress(job.id, {
|
||||
productsFound: progress.productsFound,
|
||||
productsUpserted: progress.productsUpserted,
|
||||
snapshotsCreated: progress.snapshotsCreated,
|
||||
currentPage: progress.currentPage,
|
||||
totalPages: progress.totalPages,
|
||||
});
|
||||
},
|
||||
});
|
||||
|
||||
if (result.success) {
|
||||
// Success! Create result and record
|
||||
const crawlResult = createSuccessResult(
|
||||
job.dispensaryId,
|
||||
startedAt,
|
||||
{
|
||||
productsFound: result.productsFetched,
|
||||
productsUpserted: result.productsUpserted,
|
||||
snapshotsCreated: result.snapshotsCreated,
|
||||
},
|
||||
{
|
||||
attemptNumber: job.retryCount + 1,
|
||||
userAgentUsed: userAgent,
|
||||
}
|
||||
);
|
||||
await recordCrawlSuccess(job.dispensaryId, crawlResult);
|
||||
await completeJob(job.id, {
|
||||
productsFound: result.productsFetched,
|
||||
productsUpserted: result.productsUpserted,
|
||||
snapshotsCreated: result.snapshotsCreated,
|
||||
// Visibility tracking stats for dashboard
|
||||
visibilityLostCount: result.visibilityLostCount || 0,
|
||||
visibilityRestoredCount: result.visibilityRestoredCount || 0,
|
||||
});
|
||||
} else {
|
||||
// Crawl returned failure - classify error and record
|
||||
const errorCode = classifyError(result.errorMessage || 'Crawl failed', result.httpStatus);
|
||||
const { wasFlagged } = await recordCrawlFailure(
|
||||
job.dispensaryId,
|
||||
result.errorMessage || 'Crawl failed',
|
||||
errorCode,
|
||||
result.httpStatus,
|
||||
{ userAgentUsed: userAgent, attemptNumber: job.retryCount + 1 }
|
||||
);
|
||||
|
||||
if (wasFlagged) {
|
||||
// Dispensary is now flagged - complete the job
|
||||
await completeJob(job.id, { productsFound: 0, productsUpserted: 0 });
|
||||
} else if (!isRetryable(errorCode)) {
|
||||
// Non-retryable error - complete as failed
|
||||
await completeJob(job.id, { productsFound: 0, productsUpserted: 0 });
|
||||
} else {
|
||||
// Retryable error - let job queue handle retry
|
||||
throw new Error(result.errorMessage || 'Crawl failed');
|
||||
}
|
||||
}
|
||||
} catch (error: any) {
|
||||
// Record the failure with error taxonomy
|
||||
const errorCode = classifyError(error.message);
|
||||
const { wasFlagged } = await recordCrawlFailure(
|
||||
job.dispensaryId,
|
||||
error.message,
|
||||
errorCode,
|
||||
undefined,
|
||||
{ userAgentUsed: userAgent, attemptNumber: job.retryCount + 1 }
|
||||
);
|
||||
|
||||
if (wasFlagged) {
|
||||
// Dispensary is now flagged - complete the job
|
||||
await completeJob(job.id, { productsFound: 0, productsUpserted: 0 });
|
||||
} else if (!isRetryable(errorCode)) {
|
||||
// Non-retryable error - complete as failed
|
||||
await completeJob(job.id, { productsFound: 0, productsUpserted: 0 });
|
||||
} else {
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Process a menu detection job (bulk)
|
||||
*/
|
||||
async function processMenuDetectionJob(job: QueuedJob): Promise<void> {
|
||||
const { executeMenuDetectionJob } = await import('./menu-detection');
|
||||
|
||||
const config = job.metadata || {};
|
||||
const result = await executeMenuDetectionJob(config);
|
||||
|
||||
if (result.status === 'error') {
|
||||
throw new Error(result.errorMessage || 'Menu detection failed');
|
||||
}
|
||||
|
||||
await completeJob(job.id, {
|
||||
productsFound: result.itemsProcessed,
|
||||
productsUpserted: result.itemsSucceeded,
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* Process a single dispensary menu detection job
|
||||
* This is the parallelizable version - each worker can detect one dispensary at a time
|
||||
*/
|
||||
async function processSingleDetectionJob(job: QueuedJob): Promise<void> {
|
||||
if (!job.dispensaryId) {
|
||||
throw new Error('Single detection job requires dispensary_id');
|
||||
}
|
||||
|
||||
const { detectAndResolveDispensary } = await import('./menu-detection');
|
||||
|
||||
// Get dispensary details
|
||||
const { rows } = await query<any>(
|
||||
`SELECT ${DISPENSARY_COLUMNS} FROM dispensaries WHERE id = $1`,
|
||||
[job.dispensaryId]
|
||||
);
|
||||
|
||||
if (rows.length === 0) {
|
||||
throw new Error(`Dispensary ${job.dispensaryId} not found`);
|
||||
}
|
||||
|
||||
const dispensary = rows[0];
|
||||
|
||||
// Skip if already detected or failed
|
||||
if (dispensary.failed_at) {
|
||||
console.log(`[Worker] Skipping dispensary ${job.dispensaryId} - already flagged as failed`);
|
||||
await completeJob(job.id, { productsFound: 0, productsUpserted: 0 });
|
||||
return;
|
||||
}
|
||||
|
||||
if (dispensary.menu_type && dispensary.menu_type !== 'unknown') {
|
||||
console.log(`[Worker] Skipping dispensary ${job.dispensaryId} - already detected as ${dispensary.menu_type}`);
|
||||
await completeJob(job.id, { productsFound: 0, productsUpserted: 1 });
|
||||
return;
|
||||
}
|
||||
|
||||
console.log(`[Worker] Detecting menu for dispensary ${job.dispensaryId} (${dispensary.name})...`);
|
||||
|
||||
try {
|
||||
const result = await detectAndResolveDispensary(job.dispensaryId);
|
||||
|
||||
if (result.success) {
|
||||
console.log(`[Worker] Dispensary ${job.dispensaryId}: detected ${result.detectedProvider}, platformId=${result.platformDispensaryId || 'none'}`);
|
||||
await completeJob(job.id, {
|
||||
productsFound: 1,
|
||||
productsUpserted: result.platformDispensaryId ? 1 : 0,
|
||||
});
|
||||
} else {
|
||||
// Detection failed - record failure
|
||||
await recordCrawlFailure(job.dispensaryId, result.error || 'Detection failed');
|
||||
throw new Error(result.error || 'Detection failed');
|
||||
}
|
||||
} catch (error: any) {
|
||||
// Record the failure
|
||||
const wasFlagged = await recordCrawlFailure(job.dispensaryId, error.message);
|
||||
if (wasFlagged) {
|
||||
// Dispensary is now flagged - complete the job rather than fail it
|
||||
await completeJob(job.id, { productsFound: 0, productsUpserted: 0 });
|
||||
} else {
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// SHUTDOWN HANDLING
|
||||
// ============================================================
|
||||
|
||||
function setupShutdownHandlers(): void {
|
||||
const shutdown = async (signal: string) => {
|
||||
if (shutdownPromise) return shutdownPromise;
|
||||
|
||||
console.log(`\n[Worker] Received ${signal}, shutting down...`);
|
||||
shutdownPromise = stopWorker();
|
||||
await shutdownPromise;
|
||||
process.exit(0);
|
||||
};
|
||||
|
||||
process.on('SIGTERM', () => shutdown('SIGTERM'));
|
||||
process.on('SIGINT', () => shutdown('SIGINT'));
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// STANDALONE WORKER ENTRY POINT
|
||||
// ============================================================
|
||||
|
||||
if (require.main === module) {
|
||||
// Run as standalone worker
|
||||
startWorker().catch((error) => {
|
||||
console.error('[Worker] Fatal error:', error);
|
||||
process.exit(1);
|
||||
});
|
||||
}
|
||||
@@ -1,751 +0,0 @@
|
||||
/**
|
||||
* Dutchie AZ Data Types
|
||||
*
|
||||
* Complete TypeScript interfaces for the isolated Dutchie Arizona data pipeline.
|
||||
* These types map directly to Dutchie's GraphQL FilteredProducts response.
|
||||
*/
|
||||
|
||||
// ============================================================
|
||||
// GRAPHQL RESPONSE TYPES (from Dutchie API)
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Raw Dutchie brand object from GraphQL
|
||||
*/
|
||||
export interface DutchieBrand {
|
||||
id: string;
|
||||
_id?: string;
|
||||
name: string;
|
||||
parentBrandId?: string;
|
||||
imageUrl?: string;
|
||||
description?: string;
|
||||
__typename?: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Raw Dutchie image object from GraphQL
|
||||
*/
|
||||
export interface DutchieImage {
|
||||
url: string;
|
||||
description?: string;
|
||||
active?: boolean;
|
||||
__typename?: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* POSMetaData.children - option-level inventory/pricing
|
||||
*/
|
||||
export interface DutchiePOSChild {
|
||||
activeBatchTags?: any;
|
||||
canonicalBrandId?: string;
|
||||
canonicalBrandName?: string;
|
||||
canonicalCategory?: string;
|
||||
canonicalCategoryId?: string;
|
||||
canonicalEffectivePotencyMg?: number;
|
||||
canonicalID?: string;
|
||||
canonicalPackageId?: string;
|
||||
canonicalImgUrl?: string;
|
||||
canonicalLabResultUrl?: string;
|
||||
canonicalName?: string;
|
||||
canonicalSKU?: string;
|
||||
canonicalProductTags?: string[];
|
||||
canonicalStrainId?: string;
|
||||
canonicalVendorId?: string;
|
||||
kioskQuantityAvailable?: number;
|
||||
medPrice?: number;
|
||||
option?: string;
|
||||
packageQuantity?: number;
|
||||
price?: number;
|
||||
quantity?: number;
|
||||
quantityAvailable?: number;
|
||||
recEquivalent?: number;
|
||||
recPrice?: number;
|
||||
standardEquivalent?: number;
|
||||
__typename?: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* POSMetaData object from GraphQL
|
||||
*/
|
||||
export interface DutchiePOSMetaData {
|
||||
activeBatchTags?: any;
|
||||
canonicalBrandId?: string;
|
||||
canonicalBrandName?: string;
|
||||
canonicalCategory?: string;
|
||||
canonicalCategoryId?: string;
|
||||
canonicalID?: string;
|
||||
canonicalPackageId?: string;
|
||||
canonicalImgUrl?: string;
|
||||
canonicalLabResultUrl?: string;
|
||||
canonicalName?: string;
|
||||
canonicalProductTags?: string[];
|
||||
canonicalSKU?: string;
|
||||
canonicalStrainId?: string;
|
||||
canonicalVendorId?: string;
|
||||
children?: DutchiePOSChild[];
|
||||
integrationID?: string;
|
||||
__typename?: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* THC/CBD Content structure
|
||||
*/
|
||||
export interface DutchiePotencyContent {
|
||||
unit?: string;
|
||||
range?: number[];
|
||||
}
|
||||
|
||||
/**
|
||||
* CannabinoidV2 structure
|
||||
*/
|
||||
export interface DutchieCannabinoidV2 {
|
||||
value: number;
|
||||
unit: string;
|
||||
cannabinoid: {
|
||||
name: string;
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Special data structure
|
||||
*/
|
||||
export interface DutchieSpecialData {
|
||||
saleSpecials?: Array<{
|
||||
specialId: string;
|
||||
specialName: string;
|
||||
discount: number;
|
||||
percentDiscount: boolean;
|
||||
dollarDiscount: boolean;
|
||||
specialType: string;
|
||||
}>;
|
||||
bogoSpecials?: any;
|
||||
}
|
||||
|
||||
/**
|
||||
* Complete raw product from Dutchie GraphQL FilteredProducts
|
||||
*/
|
||||
export interface DutchieRawProduct {
|
||||
_id: string;
|
||||
id?: string;
|
||||
AdditionalOptions?: any;
|
||||
duplicatedProductId?: string;
|
||||
libraryProductId?: string;
|
||||
libraryProductScore?: number;
|
||||
|
||||
// Brand
|
||||
brand?: DutchieBrand;
|
||||
brandId?: string;
|
||||
brandName?: string;
|
||||
brandLogo?: string;
|
||||
|
||||
// Potency
|
||||
CBD?: number;
|
||||
CBDContent?: DutchiePotencyContent;
|
||||
THC?: number;
|
||||
THCContent?: DutchiePotencyContent;
|
||||
cannabinoidsV2?: DutchieCannabinoidV2[];
|
||||
|
||||
// Flags
|
||||
certificateOfAnalysisEnabled?: boolean;
|
||||
collectionCardBadge?: string;
|
||||
comingSoon?: boolean;
|
||||
featured?: boolean;
|
||||
medicalOnly?: boolean;
|
||||
recOnly?: boolean;
|
||||
nonArmsLength?: boolean;
|
||||
vapeTaxApplicable?: boolean;
|
||||
useBetterPotencyTaxes?: boolean;
|
||||
|
||||
// Timestamps
|
||||
createdAt?: string;
|
||||
updatedAt?: string;
|
||||
|
||||
// Dispensary
|
||||
DispensaryID: string;
|
||||
enterpriseProductId?: string;
|
||||
|
||||
// Images
|
||||
Image?: string;
|
||||
images?: DutchieImage[];
|
||||
|
||||
// Measurements
|
||||
measurements?: {
|
||||
netWeight?: {
|
||||
unit: string;
|
||||
values: number[];
|
||||
};
|
||||
volume?: any;
|
||||
};
|
||||
weight?: number | string;
|
||||
|
||||
// Product identity
|
||||
Name: string;
|
||||
cName: string;
|
||||
pastCNames?: string[];
|
||||
|
||||
// Options
|
||||
Options?: string[];
|
||||
rawOptions?: string[];
|
||||
limitsPerCustomer?: any;
|
||||
manualInventory?: boolean;
|
||||
|
||||
// POS data
|
||||
POSMetaData?: DutchiePOSMetaData;
|
||||
|
||||
// Pricing
|
||||
Prices?: number[];
|
||||
recPrices?: number[];
|
||||
medicalPrices?: number[];
|
||||
recSpecialPrices?: number[];
|
||||
medicalSpecialPrices?: number[];
|
||||
wholesalePrices?: number[];
|
||||
pricingTierData?: any;
|
||||
specialIdsPerOption?: any;
|
||||
|
||||
// Specials
|
||||
special?: boolean;
|
||||
specialData?: DutchieSpecialData;
|
||||
|
||||
// Classification
|
||||
Status?: string;
|
||||
strainType?: string;
|
||||
subcategory?: string;
|
||||
type?: string;
|
||||
provider?: string;
|
||||
effects?: Record<string, any>;
|
||||
|
||||
// Threshold flags
|
||||
isBelowThreshold?: boolean;
|
||||
isBelowKioskThreshold?: boolean;
|
||||
optionsBelowThreshold?: boolean;
|
||||
optionsBelowKioskThreshold?: boolean;
|
||||
|
||||
// Misc
|
||||
bottleDepositTaxCents?: number;
|
||||
__typename?: string;
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// DERIVED TYPES
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* StockStatus - derived from POSMetaData.children quantityAvailable
|
||||
* - 'in_stock': At least one option has quantityAvailable > 0
|
||||
* - 'out_of_stock': All options have quantityAvailable === 0
|
||||
* - 'unknown': No POSMetaData.children or quantityAvailable data
|
||||
* - 'missing_from_feed': Product was not present in the latest crawl feed
|
||||
*/
|
||||
export type StockStatus = 'in_stock' | 'out_of_stock' | 'unknown' | 'missing_from_feed';
|
||||
|
||||
/**
|
||||
* CrawlMode - defines how products are fetched from Dutchie
|
||||
* - 'mode_a': UI parity - Status: 'Active', threshold removal ON
|
||||
* - 'mode_b': MAX COVERAGE - No Status filter, bypass thresholds
|
||||
*/
|
||||
export type CrawlMode = 'mode_a' | 'mode_b';
|
||||
|
||||
/**
|
||||
* Per-option stock status type
|
||||
*/
|
||||
export type OptionStockStatus = 'in_stock' | 'out_of_stock' | 'unknown';
|
||||
|
||||
/**
|
||||
* Get available quantity for a single option
|
||||
* Priority: quantityAvailable > kioskQuantityAvailable > quantity
|
||||
*/
|
||||
export function getOptionQuantity(child: DutchiePOSChild): number | null {
|
||||
if (typeof child.quantityAvailable === 'number') return child.quantityAvailable;
|
||||
if (typeof child.kioskQuantityAvailable === 'number') return child.kioskQuantityAvailable;
|
||||
if (typeof child.quantity === 'number') return child.quantity;
|
||||
return null; // No quantity data available
|
||||
}
|
||||
|
||||
/**
|
||||
* Derive stock status for a single option
|
||||
* Returns: 'in_stock' if qty > 0, 'out_of_stock' if qty === 0, 'unknown' if no data
|
||||
*/
|
||||
export function deriveOptionStockStatus(child: DutchiePOSChild): OptionStockStatus {
|
||||
const qty = getOptionQuantity(child);
|
||||
if (qty === null) return 'unknown';
|
||||
return qty > 0 ? 'in_stock' : 'out_of_stock';
|
||||
}
|
||||
|
||||
/**
|
||||
* Derive product-level stock status from POSMetaData.children
|
||||
*
|
||||
* Logic per spec:
|
||||
* - If ANY child is "in_stock" → product is "in_stock"
|
||||
* - Else if ALL children are "out_of_stock" → product is "out_of_stock"
|
||||
* - Else → product is "unknown"
|
||||
*
|
||||
* IMPORTANT: Threshold flags (isBelowThreshold, etc.) do NOT override stock status.
|
||||
* They only indicate "low stock" - if qty > 0, status stays "in_stock".
|
||||
*/
|
||||
export function deriveStockStatus(product: DutchieRawProduct): StockStatus {
|
||||
const children = product.POSMetaData?.children;
|
||||
|
||||
// No children data - unknown
|
||||
if (!children || children.length === 0) {
|
||||
return 'unknown';
|
||||
}
|
||||
|
||||
// Get stock status for each option
|
||||
const optionStatuses = children.map(deriveOptionStockStatus);
|
||||
|
||||
// If ANY option is in_stock → product is in_stock
|
||||
if (optionStatuses.some(status => status === 'in_stock')) {
|
||||
return 'in_stock';
|
||||
}
|
||||
|
||||
// If ALL options are out_of_stock → product is out_of_stock
|
||||
if (optionStatuses.every(status => status === 'out_of_stock')) {
|
||||
return 'out_of_stock';
|
||||
}
|
||||
|
||||
// Otherwise (mix of out_of_stock and unknown) → unknown
|
||||
return 'unknown';
|
||||
}
|
||||
|
||||
/**
|
||||
* Calculate total quantity available across all options
|
||||
* Returns null if no children data (unknown inventory), 0 if children exist but all have 0 qty
|
||||
*/
|
||||
export function calculateTotalQuantity(product: DutchieRawProduct): number | null {
|
||||
const children = product.POSMetaData?.children;
|
||||
// No children = unknown inventory, return null (NOT 0)
|
||||
if (!children || children.length === 0) return null;
|
||||
|
||||
// Check if any child has quantity data
|
||||
const hasAnyQtyData = children.some(child => getOptionQuantity(child) !== null);
|
||||
if (!hasAnyQtyData) return null; // All children lack qty data = unknown
|
||||
|
||||
return children.reduce((sum, child) => {
|
||||
const qty = getOptionQuantity(child);
|
||||
return sum + (qty ?? 0);
|
||||
}, 0);
|
||||
}
|
||||
|
||||
/**
|
||||
* Calculate total kiosk quantity available across all options
|
||||
*/
|
||||
export function calculateTotalKioskQuantity(product: DutchieRawProduct): number | null {
|
||||
const children = product.POSMetaData?.children;
|
||||
if (!children || children.length === 0) return null;
|
||||
|
||||
const hasAnyKioskQty = children.some(child => typeof child.kioskQuantityAvailable === 'number');
|
||||
if (!hasAnyKioskQty) return null;
|
||||
|
||||
return children.reduce((sum, child) => sum + (child.kioskQuantityAvailable ?? 0), 0);
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// DATABASE ENTITY TYPES
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Dispensary - represents a Dutchie store in Arizona
|
||||
*/
|
||||
export interface Dispensary {
|
||||
id: number;
|
||||
platform: 'dutchie';
|
||||
name: string;
|
||||
dbaName?: string;
|
||||
slug: string;
|
||||
city: string;
|
||||
state: string;
|
||||
postalCode?: string;
|
||||
latitude?: number;
|
||||
longitude?: number;
|
||||
address?: string;
|
||||
platformDispensaryId?: string; // Resolved internal ID (e.g., "6405ef617056e8014d79101b")
|
||||
isDelivery?: boolean;
|
||||
isPickup?: boolean;
|
||||
rawMetadata?: any; // Full discovery node
|
||||
lastCrawledAt?: Date;
|
||||
productCount?: number;
|
||||
createdAt: Date;
|
||||
updatedAt: Date;
|
||||
menuType?: string;
|
||||
menuUrl?: string;
|
||||
scrapeEnabled?: boolean;
|
||||
providerDetectionData?: any;
|
||||
platformDispensaryIdResolvedAt?: Date;
|
||||
website?: string; // The dispensary's own website (from raw_metadata or direct column)
|
||||
}
|
||||
|
||||
/**
|
||||
* DutchieProduct - canonical product identity per store
|
||||
*/
|
||||
export interface DutchieProduct {
|
||||
id: number;
|
||||
dispensaryId: number;
|
||||
platform: 'dutchie';
|
||||
|
||||
externalProductId: string; // from _id or id
|
||||
platformDispensaryId: string; // mirror of Dispensary.platformDispensaryId
|
||||
cName?: string; // cName / slug
|
||||
name: string; // Name
|
||||
|
||||
// Brand
|
||||
brandName?: string;
|
||||
brandId?: string;
|
||||
brandLogoUrl?: string;
|
||||
|
||||
// Classification
|
||||
type?: string;
|
||||
subcategory?: string;
|
||||
strainType?: string;
|
||||
provider?: string;
|
||||
|
||||
// Potency
|
||||
thc?: number;
|
||||
thcContent?: number;
|
||||
cbd?: number;
|
||||
cbdContent?: number;
|
||||
cannabinoidsV2?: DutchieCannabinoidV2[];
|
||||
effects?: Record<string, any>;
|
||||
|
||||
// Status / flags
|
||||
status?: string;
|
||||
medicalOnly: boolean;
|
||||
recOnly: boolean;
|
||||
featured: boolean;
|
||||
comingSoon: boolean;
|
||||
certificateOfAnalysisEnabled: boolean;
|
||||
|
||||
isBelowThreshold: boolean;
|
||||
isBelowKioskThreshold: boolean;
|
||||
optionsBelowThreshold: boolean;
|
||||
optionsBelowKioskThreshold: boolean;
|
||||
|
||||
// Derived stock status (from POSMetaData.children quantityAvailable)
|
||||
stockStatus: StockStatus;
|
||||
totalQuantityAvailable?: number | null; // null = unknown (no children), 0 = all OOS
|
||||
|
||||
// Images
|
||||
primaryImageUrl?: string;
|
||||
images?: DutchieImage[];
|
||||
|
||||
// Misc
|
||||
measurements?: any;
|
||||
weight?: string;
|
||||
pastCNames?: string[];
|
||||
|
||||
createdAtDutchie?: Date;
|
||||
updatedAtDutchie?: Date;
|
||||
|
||||
latestRawPayload?: any; // Full product node from last crawl
|
||||
|
||||
createdAt: Date;
|
||||
updatedAt: Date;
|
||||
}
|
||||
|
||||
/**
|
||||
* DutchieProductOptionSnapshot - child-level option data from POSMetaData.children
|
||||
*/
|
||||
export interface DutchieProductOptionSnapshot {
|
||||
optionId: string; // canonicalID or canonicalPackageId or canonicalSKU
|
||||
canonicalId?: string;
|
||||
canonicalPackageId?: string;
|
||||
canonicalSKU?: string;
|
||||
canonicalName?: string;
|
||||
|
||||
canonicalCategory?: string;
|
||||
canonicalCategoryId?: string;
|
||||
canonicalBrandId?: string;
|
||||
canonicalBrandName?: string;
|
||||
canonicalStrainId?: string;
|
||||
canonicalVendorId?: string;
|
||||
|
||||
optionLabel?: string; // from option field
|
||||
packageQuantity?: number;
|
||||
recEquivalent?: number;
|
||||
standardEquivalent?: number;
|
||||
|
||||
priceCents?: number; // price * 100
|
||||
recPriceCents?: number; // recPrice * 100
|
||||
medPriceCents?: number; // medPrice * 100
|
||||
|
||||
quantity?: number;
|
||||
quantityAvailable?: number;
|
||||
kioskQuantityAvailable?: number;
|
||||
|
||||
activeBatchTags?: any;
|
||||
canonicalImgUrl?: string;
|
||||
canonicalLabResultUrl?: string;
|
||||
canonicalEffectivePotencyMg?: number;
|
||||
|
||||
rawChildPayload?: any; // Full POSMetaData.children node
|
||||
}
|
||||
|
||||
/**
|
||||
* DutchieProductSnapshot - per crawl, includes options[]
|
||||
*/
|
||||
export interface DutchieProductSnapshot {
|
||||
id: number;
|
||||
dutchieProductId: number;
|
||||
dispensaryId: number;
|
||||
platformDispensaryId: string;
|
||||
externalProductId: string;
|
||||
pricingType: 'rec' | 'med' | 'unknown';
|
||||
crawlMode: CrawlMode; // Which crawl mode captured this snapshot
|
||||
|
||||
status?: string;
|
||||
featured: boolean;
|
||||
special: boolean;
|
||||
medicalOnly: boolean;
|
||||
recOnly: boolean;
|
||||
|
||||
// Flag indicating if product was present in feed (false = missing_from_feed snapshot)
|
||||
isPresentInFeed: boolean;
|
||||
|
||||
// Derived stock status for this snapshot
|
||||
stockStatus: StockStatus;
|
||||
|
||||
// Price summary (aggregated from children, in cents)
|
||||
recMinPriceCents?: number;
|
||||
recMaxPriceCents?: number;
|
||||
recMinSpecialPriceCents?: number;
|
||||
medMinPriceCents?: number;
|
||||
medMaxPriceCents?: number;
|
||||
medMinSpecialPriceCents?: number;
|
||||
wholesaleMinPriceCents?: number;
|
||||
|
||||
// Inventory summary (aggregated from POSMetaData.children)
|
||||
totalQuantityAvailable?: number | null; // null = unknown (no children), 0 = all OOS
|
||||
totalKioskQuantityAvailable?: number | null;
|
||||
manualInventory: boolean;
|
||||
isBelowThreshold: boolean;
|
||||
isBelowKioskThreshold: boolean;
|
||||
|
||||
// Option-level data
|
||||
options: DutchieProductOptionSnapshot[];
|
||||
|
||||
// Full raw product node at this crawl time
|
||||
rawPayload: any;
|
||||
|
||||
crawledAt: Date;
|
||||
createdAt: Date;
|
||||
updatedAt: Date;
|
||||
}
|
||||
|
||||
/**
|
||||
* CrawlJob - tracks crawl execution status
|
||||
*/
|
||||
export interface CrawlJob {
|
||||
id: number;
|
||||
jobType: 'discovery' | 'product_crawl' | 'resolve_ids';
|
||||
dispensaryId?: number;
|
||||
status: 'pending' | 'running' | 'completed' | 'failed';
|
||||
startedAt?: Date;
|
||||
completedAt?: Date;
|
||||
errorMessage?: string;
|
||||
productsFound?: number;
|
||||
snapshotsCreated?: number;
|
||||
metadata?: any;
|
||||
createdAt: Date;
|
||||
updatedAt: Date;
|
||||
}
|
||||
|
||||
/**
|
||||
* JobSchedule - recurring job configuration with jitter support
|
||||
* Times "wander" around the clock due to random jitter after each run
|
||||
*/
|
||||
export type JobStatus = 'success' | 'error' | 'partial' | 'running' | null;
|
||||
|
||||
export interface JobSchedule {
|
||||
id: number;
|
||||
jobName: string;
|
||||
description?: string;
|
||||
enabled: boolean;
|
||||
|
||||
// Timing configuration
|
||||
baseIntervalMinutes: number; // e.g., 240 (4 hours)
|
||||
jitterMinutes: number; // e.g., 30 (±30 minutes)
|
||||
|
||||
// Worker identity
|
||||
workerName?: string; // e.g., "Alice", "Henry", "Bella", "Oscar"
|
||||
workerRole?: string; // e.g., "Store Discovery Worker", "GraphQL Product Sync"
|
||||
|
||||
// Last run tracking
|
||||
lastRunAt?: Date;
|
||||
lastStatus?: JobStatus;
|
||||
lastErrorMessage?: string;
|
||||
lastDurationMs?: number;
|
||||
|
||||
// Next run (calculated with jitter)
|
||||
nextRunAt?: Date;
|
||||
|
||||
// Job-specific config
|
||||
jobConfig?: Record<string, any>;
|
||||
|
||||
createdAt: Date;
|
||||
updatedAt: Date;
|
||||
}
|
||||
|
||||
/**
|
||||
* JobRunLog - history of job executions
|
||||
*/
|
||||
export interface JobRunLog {
|
||||
id: number;
|
||||
scheduleId: number;
|
||||
jobName: string;
|
||||
status: 'pending' | 'running' | 'success' | 'error' | 'partial';
|
||||
startedAt?: Date;
|
||||
completedAt?: Date;
|
||||
durationMs?: number;
|
||||
errorMessage?: string;
|
||||
|
||||
// Worker identity (propagated from schedule)
|
||||
workerName?: string; // e.g., "Alice", "Henry", "Bella", "Oscar"
|
||||
runRole?: string; // e.g., "Store Discovery Worker"
|
||||
|
||||
// Results summary
|
||||
itemsProcessed?: number;
|
||||
itemsSucceeded?: number;
|
||||
itemsFailed?: number;
|
||||
|
||||
metadata?: any;
|
||||
createdAt: Date;
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// GRAPHQL OPERATION TYPES
|
||||
// ============================================================
|
||||
|
||||
export interface FilteredProductsVariables {
|
||||
includeEnterpriseSpecials: boolean;
|
||||
productsFilter: {
|
||||
dispensaryId: string;
|
||||
pricingType: 'rec' | 'med';
|
||||
strainTypes?: string[];
|
||||
subcategories?: string[];
|
||||
Status?: string;
|
||||
types?: string[];
|
||||
useCache?: boolean;
|
||||
isDefaultSort?: boolean;
|
||||
sortBy?: string;
|
||||
sortDirection?: number;
|
||||
bypassOnlineThresholds?: boolean;
|
||||
isKioskMenu?: boolean;
|
||||
removeProductsBelowOptionThresholds?: boolean;
|
||||
};
|
||||
page: number;
|
||||
perPage: number;
|
||||
}
|
||||
|
||||
export interface GetAddressBasedDispensaryDataVariables {
|
||||
input: {
|
||||
dispensaryId: string; // The slug like "AZ-Deeply-Rooted"
|
||||
};
|
||||
}
|
||||
|
||||
export interface ConsumerDispensariesVariables {
|
||||
filter: {
|
||||
lat: number;
|
||||
lng: number;
|
||||
radius: number; // in meters or km
|
||||
isDelivery?: boolean;
|
||||
searchText?: string;
|
||||
};
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// API RESPONSE TYPES
|
||||
// ============================================================
|
||||
|
||||
export interface DashboardStats {
|
||||
dispensaryCount: number;
|
||||
productCount: number;
|
||||
snapshotCount24h: number;
|
||||
lastCrawlTime?: Date;
|
||||
failedJobCount: number;
|
||||
brandCount: number;
|
||||
categoryCount: number;
|
||||
}
|
||||
|
||||
export interface CategorySummary {
|
||||
type: string;
|
||||
subcategory: string;
|
||||
productCount: number;
|
||||
dispensaryCount: number;
|
||||
avgPrice?: number;
|
||||
}
|
||||
|
||||
export interface BrandSummary {
|
||||
brandName: string;
|
||||
brandId?: string;
|
||||
brandLogoUrl?: string;
|
||||
productCount: number;
|
||||
dispensaryCount: number;
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// CRAWLER PROFILE TYPES
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* DispensaryCrawlerProfile - per-store crawler configuration
|
||||
*
|
||||
* Allows each dispensary to have customized crawler settings without
|
||||
* affecting shared crawler logic. A dispensary can have multiple profiles
|
||||
* but only one is active at a time (via dispensaries.active_crawler_profile_id).
|
||||
*/
|
||||
export interface DispensaryCrawlerProfile {
|
||||
id: number;
|
||||
dispensaryId: number;
|
||||
profileName: string;
|
||||
crawlerType: string; // 'dutchie', 'treez', 'jane', 'sandbox', 'custom'
|
||||
profileKey: string | null; // Optional key for per-store module mapping
|
||||
config: Record<string, any>; // Crawler-specific configuration
|
||||
timeoutMs: number | null;
|
||||
downloadImages: boolean;
|
||||
trackStock: boolean;
|
||||
version: number;
|
||||
enabled: boolean;
|
||||
createdAt: Date;
|
||||
updatedAt: Date;
|
||||
}
|
||||
|
||||
/**
|
||||
* DispensaryCrawlerProfileCreate - input type for creating a new profile
|
||||
*/
|
||||
export interface DispensaryCrawlerProfileCreate {
|
||||
dispensaryId: number;
|
||||
profileName: string;
|
||||
crawlerType: string;
|
||||
profileKey?: string | null;
|
||||
config?: Record<string, any>;
|
||||
timeoutMs?: number | null;
|
||||
downloadImages?: boolean;
|
||||
trackStock?: boolean;
|
||||
version?: number;
|
||||
enabled?: boolean;
|
||||
}
|
||||
|
||||
/**
|
||||
* DispensaryCrawlerProfileUpdate - input type for updating an existing profile
|
||||
*/
|
||||
export interface DispensaryCrawlerProfileUpdate {
|
||||
profileName?: string;
|
||||
crawlerType?: string;
|
||||
profileKey?: string | null;
|
||||
config?: Record<string, any>;
|
||||
timeoutMs?: number | null;
|
||||
downloadImages?: boolean;
|
||||
trackStock?: boolean;
|
||||
version?: number;
|
||||
enabled?: boolean;
|
||||
}
|
||||
|
||||
/**
|
||||
* CrawlerProfileOptions - runtime options derived from a profile
|
||||
* Used when invoking the actual crawler
|
||||
*/
|
||||
export interface CrawlerProfileOptions {
|
||||
timeoutMs: number;
|
||||
downloadImages: boolean;
|
||||
trackStock: boolean;
|
||||
config: Record<string, any>;
|
||||
}
|
||||
@@ -669,12 +669,4 @@ export async function syncRecentCrawls(
|
||||
return { synced, errors };
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// EXPORTS
|
||||
// ============================================================
|
||||
|
||||
export {
|
||||
CrawlResult,
|
||||
SyncOptions,
|
||||
SyncResult,
|
||||
};
|
||||
// Types CrawlResult, SyncOptions, and SyncResult are already exported at their declarations
|
||||
|
||||
@@ -6,6 +6,7 @@ import { initializeMinio, isMinioEnabled } from './utils/minio';
|
||||
import { initializeImageStorage } from './utils/image-storage';
|
||||
import { logger } from './services/logger';
|
||||
import { cleanupOrphanedJobs } from './services/proxyTestQueue';
|
||||
import healthRoutes from './routes/health';
|
||||
|
||||
dotenv.config();
|
||||
|
||||
@@ -58,22 +59,15 @@ import scraperMonitorRoutes from './routes/scraper-monitor';
|
||||
import apiTokensRoutes from './routes/api-tokens';
|
||||
import apiPermissionsRoutes from './routes/api-permissions';
|
||||
import parallelScrapeRoutes from './routes/parallel-scrape';
|
||||
import scheduleRoutes from './routes/schedule';
|
||||
import crawlerSandboxRoutes from './routes/crawler-sandbox';
|
||||
import versionRoutes from './routes/version';
|
||||
import publicApiRoutes from './routes/public-api';
|
||||
import usersRoutes from './routes/users';
|
||||
import staleProcessesRoutes from './routes/stale-processes';
|
||||
import orchestratorAdminRoutes from './routes/orchestrator-admin';
|
||||
import adminRoutes from './routes/admin';
|
||||
import healthRoutes from './routes/health';
|
||||
import workersRoutes from './routes/workers';
|
||||
import { dutchieAZRouter, startScheduler as startDutchieAZScheduler, initializeDefaultSchedules } from './dutchie-az';
|
||||
import { getPool } from './dutchie-az/db/connection';
|
||||
import { createAnalyticsRouter } from './dutchie-az/routes/analytics';
|
||||
import { createMultiStateRoutes } from './multi-state';
|
||||
import { trackApiUsage, checkRateLimit } from './middleware/apiTokenTracker';
|
||||
import { startCrawlScheduler } from './services/crawl-scheduler';
|
||||
import { validateWordPressPermissions } from './middleware/wordpressPermissions';
|
||||
import { markTrustedDomains } from './middleware/trustedDomains';
|
||||
import { createSystemRouter, createPrometheusRouter } from './system/routes';
|
||||
@@ -81,7 +75,7 @@ import { createPortalRoutes } from './portals';
|
||||
import { createStatesRouter } from './routes/states';
|
||||
import { createAnalyticsV2Router } from './routes/analytics-v2';
|
||||
import { createDiscoveryRoutes } from './discovery';
|
||||
import { createDutchieDiscoveryRoutes, promoteDiscoveryLocation } from './dutchie-az/discovery';
|
||||
import { getPool } from './db/pool';
|
||||
|
||||
// Consumer API routes (findadispo.com, findagram.co)
|
||||
import consumerAuthRoutes from './routes/consumer-auth';
|
||||
@@ -132,41 +126,22 @@ app.use('/api/scraper-monitor', scraperMonitorRoutes);
|
||||
app.use('/api/api-tokens', apiTokensRoutes);
|
||||
app.use('/api/api-permissions', apiPermissionsRoutes);
|
||||
app.use('/api/parallel-scrape', parallelScrapeRoutes);
|
||||
app.use('/api/schedule', scheduleRoutes);
|
||||
app.use('/api/crawler-sandbox', crawlerSandboxRoutes);
|
||||
app.use('/api/version', versionRoutes);
|
||||
app.use('/api/users', usersRoutes);
|
||||
app.use('/api/stale-processes', staleProcessesRoutes);
|
||||
// Admin routes - operator actions (crawl triggers, health checks)
|
||||
app.use('/api/admin', adminRoutes);
|
||||
// Admin routes - orchestrator actions
|
||||
app.use('/api/admin/orchestrator', orchestratorAdminRoutes);
|
||||
|
||||
// SEO orchestrator routes
|
||||
app.use('/api/seo', seoRoutes);
|
||||
|
||||
// Provider-agnostic worker management routes (replaces /api/dutchie-az/admin/schedules)
|
||||
// Provider-agnostic worker management routes
|
||||
app.use('/api/workers', workersRoutes);
|
||||
// Monitor routes - aliased from workers for convenience
|
||||
app.use('/api/monitor', workersRoutes);
|
||||
console.log('[Workers] Routes registered at /api/workers and /api/monitor');
|
||||
|
||||
// Market data pipeline routes (provider-agnostic)
|
||||
app.use('/api/markets', dutchieAZRouter);
|
||||
// Legacy aliases (deprecated - remove after frontend migration)
|
||||
app.use('/api/az', dutchieAZRouter);
|
||||
app.use('/api/dutchie-az', dutchieAZRouter);
|
||||
|
||||
// Phase 3: Analytics Dashboards - price trends, penetration, category growth, etc.
|
||||
try {
|
||||
const analyticsRouter = createAnalyticsRouter(getPool());
|
||||
app.use('/api/markets/analytics', analyticsRouter);
|
||||
// Legacy alias for backwards compatibility
|
||||
app.use('/api/az/analytics', analyticsRouter);
|
||||
console.log('[Analytics] Routes registered at /api/markets/analytics');
|
||||
} catch (error) {
|
||||
console.warn('[Analytics] Failed to register routes:', error);
|
||||
}
|
||||
|
||||
// Phase 3: Analytics V2 - Enhanced analytics with rec/med state segmentation
|
||||
try {
|
||||
const analyticsV2Router = createAnalyticsV2Router(getPool());
|
||||
@@ -239,43 +214,7 @@ try {
|
||||
}
|
||||
|
||||
// Platform-specific Discovery Routes
|
||||
// Uses neutral slugs to avoid trademark issues in URLs:
|
||||
// dt = Dutchie, jn = Jane, wm = Weedmaps, etc.
|
||||
// Routes: /api/discovery/platforms/:platformSlug/*
|
||||
try {
|
||||
const dtDiscoveryRoutes = createDutchieDiscoveryRoutes(getPool());
|
||||
app.use('/api/discovery/platforms/dt', dtDiscoveryRoutes);
|
||||
console.log('[Discovery] Platform routes registered at /api/discovery/platforms/dt');
|
||||
} catch (error) {
|
||||
console.warn('[Discovery] Failed to register platform routes:', error);
|
||||
}
|
||||
|
||||
// Orchestrator promotion endpoint (platform-agnostic)
|
||||
// Route: /api/orchestrator/platforms/:platformSlug/promote/:id
|
||||
app.post('/api/orchestrator/platforms/:platformSlug/promote/:id', async (req, res) => {
|
||||
try {
|
||||
const { platformSlug, id } = req.params;
|
||||
|
||||
// Validate platform slug
|
||||
const validPlatforms = ['dt']; // dt = Dutchie
|
||||
if (!validPlatforms.includes(platformSlug)) {
|
||||
return res.status(400).json({
|
||||
success: false,
|
||||
error: `Invalid platform slug: ${platformSlug}. Valid slugs: ${validPlatforms.join(', ')}`
|
||||
});
|
||||
}
|
||||
|
||||
const result = await promoteDiscoveryLocation(getPool(), parseInt(id, 10));
|
||||
if (result.success) {
|
||||
res.json(result);
|
||||
} else {
|
||||
res.status(400).json(result);
|
||||
}
|
||||
} catch (error: any) {
|
||||
console.error('[Orchestrator] Promotion error:', error);
|
||||
res.status(500).json({ success: false, error: error.message });
|
||||
}
|
||||
});
|
||||
// TODO: Rebuild with /platforms/dutchie/ module
|
||||
|
||||
async function startServer() {
|
||||
try {
|
||||
@@ -288,15 +227,6 @@ async function startServer() {
|
||||
// Clean up any orphaned proxy test jobs from previous server runs
|
||||
await cleanupOrphanedJobs();
|
||||
|
||||
// Start the crawl scheduler (checks every minute for jobs to run)
|
||||
startCrawlScheduler();
|
||||
logger.info('system', 'Crawl scheduler started');
|
||||
|
||||
// Start the Dutchie AZ scheduler (enqueues jobs for workers)
|
||||
await initializeDefaultSchedules();
|
||||
startDutchieAZScheduler();
|
||||
logger.info('system', 'Dutchie AZ scheduler started');
|
||||
|
||||
app.listen(PORT, () => {
|
||||
logger.info('system', `Server running on port ${PORT}`);
|
||||
console.log(`🚀 Server running on port ${PORT}`);
|
||||
|
||||
544
backend/src/platforms/dutchie/client.ts
Normal file
544
backend/src/platforms/dutchie/client.ts
Normal file
@@ -0,0 +1,544 @@
|
||||
/**
|
||||
* ============================================================
|
||||
* DUTCHIE PLATFORM CLIENT - LOCKED MODULE
|
||||
* ============================================================
|
||||
*
|
||||
* DO NOT MODIFY THIS FILE WITHOUT EXPLICIT AUTHORIZATION.
|
||||
*
|
||||
* This is the canonical HTTP client for all Dutchie communication.
|
||||
* All Dutchie workers (Alice, Bella, etc.) MUST use this client.
|
||||
*
|
||||
* IMPLEMENTATION:
|
||||
* - Uses curl via child_process.execSync (bypasses TLS fingerprinting)
|
||||
* - NO Puppeteer, NO axios, NO fetch
|
||||
* - Fingerprint rotation on 403
|
||||
* - Residential IP compatible
|
||||
*
|
||||
* USAGE:
|
||||
* import { curlPost, curlGet, executeGraphQL } from '@dutchie/client';
|
||||
*
|
||||
* ============================================================
|
||||
*/
|
||||
|
||||
import { execSync } from 'child_process';
|
||||
|
||||
// ============================================================
|
||||
// TYPES
|
||||
// ============================================================
|
||||
|
||||
export interface CurlResponse {
|
||||
status: number;
|
||||
data: any;
|
||||
error?: string;
|
||||
}
|
||||
|
||||
export interface Fingerprint {
|
||||
userAgent: string;
|
||||
acceptLanguage: string;
|
||||
secChUa?: string;
|
||||
secChUaPlatform?: string;
|
||||
secChUaMobile?: string;
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// CONFIGURATION
|
||||
// ============================================================
|
||||
|
||||
export const DUTCHIE_CONFIG = {
|
||||
graphqlEndpoint: 'https://dutchie.com/api-3/graphql',
|
||||
baseUrl: 'https://dutchie.com',
|
||||
timeout: 30000,
|
||||
maxRetries: 3,
|
||||
perPage: 100,
|
||||
maxPages: 200,
|
||||
pageDelayMs: 500,
|
||||
modeDelayMs: 2000,
|
||||
};
|
||||
|
||||
// ============================================================
|
||||
// PROXY SUPPORT
|
||||
// ============================================================
|
||||
// Integrates with the CrawlRotator system from proxy-rotator.ts
|
||||
// On 403 errors:
|
||||
// 1. Record failure on current proxy
|
||||
// 2. Rotate to next proxy
|
||||
// 3. Retry with new proxy
|
||||
// ============================================================
|
||||
|
||||
import type { CrawlRotator, Proxy } from '../../services/crawl-rotator';
|
||||
|
||||
let currentProxy: string | null = null;
|
||||
let crawlRotator: CrawlRotator | null = null;
|
||||
|
||||
/**
|
||||
* Set proxy for all Dutchie requests
|
||||
* Format: http://user:pass@host:port or socks5://host:port
|
||||
*/
|
||||
export function setProxy(proxy: string | null): void {
|
||||
currentProxy = proxy;
|
||||
if (proxy) {
|
||||
console.log(`[Dutchie Client] Proxy set: ${proxy.replace(/:[^:@]+@/, ':***@')}`);
|
||||
} else {
|
||||
console.log('[Dutchie Client] Proxy disabled (direct connection)');
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Get current proxy URL
|
||||
*/
|
||||
export function getProxy(): string | null {
|
||||
return currentProxy;
|
||||
}
|
||||
|
||||
/**
|
||||
* Set CrawlRotator for proxy rotation on 403s
|
||||
* This enables automatic proxy rotation when blocked
|
||||
*/
|
||||
export function setCrawlRotator(rotator: CrawlRotator | null): void {
|
||||
crawlRotator = rotator;
|
||||
if (rotator) {
|
||||
console.log('[Dutchie Client] CrawlRotator attached - proxy rotation enabled');
|
||||
// Set initial proxy from rotator
|
||||
const proxy = rotator.proxy.getCurrent();
|
||||
if (proxy) {
|
||||
currentProxy = rotator.proxy.getProxyUrl(proxy);
|
||||
console.log(`[Dutchie Client] Initial proxy: ${currentProxy.replace(/:[^:@]+@/, ':***@')}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Get attached CrawlRotator
|
||||
*/
|
||||
export function getCrawlRotator(): CrawlRotator | null {
|
||||
return crawlRotator;
|
||||
}
|
||||
|
||||
/**
|
||||
* Rotate to next proxy (called on 403)
|
||||
*/
|
||||
async function rotateProxyOn403(error?: string): Promise<boolean> {
|
||||
if (!crawlRotator) {
|
||||
return false;
|
||||
}
|
||||
|
||||
// Record failure on current proxy
|
||||
await crawlRotator.recordFailure(error || '403 Forbidden');
|
||||
|
||||
// Rotate to next proxy
|
||||
const nextProxy = crawlRotator.rotateProxy();
|
||||
if (nextProxy) {
|
||||
currentProxy = crawlRotator.proxy.getProxyUrl(nextProxy);
|
||||
console.log(`[Dutchie Client] Rotated proxy: ${currentProxy.replace(/:[^:@]+@/, ':***@')}`);
|
||||
return true;
|
||||
}
|
||||
|
||||
console.warn('[Dutchie Client] No more proxies available');
|
||||
return false;
|
||||
}
|
||||
|
||||
/**
|
||||
* Record success on current proxy
|
||||
*/
|
||||
async function recordProxySuccess(responseTimeMs?: number): Promise<void> {
|
||||
if (crawlRotator) {
|
||||
await crawlRotator.recordSuccess(responseTimeMs);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Build curl proxy argument
|
||||
*/
|
||||
function getProxyArg(): string {
|
||||
if (!currentProxy) return '';
|
||||
return `--proxy '${currentProxy}'`;
|
||||
}
|
||||
|
||||
export const GRAPHQL_HASHES = {
|
||||
FilteredProducts: 'ee29c060826dc41c527e470e9ae502c9b2c169720faa0a9f5d25e1b9a530a4a0',
|
||||
GetAddressBasedDispensaryData: '13461f73abf7268770dfd05fe7e10c523084b2bb916a929c08efe3d87531977b',
|
||||
ConsumerDispensaries: '0a5bfa6ca1d64ae47bcccb7c8077c87147cbc4e6982c17ceec97a2a4948b311b',
|
||||
DispensaryInfo: '13461f73abf7268770dfd05fe7e10c523084b2bb916a929c08efe3d87531977b',
|
||||
};
|
||||
|
||||
// ============================================================
|
||||
// FINGERPRINTS - Browser profiles for anti-detect
|
||||
// ============================================================
|
||||
|
||||
const FINGERPRINTS: Fingerprint[] = [
|
||||
// Chrome Windows (latest) - typical residential user, use first
|
||||
{
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
|
||||
acceptLanguage: 'en-US,en;q=0.9',
|
||||
secChUa: '"Google Chrome";v="131", "Chromium";v="131", "Not_A Brand";v="24"',
|
||||
secChUaPlatform: '"Windows"',
|
||||
secChUaMobile: '?0',
|
||||
},
|
||||
// Chrome Mac (latest)
|
||||
{
|
||||
userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
|
||||
acceptLanguage: 'en-US,en;q=0.9',
|
||||
secChUa: '"Google Chrome";v="131", "Chromium";v="131", "Not_A Brand";v="24"',
|
||||
secChUaPlatform: '"macOS"',
|
||||
secChUaMobile: '?0',
|
||||
},
|
||||
// Chrome Windows (120)
|
||||
{
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
|
||||
acceptLanguage: 'en-US,en;q=0.9',
|
||||
secChUa: '"Chromium";v="120", "Google Chrome";v="120", "Not-A.Brand";v="99"',
|
||||
secChUaPlatform: '"Windows"',
|
||||
secChUaMobile: '?0',
|
||||
},
|
||||
// Firefox Windows
|
||||
{
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:133.0) Gecko/20100101 Firefox/133.0',
|
||||
acceptLanguage: 'en-US,en;q=0.5',
|
||||
},
|
||||
// Safari Mac
|
||||
{
|
||||
userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 14_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15',
|
||||
acceptLanguage: 'en-US,en;q=0.9',
|
||||
},
|
||||
// Edge Windows
|
||||
{
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36 Edg/131.0.0.0',
|
||||
acceptLanguage: 'en-US,en;q=0.9',
|
||||
secChUa: '"Microsoft Edge";v="131", "Chromium";v="131", "Not_A Brand";v="24"',
|
||||
secChUaPlatform: '"Windows"',
|
||||
secChUaMobile: '?0',
|
||||
},
|
||||
];
|
||||
|
||||
let currentFingerprintIndex = 0;
|
||||
|
||||
export function getFingerprint(): Fingerprint {
|
||||
return FINGERPRINTS[currentFingerprintIndex];
|
||||
}
|
||||
|
||||
export function rotateFingerprint(): Fingerprint {
|
||||
currentFingerprintIndex = (currentFingerprintIndex + 1) % FINGERPRINTS.length;
|
||||
const fp = FINGERPRINTS[currentFingerprintIndex];
|
||||
console.log(`[Dutchie Client] Rotated to fingerprint: ${fp.userAgent.slice(0, 50)}...`);
|
||||
return fp;
|
||||
}
|
||||
|
||||
export function resetFingerprint(): void {
|
||||
currentFingerprintIndex = 0;
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// CURL HTTP CLIENT
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Build headers for Dutchie requests
|
||||
*/
|
||||
export function buildHeaders(refererPath: string, fingerprint?: Fingerprint): Record<string, string> {
|
||||
const fp = fingerprint || getFingerprint();
|
||||
const refererUrl = `https://dutchie.com${refererPath}`;
|
||||
|
||||
const headers: Record<string, string> = {
|
||||
'accept': 'application/json, text/plain, */*',
|
||||
'accept-language': fp.acceptLanguage,
|
||||
'content-type': 'application/json',
|
||||
'origin': 'https://dutchie.com',
|
||||
'referer': refererUrl,
|
||||
'user-agent': fp.userAgent,
|
||||
'apollographql-client-name': 'Marketplace (production)',
|
||||
};
|
||||
|
||||
if (fp.secChUa) {
|
||||
headers['sec-ch-ua'] = fp.secChUa;
|
||||
headers['sec-ch-ua-mobile'] = fp.secChUaMobile || '?0';
|
||||
headers['sec-ch-ua-platform'] = fp.secChUaPlatform || '"Windows"';
|
||||
headers['sec-fetch-dest'] = 'empty';
|
||||
headers['sec-fetch-mode'] = 'cors';
|
||||
headers['sec-fetch-site'] = 'same-site';
|
||||
}
|
||||
|
||||
return headers;
|
||||
}
|
||||
|
||||
/**
|
||||
* Execute HTTP POST using curl (bypasses TLS fingerprinting)
|
||||
*/
|
||||
export function curlPost(url: string, body: any, headers: Record<string, string>, timeout = 30000): CurlResponse {
|
||||
const filteredHeaders = Object.entries(headers)
|
||||
.filter(([k]) => k.toLowerCase() !== 'accept-encoding')
|
||||
.map(([k, v]) => `-H '${k}: ${v}'`)
|
||||
.join(' ');
|
||||
|
||||
const bodyJson = JSON.stringify(body).replace(/'/g, "'\\''");
|
||||
const timeoutSec = Math.ceil(timeout / 1000);
|
||||
const separator = '___HTTP_STATUS___';
|
||||
const proxyArg = getProxyArg();
|
||||
const cmd = `curl -s --compressed ${proxyArg} -w '${separator}%{http_code}' --max-time ${timeoutSec} ${filteredHeaders} -d '${bodyJson}' '${url}'`;
|
||||
|
||||
try {
|
||||
const output = execSync(cmd, {
|
||||
encoding: 'utf-8',
|
||||
maxBuffer: 10 * 1024 * 1024,
|
||||
timeout: timeout + 5000
|
||||
});
|
||||
|
||||
const separatorIndex = output.lastIndexOf(separator);
|
||||
if (separatorIndex === -1) {
|
||||
const lines = output.trim().split('\n');
|
||||
const statusCode = parseInt(lines.pop() || '0', 10);
|
||||
const responseBody = lines.join('\n');
|
||||
try {
|
||||
return { status: statusCode, data: JSON.parse(responseBody) };
|
||||
} catch {
|
||||
return { status: statusCode, data: responseBody };
|
||||
}
|
||||
}
|
||||
|
||||
const responseBody = output.slice(0, separatorIndex);
|
||||
const statusCode = parseInt(output.slice(separatorIndex + separator.length).trim(), 10);
|
||||
|
||||
try {
|
||||
return { status: statusCode, data: JSON.parse(responseBody) };
|
||||
} catch {
|
||||
return { status: statusCode, data: responseBody };
|
||||
}
|
||||
} catch (error: any) {
|
||||
return {
|
||||
status: 0,
|
||||
data: null,
|
||||
error: error.message || 'curl request failed'
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Execute HTTP GET using curl (bypasses TLS fingerprinting)
|
||||
* Returns HTML or JSON depending on response content-type
|
||||
*/
|
||||
export function curlGet(url: string, headers: Record<string, string>, timeout = 30000): CurlResponse {
|
||||
const filteredHeaders = Object.entries(headers)
|
||||
.filter(([k]) => k.toLowerCase() !== 'accept-encoding')
|
||||
.map(([k, v]) => `-H '${k}: ${v}'`)
|
||||
.join(' ');
|
||||
|
||||
const timeoutSec = Math.ceil(timeout / 1000);
|
||||
const separator = '___HTTP_STATUS___';
|
||||
const proxyArg = getProxyArg();
|
||||
const cmd = `curl -s --compressed ${proxyArg} -w '${separator}%{http_code}' --max-time ${timeoutSec} ${filteredHeaders} '${url}'`;
|
||||
|
||||
try {
|
||||
const output = execSync(cmd, {
|
||||
encoding: 'utf-8',
|
||||
maxBuffer: 10 * 1024 * 1024,
|
||||
timeout: timeout + 5000
|
||||
});
|
||||
|
||||
const separatorIndex = output.lastIndexOf(separator);
|
||||
if (separatorIndex === -1) {
|
||||
const lines = output.trim().split('\n');
|
||||
const statusCode = parseInt(lines.pop() || '0', 10);
|
||||
const responseBody = lines.join('\n');
|
||||
return { status: statusCode, data: responseBody };
|
||||
}
|
||||
|
||||
const responseBody = output.slice(0, separatorIndex);
|
||||
const statusCode = parseInt(output.slice(separatorIndex + separator.length).trim(), 10);
|
||||
|
||||
// Try to parse as JSON, otherwise return as string (HTML)
|
||||
try {
|
||||
return { status: statusCode, data: JSON.parse(responseBody) };
|
||||
} catch {
|
||||
return { status: statusCode, data: responseBody };
|
||||
}
|
||||
} catch (error: any) {
|
||||
return {
|
||||
status: 0,
|
||||
data: null,
|
||||
error: error.message || 'curl request failed'
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// GRAPHQL EXECUTION
|
||||
// ============================================================
|
||||
|
||||
export interface ExecuteGraphQLOptions {
|
||||
maxRetries?: number;
|
||||
retryOn403?: boolean;
|
||||
cName: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Execute GraphQL query with curl (bypasses TLS fingerprinting)
|
||||
*/
|
||||
export async function executeGraphQL(
|
||||
operationName: string,
|
||||
variables: any,
|
||||
hash: string,
|
||||
options: ExecuteGraphQLOptions
|
||||
): Promise<any> {
|
||||
const { maxRetries = 3, retryOn403 = true, cName } = options;
|
||||
|
||||
const body = {
|
||||
operationName,
|
||||
variables,
|
||||
extensions: {
|
||||
persistedQuery: { version: 1, sha256Hash: hash },
|
||||
},
|
||||
};
|
||||
|
||||
let lastError: Error | null = null;
|
||||
let attempt = 0;
|
||||
|
||||
while (attempt <= maxRetries) {
|
||||
const fingerprint = getFingerprint();
|
||||
const headers = buildHeaders(`/embedded-menu/${cName}`, fingerprint);
|
||||
|
||||
console.log(`[Dutchie Client] curl POST ${operationName} (attempt ${attempt + 1}/${maxRetries + 1})`);
|
||||
|
||||
const response = curlPost(DUTCHIE_CONFIG.graphqlEndpoint, body, headers, DUTCHIE_CONFIG.timeout);
|
||||
|
||||
console.log(`[Dutchie Client] Response status: ${response.status}`);
|
||||
|
||||
if (response.error) {
|
||||
console.error(`[Dutchie Client] curl error: ${response.error}`);
|
||||
lastError = new Error(response.error);
|
||||
attempt++;
|
||||
if (attempt <= maxRetries) {
|
||||
await sleep(1000 * attempt);
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
if (response.status === 200) {
|
||||
if (response.data?.errors?.length > 0) {
|
||||
console.warn(`[Dutchie Client] GraphQL errors: ${JSON.stringify(response.data.errors[0])}`);
|
||||
}
|
||||
return response.data;
|
||||
}
|
||||
|
||||
if (response.status === 403 && retryOn403) {
|
||||
console.warn(`[Dutchie Client] 403 blocked - rotating fingerprint...`);
|
||||
rotateFingerprint();
|
||||
attempt++;
|
||||
await sleep(1000 * attempt);
|
||||
continue;
|
||||
}
|
||||
|
||||
const bodyPreview = typeof response.data === 'string'
|
||||
? response.data.slice(0, 200)
|
||||
: JSON.stringify(response.data).slice(0, 200);
|
||||
console.error(`[Dutchie Client] HTTP ${response.status}: ${bodyPreview}`);
|
||||
lastError = new Error(`HTTP ${response.status}`);
|
||||
|
||||
attempt++;
|
||||
if (attempt <= maxRetries) {
|
||||
await sleep(1000 * attempt);
|
||||
}
|
||||
}
|
||||
|
||||
throw lastError || new Error('Max retries exceeded');
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// HTML PAGE FETCHING
|
||||
// ============================================================
|
||||
|
||||
export interface FetchPageOptions {
|
||||
maxRetries?: number;
|
||||
retryOn403?: boolean;
|
||||
}
|
||||
|
||||
/**
|
||||
* Fetch HTML page from Dutchie (for city pages, dispensary pages, etc.)
|
||||
* Returns raw HTML string
|
||||
*/
|
||||
export async function fetchPage(
|
||||
path: string,
|
||||
options: FetchPageOptions = {}
|
||||
): Promise<{ html: string; status: number } | null> {
|
||||
const { maxRetries = 3, retryOn403 = true } = options;
|
||||
const url = `${DUTCHIE_CONFIG.baseUrl}${path}`;
|
||||
|
||||
let attempt = 0;
|
||||
|
||||
while (attempt <= maxRetries) {
|
||||
const fingerprint = getFingerprint();
|
||||
const headers: Record<string, string> = {
|
||||
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
|
||||
'accept-language': fingerprint.acceptLanguage,
|
||||
'user-agent': fingerprint.userAgent,
|
||||
};
|
||||
|
||||
if (fingerprint.secChUa) {
|
||||
headers['sec-ch-ua'] = fingerprint.secChUa;
|
||||
headers['sec-ch-ua-mobile'] = fingerprint.secChUaMobile || '?0';
|
||||
headers['sec-ch-ua-platform'] = fingerprint.secChUaPlatform || '"Windows"';
|
||||
headers['sec-fetch-dest'] = 'document';
|
||||
headers['sec-fetch-mode'] = 'navigate';
|
||||
headers['sec-fetch-site'] = 'none';
|
||||
headers['sec-fetch-user'] = '?1';
|
||||
headers['upgrade-insecure-requests'] = '1';
|
||||
}
|
||||
|
||||
console.log(`[Dutchie Client] curl GET ${path} (attempt ${attempt + 1}/${maxRetries + 1})`);
|
||||
|
||||
const response = curlGet(url, headers, DUTCHIE_CONFIG.timeout);
|
||||
|
||||
console.log(`[Dutchie Client] Response status: ${response.status}`);
|
||||
|
||||
if (response.error) {
|
||||
console.error(`[Dutchie Client] curl error: ${response.error}`);
|
||||
attempt++;
|
||||
if (attempt <= maxRetries) {
|
||||
await sleep(1000 * attempt);
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
if (response.status === 200) {
|
||||
return { html: response.data, status: response.status };
|
||||
}
|
||||
|
||||
if (response.status === 403 && retryOn403) {
|
||||
console.warn(`[Dutchie Client] 403 blocked - rotating fingerprint...`);
|
||||
rotateFingerprint();
|
||||
attempt++;
|
||||
await sleep(1000 * attempt);
|
||||
continue;
|
||||
}
|
||||
|
||||
console.error(`[Dutchie Client] HTTP ${response.status}`);
|
||||
attempt++;
|
||||
if (attempt <= maxRetries) {
|
||||
await sleep(1000 * attempt);
|
||||
}
|
||||
}
|
||||
|
||||
return null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract __NEXT_DATA__ from HTML page
|
||||
*/
|
||||
export function extractNextData(html: string): any | null {
|
||||
const match = html.match(/<script id="__NEXT_DATA__" type="application\/json">([^<]+)<\/script>/);
|
||||
if (match && match[1]) {
|
||||
try {
|
||||
return JSON.parse(match[1]);
|
||||
} catch (e) {
|
||||
console.error('[Dutchie Client] Failed to parse __NEXT_DATA__:', e);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// UTILITY
|
||||
// ============================================================
|
||||
|
||||
function sleep(ms: number): Promise<void> {
|
||||
return new Promise(resolve => setTimeout(resolve, ms));
|
||||
}
|
||||
49
backend/src/platforms/dutchie/index.ts
Normal file
49
backend/src/platforms/dutchie/index.ts
Normal file
@@ -0,0 +1,49 @@
|
||||
/**
|
||||
* Dutchie Platform Module
|
||||
*
|
||||
* Single export point for all Dutchie communication.
|
||||
* All Dutchie workers MUST import from this module.
|
||||
*/
|
||||
|
||||
export {
|
||||
// HTTP Client
|
||||
curlPost,
|
||||
curlGet,
|
||||
executeGraphQL,
|
||||
fetchPage,
|
||||
extractNextData,
|
||||
|
||||
// Headers & Fingerprints
|
||||
buildHeaders,
|
||||
getFingerprint,
|
||||
rotateFingerprint,
|
||||
resetFingerprint,
|
||||
|
||||
// Proxy
|
||||
setProxy,
|
||||
getProxy,
|
||||
setCrawlRotator,
|
||||
getCrawlRotator,
|
||||
|
||||
// Configuration
|
||||
DUTCHIE_CONFIG,
|
||||
GRAPHQL_HASHES,
|
||||
|
||||
// Types
|
||||
type CurlResponse,
|
||||
type Fingerprint,
|
||||
type ExecuteGraphQLOptions,
|
||||
type FetchPageOptions,
|
||||
} from './client';
|
||||
|
||||
// Re-export CrawlRotator types from canonical location
|
||||
export type { CrawlRotator, Proxy, ProxyStats } from '../../services/crawl-rotator';
|
||||
|
||||
// GraphQL Queries
|
||||
export {
|
||||
resolveDispensaryId,
|
||||
resolveDispensaryIdWithDetails,
|
||||
getDispensaryInfo,
|
||||
type ResolveDispensaryResult,
|
||||
type DispensaryInfo,
|
||||
} from './queries';
|
||||
187
backend/src/platforms/dutchie/queries.ts
Normal file
187
backend/src/platforms/dutchie/queries.ts
Normal file
@@ -0,0 +1,187 @@
|
||||
/**
|
||||
* Dutchie GraphQL Queries
|
||||
*
|
||||
* High-level GraphQL operations built on top of the client.
|
||||
*/
|
||||
|
||||
import { executeGraphQL, GRAPHQL_HASHES, DUTCHIE_CONFIG } from './client';
|
||||
|
||||
// ============================================================
|
||||
// TYPES
|
||||
// ============================================================
|
||||
|
||||
export interface ResolveDispensaryResult {
|
||||
dispensaryId: string | null;
|
||||
httpStatus?: number;
|
||||
error?: string;
|
||||
source?: 'graphql' | 'html';
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// DISPENSARY ID RESOLUTION
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* Resolve a dispensary slug to its internal platform ID via GraphQL
|
||||
*/
|
||||
export async function resolveDispensaryId(slug: string): Promise<string | null> {
|
||||
const result = await resolveDispensaryIdWithDetails(slug);
|
||||
return result.dispensaryId;
|
||||
}
|
||||
|
||||
/**
|
||||
* Resolve with full details for error handling
|
||||
*/
|
||||
export async function resolveDispensaryIdWithDetails(slug: string): Promise<ResolveDispensaryResult> {
|
||||
console.log(`[Dutchie Queries] Resolving dispensary ID for slug: ${slug}`);
|
||||
|
||||
try {
|
||||
const variables = {
|
||||
dispensaryFilter: {
|
||||
cNameOrID: slug,
|
||||
},
|
||||
};
|
||||
|
||||
const result = await executeGraphQL(
|
||||
'GetAddressBasedDispensaryData',
|
||||
variables,
|
||||
GRAPHQL_HASHES.GetAddressBasedDispensaryData,
|
||||
{ cName: slug, maxRetries: 3, retryOn403: true }
|
||||
);
|
||||
|
||||
const dispensaryId = result?.data?.dispensaryBySlug?.id ||
|
||||
result?.data?.dispensary?.id ||
|
||||
result?.data?.getAddressBasedDispensaryData?.dispensary?.id;
|
||||
|
||||
if (dispensaryId) {
|
||||
console.log(`[Dutchie Queries] Resolved ${slug} -> ${dispensaryId}`);
|
||||
return { dispensaryId, source: 'graphql' };
|
||||
}
|
||||
|
||||
console.log(`[Dutchie Queries] No dispensaryId in response for ${slug}`);
|
||||
return {
|
||||
dispensaryId: null,
|
||||
error: 'Could not extract dispensaryId from GraphQL response',
|
||||
};
|
||||
|
||||
} catch (error: any) {
|
||||
const status = error.message?.match(/HTTP (\d+)/)?.[1];
|
||||
if (status === '403' || status === '404') {
|
||||
return {
|
||||
dispensaryId: null,
|
||||
httpStatus: parseInt(status),
|
||||
error: `HTTP ${status}: Store may be removed or blocked`,
|
||||
};
|
||||
}
|
||||
|
||||
return {
|
||||
dispensaryId: null,
|
||||
error: error.message,
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// DISPENSARY INFO
|
||||
// ============================================================
|
||||
|
||||
export interface DispensaryInfo {
|
||||
id: string;
|
||||
name: string;
|
||||
slug: string;
|
||||
isOpen: boolean;
|
||||
timezone: string;
|
||||
address: string;
|
||||
city: string;
|
||||
state: string;
|
||||
zip: string;
|
||||
phone: string;
|
||||
email: string;
|
||||
hours: {
|
||||
monday?: { open: string; close: string } | null;
|
||||
tuesday?: { open: string; close: string } | null;
|
||||
wednesday?: { open: string; close: string } | null;
|
||||
thursday?: { open: string; close: string } | null;
|
||||
friday?: { open: string; close: string } | null;
|
||||
saturday?: { open: string; close: string } | null;
|
||||
sunday?: { open: string; close: string } | null;
|
||||
};
|
||||
acceptsCredit: boolean;
|
||||
offersCurbside: boolean;
|
||||
offersDelivery: boolean;
|
||||
offersPickup: boolean;
|
||||
featureFlags: string[];
|
||||
}
|
||||
|
||||
/**
|
||||
* Get dispensary info including business hours
|
||||
*/
|
||||
export async function getDispensaryInfo(cNameOrSlug: string): Promise<DispensaryInfo | null> {
|
||||
console.log(`[Dutchie Queries] Getting dispensary info for: ${cNameOrSlug}`);
|
||||
|
||||
try {
|
||||
const variables = {
|
||||
dispensaryFilter: {
|
||||
cNameOrID: cNameOrSlug,
|
||||
},
|
||||
};
|
||||
|
||||
const result = await executeGraphQL(
|
||||
'GetAddressBasedDispensaryData',
|
||||
variables,
|
||||
GRAPHQL_HASHES.GetAddressBasedDispensaryData,
|
||||
{ cName: cNameOrSlug, maxRetries: 2, retryOn403: true }
|
||||
);
|
||||
|
||||
const dispensary = result?.data?.dispensary ||
|
||||
result?.data?.dispensaryBySlug ||
|
||||
result?.data?.getAddressBasedDispensaryData?.dispensary;
|
||||
|
||||
if (!dispensary) {
|
||||
console.log(`[Dutchie Queries] No dispensary data found for ${cNameOrSlug}`);
|
||||
return null;
|
||||
}
|
||||
|
||||
const hoursSettings = dispensary.hoursSettings || dispensary.operatingHours || {};
|
||||
|
||||
const parseHours = (dayHours: any) => {
|
||||
if (!dayHours || dayHours.isClosed) return null;
|
||||
return {
|
||||
open: dayHours.openTime || dayHours.open || '',
|
||||
close: dayHours.closeTime || dayHours.close || '',
|
||||
};
|
||||
};
|
||||
|
||||
return {
|
||||
id: dispensary.id || dispensary._id || '',
|
||||
name: dispensary.name || '',
|
||||
slug: dispensary.cName || dispensary.slug || cNameOrSlug,
|
||||
isOpen: dispensary.isOpen ?? dispensary.openNow ?? false,
|
||||
timezone: dispensary.timezone || '',
|
||||
address: dispensary.address || dispensary.location?.address || '',
|
||||
city: dispensary.city || dispensary.location?.city || '',
|
||||
state: dispensary.state || dispensary.location?.state || '',
|
||||
zip: dispensary.zip || dispensary.zipcode || dispensary.location?.zip || '',
|
||||
phone: dispensary.phone || dispensary.phoneNumber || '',
|
||||
email: dispensary.email || '',
|
||||
hours: {
|
||||
monday: parseHours(hoursSettings.monday),
|
||||
tuesday: parseHours(hoursSettings.tuesday),
|
||||
wednesday: parseHours(hoursSettings.wednesday),
|
||||
thursday: parseHours(hoursSettings.thursday),
|
||||
friday: parseHours(hoursSettings.friday),
|
||||
saturday: parseHours(hoursSettings.saturday),
|
||||
sunday: parseHours(hoursSettings.sunday),
|
||||
},
|
||||
acceptsCredit: dispensary.acceptsCreditCards ?? dispensary.creditCardAccepted ?? false,
|
||||
offersCurbside: dispensary.offersCurbside ?? dispensary.curbsidePickup ?? false,
|
||||
offersDelivery: dispensary.offersDelivery ?? dispensary.delivery ?? false,
|
||||
offersPickup: dispensary.offersPickup ?? dispensary.pickup ?? true,
|
||||
featureFlags: dispensary.featureFlags || [],
|
||||
};
|
||||
|
||||
} catch (error: any) {
|
||||
console.error(`[Dutchie Queries] Error getting dispensary info: ${error.message}`);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
@@ -1,53 +0,0 @@
|
||||
/**
|
||||
* Admin Routes
|
||||
*
|
||||
* Top-level admin/operator actions (crawl triggers, health checks, etc.)
|
||||
*
|
||||
* Route semantics:
|
||||
* /api/admin/... = Admin/operator actions
|
||||
* /api/az/... = Arizona data slice (stores, products, metrics)
|
||||
*/
|
||||
|
||||
import { Router, Request, Response } from 'express';
|
||||
import { getDispensaryById, crawlSingleDispensary } from '../dutchie-az';
|
||||
|
||||
const router = Router();
|
||||
|
||||
// ============================================================
|
||||
// CRAWL TRIGGER
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* POST /api/admin/crawl/:dispensaryId
|
||||
*
|
||||
* Trigger a crawl for a specific dispensary.
|
||||
* This is the CANONICAL endpoint for triggering crawls.
|
||||
*
|
||||
* Request body (optional):
|
||||
* - pricingType: 'rec' | 'med' (default: 'rec')
|
||||
* - useBothModes: boolean (default: true)
|
||||
*
|
||||
* Response:
|
||||
* - On success: crawl result with product counts
|
||||
* - On 404: dispensary not found
|
||||
* - On 500: crawl error
|
||||
*/
|
||||
router.post('/crawl/:dispensaryId', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const { dispensaryId } = req.params;
|
||||
const { pricingType = 'rec', useBothModes = true } = req.body;
|
||||
|
||||
// Fetch the dispensary first
|
||||
const dispensary = await getDispensaryById(parseInt(dispensaryId, 10));
|
||||
if (!dispensary) {
|
||||
return res.status(404).json({ error: 'Dispensary not found' });
|
||||
}
|
||||
|
||||
const result = await crawlSingleDispensary(dispensary, pricingType, { useBothModes });
|
||||
res.json(result);
|
||||
} catch (error: any) {
|
||||
res.status(500).json({ error: error.message });
|
||||
}
|
||||
});
|
||||
|
||||
export default router;
|
||||
@@ -1,6 +1,6 @@
|
||||
import { Router } from 'express';
|
||||
import { authMiddleware } from '../auth/middleware';
|
||||
import { query as azQuery } from '../dutchie-az/db/connection';
|
||||
import { pool } from '../db/pool';
|
||||
|
||||
const router = Router();
|
||||
router.use(authMiddleware);
|
||||
@@ -10,7 +10,7 @@ router.use(authMiddleware);
|
||||
router.get('/stats', async (req, res) => {
|
||||
try {
|
||||
// All stats in a single query using CTEs
|
||||
const result = await azQuery(`
|
||||
const result = await pool.query(`
|
||||
WITH dispensary_stats AS (
|
||||
SELECT
|
||||
COUNT(*) as total,
|
||||
@@ -93,7 +93,7 @@ router.get('/activity', async (req, res) => {
|
||||
const { limit = 20 } = req.query;
|
||||
|
||||
// Recent crawls from dispensaries (with product counts from dutchie_products)
|
||||
const scrapesResult = await azQuery(`
|
||||
const scrapesResult = await pool.query(`
|
||||
SELECT
|
||||
d.name,
|
||||
d.last_crawled_at as last_scraped_at,
|
||||
@@ -105,7 +105,7 @@ router.get('/activity', async (req, res) => {
|
||||
`, [limit]);
|
||||
|
||||
// Recent products from dutchie_products
|
||||
const productsResult = await azQuery(`
|
||||
const productsResult = await pool.query(`
|
||||
SELECT
|
||||
p.name,
|
||||
0 as price,
|
||||
|
||||
@@ -11,12 +11,11 @@ const VALID_MENU_TYPES = ['dutchie', 'treez', 'jane', 'weedmaps', 'leafly', 'mea
|
||||
// Get all dispensaries
|
||||
router.get('/', async (req, res) => {
|
||||
try {
|
||||
const { menu_type } = req.query;
|
||||
const { menu_type, city, state } = req.query;
|
||||
|
||||
let query = `
|
||||
SELECT
|
||||
id,
|
||||
azdhs_id,
|
||||
name,
|
||||
company_name,
|
||||
slug,
|
||||
@@ -25,36 +24,46 @@ router.get('/', async (req, res) => {
|
||||
state,
|
||||
zip,
|
||||
phone,
|
||||
email,
|
||||
website,
|
||||
dba_name,
|
||||
google_rating,
|
||||
google_review_count,
|
||||
status_line,
|
||||
azdhs_url,
|
||||
latitude,
|
||||
longitude,
|
||||
menu_url,
|
||||
menu_type,
|
||||
menu_provider,
|
||||
menu_provider_confidence,
|
||||
scraper_template,
|
||||
last_menu_scrape,
|
||||
menu_scrape_status,
|
||||
platform,
|
||||
platform_dispensary_id,
|
||||
product_count,
|
||||
last_crawl_at,
|
||||
created_at,
|
||||
updated_at
|
||||
FROM dispensaries
|
||||
`;
|
||||
|
||||
const params: any[] = [];
|
||||
const conditions: string[] = [];
|
||||
|
||||
// Filter by menu_type if provided
|
||||
if (menu_type) {
|
||||
query += ` WHERE menu_type = $1`;
|
||||
conditions.push(`menu_type = $${params.length + 1}`);
|
||||
params.push(menu_type);
|
||||
}
|
||||
|
||||
// Filter by city if provided
|
||||
if (city) {
|
||||
conditions.push(`city ILIKE $${params.length + 1}`);
|
||||
params.push(city);
|
||||
}
|
||||
|
||||
// Filter by state if provided
|
||||
if (state) {
|
||||
conditions.push(`state = $${params.length + 1}`);
|
||||
params.push(state);
|
||||
}
|
||||
|
||||
if (conditions.length > 0) {
|
||||
query += ` WHERE ${conditions.join(' AND ')}`;
|
||||
}
|
||||
|
||||
query += ` ORDER BY name`;
|
||||
|
||||
const result = await pool.query(query, params);
|
||||
@@ -82,15 +91,15 @@ router.get('/stats/menu-types', async (req, res) => {
|
||||
}
|
||||
});
|
||||
|
||||
// Get single dispensary by slug
|
||||
router.get('/:slug', async (req, res) => {
|
||||
// Get single dispensary by slug or ID
|
||||
router.get('/:slugOrId', async (req, res) => {
|
||||
try {
|
||||
const { slug } = req.params;
|
||||
const { slugOrId } = req.params;
|
||||
const isNumeric = /^\d+$/.test(slugOrId);
|
||||
|
||||
const result = await pool.query(`
|
||||
SELECT
|
||||
id,
|
||||
azdhs_id,
|
||||
name,
|
||||
company_name,
|
||||
slug,
|
||||
@@ -99,29 +108,22 @@ router.get('/:slug', async (req, res) => {
|
||||
state,
|
||||
zip,
|
||||
phone,
|
||||
email,
|
||||
website,
|
||||
dba_name,
|
||||
google_rating,
|
||||
google_review_count,
|
||||
status_line,
|
||||
azdhs_url,
|
||||
latitude,
|
||||
longitude,
|
||||
menu_url,
|
||||
menu_type,
|
||||
menu_provider,
|
||||
menu_provider_confidence,
|
||||
scraper_template,
|
||||
scraper_config,
|
||||
last_menu_scrape,
|
||||
menu_scrape_status,
|
||||
platform,
|
||||
platform_dispensary_id,
|
||||
product_count,
|
||||
last_crawl_at,
|
||||
raw_metadata,
|
||||
created_at,
|
||||
updated_at
|
||||
FROM dispensaries
|
||||
WHERE slug = $1
|
||||
`, [slug]);
|
||||
WHERE ${isNumeric ? 'id = $1' : 'slug = $1'}
|
||||
`, [isNumeric ? parseInt(slugOrId) : slugOrId]);
|
||||
|
||||
if (result.rows.length === 0) {
|
||||
return res.status(404).json({ error: 'Dispensary not found' });
|
||||
@@ -139,17 +141,22 @@ router.put('/:id', async (req, res) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
const {
|
||||
name,
|
||||
dba_name,
|
||||
company_name,
|
||||
website,
|
||||
phone,
|
||||
email,
|
||||
google_rating,
|
||||
google_review_count,
|
||||
address,
|
||||
city,
|
||||
state,
|
||||
zip,
|
||||
latitude,
|
||||
longitude,
|
||||
menu_url,
|
||||
menu_type,
|
||||
scraper_template,
|
||||
scraper_config,
|
||||
menu_scrape_status
|
||||
platform,
|
||||
platform_dispensary_id,
|
||||
slug,
|
||||
} = req.body;
|
||||
|
||||
// Validate menu_type if provided
|
||||
@@ -162,32 +169,42 @@ router.put('/:id', async (req, res) => {
|
||||
const result = await pool.query(`
|
||||
UPDATE dispensaries
|
||||
SET
|
||||
dba_name = COALESCE($1, dba_name),
|
||||
website = COALESCE($2, website),
|
||||
phone = COALESCE($3, phone),
|
||||
email = COALESCE($4, email),
|
||||
google_rating = COALESCE($5, google_rating),
|
||||
google_review_count = COALESCE($6, google_review_count),
|
||||
menu_url = COALESCE($7, menu_url),
|
||||
menu_type = COALESCE($8, menu_type),
|
||||
scraper_template = COALESCE($9, scraper_template),
|
||||
scraper_config = COALESCE($10, scraper_config),
|
||||
menu_scrape_status = COALESCE($11, menu_scrape_status),
|
||||
name = COALESCE($1, name),
|
||||
dba_name = COALESCE($2, dba_name),
|
||||
company_name = COALESCE($3, company_name),
|
||||
website = COALESCE($4, website),
|
||||
phone = COALESCE($5, phone),
|
||||
address = COALESCE($6, address),
|
||||
city = COALESCE($7, city),
|
||||
state = COALESCE($8, state),
|
||||
zip = COALESCE($9, zip),
|
||||
latitude = COALESCE($10, latitude),
|
||||
longitude = COALESCE($11, longitude),
|
||||
menu_url = COALESCE($12, menu_url),
|
||||
menu_type = COALESCE($13, menu_type),
|
||||
platform = COALESCE($14, platform),
|
||||
platform_dispensary_id = COALESCE($15, platform_dispensary_id),
|
||||
slug = COALESCE($16, slug),
|
||||
updated_at = CURRENT_TIMESTAMP
|
||||
WHERE id = $12
|
||||
WHERE id = $17
|
||||
RETURNING *
|
||||
`, [
|
||||
name,
|
||||
dba_name,
|
||||
company_name,
|
||||
website,
|
||||
phone,
|
||||
email,
|
||||
google_rating,
|
||||
google_review_count,
|
||||
address,
|
||||
city,
|
||||
state,
|
||||
zip,
|
||||
latitude,
|
||||
longitude,
|
||||
menu_url,
|
||||
menu_type,
|
||||
scraper_template,
|
||||
scraper_config,
|
||||
menu_scrape_status,
|
||||
platform,
|
||||
platform_dispensary_id,
|
||||
slug,
|
||||
id
|
||||
]);
|
||||
|
||||
@@ -468,6 +485,100 @@ router.patch('/:id/menu-type', async (req, res) => {
|
||||
}
|
||||
});
|
||||
|
||||
// Sync dispensary from discovery (upsert by platform_dispensary_id or slug)
|
||||
// Used by Alice worker to sync discovered dispensaries to DB
|
||||
router.post('/sync', async (req, res) => {
|
||||
try {
|
||||
const {
|
||||
name,
|
||||
slug,
|
||||
city,
|
||||
state,
|
||||
address,
|
||||
postalCode,
|
||||
latitude,
|
||||
longitude,
|
||||
platformDispensaryId,
|
||||
menuType,
|
||||
menuUrl,
|
||||
platform,
|
||||
} = req.body;
|
||||
|
||||
if (!slug || !platformDispensaryId) {
|
||||
return res.status(400).json({ error: 'slug and platformDispensaryId are required' });
|
||||
}
|
||||
|
||||
// Try to find existing by platform_dispensary_id first, then by slug
|
||||
const existingResult = await pool.query(`
|
||||
SELECT id, name, slug, platform_dispensary_id, menu_type
|
||||
FROM dispensaries
|
||||
WHERE platform_dispensary_id = $1
|
||||
OR (slug = $2 AND platform_dispensary_id IS NULL)
|
||||
LIMIT 1
|
||||
`, [platformDispensaryId, slug]);
|
||||
|
||||
if (existingResult.rows.length > 0) {
|
||||
// Update existing
|
||||
const existing = existingResult.rows[0];
|
||||
const result = await pool.query(`
|
||||
UPDATE dispensaries
|
||||
SET
|
||||
name = COALESCE($1, name),
|
||||
slug = COALESCE($2, slug),
|
||||
city = COALESCE($3, city),
|
||||
state = COALESCE($4, state),
|
||||
address = COALESCE($5, address),
|
||||
zip = COALESCE($6, zip),
|
||||
latitude = COALESCE($7, latitude),
|
||||
longitude = COALESCE($8, longitude),
|
||||
platform_dispensary_id = COALESCE($9, platform_dispensary_id),
|
||||
menu_type = COALESCE($10, menu_type),
|
||||
menu_url = COALESCE($11, menu_url),
|
||||
platform = COALESCE($12, platform),
|
||||
updated_at = CURRENT_TIMESTAMP
|
||||
WHERE id = $13
|
||||
RETURNING id, name, slug, platform_dispensary_id, menu_type
|
||||
`, [
|
||||
name, slug, city, state, address, postalCode,
|
||||
latitude, longitude, platformDispensaryId, menuType, menuUrl, platform,
|
||||
existing.id
|
||||
]);
|
||||
|
||||
const updated = result.rows[0];
|
||||
const changed = existing.platform_dispensary_id !== updated.platform_dispensary_id ||
|
||||
existing.menu_type !== updated.menu_type ||
|
||||
existing.slug !== updated.slug;
|
||||
|
||||
return res.json({
|
||||
action: changed ? 'updated' : 'matched',
|
||||
dispensary: updated,
|
||||
});
|
||||
}
|
||||
|
||||
// Insert new
|
||||
const result = await pool.query(`
|
||||
INSERT INTO dispensaries (
|
||||
name, slug, city, state, address, zip,
|
||||
latitude, longitude, platform_dispensary_id, menu_type, menu_url, platform,
|
||||
created_at, updated_at
|
||||
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP)
|
||||
RETURNING id, name, slug, platform_dispensary_id, menu_type
|
||||
`, [
|
||||
name, slug, city, state, address, postalCode,
|
||||
latitude, longitude, platformDispensaryId, menuType, menuUrl, platform
|
||||
]);
|
||||
|
||||
return res.json({
|
||||
action: 'inserted',
|
||||
dispensary: result.rows[0],
|
||||
});
|
||||
|
||||
} catch (error) {
|
||||
console.error('Error syncing dispensary:', error);
|
||||
res.status(500).json({ error: 'Failed to sync dispensary' });
|
||||
}
|
||||
});
|
||||
|
||||
// Bulk update menu_type for multiple dispensaries
|
||||
router.post('/bulk/menu-type', async (req, res) => {
|
||||
try {
|
||||
|
||||
@@ -14,7 +14,7 @@
|
||||
*/
|
||||
|
||||
import { Router, Request, Response } from 'express';
|
||||
import { getPool, healthCheck as dbHealthCheck } from '../dutchie-az/db/connection';
|
||||
import { pool } from '../db/pool';
|
||||
import { getRedis } from '../lib/redis';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
@@ -119,7 +119,7 @@ async function getApiHealth(): Promise<ApiHealth> {
|
||||
async function getDbHealth(): Promise<DbHealth> {
|
||||
const start = Date.now();
|
||||
try {
|
||||
const pool = getPool();
|
||||
// pool imported from db/pool
|
||||
await pool.query('SELECT 1');
|
||||
return {
|
||||
status: 'ok',
|
||||
@@ -175,7 +175,7 @@ async function getRedisHealth(): Promise<RedisHealth> {
|
||||
|
||||
async function getWorkersHealth(): Promise<WorkersHealth> {
|
||||
try {
|
||||
const pool = getPool();
|
||||
// pool imported from db/pool
|
||||
|
||||
// Get queue stats from v_queue_stats view or equivalent
|
||||
const queueStatsResult = await pool.query(`
|
||||
@@ -248,7 +248,7 @@ async function getWorkersHealth(): Promise<WorkersHealth> {
|
||||
|
||||
async function getCrawlsHealth(): Promise<CrawlsHealth> {
|
||||
try {
|
||||
const pool = getPool();
|
||||
// pool imported from db/pool
|
||||
|
||||
// Get crawl statistics
|
||||
const statsResult = await pool.query(`
|
||||
@@ -299,7 +299,7 @@ async function getCrawlsHealth(): Promise<CrawlsHealth> {
|
||||
|
||||
async function getAnalyticsHealth(): Promise<AnalyticsHealth> {
|
||||
try {
|
||||
const pool = getPool();
|
||||
// pool imported from db/pool
|
||||
|
||||
// Check analytics/aggregate job runs
|
||||
const statsResult = await pool.query(`
|
||||
|
||||
@@ -9,7 +9,6 @@
|
||||
|
||||
import { Router, Request, Response, NextFunction } from 'express';
|
||||
import { pool } from '../db/pool';
|
||||
import { query as dutchieAzQuery } from '../dutchie-az/db/connection';
|
||||
import ipaddr from 'ipaddr.js';
|
||||
import {
|
||||
ApiScope,
|
||||
@@ -140,7 +139,7 @@ async function validatePublicApiKey(
|
||||
|
||||
try {
|
||||
// Query WordPress permissions table with store info
|
||||
const result = await pool.query<ApiKeyPermission>(`
|
||||
const result = await pool.query(`
|
||||
SELECT
|
||||
p.id,
|
||||
p.user_name,
|
||||
@@ -198,7 +197,7 @@ async function validatePublicApiKey(
|
||||
|
||||
// Resolve the dutchie_az store for wordpress keys
|
||||
if (permission.key_type === 'wordpress' && permission.store_name) {
|
||||
const storeResult = await dutchieAzQuery<{ id: number }>(`
|
||||
const storeResult = await pool.query(`
|
||||
SELECT id FROM dispensaries
|
||||
WHERE LOWER(TRIM(name)) = LOWER(TRIM($1))
|
||||
OR LOWER(TRIM(name)) LIKE LOWER(TRIM($1)) || '%'
|
||||
@@ -439,7 +438,7 @@ router.get('/products', async (req: PublicApiRequest, res: Response) => {
|
||||
|
||||
// Query products with latest snapshot data
|
||||
// Note: Price filters use HAVING clause since they reference the snapshot subquery
|
||||
const { rows: products } = await dutchieAzQuery(`
|
||||
const { rows: products } = await pool.query(`
|
||||
SELECT
|
||||
p.id,
|
||||
p.dispensary_id,
|
||||
@@ -482,7 +481,7 @@ router.get('/products', async (req: PublicApiRequest, res: Response) => {
|
||||
`, params);
|
||||
|
||||
// Get total count for pagination (include price filters if specified)
|
||||
const { rows: countRows } = await dutchieAzQuery(`
|
||||
const { rows: countRows } = await pool.query(`
|
||||
SELECT COUNT(*) as total FROM dutchie_products p
|
||||
LEFT JOIN LATERAL (
|
||||
SELECT rec_min_price_cents, special FROM dutchie_product_snapshots
|
||||
@@ -567,7 +566,7 @@ router.get('/products/:id', async (req: PublicApiRequest, res: Response) => {
|
||||
const { id } = req.params;
|
||||
|
||||
// Get product (without dispensary filter to check access afterward)
|
||||
const { rows: products } = await dutchieAzQuery(`
|
||||
const { rows: products } = await pool.query(`
|
||||
SELECT
|
||||
p.*,
|
||||
s.rec_min_price_cents,
|
||||
@@ -677,7 +676,7 @@ router.get('/categories', async (req: PublicApiRequest, res: Response) => {
|
||||
});
|
||||
}
|
||||
|
||||
const { rows: categories } = await dutchieAzQuery(`
|
||||
const { rows: categories } = await pool.query(`
|
||||
SELECT
|
||||
type as category,
|
||||
subcategory,
|
||||
@@ -733,7 +732,7 @@ router.get('/brands', async (req: PublicApiRequest, res: Response) => {
|
||||
});
|
||||
}
|
||||
|
||||
const { rows: brands } = await dutchieAzQuery(`
|
||||
const { rows: brands } = await pool.query(`
|
||||
SELECT
|
||||
brand_name as brand,
|
||||
COUNT(*) as product_count,
|
||||
@@ -796,7 +795,7 @@ router.get('/specials', async (req: PublicApiRequest, res: Response) => {
|
||||
|
||||
params.push(limitNum, offsetNum);
|
||||
|
||||
const { rows: products } = await dutchieAzQuery(`
|
||||
const { rows: products } = await pool.query(`
|
||||
SELECT
|
||||
p.id,
|
||||
p.dispensary_id,
|
||||
@@ -828,7 +827,7 @@ router.get('/specials', async (req: PublicApiRequest, res: Response) => {
|
||||
|
||||
// Get total count
|
||||
const countParams = params.slice(0, -2);
|
||||
const { rows: countRows } = await dutchieAzQuery(`
|
||||
const { rows: countRows } = await pool.query(`
|
||||
SELECT COUNT(*) as total
|
||||
FROM dutchie_products p
|
||||
INNER JOIN LATERAL (
|
||||
@@ -906,7 +905,7 @@ router.get('/dispensaries', async (req: PublicApiRequest, res: Response) => {
|
||||
}
|
||||
|
||||
// Get single dispensary for wordpress key
|
||||
const { rows: dispensaries } = await dutchieAzQuery(`
|
||||
const { rows: dispensaries } = await pool.query(`
|
||||
SELECT
|
||||
d.id,
|
||||
d.name,
|
||||
@@ -1013,7 +1012,7 @@ router.get('/dispensaries', async (req: PublicApiRequest, res: Response) => {
|
||||
const limitNum = Math.min(parseInt(limit as string, 10) || 100, 500);
|
||||
const offsetNum = parseInt(offset as string, 10) || 0;
|
||||
|
||||
const { rows: dispensaries } = await dutchieAzQuery(`
|
||||
const { rows: dispensaries } = await pool.query(`
|
||||
SELECT
|
||||
d.id,
|
||||
d.name,
|
||||
@@ -1051,7 +1050,7 @@ router.get('/dispensaries', async (req: PublicApiRequest, res: Response) => {
|
||||
LIMIT $${paramIndex} OFFSET $${paramIndex + 1}
|
||||
`, [...params, limitNum, offsetNum]);
|
||||
|
||||
const { rows: countRows } = await dutchieAzQuery(`
|
||||
const { rows: countRows } = await pool.query(`
|
||||
SELECT COUNT(*) as total
|
||||
FROM dispensaries d
|
||||
LEFT JOIN LATERAL (
|
||||
@@ -1178,7 +1177,7 @@ router.get('/search', async (req: PublicApiRequest, res: Response) => {
|
||||
|
||||
params.push(limitNum, offsetNum);
|
||||
|
||||
const { rows: products } = await dutchieAzQuery(`
|
||||
const { rows: products } = await pool.query(`
|
||||
SELECT
|
||||
p.id,
|
||||
p.dispensary_id,
|
||||
@@ -1221,7 +1220,7 @@ router.get('/search', async (req: PublicApiRequest, res: Response) => {
|
||||
|
||||
// Count query (without relevance param)
|
||||
const countParams = params.slice(0, paramIndex - 3); // Remove relevance, limit, offset
|
||||
const { rows: countRows } = await dutchieAzQuery(`
|
||||
const { rows: countRows } = await pool.query(`
|
||||
SELECT COUNT(*) as total
|
||||
FROM dutchie_products p
|
||||
${whereClause}
|
||||
@@ -1302,7 +1301,7 @@ router.get('/menu', async (req: PublicApiRequest, res: Response) => {
|
||||
}
|
||||
|
||||
// Get counts by category
|
||||
const { rows: categoryCounts } = await dutchieAzQuery(`
|
||||
const { rows: categoryCounts } = await pool.query(`
|
||||
SELECT
|
||||
type as category,
|
||||
COUNT(*) as total,
|
||||
@@ -1314,7 +1313,7 @@ router.get('/menu', async (req: PublicApiRequest, res: Response) => {
|
||||
`, params);
|
||||
|
||||
// Get overall stats
|
||||
const { rows: stats } = await dutchieAzQuery(`
|
||||
const { rows: stats } = await pool.query(`
|
||||
SELECT
|
||||
COUNT(*) as total_products,
|
||||
COUNT(*) FILTER (WHERE stock_status = 'in_stock') as in_stock_count,
|
||||
@@ -1326,7 +1325,7 @@ router.get('/menu', async (req: PublicApiRequest, res: Response) => {
|
||||
`, params);
|
||||
|
||||
// Get specials count
|
||||
const { rows: specialsCount } = await dutchieAzQuery(`
|
||||
const { rows: specialsCount } = await pool.query(`
|
||||
SELECT COUNT(*) as count
|
||||
FROM dutchie_products p
|
||||
INNER JOIN LATERAL (
|
||||
|
||||
@@ -1,986 +0,0 @@
|
||||
import { Router, Request, Response } from 'express';
|
||||
import { authMiddleware, requireRole } from '../auth/middleware';
|
||||
import {
|
||||
getGlobalSchedule,
|
||||
updateGlobalSchedule,
|
||||
getStoreScheduleStatuses,
|
||||
getStoreSchedule,
|
||||
updateStoreSchedule,
|
||||
getAllRecentJobs,
|
||||
getRecentJobs,
|
||||
triggerManualCrawl,
|
||||
triggerAllStoresCrawl,
|
||||
cancelJob,
|
||||
restartCrawlScheduler,
|
||||
setSchedulerMode,
|
||||
getSchedulerMode,
|
||||
} from '../services/crawl-scheduler';
|
||||
import {
|
||||
runStoreCrawlOrchestrator,
|
||||
runBatchOrchestrator,
|
||||
getStoresDueForOrchestration,
|
||||
} from '../services/store-crawl-orchestrator';
|
||||
import {
|
||||
runDispensaryOrchestrator,
|
||||
runBatchDispensaryOrchestrator,
|
||||
getDispensariesDueForOrchestration,
|
||||
ensureAllDispensariesHaveSchedules,
|
||||
} from '../services/dispensary-orchestrator';
|
||||
import { pool } from '../db/pool';
|
||||
import { resolveDispensaryId } from '../dutchie-az/services/graphql-client';
|
||||
|
||||
const router = Router();
|
||||
router.use(authMiddleware);
|
||||
|
||||
// ============================================
|
||||
// Global Schedule Endpoints
|
||||
// ============================================
|
||||
|
||||
/**
|
||||
* GET /api/schedule/global
|
||||
* Get global schedule settings
|
||||
*/
|
||||
router.get('/global', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const schedules = await getGlobalSchedule();
|
||||
res.json({ schedules });
|
||||
} catch (error: any) {
|
||||
console.error('Error fetching global schedule:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch global schedule' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* PUT /api/schedule/global/:type
|
||||
* Update global schedule setting
|
||||
*/
|
||||
router.put('/global/:type', requireRole('superadmin', 'admin'), async (req: Request, res: Response) => {
|
||||
try {
|
||||
const { type } = req.params;
|
||||
const { enabled, interval_hours, run_time } = req.body;
|
||||
|
||||
if (type !== 'global_interval' && type !== 'daily_special') {
|
||||
return res.status(400).json({ error: 'Invalid schedule type' });
|
||||
}
|
||||
|
||||
const schedule = await updateGlobalSchedule(type, {
|
||||
enabled,
|
||||
interval_hours,
|
||||
run_time
|
||||
});
|
||||
|
||||
// Restart scheduler to apply changes
|
||||
await restartCrawlScheduler();
|
||||
|
||||
res.json({ schedule, message: 'Schedule updated and scheduler restarted' });
|
||||
} catch (error: any) {
|
||||
console.error('Error updating global schedule:', error);
|
||||
res.status(500).json({ error: 'Failed to update global schedule' });
|
||||
}
|
||||
});
|
||||
|
||||
// ============================================
|
||||
// Store Schedule Endpoints
|
||||
// ============================================
|
||||
|
||||
/**
|
||||
* GET /api/schedule/stores
|
||||
* Get all store schedule statuses
|
||||
*/
|
||||
router.get('/stores', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const stores = await getStoreScheduleStatuses();
|
||||
res.json({ stores });
|
||||
} catch (error: any) {
|
||||
console.error('Error fetching store schedules:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch store schedules' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/schedule/stores/:storeId
|
||||
* Get schedule for a specific store
|
||||
*/
|
||||
router.get('/stores/:storeId', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const storeId = parseInt(req.params.storeId);
|
||||
if (isNaN(storeId)) {
|
||||
return res.status(400).json({ error: 'Invalid store ID' });
|
||||
}
|
||||
|
||||
const schedule = await getStoreSchedule(storeId);
|
||||
res.json({ schedule });
|
||||
} catch (error: any) {
|
||||
console.error('Error fetching store schedule:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch store schedule' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* PUT /api/schedule/stores/:storeId
|
||||
* Update schedule for a specific store
|
||||
*/
|
||||
router.put('/stores/:storeId', requireRole('superadmin', 'admin'), async (req: Request, res: Response) => {
|
||||
try {
|
||||
const storeId = parseInt(req.params.storeId);
|
||||
if (isNaN(storeId)) {
|
||||
return res.status(400).json({ error: 'Invalid store ID' });
|
||||
}
|
||||
|
||||
const {
|
||||
enabled,
|
||||
interval_hours,
|
||||
daily_special_enabled,
|
||||
daily_special_time,
|
||||
priority
|
||||
} = req.body;
|
||||
|
||||
const schedule = await updateStoreSchedule(storeId, {
|
||||
enabled,
|
||||
interval_hours,
|
||||
daily_special_enabled,
|
||||
daily_special_time,
|
||||
priority
|
||||
});
|
||||
|
||||
res.json({ schedule });
|
||||
} catch (error: any) {
|
||||
console.error('Error updating store schedule:', error);
|
||||
res.status(500).json({ error: 'Failed to update store schedule' });
|
||||
}
|
||||
});
|
||||
|
||||
// ============================================
|
||||
// Job Queue Endpoints
|
||||
// ============================================
|
||||
|
||||
/**
|
||||
* GET /api/schedule/jobs
|
||||
* Get recent jobs
|
||||
*/
|
||||
router.get('/jobs', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const limit = parseInt(req.query.limit as string) || 50;
|
||||
const jobs = await getAllRecentJobs(Math.min(limit, 200));
|
||||
res.json({ jobs });
|
||||
} catch (error: any) {
|
||||
console.error('Error fetching jobs:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch jobs' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/schedule/jobs/store/:storeId
|
||||
* Get recent jobs for a specific store
|
||||
*/
|
||||
router.get('/jobs/store/:storeId', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const storeId = parseInt(req.params.storeId);
|
||||
if (isNaN(storeId)) {
|
||||
return res.status(400).json({ error: 'Invalid store ID' });
|
||||
}
|
||||
|
||||
const limit = parseInt(req.query.limit as string) || 10;
|
||||
const jobs = await getRecentJobs(storeId, Math.min(limit, 100));
|
||||
res.json({ jobs });
|
||||
} catch (error: any) {
|
||||
console.error('Error fetching store jobs:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch store jobs' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* POST /api/schedule/jobs/:jobId/cancel
|
||||
* Cancel a pending job
|
||||
*/
|
||||
router.post('/jobs/:jobId/cancel', requireRole('superadmin', 'admin'), async (req: Request, res: Response) => {
|
||||
try {
|
||||
const jobId = parseInt(req.params.jobId);
|
||||
if (isNaN(jobId)) {
|
||||
return res.status(400).json({ error: 'Invalid job ID' });
|
||||
}
|
||||
|
||||
const cancelled = await cancelJob(jobId);
|
||||
if (cancelled) {
|
||||
res.json({ success: true, message: 'Job cancelled' });
|
||||
} else {
|
||||
res.status(400).json({ error: 'Job could not be cancelled (may not be pending)' });
|
||||
}
|
||||
} catch (error: any) {
|
||||
console.error('Error cancelling job:', error);
|
||||
res.status(500).json({ error: 'Failed to cancel job' });
|
||||
}
|
||||
});
|
||||
|
||||
// ============================================
|
||||
// Manual Trigger Endpoints
|
||||
// ============================================
|
||||
|
||||
/**
|
||||
* POST /api/schedule/trigger/store/:storeId
|
||||
* Manually trigger orchestrated crawl for a specific store
|
||||
* Uses the intelligent orchestrator which:
|
||||
* - Checks provider detection status
|
||||
* - Runs detection if needed
|
||||
* - Queues appropriate crawl type (production/sandbox)
|
||||
*/
|
||||
router.post('/trigger/store/:storeId', requireRole('superadmin', 'admin'), async (req: Request, res: Response) => {
|
||||
try {
|
||||
const storeId = parseInt(req.params.storeId);
|
||||
if (isNaN(storeId)) {
|
||||
return res.status(400).json({ error: 'Invalid store ID' });
|
||||
}
|
||||
|
||||
// Use the orchestrator instead of simple triggerManualCrawl
|
||||
const result = await runStoreCrawlOrchestrator(storeId);
|
||||
|
||||
res.json({
|
||||
result,
|
||||
message: result.summary,
|
||||
success: result.status === 'success' || result.status === 'sandbox_only',
|
||||
});
|
||||
} catch (error: any) {
|
||||
console.error('Error triggering orchestrated crawl:', error);
|
||||
res.status(500).json({ error: 'Failed to trigger crawl' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* POST /api/schedule/trigger/store/:storeId/legacy
|
||||
* Legacy: Simple job queue trigger (no orchestration)
|
||||
*/
|
||||
router.post('/trigger/store/:storeId/legacy', requireRole('superadmin', 'admin'), async (req: Request, res: Response) => {
|
||||
try {
|
||||
const storeId = parseInt(req.params.storeId);
|
||||
if (isNaN(storeId)) {
|
||||
return res.status(400).json({ error: 'Invalid store ID' });
|
||||
}
|
||||
|
||||
const job = await triggerManualCrawl(storeId);
|
||||
res.json({ job, message: 'Crawl job created' });
|
||||
} catch (error: any) {
|
||||
console.error('Error triggering manual crawl:', error);
|
||||
res.status(500).json({ error: 'Failed to trigger crawl' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* POST /api/schedule/trigger/all
|
||||
* Manually trigger crawls for all stores
|
||||
*/
|
||||
router.post('/trigger/all', requireRole('superadmin', 'admin'), async (req: Request, res: Response) => {
|
||||
try {
|
||||
const jobsCreated = await triggerAllStoresCrawl();
|
||||
res.json({ jobs_created: jobsCreated, message: `Created ${jobsCreated} crawl jobs` });
|
||||
} catch (error: any) {
|
||||
console.error('Error triggering all crawls:', error);
|
||||
res.status(500).json({ error: 'Failed to trigger crawls' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* POST /api/schedule/restart
|
||||
* Restart the scheduler
|
||||
*/
|
||||
router.post('/restart', requireRole('superadmin', 'admin'), async (req: Request, res: Response) => {
|
||||
try {
|
||||
await restartCrawlScheduler();
|
||||
res.json({ message: 'Scheduler restarted', mode: getSchedulerMode() });
|
||||
} catch (error: any) {
|
||||
console.error('Error restarting scheduler:', error);
|
||||
res.status(500).json({ error: 'Failed to restart scheduler' });
|
||||
}
|
||||
});
|
||||
|
||||
// ============================================
|
||||
// Scheduler Mode Endpoints
|
||||
// ============================================
|
||||
|
||||
/**
|
||||
* GET /api/schedule/mode
|
||||
* Get current scheduler mode
|
||||
*/
|
||||
router.get('/mode', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const mode = getSchedulerMode();
|
||||
res.json({ mode });
|
||||
} catch (error: any) {
|
||||
console.error('Error getting scheduler mode:', error);
|
||||
res.status(500).json({ error: 'Failed to get scheduler mode' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* PUT /api/schedule/mode
|
||||
* Set scheduler mode (legacy or orchestrator)
|
||||
*/
|
||||
router.put('/mode', requireRole('superadmin', 'admin'), async (req: Request, res: Response) => {
|
||||
try {
|
||||
const { mode } = req.body;
|
||||
|
||||
if (mode !== 'legacy' && mode !== 'orchestrator') {
|
||||
return res.status(400).json({ error: 'Invalid mode. Must be "legacy" or "orchestrator"' });
|
||||
}
|
||||
|
||||
setSchedulerMode(mode);
|
||||
|
||||
// Restart scheduler with new mode
|
||||
await restartCrawlScheduler();
|
||||
|
||||
res.json({ mode, message: `Scheduler mode set to ${mode} and restarted` });
|
||||
} catch (error: any) {
|
||||
console.error('Error setting scheduler mode:', error);
|
||||
res.status(500).json({ error: 'Failed to set scheduler mode' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/schedule/due
|
||||
* Get stores that are due for orchestration
|
||||
*/
|
||||
router.get('/due', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const limit = parseInt(req.query.limit as string) || 10;
|
||||
const storeIds = await getStoresDueForOrchestration(Math.min(limit, 50));
|
||||
res.json({ stores_due: storeIds, count: storeIds.length });
|
||||
} catch (error: any) {
|
||||
console.error('Error getting stores due for orchestration:', error);
|
||||
res.status(500).json({ error: 'Failed to get stores due' });
|
||||
}
|
||||
});
|
||||
|
||||
// ============================================
|
||||
// Dispensary Schedule Endpoints (NEW - dispensary-centric)
|
||||
// ============================================
|
||||
|
||||
/**
|
||||
* GET /api/schedule/dispensaries
|
||||
* Get all dispensary schedule statuses with optional filters
|
||||
* Query params:
|
||||
* - state: filter by state (e.g., 'AZ')
|
||||
* - search: search by name or slug
|
||||
*/
|
||||
router.get('/dispensaries', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const { state, search } = req.query;
|
||||
|
||||
// Build dynamic query with optional filters
|
||||
const conditions: string[] = [];
|
||||
const params: any[] = [];
|
||||
let paramIndex = 1;
|
||||
|
||||
if (state) {
|
||||
conditions.push(`d.state = $${paramIndex}`);
|
||||
params.push(state);
|
||||
paramIndex++;
|
||||
}
|
||||
|
||||
if (search) {
|
||||
conditions.push(`(d.name ILIKE $${paramIndex} OR d.slug ILIKE $${paramIndex})`);
|
||||
params.push(`%${search}%`);
|
||||
paramIndex++;
|
||||
}
|
||||
|
||||
const whereClause = conditions.length > 0 ? `WHERE ${conditions.join(' AND ')}` : '';
|
||||
|
||||
const query = `
|
||||
SELECT
|
||||
d.id AS dispensary_id,
|
||||
d.name AS dispensary_name,
|
||||
d.slug AS dispensary_slug,
|
||||
d.city,
|
||||
d.state,
|
||||
d.menu_url,
|
||||
d.menu_type,
|
||||
d.platform_dispensary_id,
|
||||
d.scrape_enabled,
|
||||
d.last_crawl_at,
|
||||
d.crawl_status,
|
||||
d.product_crawler_mode,
|
||||
d.product_provider,
|
||||
cs.interval_minutes,
|
||||
cs.is_active,
|
||||
cs.priority,
|
||||
cs.last_run_at,
|
||||
cs.next_run_at,
|
||||
cs.last_status AS schedule_last_status,
|
||||
cs.last_error AS schedule_last_error,
|
||||
cs.consecutive_failures,
|
||||
j.id AS latest_job_id,
|
||||
j.status AS latest_job_status,
|
||||
j.job_type AS latest_job_type,
|
||||
j.started_at AS latest_job_started,
|
||||
j.completed_at AS latest_job_completed,
|
||||
j.products_found AS latest_products_found,
|
||||
j.products_new AS latest_products_created,
|
||||
j.products_updated AS latest_products_updated,
|
||||
j.error_message AS latest_job_error,
|
||||
CASE
|
||||
WHEN d.menu_type = 'dutchie' AND d.platform_dispensary_id IS NOT NULL THEN true
|
||||
ELSE false
|
||||
END AS can_crawl,
|
||||
CASE
|
||||
WHEN d.menu_type IS NULL OR d.menu_type = 'unknown' THEN 'menu_type not detected'
|
||||
WHEN d.menu_type != 'dutchie' THEN 'not dutchie platform'
|
||||
WHEN d.platform_dispensary_id IS NULL THEN 'platform ID not resolved'
|
||||
WHEN d.scrape_enabled = false THEN 'scraping disabled'
|
||||
ELSE 'ready'
|
||||
END AS schedule_status_reason
|
||||
FROM public.dispensaries d
|
||||
LEFT JOIN public.dispensary_crawl_schedule cs ON cs.dispensary_id = d.id
|
||||
LEFT JOIN LATERAL (
|
||||
SELECT *
|
||||
FROM public.dispensary_crawl_jobs dj
|
||||
WHERE dj.dispensary_id = d.id
|
||||
ORDER BY dj.created_at DESC
|
||||
LIMIT 1
|
||||
) j ON true
|
||||
${whereClause}
|
||||
ORDER BY cs.priority DESC NULLS LAST, d.name
|
||||
`;
|
||||
|
||||
const result = await pool.query(query, params);
|
||||
res.json({ dispensaries: result.rows });
|
||||
} catch (error: any) {
|
||||
console.error('Error fetching dispensary schedules:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch dispensary schedules' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/schedule/dispensaries/:id
|
||||
* Get schedule for a specific dispensary
|
||||
*/
|
||||
router.get('/dispensaries/:id', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const dispensaryId = parseInt(req.params.id);
|
||||
if (isNaN(dispensaryId)) {
|
||||
return res.status(400).json({ error: 'Invalid dispensary ID' });
|
||||
}
|
||||
|
||||
const result = await pool.query(`
|
||||
SELECT * FROM dispensary_crawl_status
|
||||
WHERE dispensary_id = $1
|
||||
`, [dispensaryId]);
|
||||
|
||||
if (result.rows.length === 0) {
|
||||
return res.status(404).json({ error: 'Dispensary not found' });
|
||||
}
|
||||
|
||||
res.json({ schedule: result.rows[0] });
|
||||
} catch (error: any) {
|
||||
console.error('Error fetching dispensary schedule:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch dispensary schedule' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* PUT /api/schedule/dispensaries/:id
|
||||
* Update schedule for a specific dispensary
|
||||
*/
|
||||
router.put('/dispensaries/:id', requireRole('superadmin', 'admin'), async (req: Request, res: Response) => {
|
||||
try {
|
||||
const dispensaryId = parseInt(req.params.id);
|
||||
if (isNaN(dispensaryId)) {
|
||||
return res.status(400).json({ error: 'Invalid dispensary ID' });
|
||||
}
|
||||
|
||||
const {
|
||||
is_active,
|
||||
interval_minutes,
|
||||
priority
|
||||
} = req.body;
|
||||
|
||||
// Upsert schedule
|
||||
const result = await pool.query(`
|
||||
INSERT INTO dispensary_crawl_schedule (dispensary_id, is_active, interval_minutes, priority)
|
||||
VALUES ($1, COALESCE($2, TRUE), COALESCE($3, 240), COALESCE($4, 0))
|
||||
ON CONFLICT (dispensary_id) DO UPDATE SET
|
||||
is_active = COALESCE($2, dispensary_crawl_schedule.is_active),
|
||||
interval_minutes = COALESCE($3, dispensary_crawl_schedule.interval_minutes),
|
||||
priority = COALESCE($4, dispensary_crawl_schedule.priority),
|
||||
updated_at = NOW()
|
||||
RETURNING *
|
||||
`, [dispensaryId, is_active, interval_minutes, priority]);
|
||||
|
||||
res.json({ schedule: result.rows[0] });
|
||||
} catch (error: any) {
|
||||
console.error('Error updating dispensary schedule:', error);
|
||||
res.status(500).json({ error: 'Failed to update dispensary schedule' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/schedule/dispensary-jobs
|
||||
* Get recent dispensary crawl jobs
|
||||
*/
|
||||
router.get('/dispensary-jobs', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const limit = parseInt(req.query.limit as string) || 50;
|
||||
const result = await pool.query(`
|
||||
SELECT dcj.*, d.name as dispensary_name
|
||||
FROM dispensary_crawl_jobs dcj
|
||||
JOIN dispensaries d ON d.id = dcj.dispensary_id
|
||||
ORDER BY dcj.created_at DESC
|
||||
LIMIT $1
|
||||
`, [Math.min(limit, 200)]);
|
||||
res.json({ jobs: result.rows });
|
||||
} catch (error: any) {
|
||||
console.error('Error fetching dispensary jobs:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch dispensary jobs' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/schedule/dispensary-jobs/:dispensaryId
|
||||
* Get recent jobs for a specific dispensary
|
||||
*/
|
||||
router.get('/dispensary-jobs/:dispensaryId', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const dispensaryId = parseInt(req.params.dispensaryId);
|
||||
if (isNaN(dispensaryId)) {
|
||||
return res.status(400).json({ error: 'Invalid dispensary ID' });
|
||||
}
|
||||
|
||||
const limit = parseInt(req.query.limit as string) || 10;
|
||||
const result = await pool.query(`
|
||||
SELECT dcj.*, d.name as dispensary_name
|
||||
FROM dispensary_crawl_jobs dcj
|
||||
JOIN dispensaries d ON d.id = dcj.dispensary_id
|
||||
WHERE dcj.dispensary_id = $1
|
||||
ORDER BY dcj.created_at DESC
|
||||
LIMIT $2
|
||||
`, [dispensaryId, Math.min(limit, 100)]);
|
||||
|
||||
res.json({ jobs: result.rows });
|
||||
} catch (error: any) {
|
||||
console.error('Error fetching dispensary jobs:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch dispensary jobs' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* POST /api/schedule/trigger/dispensary/:id
|
||||
* Trigger orchestrator for a specific dispensary (Run Now button)
|
||||
*/
|
||||
router.post('/trigger/dispensary/:id', requireRole('superadmin', 'admin'), async (req: Request, res: Response) => {
|
||||
try {
|
||||
const dispensaryId = parseInt(req.params.id);
|
||||
if (isNaN(dispensaryId)) {
|
||||
return res.status(400).json({ error: 'Invalid dispensary ID' });
|
||||
}
|
||||
|
||||
// Run the dispensary orchestrator
|
||||
const result = await runDispensaryOrchestrator(dispensaryId);
|
||||
|
||||
res.json({
|
||||
result,
|
||||
message: result.summary,
|
||||
success: result.status === 'success' || result.status === 'sandbox_only' || result.status === 'detection_only',
|
||||
});
|
||||
} catch (error: any) {
|
||||
console.error('Error triggering dispensary orchestrator:', error);
|
||||
res.status(500).json({ error: 'Failed to trigger orchestrator' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* POST /api/schedule/trigger/dispensaries/batch
|
||||
* Trigger orchestrator for multiple dispensaries
|
||||
*/
|
||||
router.post('/trigger/dispensaries/batch', requireRole('superadmin', 'admin'), async (req: Request, res: Response) => {
|
||||
try {
|
||||
const { dispensary_ids, concurrency } = req.body;
|
||||
|
||||
if (!Array.isArray(dispensary_ids) || dispensary_ids.length === 0) {
|
||||
return res.status(400).json({ error: 'dispensary_ids must be a non-empty array' });
|
||||
}
|
||||
|
||||
const results = await runBatchDispensaryOrchestrator(
|
||||
dispensary_ids,
|
||||
concurrency || 3
|
||||
);
|
||||
|
||||
const summary = {
|
||||
total: results.length,
|
||||
success: results.filter(r => r.status === 'success').length,
|
||||
sandbox_only: results.filter(r => r.status === 'sandbox_only').length,
|
||||
detection_only: results.filter(r => r.status === 'detection_only').length,
|
||||
error: results.filter(r => r.status === 'error').length,
|
||||
};
|
||||
|
||||
res.json({ results, summary });
|
||||
} catch (error: any) {
|
||||
console.error('Error triggering batch orchestrator:', error);
|
||||
res.status(500).json({ error: 'Failed to trigger batch orchestrator' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/schedule/dispensary-due
|
||||
* Get dispensaries that are due for orchestration
|
||||
*/
|
||||
router.get('/dispensary-due', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const limit = parseInt(req.query.limit as string) || 10;
|
||||
const dispensaryIds = await getDispensariesDueForOrchestration(Math.min(limit, 50));
|
||||
|
||||
// Get details for the due dispensaries
|
||||
if (dispensaryIds.length > 0) {
|
||||
const details = await pool.query(`
|
||||
SELECT d.id, d.name, d.product_provider, d.product_crawler_mode,
|
||||
dcs.next_run_at, dcs.last_status, dcs.priority
|
||||
FROM dispensaries d
|
||||
LEFT JOIN dispensary_crawl_schedule dcs ON dcs.dispensary_id = d.id
|
||||
WHERE d.id = ANY($1)
|
||||
ORDER BY COALESCE(dcs.priority, 0) DESC, dcs.last_run_at ASC NULLS FIRST
|
||||
`, [dispensaryIds]);
|
||||
|
||||
res.json({ dispensaries_due: details.rows, count: dispensaryIds.length });
|
||||
} else {
|
||||
res.json({ dispensaries_due: [], count: 0 });
|
||||
}
|
||||
} catch (error: any) {
|
||||
console.error('Error getting dispensaries due for orchestration:', error);
|
||||
res.status(500).json({ error: 'Failed to get dispensaries due' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* POST /api/schedule/dispensaries/bootstrap
|
||||
* Ensure all dispensaries have schedule entries
|
||||
*/
|
||||
router.post('/dispensaries/bootstrap', requireRole('superadmin', 'admin'), async (req: Request, res: Response) => {
|
||||
try {
|
||||
const { interval_minutes } = req.body;
|
||||
|
||||
const result = await ensureAllDispensariesHaveSchedules(interval_minutes || 240);
|
||||
|
||||
res.json({
|
||||
message: `Created ${result.created} new schedules, ${result.existing} already existed`,
|
||||
created: result.created,
|
||||
existing: result.existing,
|
||||
});
|
||||
} catch (error: any) {
|
||||
console.error('Error bootstrapping dispensary schedules:', error);
|
||||
res.status(500).json({ error: 'Failed to bootstrap schedules' });
|
||||
}
|
||||
});
|
||||
|
||||
// ============================================
|
||||
// Platform ID & Menu Type Detection Endpoints
|
||||
// ============================================
|
||||
|
||||
/**
|
||||
* POST /api/schedule/dispensaries/:id/resolve-platform-id
|
||||
* Resolve the Dutchie platform_dispensary_id from menu_url slug
|
||||
*/
|
||||
router.post('/dispensaries/:id/resolve-platform-id', requireRole('superadmin', 'admin'), async (req: Request, res: Response) => {
|
||||
try {
|
||||
const dispensaryId = parseInt(req.params.id);
|
||||
if (isNaN(dispensaryId)) {
|
||||
return res.status(400).json({ error: 'Invalid dispensary ID' });
|
||||
}
|
||||
|
||||
// Get dispensary info
|
||||
const dispensaryResult = await pool.query(`
|
||||
SELECT id, name, slug, menu_url, menu_type, platform_dispensary_id
|
||||
FROM dispensaries WHERE id = $1
|
||||
`, [dispensaryId]);
|
||||
|
||||
if (dispensaryResult.rows.length === 0) {
|
||||
return res.status(404).json({ error: 'Dispensary not found' });
|
||||
}
|
||||
|
||||
const dispensary = dispensaryResult.rows[0];
|
||||
|
||||
// Check if already resolved
|
||||
if (dispensary.platform_dispensary_id) {
|
||||
return res.json({
|
||||
success: true,
|
||||
message: 'Platform ID already resolved',
|
||||
platform_dispensary_id: dispensary.platform_dispensary_id,
|
||||
already_resolved: true
|
||||
});
|
||||
}
|
||||
|
||||
// Extract slug from menu_url for Dutchie URLs
|
||||
let slugToResolve = dispensary.slug;
|
||||
if (dispensary.menu_url) {
|
||||
// Match embedded-menu or dispensary URLs
|
||||
const match = dispensary.menu_url.match(/(?:embedded-menu|dispensar(?:y|ies))\/([^\/\?#]+)/i);
|
||||
if (match) {
|
||||
slugToResolve = match[1];
|
||||
}
|
||||
}
|
||||
|
||||
if (!slugToResolve) {
|
||||
return res.status(400).json({
|
||||
error: 'No slug available to resolve platform ID',
|
||||
menu_url: dispensary.menu_url
|
||||
});
|
||||
}
|
||||
|
||||
console.log(`[Schedule] Resolving platform ID for ${dispensary.name} using slug: ${slugToResolve}`);
|
||||
|
||||
// Resolve platform ID using GraphQL client
|
||||
const platformId = await resolveDispensaryId(slugToResolve);
|
||||
|
||||
if (!platformId) {
|
||||
return res.status(404).json({
|
||||
error: 'Could not resolve platform ID',
|
||||
slug_tried: slugToResolve,
|
||||
message: 'The dispensary might not be on Dutchie or the slug is incorrect'
|
||||
});
|
||||
}
|
||||
|
||||
// Update the dispensary with resolved platform ID
|
||||
await pool.query(`
|
||||
UPDATE dispensaries
|
||||
SET platform_dispensary_id = $1,
|
||||
menu_type = COALESCE(menu_type, 'dutchie'),
|
||||
updated_at = NOW()
|
||||
WHERE id = $2
|
||||
`, [platformId, dispensaryId]);
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
platform_dispensary_id: platformId,
|
||||
slug_resolved: slugToResolve,
|
||||
message: `Platform ID resolved: ${platformId}`
|
||||
});
|
||||
} catch (error: any) {
|
||||
console.error('Error resolving platform ID:', error);
|
||||
res.status(500).json({ error: 'Failed to resolve platform ID', details: error.message });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* POST /api/schedule/dispensaries/:id/detect-menu-type
|
||||
* Detect menu type from menu_url
|
||||
*/
|
||||
router.post('/dispensaries/:id/detect-menu-type', requireRole('superadmin', 'admin'), async (req: Request, res: Response) => {
|
||||
try {
|
||||
const dispensaryId = parseInt(req.params.id);
|
||||
if (isNaN(dispensaryId)) {
|
||||
return res.status(400).json({ error: 'Invalid dispensary ID' });
|
||||
}
|
||||
|
||||
// Get dispensary info
|
||||
const dispensaryResult = await pool.query(`
|
||||
SELECT id, name, menu_url, website FROM dispensaries WHERE id = $1
|
||||
`, [dispensaryId]);
|
||||
|
||||
if (dispensaryResult.rows.length === 0) {
|
||||
return res.status(404).json({ error: 'Dispensary not found' });
|
||||
}
|
||||
|
||||
const dispensary = dispensaryResult.rows[0];
|
||||
const urlToCheck = dispensary.menu_url || dispensary.website;
|
||||
|
||||
if (!urlToCheck) {
|
||||
return res.status(400).json({ error: 'No menu_url or website to detect from' });
|
||||
}
|
||||
|
||||
// Detect menu type from URL patterns
|
||||
let detectedType: string = 'unknown';
|
||||
|
||||
if (urlToCheck.includes('dutchie.com') || urlToCheck.includes('embedded-menu')) {
|
||||
detectedType = 'dutchie';
|
||||
} else if (urlToCheck.includes('iheartjane.com') || urlToCheck.includes('jane.co')) {
|
||||
detectedType = 'jane';
|
||||
} else if (urlToCheck.includes('weedmaps.com')) {
|
||||
detectedType = 'weedmaps';
|
||||
} else if (urlToCheck.includes('leafly.com')) {
|
||||
detectedType = 'leafly';
|
||||
} else if (urlToCheck.includes('treez.io') || urlToCheck.includes('treez.co')) {
|
||||
detectedType = 'treez';
|
||||
} else if (urlToCheck.includes('meadow.com')) {
|
||||
detectedType = 'meadow';
|
||||
} else if (urlToCheck.includes('blaze.me') || urlToCheck.includes('blazepay')) {
|
||||
detectedType = 'blaze';
|
||||
} else if (urlToCheck.includes('flowhub.com')) {
|
||||
detectedType = 'flowhub';
|
||||
} else if (urlToCheck.includes('dispense.app')) {
|
||||
detectedType = 'dispense';
|
||||
} else if (urlToCheck.includes('covasoft.com')) {
|
||||
detectedType = 'cova';
|
||||
}
|
||||
|
||||
// Update menu_type
|
||||
await pool.query(`
|
||||
UPDATE dispensaries
|
||||
SET menu_type = $1, updated_at = NOW()
|
||||
WHERE id = $2
|
||||
`, [detectedType, dispensaryId]);
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
menu_type: detectedType,
|
||||
url_checked: urlToCheck,
|
||||
message: `Menu type detected: ${detectedType}`
|
||||
});
|
||||
} catch (error: any) {
|
||||
console.error('Error detecting menu type:', error);
|
||||
res.status(500).json({ error: 'Failed to detect menu type' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* POST /api/schedule/dispensaries/:id/refresh-detection
|
||||
* Combined: detect menu_type AND resolve platform_dispensary_id if dutchie
|
||||
*/
|
||||
router.post('/dispensaries/:id/refresh-detection', requireRole('superadmin', 'admin'), async (req: Request, res: Response) => {
|
||||
try {
|
||||
const dispensaryId = parseInt(req.params.id);
|
||||
if (isNaN(dispensaryId)) {
|
||||
return res.status(400).json({ error: 'Invalid dispensary ID' });
|
||||
}
|
||||
|
||||
// Get dispensary info
|
||||
const dispensaryResult = await pool.query(`
|
||||
SELECT id, name, slug, menu_url, website FROM dispensaries WHERE id = $1
|
||||
`, [dispensaryId]);
|
||||
|
||||
if (dispensaryResult.rows.length === 0) {
|
||||
return res.status(404).json({ error: 'Dispensary not found' });
|
||||
}
|
||||
|
||||
const dispensary = dispensaryResult.rows[0];
|
||||
const urlToCheck = dispensary.menu_url || dispensary.website;
|
||||
|
||||
if (!urlToCheck) {
|
||||
return res.status(400).json({ error: 'No menu_url or website to detect from' });
|
||||
}
|
||||
|
||||
// Detect menu type from URL patterns
|
||||
let detectedType: string = 'unknown';
|
||||
|
||||
if (urlToCheck.includes('dutchie.com') || urlToCheck.includes('embedded-menu')) {
|
||||
detectedType = 'dutchie';
|
||||
} else if (urlToCheck.includes('iheartjane.com') || urlToCheck.includes('jane.co')) {
|
||||
detectedType = 'jane';
|
||||
} else if (urlToCheck.includes('weedmaps.com')) {
|
||||
detectedType = 'weedmaps';
|
||||
} else if (urlToCheck.includes('leafly.com')) {
|
||||
detectedType = 'leafly';
|
||||
} else if (urlToCheck.includes('treez.io') || urlToCheck.includes('treez.co')) {
|
||||
detectedType = 'treez';
|
||||
} else if (urlToCheck.includes('meadow.com')) {
|
||||
detectedType = 'meadow';
|
||||
} else if (urlToCheck.includes('blaze.me') || urlToCheck.includes('blazepay')) {
|
||||
detectedType = 'blaze';
|
||||
} else if (urlToCheck.includes('flowhub.com')) {
|
||||
detectedType = 'flowhub';
|
||||
} else if (urlToCheck.includes('dispense.app')) {
|
||||
detectedType = 'dispense';
|
||||
} else if (urlToCheck.includes('covasoft.com')) {
|
||||
detectedType = 'cova';
|
||||
}
|
||||
|
||||
// Update menu_type first
|
||||
await pool.query(`
|
||||
UPDATE dispensaries SET menu_type = $1, updated_at = NOW() WHERE id = $2
|
||||
`, [detectedType, dispensaryId]);
|
||||
|
||||
let platformId: string | null = null;
|
||||
|
||||
// If dutchie, also try to resolve platform ID
|
||||
if (detectedType === 'dutchie') {
|
||||
let slugToResolve = dispensary.slug;
|
||||
const match = urlToCheck.match(/(?:embedded-menu|dispensar(?:y|ies))\/([^\/\?#]+)/i);
|
||||
if (match) {
|
||||
slugToResolve = match[1];
|
||||
}
|
||||
|
||||
if (slugToResolve) {
|
||||
try {
|
||||
console.log(`[Schedule] Resolving platform ID for ${dispensary.name} using slug: ${slugToResolve}`);
|
||||
platformId = await resolveDispensaryId(slugToResolve);
|
||||
|
||||
if (platformId) {
|
||||
await pool.query(`
|
||||
UPDATE dispensaries SET platform_dispensary_id = $1, updated_at = NOW() WHERE id = $2
|
||||
`, [platformId, dispensaryId]);
|
||||
}
|
||||
} catch (err: any) {
|
||||
console.warn(`[Schedule] Failed to resolve platform ID: ${err.message}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
menu_type: detectedType,
|
||||
platform_dispensary_id: platformId,
|
||||
url_checked: urlToCheck,
|
||||
can_crawl: detectedType === 'dutchie' && !!platformId
|
||||
});
|
||||
} catch (error: any) {
|
||||
console.error('Error refreshing detection:', error);
|
||||
res.status(500).json({ error: 'Failed to refresh detection' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* PUT /api/schedule/dispensaries/:id/toggle-active
|
||||
* Enable or disable schedule for a dispensary
|
||||
*/
|
||||
router.put('/dispensaries/:id/toggle-active', requireRole('superadmin', 'admin'), async (req: Request, res: Response) => {
|
||||
try {
|
||||
const dispensaryId = parseInt(req.params.id);
|
||||
if (isNaN(dispensaryId)) {
|
||||
return res.status(400).json({ error: 'Invalid dispensary ID' });
|
||||
}
|
||||
|
||||
const { is_active } = req.body;
|
||||
|
||||
// Upsert schedule with new is_active value
|
||||
const result = await pool.query(`
|
||||
INSERT INTO dispensary_crawl_schedule (dispensary_id, is_active, interval_minutes, priority)
|
||||
VALUES ($1, $2, 240, 0)
|
||||
ON CONFLICT (dispensary_id) DO UPDATE SET
|
||||
is_active = $2,
|
||||
updated_at = NOW()
|
||||
RETURNING *
|
||||
`, [dispensaryId, is_active]);
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
schedule: result.rows[0],
|
||||
message: is_active ? 'Schedule enabled' : 'Schedule disabled'
|
||||
});
|
||||
} catch (error: any) {
|
||||
console.error('Error toggling schedule active status:', error);
|
||||
res.status(500).json({ error: 'Failed to toggle schedule' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* DELETE /api/schedule/dispensaries/:id/schedule
|
||||
* Delete schedule for a dispensary
|
||||
*/
|
||||
router.delete('/dispensaries/:id/schedule', requireRole('superadmin', 'admin'), async (req: Request, res: Response) => {
|
||||
try {
|
||||
const dispensaryId = parseInt(req.params.id);
|
||||
if (isNaN(dispensaryId)) {
|
||||
return res.status(400).json({ error: 'Invalid dispensary ID' });
|
||||
}
|
||||
|
||||
const result = await pool.query(`
|
||||
DELETE FROM dispensary_crawl_schedule WHERE dispensary_id = $1 RETURNING id
|
||||
`, [dispensaryId]);
|
||||
|
||||
const deleted = (result.rowCount ?? 0) > 0;
|
||||
res.json({
|
||||
success: true,
|
||||
deleted,
|
||||
message: deleted ? 'Schedule deleted' : 'No schedule to delete'
|
||||
});
|
||||
} catch (error: any) {
|
||||
console.error('Error deleting schedule:', error);
|
||||
res.status(500).json({ error: 'Failed to delete schedule' });
|
||||
}
|
||||
});
|
||||
|
||||
export default router;
|
||||
@@ -1,53 +1,39 @@
|
||||
/**
|
||||
* Stores API Routes
|
||||
*
|
||||
* NOTE: "Store" and "Dispensary" are synonyms in CannaiQ.
|
||||
* - This file handles `/api/stores` endpoints
|
||||
* - The DB table is `dispensaries` (NOT `stores`)
|
||||
* - Use these terms interchangeably
|
||||
* - `/api/stores` and `/api/dispensaries` both work
|
||||
*/
|
||||
import { Router } from 'express';
|
||||
import { authMiddleware, requireRole } from '../auth/middleware';
|
||||
import { pool } from '../db/pool';
|
||||
import { scrapeStore, scrapeCategory, discoverCategories } from '../scraper-v2';
|
||||
|
||||
const router = Router();
|
||||
router.use(authMiddleware);
|
||||
|
||||
// Get all stores
|
||||
router.get('/', async (req, res) => {
|
||||
try {
|
||||
const result = await pool.query(`
|
||||
SELECT
|
||||
s.*,
|
||||
COUNT(DISTINCT p.id) as product_count,
|
||||
COUNT(DISTINCT c.id) as category_count
|
||||
FROM stores s
|
||||
LEFT JOIN products p ON s.id = p.store_id
|
||||
LEFT JOIN categories c ON s.id = c.store_id
|
||||
GROUP BY s.id
|
||||
ORDER BY s.name
|
||||
`);
|
||||
|
||||
res.json({ stores: result.rows });
|
||||
} catch (error) {
|
||||
console.error('Error fetching stores:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch stores' });
|
||||
}
|
||||
});
|
||||
|
||||
// Freshness threshold in hours
|
||||
const STALE_THRESHOLD_HOURS = 4;
|
||||
|
||||
function calculateFreshness(lastScrapedAt: Date | null): {
|
||||
last_scraped_at: string | null;
|
||||
function calculateFreshness(lastCrawlAt: Date | null): {
|
||||
last_crawl_at: string | null;
|
||||
is_stale: boolean;
|
||||
freshness: string;
|
||||
hours_since_scrape: number | null;
|
||||
hours_since_crawl: number | null;
|
||||
} {
|
||||
if (!lastScrapedAt) {
|
||||
if (!lastCrawlAt) {
|
||||
return {
|
||||
last_scraped_at: null,
|
||||
last_crawl_at: null,
|
||||
is_stale: true,
|
||||
freshness: 'Never scraped',
|
||||
hours_since_scrape: null
|
||||
freshness: 'Never crawled',
|
||||
hours_since_crawl: null
|
||||
};
|
||||
}
|
||||
|
||||
const now = new Date();
|
||||
const diffMs = now.getTime() - lastScrapedAt.getTime();
|
||||
const diffMs = now.getTime() - lastCrawlAt.getTime();
|
||||
const diffHours = diffMs / (1000 * 60 * 60);
|
||||
const isStale = diffHours > STALE_THRESHOLD_HOURS;
|
||||
|
||||
@@ -64,49 +50,123 @@ function calculateFreshness(lastScrapedAt: Date | null): {
|
||||
}
|
||||
|
||||
return {
|
||||
last_scraped_at: lastScrapedAt.toISOString(),
|
||||
last_crawl_at: lastCrawlAt.toISOString(),
|
||||
is_stale: isStale,
|
||||
freshness: freshnessText,
|
||||
hours_since_scrape: Math.round(diffHours * 10) / 10
|
||||
hours_since_crawl: Math.round(diffHours * 10) / 10
|
||||
};
|
||||
}
|
||||
|
||||
function detectProvider(dutchieUrl: string | null): string {
|
||||
if (!dutchieUrl) return 'unknown';
|
||||
if (dutchieUrl.includes('dutchie.com')) return 'Dutchie';
|
||||
if (dutchieUrl.includes('iheartjane.com') || dutchieUrl.includes('jane.co')) return 'Jane';
|
||||
if (dutchieUrl.includes('treez.io')) return 'Treez';
|
||||
if (dutchieUrl.includes('weedmaps.com')) return 'Weedmaps';
|
||||
if (dutchieUrl.includes('leafly.com')) return 'Leafly';
|
||||
function detectProvider(menuUrl: string | null): string {
|
||||
if (!menuUrl) return 'unknown';
|
||||
if (menuUrl.includes('dutchie.com')) return 'Dutchie';
|
||||
if (menuUrl.includes('iheartjane.com') || menuUrl.includes('jane.co')) return 'Jane';
|
||||
if (menuUrl.includes('treez.io')) return 'Treez';
|
||||
if (menuUrl.includes('weedmaps.com')) return 'Weedmaps';
|
||||
if (menuUrl.includes('leafly.com')) return 'Leafly';
|
||||
return 'Custom';
|
||||
}
|
||||
|
||||
// Get single store with full details
|
||||
// Get all stores (from dispensaries table)
|
||||
router.get('/', async (req, res) => {
|
||||
try {
|
||||
const { city, state, menu_type } = req.query;
|
||||
|
||||
let query = `
|
||||
SELECT
|
||||
id,
|
||||
name,
|
||||
slug,
|
||||
city,
|
||||
state,
|
||||
address,
|
||||
zip,
|
||||
phone,
|
||||
website,
|
||||
latitude,
|
||||
longitude,
|
||||
menu_url,
|
||||
menu_type,
|
||||
platform,
|
||||
platform_dispensary_id,
|
||||
product_count,
|
||||
last_crawl_at,
|
||||
created_at,
|
||||
updated_at
|
||||
FROM dispensaries
|
||||
`;
|
||||
|
||||
const params: any[] = [];
|
||||
const conditions: string[] = [];
|
||||
|
||||
if (city) {
|
||||
conditions.push(`city ILIKE $${params.length + 1}`);
|
||||
params.push(city);
|
||||
}
|
||||
|
||||
if (state) {
|
||||
conditions.push(`state = $${params.length + 1}`);
|
||||
params.push(state);
|
||||
}
|
||||
|
||||
if (menu_type) {
|
||||
conditions.push(`menu_type = $${params.length + 1}`);
|
||||
params.push(menu_type);
|
||||
}
|
||||
|
||||
if (conditions.length > 0) {
|
||||
query += ` WHERE ${conditions.join(' AND ')}`;
|
||||
}
|
||||
|
||||
query += ` ORDER BY name`;
|
||||
|
||||
const result = await pool.query(query, params);
|
||||
|
||||
// Add computed fields
|
||||
const stores = result.rows.map(row => ({
|
||||
...row,
|
||||
provider: detectProvider(row.menu_url),
|
||||
...calculateFreshness(row.last_crawl_at)
|
||||
}));
|
||||
|
||||
res.json({ stores });
|
||||
} catch (error) {
|
||||
console.error('Error fetching stores:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch stores' });
|
||||
}
|
||||
});
|
||||
|
||||
// Get single store by ID (from dispensaries table)
|
||||
router.get('/:id', async (req, res) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
|
||||
// Get store with counts and linked dispensary
|
||||
const result = await pool.query(`
|
||||
SELECT
|
||||
s.*,
|
||||
d.id as dispensary_id,
|
||||
d.name as dispensary_name,
|
||||
d.slug as dispensary_slug,
|
||||
d.state as dispensary_state,
|
||||
d.city as dispensary_city,
|
||||
d.address as dispensary_address,
|
||||
d.menu_provider as dispensary_menu_provider,
|
||||
COUNT(DISTINCT p.id) as product_count,
|
||||
COUNT(DISTINCT c.id) as category_count,
|
||||
COUNT(DISTINCT p.id) FILTER (WHERE p.in_stock = true) as in_stock_count,
|
||||
COUNT(DISTINCT p.id) FILTER (WHERE p.in_stock = false) as out_of_stock_count
|
||||
FROM stores s
|
||||
LEFT JOIN dispensaries d ON s.dispensary_id = d.id
|
||||
LEFT JOIN products p ON s.id = p.store_id
|
||||
LEFT JOIN categories c ON s.id = c.store_id
|
||||
WHERE s.id = $1
|
||||
GROUP BY s.id, d.id, d.name, d.slug, d.state, d.city, d.address, d.menu_provider
|
||||
id,
|
||||
name,
|
||||
slug,
|
||||
city,
|
||||
state,
|
||||
address,
|
||||
zip,
|
||||
phone,
|
||||
website,
|
||||
dba_name,
|
||||
company_name,
|
||||
latitude,
|
||||
longitude,
|
||||
menu_url,
|
||||
menu_type,
|
||||
platform,
|
||||
platform_dispensary_id,
|
||||
product_count,
|
||||
last_crawl_at,
|
||||
raw_metadata,
|
||||
created_at,
|
||||
updated_at
|
||||
FROM dispensaries
|
||||
WHERE id = $1
|
||||
`, [id]);
|
||||
|
||||
if (result.rows.length === 0) {
|
||||
@@ -115,62 +175,19 @@ router.get('/:id', async (req, res) => {
|
||||
|
||||
const store = result.rows[0];
|
||||
|
||||
// Get recent crawl jobs for this store
|
||||
const jobsResult = await pool.query(`
|
||||
SELECT
|
||||
id, status, job_type, trigger_type,
|
||||
started_at, completed_at,
|
||||
products_found, products_new, products_updated,
|
||||
in_stock_count, out_of_stock_count,
|
||||
error_message
|
||||
FROM crawl_jobs
|
||||
WHERE store_id = $1
|
||||
ORDER BY created_at DESC
|
||||
LIMIT 10
|
||||
`, [id]);
|
||||
|
||||
// Get schedule info if exists
|
||||
const scheduleResult = await pool.query(`
|
||||
SELECT
|
||||
enabled, interval_hours, next_run_at, last_run_at
|
||||
FROM store_crawl_schedule
|
||||
WHERE store_id = $1
|
||||
`, [id]);
|
||||
|
||||
// Calculate freshness
|
||||
const freshness = calculateFreshness(store.last_scraped_at);
|
||||
const freshness = calculateFreshness(store.last_crawl_at);
|
||||
|
||||
// Detect provider from URL
|
||||
const provider = detectProvider(store.dutchie_url);
|
||||
const provider = detectProvider(store.menu_url);
|
||||
|
||||
// Build response
|
||||
const response = {
|
||||
...store,
|
||||
provider,
|
||||
freshness: freshness.freshness,
|
||||
is_stale: freshness.is_stale,
|
||||
hours_since_scrape: freshness.hours_since_scrape,
|
||||
linked_dispensary: store.dispensary_id ? {
|
||||
id: store.dispensary_id,
|
||||
name: store.dispensary_name,
|
||||
slug: store.dispensary_slug,
|
||||
state: store.dispensary_state,
|
||||
city: store.dispensary_city,
|
||||
address: store.dispensary_address,
|
||||
menu_provider: store.dispensary_menu_provider
|
||||
} : null,
|
||||
schedule: scheduleResult.rows[0] || null,
|
||||
recent_jobs: jobsResult.rows
|
||||
...freshness,
|
||||
};
|
||||
|
||||
// Remove redundant dispensary fields from root
|
||||
delete response.dispensary_name;
|
||||
delete response.dispensary_slug;
|
||||
delete response.dispensary_state;
|
||||
delete response.dispensary_city;
|
||||
delete response.dispensary_address;
|
||||
delete response.dispensary_menu_provider;
|
||||
|
||||
res.json(response);
|
||||
} catch (error) {
|
||||
console.error('Error fetching store:', error);
|
||||
@@ -178,88 +195,101 @@ router.get('/:id', async (req, res) => {
|
||||
}
|
||||
});
|
||||
|
||||
// Get store brands
|
||||
router.get('/:id/brands', async (req, res) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
|
||||
const result = await pool.query(`
|
||||
SELECT name
|
||||
FROM brands
|
||||
WHERE store_id = $1
|
||||
ORDER BY name
|
||||
`, [id]);
|
||||
|
||||
const brands = result.rows.map((row: any) => row.name);
|
||||
res.json({ brands });
|
||||
} catch (error) {
|
||||
console.error('Error fetching store brands:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch store brands' });
|
||||
}
|
||||
});
|
||||
|
||||
// Get store specials
|
||||
router.get('/:id/specials', async (req, res) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
const { date } = req.query;
|
||||
|
||||
// Use provided date or today's date
|
||||
const queryDate = date || new Date().toISOString().split('T')[0];
|
||||
|
||||
const result = await pool.query(`
|
||||
SELECT
|
||||
s.*,
|
||||
p.name as product_name,
|
||||
p.image_url as product_image
|
||||
FROM specials s
|
||||
LEFT JOIN products p ON s.product_id = p.id
|
||||
WHERE s.store_id = $1 AND s.valid_date = $2
|
||||
ORDER BY s.name
|
||||
`, [id, queryDate]);
|
||||
|
||||
res.json({ specials: result.rows, date: queryDate });
|
||||
} catch (error) {
|
||||
console.error('Error fetching store specials:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch store specials' });
|
||||
}
|
||||
});
|
||||
|
||||
// Create store
|
||||
// Create store (into dispensaries table)
|
||||
router.post('/', requireRole('superadmin', 'admin'), async (req, res) => {
|
||||
try {
|
||||
const { name, slug, dutchie_url, active, scrape_enabled } = req.body;
|
||||
const {
|
||||
name,
|
||||
slug,
|
||||
city,
|
||||
state,
|
||||
address,
|
||||
zip,
|
||||
phone,
|
||||
website,
|
||||
menu_url,
|
||||
menu_type,
|
||||
platform,
|
||||
platform_dispensary_id,
|
||||
latitude,
|
||||
longitude
|
||||
} = req.body;
|
||||
|
||||
if (!name || !slug || !city || !state) {
|
||||
return res.status(400).json({ error: 'name, slug, city, and state are required' });
|
||||
}
|
||||
|
||||
const result = await pool.query(`
|
||||
INSERT INTO stores (name, slug, dutchie_url, active, scrape_enabled)
|
||||
VALUES ($1, $2, $3, $4, $5)
|
||||
INSERT INTO dispensaries (
|
||||
name, slug, city, state, address, zip, phone, website,
|
||||
menu_url, menu_type, platform, platform_dispensary_id,
|
||||
latitude, longitude, created_at, updated_at
|
||||
)
|
||||
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP)
|
||||
RETURNING *
|
||||
`, [name, slug, dutchie_url, active ?? true, scrape_enabled ?? true]);
|
||||
`, [
|
||||
name, slug, city, state, address, zip, phone, website,
|
||||
menu_url, menu_type, platform || 'dutchie', platform_dispensary_id,
|
||||
latitude, longitude
|
||||
]);
|
||||
|
||||
res.status(201).json(result.rows[0]);
|
||||
} catch (error) {
|
||||
} catch (error: any) {
|
||||
console.error('Error creating store:', error);
|
||||
res.status(500).json({ error: 'Failed to create store' });
|
||||
if (error.code === '23505') { // unique violation
|
||||
res.status(409).json({ error: 'Store with this slug already exists' });
|
||||
} else {
|
||||
res.status(500).json({ error: 'Failed to create store' });
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
// Update store
|
||||
// Update store (in dispensaries table)
|
||||
router.put('/:id', requireRole('superadmin', 'admin'), async (req, res) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
const { name, slug, dutchie_url, active, scrape_enabled } = req.body;
|
||||
const {
|
||||
name,
|
||||
slug,
|
||||
city,
|
||||
state,
|
||||
address,
|
||||
zip,
|
||||
phone,
|
||||
website,
|
||||
menu_url,
|
||||
menu_type,
|
||||
platform,
|
||||
platform_dispensary_id,
|
||||
latitude,
|
||||
longitude
|
||||
} = req.body;
|
||||
|
||||
const result = await pool.query(`
|
||||
UPDATE stores
|
||||
SET name = COALESCE($1, name),
|
||||
slug = COALESCE($2, slug),
|
||||
dutchie_url = COALESCE($3, dutchie_url),
|
||||
active = COALESCE($4, active),
|
||||
scrape_enabled = COALESCE($5, scrape_enabled),
|
||||
updated_at = CURRENT_TIMESTAMP
|
||||
WHERE id = $6
|
||||
UPDATE dispensaries
|
||||
SET
|
||||
name = COALESCE($1, name),
|
||||
slug = COALESCE($2, slug),
|
||||
city = COALESCE($3, city),
|
||||
state = COALESCE($4, state),
|
||||
address = COALESCE($5, address),
|
||||
zip = COALESCE($6, zip),
|
||||
phone = COALESCE($7, phone),
|
||||
website = COALESCE($8, website),
|
||||
menu_url = COALESCE($9, menu_url),
|
||||
menu_type = COALESCE($10, menu_type),
|
||||
platform = COALESCE($11, platform),
|
||||
platform_dispensary_id = COALESCE($12, platform_dispensary_id),
|
||||
latitude = COALESCE($13, latitude),
|
||||
longitude = COALESCE($14, longitude),
|
||||
updated_at = CURRENT_TIMESTAMP
|
||||
WHERE id = $15
|
||||
RETURNING *
|
||||
`, [name, slug, dutchie_url, active, scrape_enabled, id]);
|
||||
`, [
|
||||
name, slug, city, state, address, zip, phone, website,
|
||||
menu_url, menu_type, platform, platform_dispensary_id,
|
||||
latitude, longitude, id
|
||||
]);
|
||||
|
||||
if (result.rows.length === 0) {
|
||||
return res.status(404).json({ error: 'Store not found' });
|
||||
@@ -272,12 +302,12 @@ router.put('/:id', requireRole('superadmin', 'admin'), async (req, res) => {
|
||||
}
|
||||
});
|
||||
|
||||
// Delete store
|
||||
// Delete store (from dispensaries table)
|
||||
router.delete('/:id', requireRole('superadmin'), async (req, res) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
|
||||
const result = await pool.query('DELETE FROM stores WHERE id = $1 RETURNING *', [id]);
|
||||
const result = await pool.query('DELETE FROM dispensaries WHERE id = $1 RETURNING *', [id]);
|
||||
|
||||
if (result.rows.length === 0) {
|
||||
return res.status(404).json({ error: 'Store not found' });
|
||||
@@ -290,135 +320,55 @@ router.delete('/:id', requireRole('superadmin'), async (req, res) => {
|
||||
}
|
||||
});
|
||||
|
||||
// Trigger scrape for a store
|
||||
router.post('/:id/scrape', requireRole('superadmin', 'admin'), async (req, res) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
const { parallel = 3, userAgent } = req.body; // Default to 3 parallel scrapers
|
||||
|
||||
const storeResult = await pool.query('SELECT id FROM stores WHERE id = $1', [id]);
|
||||
if (storeResult.rows.length === 0) {
|
||||
return res.status(404).json({ error: 'Store not found' });
|
||||
}
|
||||
|
||||
scrapeStore(parseInt(id), parseInt(parallel), userAgent).catch(err => {
|
||||
console.error('Background scrape error:', err);
|
||||
});
|
||||
|
||||
res.json({
|
||||
message: 'Scrape started',
|
||||
parallel: parseInt(parallel),
|
||||
userAgent: userAgent || 'random'
|
||||
});
|
||||
} catch (error) {
|
||||
console.error('Error triggering scrape:', error);
|
||||
res.status(500).json({ error: 'Failed to trigger scrape' });
|
||||
}
|
||||
});
|
||||
|
||||
// Download missing images for a store
|
||||
router.post('/:id/download-images', requireRole('superadmin', 'admin'), async (req, res) => {
|
||||
// Get products for a store (uses dutchie_products table)
|
||||
router.get('/:id/products', async (req, res) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
|
||||
const storeResult = await pool.query('SELECT id, name FROM stores WHERE id = $1', [id]);
|
||||
if (storeResult.rows.length === 0) {
|
||||
return res.status(404).json({ error: 'Store not found' });
|
||||
}
|
||||
|
||||
const store = storeResult.rows[0];
|
||||
|
||||
const productsResult = await pool.query(`
|
||||
SELECT id, name, image_url
|
||||
FROM products
|
||||
WHERE store_id = $1
|
||||
AND image_url IS NOT NULL
|
||||
AND local_image_path IS NULL
|
||||
const result = await pool.query(`
|
||||
SELECT
|
||||
id,
|
||||
name,
|
||||
brand_name,
|
||||
type,
|
||||
subcategory,
|
||||
stock_status,
|
||||
thc_content,
|
||||
cbd_content,
|
||||
primary_image_url,
|
||||
external_product_id,
|
||||
created_at,
|
||||
updated_at
|
||||
FROM dutchie_products
|
||||
WHERE dispensary_id = $1
|
||||
ORDER BY name
|
||||
`, [id]);
|
||||
|
||||
(async () => {
|
||||
const { uploadImageFromUrl } = await import('../utils/minio');
|
||||
let downloaded = 0;
|
||||
|
||||
for (const product of productsResult.rows) {
|
||||
try {
|
||||
console.log(`📸 Downloading image for: ${product.name}`);
|
||||
const localPath = await uploadImageFromUrl(product.image_url, product.id);
|
||||
|
||||
await pool.query(`
|
||||
UPDATE products
|
||||
SET local_image_path = $1
|
||||
WHERE id = $2
|
||||
`, [localPath, product.id]);
|
||||
|
||||
downloaded++;
|
||||
} catch (error) {
|
||||
console.error(`Failed to download image for ${product.name}:`, error);
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`✅ Downloaded ${downloaded} of ${productsResult.rows.length} missing images for ${store.name}`);
|
||||
})().catch(err => console.error('Background image download error:', err));
|
||||
|
||||
res.json({
|
||||
message: 'Image download started',
|
||||
total_missing: productsResult.rows.length
|
||||
});
|
||||
res.json({ products: result.rows });
|
||||
} catch (error) {
|
||||
console.error('Error triggering image download:', error);
|
||||
res.status(500).json({ error: 'Failed to trigger image download' });
|
||||
console.error('Error fetching store products:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch products' });
|
||||
}
|
||||
});
|
||||
|
||||
// Discover categories for a store
|
||||
router.post('/:id/discover-categories', requireRole('superadmin', 'admin'), async (req, res) => {
|
||||
// Get brands for a store
|
||||
router.get('/:id/brands', async (req, res) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
|
||||
const storeResult = await pool.query('SELECT id FROM stores WHERE id = $1', [id]);
|
||||
if (storeResult.rows.length === 0) {
|
||||
return res.status(404).json({ error: 'Store not found' });
|
||||
}
|
||||
|
||||
discoverCategories(parseInt(id)).catch(err => {
|
||||
console.error('Background category discovery error:', err);
|
||||
});
|
||||
|
||||
res.json({ message: 'Category discovery started' });
|
||||
} catch (error) {
|
||||
console.error('Error triggering category discovery:', error);
|
||||
res.status(500).json({ error: 'Failed to trigger category discovery' });
|
||||
}
|
||||
});
|
||||
|
||||
// Debug scraper
|
||||
router.post('/:id/debug-scrape', requireRole('superadmin', 'admin'), async (req, res) => {
|
||||
try {
|
||||
const { id } = req.params;
|
||||
console.log('Debug scrape triggered for store:', id);
|
||||
|
||||
const categoryResult = await pool.query(`
|
||||
SELECT c.dutchie_url, c.name
|
||||
FROM categories c
|
||||
WHERE c.store_id = $1 AND c.slug = 'edibles'
|
||||
LIMIT 1
|
||||
const result = await pool.query(`
|
||||
SELECT DISTINCT brand_name as name, COUNT(*) as product_count
|
||||
FROM dutchie_products
|
||||
WHERE dispensary_id = $1 AND brand_name IS NOT NULL
|
||||
GROUP BY brand_name
|
||||
ORDER BY product_count DESC, brand_name
|
||||
`, [id]);
|
||||
|
||||
if (categoryResult.rows.length === 0) {
|
||||
return res.status(404).json({ error: 'Edibles category not found' });
|
||||
}
|
||||
|
||||
console.log('Found category:', categoryResult.rows[0]);
|
||||
|
||||
const { debugDutchiePage } = await import('../services/scraper-debug');
|
||||
debugDutchiePage(categoryResult.rows[0].dutchie_url).catch(err => {
|
||||
console.error('Debug error:', err);
|
||||
});
|
||||
|
||||
res.json({ message: 'Debug started, check logs', url: categoryResult.rows[0].dutchie_url });
|
||||
const brands = result.rows.map((row: any) => row.name);
|
||||
res.json({ brands, details: result.rows });
|
||||
} catch (error) {
|
||||
console.error('Debug endpoint error:', error);
|
||||
res.status(500).json({ error: 'Failed to debug' });
|
||||
console.error('Error fetching store brands:', error);
|
||||
res.status(500).json({ error: 'Failed to fetch store brands' });
|
||||
}
|
||||
});
|
||||
|
||||
|
||||
@@ -20,7 +20,7 @@
|
||||
*/
|
||||
|
||||
import { Router, Request, Response } from 'express';
|
||||
import { getPool } from '../dutchie-az/db/connection';
|
||||
import { pool } from '../db/pool';
|
||||
|
||||
const router = Router();
|
||||
|
||||
@@ -112,7 +112,7 @@ function extractRunRole(jobName: string, jobConfig: any): string {
|
||||
*/
|
||||
router.get('/', async (_req: Request, res: Response) => {
|
||||
try {
|
||||
const pool = getPool();
|
||||
// pool imported from db/pool
|
||||
const { rows } = await pool.query(`
|
||||
SELECT
|
||||
id,
|
||||
@@ -158,7 +158,7 @@ router.get('/', async (_req: Request, res: Response) => {
|
||||
*/
|
||||
router.get('/active', async (_req: Request, res: Response) => {
|
||||
try {
|
||||
const pool = getPool();
|
||||
// pool imported from db/pool
|
||||
const { rows } = await pool.query(`
|
||||
SELECT DISTINCT ON (claimed_by)
|
||||
claimed_by as worker_id,
|
||||
@@ -193,7 +193,7 @@ router.get('/active', async (_req: Request, res: Response) => {
|
||||
*/
|
||||
router.get('/schedule', async (req: Request, res: Response) => {
|
||||
// Delegate to main workers endpoint
|
||||
const pool = getPool();
|
||||
// pool imported from db/pool
|
||||
try {
|
||||
const { rows } = await pool.query(`
|
||||
SELECT
|
||||
@@ -223,7 +223,7 @@ router.get('/schedule', async (req: Request, res: Response) => {
|
||||
router.get('/:workerIdOrName', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const { workerIdOrName } = req.params;
|
||||
const pool = getPool();
|
||||
// pool imported from db/pool
|
||||
|
||||
// Try to find by ID or job_name
|
||||
const { rows } = await pool.query(`
|
||||
@@ -278,7 +278,7 @@ router.get('/:workerIdOrName', async (req: Request, res: Response) => {
|
||||
router.get('/:workerIdOrName/scope', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const { workerIdOrName } = req.params;
|
||||
const pool = getPool();
|
||||
// pool imported from db/pool
|
||||
|
||||
const { rows } = await pool.query(`
|
||||
SELECT job_config
|
||||
@@ -304,7 +304,7 @@ router.get('/:workerIdOrName/scope', async (req: Request, res: Response) => {
|
||||
router.get('/:workerIdOrName/stats', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const { workerIdOrName } = req.params;
|
||||
const pool = getPool();
|
||||
// pool imported from db/pool
|
||||
|
||||
// Get schedule info
|
||||
const scheduleResult = await pool.query(`
|
||||
@@ -357,7 +357,7 @@ router.get('/:workerIdOrName/logs', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const { workerIdOrName } = req.params;
|
||||
const limit = parseInt(req.query.limit as string) || 20;
|
||||
const pool = getPool();
|
||||
// pool imported from db/pool
|
||||
|
||||
// Get schedule info
|
||||
const scheduleResult = await pool.query(`
|
||||
@@ -419,7 +419,7 @@ router.get('/:workerIdOrName/logs', async (req: Request, res: Response) => {
|
||||
router.post('/:workerIdOrName/trigger', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const { workerIdOrName } = req.params;
|
||||
const pool = getPool();
|
||||
// pool imported from db/pool
|
||||
|
||||
// Get schedule info
|
||||
const scheduleResult = await pool.query(`
|
||||
@@ -458,7 +458,7 @@ router.get('/jobs', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const limit = parseInt(req.query.limit as string) || 50;
|
||||
const status = req.query.status as string | undefined;
|
||||
const pool = getPool();
|
||||
// pool imported from db/pool
|
||||
|
||||
let query = `
|
||||
SELECT
|
||||
@@ -518,7 +518,7 @@ router.get('/jobs', async (req: Request, res: Response) => {
|
||||
*/
|
||||
router.get('/active-jobs', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const pool = getPool();
|
||||
// pool imported from db/pool
|
||||
|
||||
const { rows } = await pool.query(`
|
||||
SELECT
|
||||
@@ -563,7 +563,7 @@ router.get('/active-jobs', async (req: Request, res: Response) => {
|
||||
*/
|
||||
router.get('/summary', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const pool = getPool();
|
||||
// pool imported from db/pool
|
||||
|
||||
// Get summary stats
|
||||
const [scheduleStats, jobStats, activeJobs] = await Promise.all([
|
||||
|
||||
@@ -38,10 +38,10 @@ export class NavigationDiscovery {
|
||||
logger.info('categories', `Starting category discovery for store ${storeId}`);
|
||||
|
||||
try {
|
||||
// Get store info
|
||||
// Get dispensary info (store = dispensary)
|
||||
const storeResult = await pool.query(`
|
||||
SELECT id, name, slug, dutchie_url
|
||||
FROM stores
|
||||
SELECT id, name, slug, menu_url as dutchie_url
|
||||
FROM dispensaries
|
||||
WHERE id = $1
|
||||
`, [storeId]);
|
||||
|
||||
|
||||
@@ -1,439 +0,0 @@
|
||||
// ============================================================================
|
||||
// DEPRECATED: This scraper writes to the LEGACY products table.
|
||||
// DO NOT USE - All Dutchie crawling must use the new dutchie-az pipeline.
|
||||
//
|
||||
// New pipeline location: src/dutchie-az/services/product-crawler.ts
|
||||
// - Uses fetch-based GraphQL (no Puppeteer needed)
|
||||
// - Writes to isolated dutchie_az_* tables with snapshot model
|
||||
// - Tracks stockStatus, isPresentInFeed, missing_from_feed
|
||||
// ============================================================================
|
||||
|
||||
/**
|
||||
* @deprecated DEPRECATED - Use src/dutchie-az/services/product-crawler.ts instead.
|
||||
* This scraper writes to the legacy products table, not the new dutchie_az tables.
|
||||
*
|
||||
* Makes direct GraphQL requests from within the browser context to:
|
||||
* 1. Bypass Cloudflare (using browser session)
|
||||
* 2. Fetch ALL products including out-of-stock (Status: null)
|
||||
* 3. Paginate through complete menu
|
||||
*/
|
||||
|
||||
import puppeteer from 'puppeteer-extra';
|
||||
import type { Browser, Page } from 'puppeteer';
|
||||
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
|
||||
import { Pool } from 'pg';
|
||||
import { DutchieProduct, NormalizedProduct, normalizeDutchieProduct } from './dutchie-graphql';
|
||||
|
||||
puppeteer.use(StealthPlugin());
|
||||
|
||||
// GraphQL persisted query hashes
|
||||
const GRAPHQL_HASHES = {
|
||||
FilteredProducts: 'ee29c060826dc41c527e470e9ae502c9b2c169720faa0a9f5d25e1b9a530a4a0',
|
||||
GetAddressBasedDispensaryData: '13461f73abf7268770dfd05fe7e10c523084b2bb916a929c08efe3d87531977b',
|
||||
};
|
||||
|
||||
interface FetchResult {
|
||||
products: DutchieProduct[];
|
||||
dispensaryId: string;
|
||||
totalProducts: number;
|
||||
activeCount: number;
|
||||
inactiveCount: number;
|
||||
}
|
||||
|
||||
/**
|
||||
* Fetch all products via in-page GraphQL requests
|
||||
* This includes both in-stock and out-of-stock items
|
||||
*/
|
||||
export async function fetchAllDutchieProducts(
|
||||
menuUrl: string,
|
||||
options: {
|
||||
headless?: boolean | 'new';
|
||||
timeout?: number;
|
||||
perPage?: number;
|
||||
includeOutOfStock?: boolean;
|
||||
} = {}
|
||||
): Promise<FetchResult> {
|
||||
const {
|
||||
headless = 'new',
|
||||
timeout = 90000,
|
||||
perPage = 100,
|
||||
includeOutOfStock = true,
|
||||
} = options;
|
||||
|
||||
let browser: Browser | undefined;
|
||||
|
||||
try {
|
||||
browser = await puppeteer.launch({
|
||||
headless,
|
||||
args: [
|
||||
'--no-sandbox',
|
||||
'--disable-setuid-sandbox',
|
||||
'--disable-dev-shm-usage',
|
||||
'--disable-blink-features=AutomationControlled',
|
||||
],
|
||||
});
|
||||
|
||||
const page = await browser.newPage();
|
||||
|
||||
// Stealth configuration
|
||||
await page.setUserAgent(
|
||||
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
|
||||
);
|
||||
await page.setViewport({ width: 1920, height: 1080 });
|
||||
await page.evaluateOnNewDocument(() => {
|
||||
Object.defineProperty(navigator, 'webdriver', { get: () => false });
|
||||
(window as any).chrome = { runtime: {} };
|
||||
});
|
||||
|
||||
// Navigate to menu page to establish session
|
||||
console.log('[DutchieGraphQL] Loading menu page to establish session...');
|
||||
await page.goto(menuUrl, {
|
||||
waitUntil: 'networkidle2',
|
||||
timeout,
|
||||
});
|
||||
|
||||
// Get dispensary ID from page
|
||||
const dispensaryId = await page.evaluate(() => {
|
||||
const env = (window as any).reactEnv;
|
||||
return env?.dispensaryId || env?.retailerId || '';
|
||||
});
|
||||
|
||||
if (!dispensaryId) {
|
||||
throw new Error('Could not determine dispensaryId from page');
|
||||
}
|
||||
|
||||
console.log(`[DutchieGraphQL] Dispensary ID: ${dispensaryId}`);
|
||||
|
||||
// Fetch all products via in-page GraphQL requests
|
||||
const allProducts: DutchieProduct[] = [];
|
||||
let page_num = 0;
|
||||
let hasMore = true;
|
||||
|
||||
while (hasMore) {
|
||||
console.log(`[DutchieGraphQL] Fetching page ${page_num} (perPage=${perPage})...`);
|
||||
|
||||
const result = await page.evaluate(
|
||||
async (dispensaryId: string, page_num: number, perPage: number, includeOutOfStock: boolean, hash: string) => {
|
||||
const variables = {
|
||||
includeEnterpriseSpecials: false,
|
||||
productsFilter: {
|
||||
dispensaryId,
|
||||
pricingType: 'rec',
|
||||
Status: includeOutOfStock ? null : 'Active', // null = include out-of-stock
|
||||
types: [],
|
||||
useCache: false, // Don't cache to get fresh data
|
||||
isDefaultSort: true,
|
||||
sortBy: 'popularSortIdx',
|
||||
sortDirection: 1,
|
||||
bypassOnlineThresholds: true,
|
||||
isKioskMenu: false,
|
||||
removeProductsBelowOptionThresholds: false,
|
||||
},
|
||||
page: page_num,
|
||||
perPage,
|
||||
};
|
||||
|
||||
const qs = new URLSearchParams({
|
||||
operationName: 'FilteredProducts',
|
||||
variables: JSON.stringify(variables),
|
||||
extensions: JSON.stringify({
|
||||
persistedQuery: { version: 1, sha256Hash: hash },
|
||||
}),
|
||||
});
|
||||
|
||||
const response = await fetch(`https://dutchie.com/graphql?${qs.toString()}`, {
|
||||
method: 'GET',
|
||||
headers: {
|
||||
'content-type': 'application/json',
|
||||
'apollographql-client-name': 'Marketplace (production)',
|
||||
},
|
||||
credentials: 'include', // Include cookies/session
|
||||
});
|
||||
|
||||
if (!response.ok) {
|
||||
throw new Error(`HTTP ${response.status}`);
|
||||
}
|
||||
|
||||
return response.json();
|
||||
},
|
||||
dispensaryId,
|
||||
page_num,
|
||||
perPage,
|
||||
includeOutOfStock,
|
||||
GRAPHQL_HASHES.FilteredProducts
|
||||
);
|
||||
|
||||
if (result.errors) {
|
||||
console.error('[DutchieGraphQL] GraphQL errors:', result.errors);
|
||||
break;
|
||||
}
|
||||
|
||||
const products = result?.data?.filteredProducts?.products || [];
|
||||
console.log(`[DutchieGraphQL] Page ${page_num}: ${products.length} products`);
|
||||
|
||||
if (products.length === 0) {
|
||||
hasMore = false;
|
||||
} else {
|
||||
allProducts.push(...products);
|
||||
page_num++;
|
||||
|
||||
// Safety limit
|
||||
if (page_num > 50) {
|
||||
console.log('[DutchieGraphQL] Reached page limit, stopping');
|
||||
hasMore = false;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Count active vs inactive
|
||||
const activeCount = allProducts.filter((p) => p.Status === 'Active').length;
|
||||
const inactiveCount = allProducts.filter((p) => p.Status !== 'Active').length;
|
||||
|
||||
console.log(`[DutchieGraphQL] Total: ${allProducts.length} products (${activeCount} active, ${inactiveCount} inactive)`);
|
||||
|
||||
return {
|
||||
products: allProducts,
|
||||
dispensaryId,
|
||||
totalProducts: allProducts.length,
|
||||
activeCount,
|
||||
inactiveCount,
|
||||
};
|
||||
} finally {
|
||||
if (browser) {
|
||||
await browser.close();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Upsert products to database
|
||||
*/
|
||||
export async function upsertProductsDirect(
|
||||
pool: Pool,
|
||||
storeId: number,
|
||||
products: NormalizedProduct[]
|
||||
): Promise<{ inserted: number; updated: number }> {
|
||||
const client = await pool.connect();
|
||||
let inserted = 0;
|
||||
let updated = 0;
|
||||
|
||||
try {
|
||||
await client.query('BEGIN');
|
||||
|
||||
for (const product of products) {
|
||||
const result = await client.query(
|
||||
`
|
||||
INSERT INTO products (
|
||||
store_id, external_id, slug, name, enterprise_product_id,
|
||||
brand, brand_external_id, brand_logo_url,
|
||||
subcategory, strain_type, canonical_category,
|
||||
price, rec_price, med_price, rec_special_price, med_special_price,
|
||||
is_on_special, special_name, discount_percent, special_data,
|
||||
sku, inventory_quantity, inventory_available, is_below_threshold, status,
|
||||
thc_percentage, cbd_percentage, cannabinoids,
|
||||
weight_mg, net_weight_value, net_weight_unit, options, raw_options,
|
||||
image_url, additional_images,
|
||||
is_featured, medical_only, rec_only,
|
||||
source_created_at, source_updated_at,
|
||||
description, raw_data,
|
||||
dutchie_url, last_seen_at, updated_at
|
||||
)
|
||||
VALUES (
|
||||
$1, $2, $3, $4, $5,
|
||||
$6, $7, $8,
|
||||
$9, $10, $11,
|
||||
$12, $13, $14, $15, $16,
|
||||
$17, $18, $19, $20,
|
||||
$21, $22, $23, $24, $25,
|
||||
$26, $27, $28,
|
||||
$29, $30, $31, $32, $33,
|
||||
$34, $35,
|
||||
$36, $37, $38,
|
||||
$39, $40,
|
||||
$41, $42,
|
||||
'', NOW(), NOW()
|
||||
)
|
||||
ON CONFLICT (store_id, slug) DO UPDATE SET
|
||||
name = EXCLUDED.name,
|
||||
enterprise_product_id = EXCLUDED.enterprise_product_id,
|
||||
brand = EXCLUDED.brand,
|
||||
brand_external_id = EXCLUDED.brand_external_id,
|
||||
brand_logo_url = EXCLUDED.brand_logo_url,
|
||||
subcategory = EXCLUDED.subcategory,
|
||||
strain_type = EXCLUDED.strain_type,
|
||||
canonical_category = EXCLUDED.canonical_category,
|
||||
price = EXCLUDED.price,
|
||||
rec_price = EXCLUDED.rec_price,
|
||||
med_price = EXCLUDED.med_price,
|
||||
rec_special_price = EXCLUDED.rec_special_price,
|
||||
med_special_price = EXCLUDED.med_special_price,
|
||||
is_on_special = EXCLUDED.is_on_special,
|
||||
special_name = EXCLUDED.special_name,
|
||||
discount_percent = EXCLUDED.discount_percent,
|
||||
special_data = EXCLUDED.special_data,
|
||||
sku = EXCLUDED.sku,
|
||||
inventory_quantity = EXCLUDED.inventory_quantity,
|
||||
inventory_available = EXCLUDED.inventory_available,
|
||||
is_below_threshold = EXCLUDED.is_below_threshold,
|
||||
status = EXCLUDED.status,
|
||||
thc_percentage = EXCLUDED.thc_percentage,
|
||||
cbd_percentage = EXCLUDED.cbd_percentage,
|
||||
cannabinoids = EXCLUDED.cannabinoids,
|
||||
weight_mg = EXCLUDED.weight_mg,
|
||||
net_weight_value = EXCLUDED.net_weight_value,
|
||||
net_weight_unit = EXCLUDED.net_weight_unit,
|
||||
options = EXCLUDED.options,
|
||||
raw_options = EXCLUDED.raw_options,
|
||||
image_url = EXCLUDED.image_url,
|
||||
additional_images = EXCLUDED.additional_images,
|
||||
is_featured = EXCLUDED.is_featured,
|
||||
medical_only = EXCLUDED.medical_only,
|
||||
rec_only = EXCLUDED.rec_only,
|
||||
source_created_at = EXCLUDED.source_created_at,
|
||||
source_updated_at = EXCLUDED.source_updated_at,
|
||||
description = EXCLUDED.description,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
last_seen_at = NOW(),
|
||||
updated_at = NOW()
|
||||
RETURNING (xmax = 0) AS was_inserted
|
||||
`,
|
||||
[
|
||||
storeId,
|
||||
product.external_id,
|
||||
product.slug,
|
||||
product.name,
|
||||
product.enterprise_product_id,
|
||||
product.brand,
|
||||
product.brand_external_id,
|
||||
product.brand_logo_url,
|
||||
product.subcategory,
|
||||
product.strain_type,
|
||||
product.canonical_category,
|
||||
product.price,
|
||||
product.rec_price,
|
||||
product.med_price,
|
||||
product.rec_special_price,
|
||||
product.med_special_price,
|
||||
product.is_on_special,
|
||||
product.special_name,
|
||||
product.discount_percent,
|
||||
product.special_data ? JSON.stringify(product.special_data) : null,
|
||||
product.sku,
|
||||
product.inventory_quantity,
|
||||
product.inventory_available,
|
||||
product.is_below_threshold,
|
||||
product.status,
|
||||
product.thc_percentage,
|
||||
product.cbd_percentage,
|
||||
product.cannabinoids ? JSON.stringify(product.cannabinoids) : null,
|
||||
product.weight_mg,
|
||||
product.net_weight_value,
|
||||
product.net_weight_unit,
|
||||
product.options,
|
||||
product.raw_options,
|
||||
product.image_url,
|
||||
product.additional_images,
|
||||
product.is_featured,
|
||||
product.medical_only,
|
||||
product.rec_only,
|
||||
product.source_created_at,
|
||||
product.source_updated_at,
|
||||
product.description,
|
||||
product.raw_data ? JSON.stringify(product.raw_data) : null,
|
||||
]
|
||||
);
|
||||
|
||||
if (result.rows[0]?.was_inserted) {
|
||||
inserted++;
|
||||
} else {
|
||||
updated++;
|
||||
}
|
||||
}
|
||||
|
||||
await client.query('COMMIT');
|
||||
return { inserted, updated };
|
||||
} catch (error) {
|
||||
await client.query('ROLLBACK');
|
||||
throw error;
|
||||
} finally {
|
||||
client.release();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* @deprecated DEPRECATED - Use src/dutchie-az/services/product-crawler.ts instead.
|
||||
* This function is disabled and will throw an error if called.
|
||||
* Main entry point - scrape all products including out-of-stock
|
||||
*/
|
||||
export async function scrapeAllDutchieProducts(
|
||||
pool: Pool,
|
||||
storeId: number,
|
||||
menuUrl: string
|
||||
): Promise<{
|
||||
success: boolean;
|
||||
totalProducts: number;
|
||||
activeCount: number;
|
||||
inactiveCount: number;
|
||||
inserted: number;
|
||||
updated: number;
|
||||
error?: string;
|
||||
}> {
|
||||
// DEPRECATED: Throw error to prevent accidental use
|
||||
throw new Error(
|
||||
'DEPRECATED: scrapeAllDutchieProducts() is deprecated. ' +
|
||||
'Use src/dutchie-az/services/product-crawler.ts instead. ' +
|
||||
'This scraper writes to the legacy products table.'
|
||||
);
|
||||
|
||||
// Original code below is unreachable but kept for reference
|
||||
try {
|
||||
console.log(`[DutchieGraphQL] Scraping ALL products (including out-of-stock): ${menuUrl}`);
|
||||
|
||||
// Fetch all products via direct GraphQL
|
||||
const { products, totalProducts, activeCount, inactiveCount } = await fetchAllDutchieProducts(menuUrl, {
|
||||
includeOutOfStock: true,
|
||||
perPage: 100,
|
||||
});
|
||||
|
||||
if (products.length === 0) {
|
||||
return {
|
||||
success: false,
|
||||
totalProducts: 0,
|
||||
activeCount: 0,
|
||||
inactiveCount: 0,
|
||||
inserted: 0,
|
||||
updated: 0,
|
||||
error: 'No products returned from GraphQL',
|
||||
};
|
||||
}
|
||||
|
||||
// Normalize products
|
||||
const normalized = products.map(normalizeDutchieProduct);
|
||||
|
||||
// Upsert to database
|
||||
const { inserted, updated } = await upsertProductsDirect(pool, storeId, normalized);
|
||||
|
||||
console.log(`[DutchieGraphQL] Complete: ${totalProducts} products (${activeCount} active, ${inactiveCount} inactive)`);
|
||||
console.log(`[DutchieGraphQL] Database: ${inserted} inserted, ${updated} updated`);
|
||||
|
||||
return {
|
||||
success: true,
|
||||
totalProducts,
|
||||
activeCount,
|
||||
inactiveCount,
|
||||
inserted,
|
||||
updated,
|
||||
};
|
||||
} catch (error: any) {
|
||||
console.error(`[DutchieGraphQL] Error:`, error.message);
|
||||
return {
|
||||
success: false,
|
||||
totalProducts: 0,
|
||||
activeCount: 0,
|
||||
inactiveCount: 0,
|
||||
inserted: 0,
|
||||
updated: 0,
|
||||
error: error.message,
|
||||
};
|
||||
}
|
||||
}
|
||||
@@ -1,711 +0,0 @@
|
||||
// ============================================================================
|
||||
// DEPRECATED: This scraper writes to the LEGACY products table.
|
||||
// DO NOT USE - All Dutchie crawling must use the new dutchie-az pipeline.
|
||||
//
|
||||
// New pipeline location: src/dutchie-az/services/product-crawler.ts
|
||||
// - Uses fetch-based GraphQL (no Puppeteer needed)
|
||||
// - Writes to isolated dutchie_az_* tables with snapshot model
|
||||
// - Tracks stockStatus, isPresentInFeed, missing_from_feed
|
||||
//
|
||||
// The normalizer functions in this file (normalizeDutchieProduct) may still
|
||||
// be imported for reference, but do NOT call scrapeDutchieMenu() or upsertProducts().
|
||||
// ============================================================================
|
||||
|
||||
/**
|
||||
* @deprecated DEPRECATED - Use src/dutchie-az/services/product-crawler.ts instead.
|
||||
* This scraper writes to the legacy products table, not the new dutchie_az tables.
|
||||
*
|
||||
* Fetches product data via Puppeteer interception of Dutchie's GraphQL API.
|
||||
* This bypasses Cloudflare by using a real browser to load the menu page.
|
||||
*
|
||||
* GraphQL Operations:
|
||||
* - FilteredProducts: Returns paginated product list with full details
|
||||
* - GetAddressBasedDispensaryData: Resolves dispensary cName to dispensaryId
|
||||
*/
|
||||
|
||||
import puppeteer from 'puppeteer-extra';
|
||||
import type { Browser, Page, HTTPResponse } from 'puppeteer';
|
||||
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
|
||||
import { Pool } from 'pg';
|
||||
|
||||
puppeteer.use(StealthPlugin());
|
||||
|
||||
// =====================================================
|
||||
// TYPE DEFINITIONS (from captured GraphQL schema)
|
||||
// =====================================================
|
||||
|
||||
export interface DutchieProduct {
|
||||
_id: string;
|
||||
id: string;
|
||||
Name: string;
|
||||
cName: string; // URL slug
|
||||
enterpriseProductId?: string;
|
||||
DispensaryID: string;
|
||||
|
||||
// Brand
|
||||
brand?: {
|
||||
id: string;
|
||||
name: string;
|
||||
imageUrl?: string;
|
||||
description?: string;
|
||||
};
|
||||
brandId?: string;
|
||||
brandName?: string;
|
||||
brandLogo?: string;
|
||||
|
||||
// Category
|
||||
type?: string; // e.g., "Edible", "Flower"
|
||||
subcategory?: string; // e.g., "gummies", "pre-rolls"
|
||||
strainType?: string; // "Indica", "Sativa", "Hybrid", "N/A"
|
||||
|
||||
// Pricing (arrays - first element is primary)
|
||||
Prices?: number[];
|
||||
recPrices?: number[];
|
||||
medicalPrices?: number[];
|
||||
recSpecialPrices?: number[];
|
||||
medicalSpecialPrices?: number[];
|
||||
|
||||
// Specials
|
||||
special?: boolean;
|
||||
specialData?: {
|
||||
saleSpecials?: Array<{
|
||||
specialId: string;
|
||||
specialName: string;
|
||||
discount: number;
|
||||
percentDiscount: boolean;
|
||||
dollarDiscount: boolean;
|
||||
specialType: string;
|
||||
}>;
|
||||
bogoSpecials?: any;
|
||||
};
|
||||
|
||||
// Inventory
|
||||
POSMetaData?: {
|
||||
canonicalSKU?: string;
|
||||
canonicalCategory?: string;
|
||||
canonicalName?: string;
|
||||
canonicalLabResultUrl?: string;
|
||||
children?: Array<{
|
||||
option: string;
|
||||
price: number;
|
||||
quantity: number;
|
||||
quantityAvailable: number;
|
||||
recPrice?: number;
|
||||
medPrice?: number;
|
||||
}>;
|
||||
};
|
||||
Status?: string; // "Active" or "Inactive"
|
||||
isBelowThreshold?: boolean;
|
||||
|
||||
// Potency
|
||||
THCContent?: {
|
||||
unit: string;
|
||||
range: number[];
|
||||
};
|
||||
CBDContent?: {
|
||||
unit: string;
|
||||
range: number[];
|
||||
};
|
||||
cannabinoidsV2?: Array<{
|
||||
value: number;
|
||||
unit: string;
|
||||
cannabinoid: {
|
||||
name: string;
|
||||
};
|
||||
}>;
|
||||
|
||||
// Weight/Options
|
||||
Options?: string[];
|
||||
rawOptions?: string[];
|
||||
weight?: number;
|
||||
measurements?: {
|
||||
netWeight?: {
|
||||
unit: string;
|
||||
values: number[];
|
||||
};
|
||||
volume?: any;
|
||||
};
|
||||
|
||||
// Images
|
||||
Image?: string;
|
||||
images?: string[];
|
||||
|
||||
// Flags
|
||||
featured?: boolean;
|
||||
medicalOnly?: boolean;
|
||||
recOnly?: boolean;
|
||||
|
||||
// Timestamps
|
||||
createdAt?: string;
|
||||
updatedAt?: string;
|
||||
|
||||
// Description
|
||||
description?: string;
|
||||
effects?: Record<string, any>;
|
||||
terpenes?: any[];
|
||||
}
|
||||
|
||||
// Database product row
|
||||
export interface NormalizedProduct {
|
||||
external_id: string;
|
||||
slug: string;
|
||||
name: string;
|
||||
enterprise_product_id?: string;
|
||||
|
||||
// Brand
|
||||
brand?: string;
|
||||
brand_external_id?: string;
|
||||
brand_logo_url?: string;
|
||||
|
||||
// Category
|
||||
subcategory?: string;
|
||||
strain_type?: string;
|
||||
canonical_category?: string;
|
||||
|
||||
// Pricing
|
||||
price?: number;
|
||||
rec_price?: number;
|
||||
med_price?: number;
|
||||
rec_special_price?: number;
|
||||
med_special_price?: number;
|
||||
|
||||
// Specials
|
||||
is_on_special: boolean;
|
||||
special_name?: string;
|
||||
discount_percent?: number;
|
||||
special_data?: any;
|
||||
|
||||
// Inventory
|
||||
sku?: string;
|
||||
inventory_quantity?: number;
|
||||
inventory_available?: number;
|
||||
is_below_threshold: boolean;
|
||||
status?: string;
|
||||
|
||||
// Potency
|
||||
thc_percentage?: number;
|
||||
cbd_percentage?: number;
|
||||
cannabinoids?: any;
|
||||
|
||||
// Weight/Options
|
||||
weight_mg?: number;
|
||||
net_weight_value?: number;
|
||||
net_weight_unit?: string;
|
||||
options?: string[];
|
||||
raw_options?: string[];
|
||||
|
||||
// Images
|
||||
image_url?: string;
|
||||
additional_images?: string[];
|
||||
|
||||
// Flags
|
||||
is_featured: boolean;
|
||||
medical_only: boolean;
|
||||
rec_only: boolean;
|
||||
|
||||
// Timestamps
|
||||
source_created_at?: Date;
|
||||
source_updated_at?: Date;
|
||||
|
||||
// Raw
|
||||
description?: string;
|
||||
raw_data?: any;
|
||||
}
|
||||
|
||||
// =====================================================
|
||||
// NORMALIZER: Dutchie GraphQL → DB Schema
|
||||
// =====================================================
|
||||
|
||||
export function normalizeDutchieProduct(product: DutchieProduct): NormalizedProduct {
|
||||
// Extract first special if exists
|
||||
const saleSpecial = product.specialData?.saleSpecials?.[0];
|
||||
|
||||
// Calculate inventory from POSMetaData children
|
||||
const children = product.POSMetaData?.children || [];
|
||||
const totalQuantity = children.reduce((sum, c) => sum + (c.quantity || 0), 0);
|
||||
const availableQuantity = children.reduce((sum, c) => sum + (c.quantityAvailable || 0), 0);
|
||||
|
||||
// Parse timestamps
|
||||
let sourceCreatedAt: Date | undefined;
|
||||
if (product.createdAt) {
|
||||
// createdAt is a timestamp string like "1729044510543"
|
||||
const ts = parseInt(product.createdAt, 10);
|
||||
if (!isNaN(ts)) {
|
||||
sourceCreatedAt = new Date(ts);
|
||||
}
|
||||
}
|
||||
|
||||
let sourceUpdatedAt: Date | undefined;
|
||||
if (product.updatedAt) {
|
||||
sourceUpdatedAt = new Date(product.updatedAt);
|
||||
}
|
||||
|
||||
return {
|
||||
// Identity
|
||||
external_id: product._id || product.id,
|
||||
slug: product.cName,
|
||||
name: product.Name,
|
||||
enterprise_product_id: product.enterpriseProductId,
|
||||
|
||||
// Brand
|
||||
brand: product.brandName || product.brand?.name,
|
||||
brand_external_id: product.brandId || product.brand?.id,
|
||||
brand_logo_url: product.brandLogo || product.brand?.imageUrl,
|
||||
|
||||
// Category
|
||||
subcategory: product.subcategory,
|
||||
strain_type: product.strainType,
|
||||
canonical_category: product.POSMetaData?.canonicalCategory,
|
||||
|
||||
// Pricing
|
||||
price: product.Prices?.[0],
|
||||
rec_price: product.recPrices?.[0],
|
||||
med_price: product.medicalPrices?.[0],
|
||||
rec_special_price: product.recSpecialPrices?.[0],
|
||||
med_special_price: product.medicalSpecialPrices?.[0],
|
||||
|
||||
// Specials
|
||||
is_on_special: product.special === true,
|
||||
special_name: saleSpecial?.specialName,
|
||||
discount_percent: saleSpecial?.percentDiscount ? saleSpecial.discount : undefined,
|
||||
special_data: product.specialData,
|
||||
|
||||
// Inventory
|
||||
sku: product.POSMetaData?.canonicalSKU,
|
||||
inventory_quantity: totalQuantity || undefined,
|
||||
inventory_available: availableQuantity || undefined,
|
||||
is_below_threshold: product.isBelowThreshold === true,
|
||||
status: product.Status,
|
||||
|
||||
// Potency
|
||||
thc_percentage: product.THCContent?.range?.[0],
|
||||
cbd_percentage: product.CBDContent?.range?.[0],
|
||||
cannabinoids: product.cannabinoidsV2,
|
||||
|
||||
// Weight/Options
|
||||
weight_mg: product.weight,
|
||||
net_weight_value: product.measurements?.netWeight?.values?.[0],
|
||||
net_weight_unit: product.measurements?.netWeight?.unit,
|
||||
options: product.Options,
|
||||
raw_options: product.rawOptions,
|
||||
|
||||
// Images
|
||||
image_url: product.Image,
|
||||
additional_images: product.images?.length ? product.images : undefined,
|
||||
|
||||
// Flags
|
||||
is_featured: product.featured === true,
|
||||
medical_only: product.medicalOnly === true,
|
||||
rec_only: product.recOnly === true,
|
||||
|
||||
// Timestamps
|
||||
source_created_at: sourceCreatedAt,
|
||||
source_updated_at: sourceUpdatedAt,
|
||||
|
||||
// Description
|
||||
description: typeof product.description === 'string' ? product.description : undefined,
|
||||
|
||||
// Raw
|
||||
raw_data: product,
|
||||
};
|
||||
}
|
||||
|
||||
// =====================================================
|
||||
// PUPPETEER SCRAPER
|
||||
// =====================================================
|
||||
|
||||
interface CapturedProducts {
|
||||
products: DutchieProduct[];
|
||||
dispensaryId: string;
|
||||
menuUrl: string;
|
||||
}
|
||||
|
||||
export async function fetchDutchieMenuViaPuppeteer(
|
||||
menuUrl: string,
|
||||
options: {
|
||||
headless?: boolean | 'new';
|
||||
timeout?: number;
|
||||
maxScrolls?: number;
|
||||
} = {}
|
||||
): Promise<CapturedProducts> {
|
||||
const {
|
||||
headless = 'new',
|
||||
timeout = 90000,
|
||||
maxScrolls = 30, // Increased for full menu capture
|
||||
} = options;
|
||||
|
||||
let browser: Browser | undefined;
|
||||
const capturedProducts: DutchieProduct[] = [];
|
||||
let dispensaryId = '';
|
||||
|
||||
try {
|
||||
browser = await puppeteer.launch({
|
||||
headless,
|
||||
args: [
|
||||
'--no-sandbox',
|
||||
'--disable-setuid-sandbox',
|
||||
'--disable-dev-shm-usage',
|
||||
'--disable-blink-features=AutomationControlled',
|
||||
],
|
||||
});
|
||||
|
||||
const page = await browser.newPage();
|
||||
|
||||
// Stealth configuration
|
||||
await page.setUserAgent(
|
||||
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
|
||||
);
|
||||
await page.setViewport({ width: 1920, height: 1080 });
|
||||
await page.evaluateOnNewDocument(() => {
|
||||
Object.defineProperty(navigator, 'webdriver', { get: () => false });
|
||||
(window as any).chrome = { runtime: {} };
|
||||
});
|
||||
|
||||
// Track seen product IDs to avoid duplicates
|
||||
const seenIds = new Set<string>();
|
||||
|
||||
// Intercept GraphQL responses
|
||||
page.on('response', async (response: HTTPResponse) => {
|
||||
const url = response.url();
|
||||
if (!url.includes('graphql')) return;
|
||||
|
||||
try {
|
||||
const contentType = response.headers()['content-type'] || '';
|
||||
if (!contentType.includes('application/json')) return;
|
||||
|
||||
const data = await response.json();
|
||||
|
||||
// Capture dispensary ID
|
||||
if (data?.data?.getAddressBasedDispensaryData?.dispensaryData?.dispensaryId) {
|
||||
dispensaryId = data.data.getAddressBasedDispensaryData.dispensaryData.dispensaryId;
|
||||
}
|
||||
|
||||
// Capture products from FilteredProducts
|
||||
if (data?.data?.filteredProducts?.products) {
|
||||
const products = data.data.filteredProducts.products as DutchieProduct[];
|
||||
for (const product of products) {
|
||||
if (!seenIds.has(product._id)) {
|
||||
seenIds.add(product._id);
|
||||
capturedProducts.push(product);
|
||||
}
|
||||
}
|
||||
}
|
||||
} catch {
|
||||
// Ignore parse errors
|
||||
}
|
||||
});
|
||||
|
||||
// Navigate to menu
|
||||
console.log('[DutchieGraphQL] Loading menu page...');
|
||||
await page.goto(menuUrl, {
|
||||
waitUntil: 'networkidle2',
|
||||
timeout,
|
||||
});
|
||||
|
||||
// Get dispensary ID from window.reactEnv if not captured
|
||||
if (!dispensaryId) {
|
||||
dispensaryId = await page.evaluate(() => {
|
||||
const env = (window as any).reactEnv;
|
||||
return env?.dispensaryId || env?.retailerId || '';
|
||||
});
|
||||
}
|
||||
|
||||
// Helper function to scroll through a page until no more products load
|
||||
async function scrollToLoadAll(maxScrollAttempts: number = maxScrolls): Promise<void> {
|
||||
let scrollCount = 0;
|
||||
let previousCount = 0;
|
||||
let noNewProductsCount = 0;
|
||||
|
||||
while (scrollCount < maxScrollAttempts && noNewProductsCount < 3) {
|
||||
await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
|
||||
await new Promise((r) => setTimeout(r, 1500));
|
||||
|
||||
const currentCount = seenIds.size;
|
||||
if (currentCount === previousCount) {
|
||||
noNewProductsCount++;
|
||||
} else {
|
||||
noNewProductsCount = 0;
|
||||
}
|
||||
previousCount = currentCount;
|
||||
scrollCount++;
|
||||
}
|
||||
}
|
||||
|
||||
// First, scroll through the main page (all products)
|
||||
console.log('[DutchieGraphQL] Scrolling main page...');
|
||||
await scrollToLoadAll();
|
||||
console.log(`[DutchieGraphQL] After main page: ${seenIds.size} products`);
|
||||
|
||||
// Get category links from the navigation
|
||||
const categoryLinks = await page.evaluate(() => {
|
||||
const links: string[] = [];
|
||||
// Look for category navigation links
|
||||
const navLinks = document.querySelectorAll('a[href*="/products/"]');
|
||||
navLinks.forEach((link) => {
|
||||
const href = (link as HTMLAnchorElement).href;
|
||||
if (href && !links.includes(href)) {
|
||||
links.push(href);
|
||||
}
|
||||
});
|
||||
return links;
|
||||
});
|
||||
|
||||
console.log(`[DutchieGraphQL] Found ${categoryLinks.length} category links`);
|
||||
|
||||
// Visit each category page to capture all products
|
||||
for (const categoryUrl of categoryLinks) {
|
||||
try {
|
||||
console.log(`[DutchieGraphQL] Visiting category: ${categoryUrl.split('/').pop()}`);
|
||||
await page.goto(categoryUrl, {
|
||||
waitUntil: 'networkidle2',
|
||||
timeout: 30000,
|
||||
});
|
||||
await scrollToLoadAll(15); // Fewer scrolls per category
|
||||
console.log(`[DutchieGraphQL] Total products: ${seenIds.size}`);
|
||||
} catch (e: any) {
|
||||
console.log(`[DutchieGraphQL] Category error: ${e.message}`);
|
||||
}
|
||||
}
|
||||
|
||||
// Wait for any final responses
|
||||
await new Promise((r) => setTimeout(r, 2000));
|
||||
|
||||
return {
|
||||
products: capturedProducts,
|
||||
dispensaryId,
|
||||
menuUrl,
|
||||
};
|
||||
} finally {
|
||||
if (browser) {
|
||||
await browser.close();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// =====================================================
|
||||
// DATABASE OPERATIONS
|
||||
// =====================================================
|
||||
|
||||
export async function upsertProducts(
|
||||
pool: Pool,
|
||||
storeId: number,
|
||||
products: NormalizedProduct[]
|
||||
): Promise<{ inserted: number; updated: number }> {
|
||||
const client = await pool.connect();
|
||||
let inserted = 0;
|
||||
let updated = 0;
|
||||
|
||||
try {
|
||||
await client.query('BEGIN');
|
||||
|
||||
for (const product of products) {
|
||||
// Upsert product
|
||||
const result = await client.query(
|
||||
`
|
||||
INSERT INTO products (
|
||||
store_id, external_id, slug, name, enterprise_product_id,
|
||||
brand, brand_external_id, brand_logo_url,
|
||||
subcategory, strain_type, canonical_category,
|
||||
price, rec_price, med_price, rec_special_price, med_special_price,
|
||||
is_on_special, special_name, discount_percent, special_data,
|
||||
sku, inventory_quantity, inventory_available, is_below_threshold, status,
|
||||
thc_percentage, cbd_percentage, cannabinoids,
|
||||
weight_mg, net_weight_value, net_weight_unit, options, raw_options,
|
||||
image_url, additional_images,
|
||||
is_featured, medical_only, rec_only,
|
||||
source_created_at, source_updated_at,
|
||||
description, raw_data,
|
||||
dutchie_url, last_seen_at, updated_at
|
||||
)
|
||||
VALUES (
|
||||
$1, $2, $3, $4, $5,
|
||||
$6, $7, $8,
|
||||
$9, $10, $11,
|
||||
$12, $13, $14, $15, $16,
|
||||
$17, $18, $19, $20,
|
||||
$21, $22, $23, $24, $25,
|
||||
$26, $27, $28,
|
||||
$29, $30, $31, $32, $33,
|
||||
$34, $35,
|
||||
$36, $37, $38,
|
||||
$39, $40,
|
||||
$41, $42,
|
||||
'', NOW(), NOW()
|
||||
)
|
||||
ON CONFLICT (store_id, slug) DO UPDATE SET
|
||||
name = EXCLUDED.name,
|
||||
enterprise_product_id = EXCLUDED.enterprise_product_id,
|
||||
brand = EXCLUDED.brand,
|
||||
brand_external_id = EXCLUDED.brand_external_id,
|
||||
brand_logo_url = EXCLUDED.brand_logo_url,
|
||||
subcategory = EXCLUDED.subcategory,
|
||||
strain_type = EXCLUDED.strain_type,
|
||||
canonical_category = EXCLUDED.canonical_category,
|
||||
price = EXCLUDED.price,
|
||||
rec_price = EXCLUDED.rec_price,
|
||||
med_price = EXCLUDED.med_price,
|
||||
rec_special_price = EXCLUDED.rec_special_price,
|
||||
med_special_price = EXCLUDED.med_special_price,
|
||||
is_on_special = EXCLUDED.is_on_special,
|
||||
special_name = EXCLUDED.special_name,
|
||||
discount_percent = EXCLUDED.discount_percent,
|
||||
special_data = EXCLUDED.special_data,
|
||||
sku = EXCLUDED.sku,
|
||||
inventory_quantity = EXCLUDED.inventory_quantity,
|
||||
inventory_available = EXCLUDED.inventory_available,
|
||||
is_below_threshold = EXCLUDED.is_below_threshold,
|
||||
status = EXCLUDED.status,
|
||||
thc_percentage = EXCLUDED.thc_percentage,
|
||||
cbd_percentage = EXCLUDED.cbd_percentage,
|
||||
cannabinoids = EXCLUDED.cannabinoids,
|
||||
weight_mg = EXCLUDED.weight_mg,
|
||||
net_weight_value = EXCLUDED.net_weight_value,
|
||||
net_weight_unit = EXCLUDED.net_weight_unit,
|
||||
options = EXCLUDED.options,
|
||||
raw_options = EXCLUDED.raw_options,
|
||||
image_url = EXCLUDED.image_url,
|
||||
additional_images = EXCLUDED.additional_images,
|
||||
is_featured = EXCLUDED.is_featured,
|
||||
medical_only = EXCLUDED.medical_only,
|
||||
rec_only = EXCLUDED.rec_only,
|
||||
source_created_at = EXCLUDED.source_created_at,
|
||||
source_updated_at = EXCLUDED.source_updated_at,
|
||||
description = EXCLUDED.description,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
last_seen_at = NOW(),
|
||||
updated_at = NOW()
|
||||
RETURNING (xmax = 0) AS was_inserted
|
||||
`,
|
||||
[
|
||||
storeId,
|
||||
product.external_id,
|
||||
product.slug,
|
||||
product.name,
|
||||
product.enterprise_product_id,
|
||||
product.brand,
|
||||
product.brand_external_id,
|
||||
product.brand_logo_url,
|
||||
product.subcategory,
|
||||
product.strain_type,
|
||||
product.canonical_category,
|
||||
product.price,
|
||||
product.rec_price,
|
||||
product.med_price,
|
||||
product.rec_special_price,
|
||||
product.med_special_price,
|
||||
product.is_on_special,
|
||||
product.special_name,
|
||||
product.discount_percent,
|
||||
product.special_data ? JSON.stringify(product.special_data) : null,
|
||||
product.sku,
|
||||
product.inventory_quantity,
|
||||
product.inventory_available,
|
||||
product.is_below_threshold,
|
||||
product.status,
|
||||
product.thc_percentage,
|
||||
product.cbd_percentage,
|
||||
product.cannabinoids ? JSON.stringify(product.cannabinoids) : null,
|
||||
product.weight_mg,
|
||||
product.net_weight_value,
|
||||
product.net_weight_unit,
|
||||
product.options,
|
||||
product.raw_options,
|
||||
product.image_url,
|
||||
product.additional_images,
|
||||
product.is_featured,
|
||||
product.medical_only,
|
||||
product.rec_only,
|
||||
product.source_created_at,
|
||||
product.source_updated_at,
|
||||
product.description,
|
||||
product.raw_data ? JSON.stringify(product.raw_data) : null,
|
||||
]
|
||||
);
|
||||
|
||||
if (result.rows[0]?.was_inserted) {
|
||||
inserted++;
|
||||
} else {
|
||||
updated++;
|
||||
}
|
||||
}
|
||||
|
||||
await client.query('COMMIT');
|
||||
return { inserted, updated };
|
||||
} catch (error) {
|
||||
await client.query('ROLLBACK');
|
||||
throw error;
|
||||
} finally {
|
||||
client.release();
|
||||
}
|
||||
}
|
||||
|
||||
// =====================================================
|
||||
// MAIN ENTRY POINT
|
||||
// =====================================================
|
||||
|
||||
/**
|
||||
* @deprecated DEPRECATED - Use src/dutchie-az/services/product-crawler.ts instead.
|
||||
* This function is disabled and will throw an error if called.
|
||||
*/
|
||||
export async function scrapeDutchieMenu(
|
||||
pool: Pool,
|
||||
storeId: number,
|
||||
menuUrl: string
|
||||
): Promise<{
|
||||
success: boolean;
|
||||
productsFound: number;
|
||||
inserted: number;
|
||||
updated: number;
|
||||
error?: string;
|
||||
}> {
|
||||
// DEPRECATED: Throw error to prevent accidental use
|
||||
throw new Error(
|
||||
'DEPRECATED: scrapeDutchieMenu() is deprecated. ' +
|
||||
'Use src/dutchie-az/services/product-crawler.ts instead. ' +
|
||||
'This scraper writes to the legacy products table.'
|
||||
);
|
||||
|
||||
// Original code below is unreachable but kept for reference
|
||||
try {
|
||||
console.log(`[DutchieGraphQL] Scraping: ${menuUrl}`);
|
||||
|
||||
// Fetch products via Puppeteer
|
||||
const { products, dispensaryId } = await fetchDutchieMenuViaPuppeteer(menuUrl);
|
||||
|
||||
console.log(`[DutchieGraphQL] Captured ${products.length} products, dispensaryId: ${dispensaryId}`);
|
||||
|
||||
if (products.length === 0) {
|
||||
return {
|
||||
success: false,
|
||||
productsFound: 0,
|
||||
inserted: 0,
|
||||
updated: 0,
|
||||
error: 'No products captured from GraphQL responses',
|
||||
};
|
||||
}
|
||||
|
||||
// Normalize products
|
||||
const normalized = products.map(normalizeDutchieProduct);
|
||||
|
||||
// Upsert to database
|
||||
const { inserted, updated } = await upsertProducts(pool, storeId, normalized);
|
||||
|
||||
console.log(`[DutchieGraphQL] Upsert complete: ${inserted} inserted, ${updated} updated`);
|
||||
|
||||
return {
|
||||
success: true,
|
||||
productsFound: products.length,
|
||||
inserted,
|
||||
updated,
|
||||
};
|
||||
} catch (error: any) {
|
||||
console.error(`[DutchieGraphQL] Error:`, error.message);
|
||||
return {
|
||||
success: false,
|
||||
productsFound: 0,
|
||||
inserted: 0,
|
||||
updated: 0,
|
||||
error: error.message,
|
||||
};
|
||||
}
|
||||
}
|
||||
@@ -1,102 +0,0 @@
|
||||
// ============================================================================
|
||||
// DEPRECATED: Dutchie now crawled via GraphQL only (see dutchie-az pipeline)
|
||||
// DO NOT USE - This HTML scraper is unreliable and targets the legacy products table.
|
||||
// All Dutchie crawling must go through: src/dutchie-az/services/product-crawler.ts
|
||||
// ============================================================================
|
||||
|
||||
import { Page } from 'playwright';
|
||||
import { logger } from '../../services/logger';
|
||||
|
||||
export interface ScraperTemplate {
|
||||
name: string;
|
||||
urlPattern: RegExp;
|
||||
buildCategoryUrl: (baseUrl: string, category: string) => string;
|
||||
extractProducts: (page: Page) => Promise<any[]>;
|
||||
}
|
||||
|
||||
/**
|
||||
* @deprecated DEPRECATED - Dutchie HTML scraping is no longer supported.
|
||||
* Use the dutchie-az GraphQL pipeline instead: src/dutchie-az/services/product-crawler.ts
|
||||
* This template relied on unstable DOM selectors and wrote to legacy tables.
|
||||
*/
|
||||
export const dutchieTemplate: ScraperTemplate = {
|
||||
name: 'Dutchie Marketplace',
|
||||
urlPattern: /dutchie\.com\/dispensary\//,
|
||||
|
||||
buildCategoryUrl: (baseUrl: string, category: string) => {
|
||||
// Remove trailing slash
|
||||
const base = baseUrl.replace(/\/$/, '');
|
||||
// Convert category name to URL-friendly slug
|
||||
const categorySlug = category.toLowerCase().replace(/\s+/g, '-');
|
||||
return `${base}/products/${categorySlug}`;
|
||||
},
|
||||
|
||||
extractProducts: async (page: Page) => {
|
||||
const products: any[] = [];
|
||||
|
||||
try {
|
||||
// Wait for product cards to load
|
||||
await page.waitForSelector('a[data-testid="card-link"]', { timeout: 10000 }).catch(() => {
|
||||
logger.warn('scraper', 'No product cards found with data-testid="card-link"');
|
||||
});
|
||||
|
||||
// Get all product card links
|
||||
const productCards = await page.locator('a[href*="/product/"][data-testid="card-link"]').all();
|
||||
|
||||
logger.info('scraper', `Found ${productCards.length} Dutchie product cards`);
|
||||
|
||||
for (const card of productCards) {
|
||||
try {
|
||||
// Extract all data at once using evaluate for speed
|
||||
const cardData = await card.evaluate((el) => {
|
||||
const href = el.getAttribute('href') || '';
|
||||
const img = el.querySelector('img');
|
||||
const imageUrl = img ? img.getAttribute('src') || '' : '';
|
||||
|
||||
// Get all text nodes in order
|
||||
const textElements = Array.from(el.querySelectorAll('*'))
|
||||
.filter(el => el.textContent && el.children.length === 0)
|
||||
.map(el => (el.textContent || '').trim())
|
||||
.filter(text => text.length > 0);
|
||||
|
||||
const name = textElements[0] || '';
|
||||
const brand = textElements[1] || '';
|
||||
|
||||
// Look for price
|
||||
const priceMatch = el.textContent?.match(/\$(\d+(?:\.\d{2})?)/);
|
||||
const price = priceMatch ? parseFloat(priceMatch[1]) : undefined;
|
||||
|
||||
return { href, imageUrl, name, brand, price };
|
||||
});
|
||||
|
||||
if (cardData.name && cardData.href) {
|
||||
products.push({
|
||||
name: cardData.name,
|
||||
brand: cardData.brand || undefined,
|
||||
product_url: cardData.href.startsWith('http') ? cardData.href : `https://dutchie.com${cardData.href}`,
|
||||
image_url: cardData.imageUrl || undefined,
|
||||
price: cardData.price,
|
||||
in_stock: true,
|
||||
});
|
||||
}
|
||||
} catch (err) {
|
||||
logger.warn('scraper', `Error extracting Dutchie product card: ${err}`);
|
||||
}
|
||||
}
|
||||
} catch (err) {
|
||||
logger.error('scraper', `Error in Dutchie product extraction: ${err}`);
|
||||
}
|
||||
|
||||
return products;
|
||||
},
|
||||
};
|
||||
|
||||
/**
|
||||
* Get the appropriate scraper template based on URL
|
||||
*/
|
||||
export function getTemplateForUrl(url: string): ScraperTemplate | null {
|
||||
if (dutchieTemplate.urlPattern.test(url)) {
|
||||
return dutchieTemplate;
|
||||
}
|
||||
return null;
|
||||
}
|
||||
@@ -905,12 +905,13 @@ async function backfillProducts(
|
||||
|
||||
let crawlRunId = crawlRunCache.get(dayKey);
|
||||
if (!crawlRunId && !options.dryRun) {
|
||||
crawlRunId = await getOrCreateBackfillCrawlRun(
|
||||
const newCrawlRunId = await getOrCreateBackfillCrawlRun(
|
||||
pool,
|
||||
product.dispensary_id,
|
||||
capturedAt,
|
||||
options.dryRun
|
||||
);
|
||||
crawlRunId = newCrawlRunId ?? undefined;
|
||||
if (crawlRunId) {
|
||||
crawlRunCache.set(dayKey, crawlRunId);
|
||||
stats.crawlRunsCreated++;
|
||||
|
||||
@@ -212,7 +212,7 @@ EXAMPLES:
|
||||
|
||||
try {
|
||||
// Fetch all stores without a dispensary_id
|
||||
const storesResult = await pool.query<Store>(`
|
||||
const storesResult = await pool.query(`
|
||||
SELECT id, name, slug, dispensary_id
|
||||
FROM stores
|
||||
WHERE dispensary_id IS NULL
|
||||
@@ -221,7 +221,7 @@ EXAMPLES:
|
||||
const unmappedStores = storesResult.rows;
|
||||
|
||||
// Fetch all already-mapped stores for context
|
||||
const mappedResult = await pool.query<Store>(`
|
||||
const mappedResult = await pool.query(`
|
||||
SELECT id, name, slug, dispensary_id
|
||||
FROM stores
|
||||
WHERE dispensary_id IS NOT NULL
|
||||
@@ -230,7 +230,7 @@ EXAMPLES:
|
||||
const mappedStores = mappedResult.rows;
|
||||
|
||||
// Fetch all dispensaries
|
||||
const dispResult = await pool.query<Dispensary>(`
|
||||
const dispResult = await pool.query(`
|
||||
SELECT id, name, company_name, city, address, slug
|
||||
FROM dispensaries
|
||||
ORDER BY name
|
||||
|
||||
@@ -1,388 +0,0 @@
|
||||
#!/usr/bin/env npx tsx
|
||||
/**
|
||||
* Bootstrap Discovery Script
|
||||
*
|
||||
* One-time (but reusable) bootstrap command that:
|
||||
* 1. Ensures every Dispensary has a dispensary_crawl_schedule entry (4h default)
|
||||
* 2. Optionally runs RunDispensaryOrchestrator for each dispensary
|
||||
*
|
||||
* Usage:
|
||||
* npx tsx src/scripts/bootstrap-discovery.ts # Create schedules only
|
||||
* npx tsx src/scripts/bootstrap-discovery.ts --run # Create schedules + run orchestrator
|
||||
* npx tsx src/scripts/bootstrap-discovery.ts --run --limit=10 # Run for first 10 dispensaries
|
||||
* npx tsx src/scripts/bootstrap-discovery.ts --dry-run # Preview what would happen
|
||||
* npx tsx src/scripts/bootstrap-discovery.ts --status # Show current status only
|
||||
*/
|
||||
|
||||
import { pool } from '../db/pool';
|
||||
import {
|
||||
ensureAllDispensariesHaveSchedules,
|
||||
runDispensaryOrchestrator,
|
||||
runBatchDispensaryOrchestrator,
|
||||
getDispensariesDueForOrchestration,
|
||||
} from '../services/dispensary-orchestrator';
|
||||
|
||||
// Parse command line args
|
||||
const args = process.argv.slice(2);
|
||||
const flags = {
|
||||
run: args.includes('--run'),
|
||||
dryRun: args.includes('--dry-run'),
|
||||
status: args.includes('--status'),
|
||||
help: args.includes('--help') || args.includes('-h'),
|
||||
limit: parseInt(args.find(a => a.startsWith('--limit='))?.split('=')[1] || '0'),
|
||||
concurrency: parseInt(args.find(a => a.startsWith('--concurrency='))?.split('=')[1] || '3'),
|
||||
interval: parseInt(args.find(a => a.startsWith('--interval='))?.split('=')[1] || '240'),
|
||||
detectionOnly: args.includes('--detection-only'),
|
||||
productionOnly: args.includes('--production-only'),
|
||||
sandboxOnly: args.includes('--sandbox-only'),
|
||||
};
|
||||
|
||||
async function showHelp() {
|
||||
console.log(`
|
||||
Bootstrap Discovery - Initialize Dispensary Crawl System
|
||||
|
||||
USAGE:
|
||||
npx tsx src/scripts/bootstrap-discovery.ts [OPTIONS]
|
||||
|
||||
OPTIONS:
|
||||
--run After creating schedules, run the orchestrator for each dispensary
|
||||
--dry-run Show what would happen without making changes
|
||||
--status Show current status and exit
|
||||
--limit=N Limit how many dispensaries to process (0 = all, default: 0)
|
||||
--concurrency=N How many dispensaries to process in parallel (default: 3)
|
||||
--interval=M Default interval in minutes for new schedules (default: 240 = 4 hours)
|
||||
--detection-only Only run detection, don't crawl
|
||||
--production-only Only run dispensaries in production mode
|
||||
--sandbox-only Only run dispensaries in sandbox mode
|
||||
--help, -h Show this help message
|
||||
|
||||
EXAMPLES:
|
||||
# Create schedule entries for all dispensaries (no crawling)
|
||||
npx tsx src/scripts/bootstrap-discovery.ts
|
||||
|
||||
# Create schedules and run orchestrator for all dispensaries
|
||||
npx tsx src/scripts/bootstrap-discovery.ts --run
|
||||
|
||||
# Run orchestrator for first 10 dispensaries
|
||||
npx tsx src/scripts/bootstrap-discovery.ts --run --limit=10
|
||||
|
||||
# Run with higher concurrency
|
||||
npx tsx src/scripts/bootstrap-discovery.ts --run --concurrency=5
|
||||
|
||||
# Show current status
|
||||
npx tsx src/scripts/bootstrap-discovery.ts --status
|
||||
|
||||
WHAT IT DOES:
|
||||
1. Creates dispensary_crawl_schedule entries for all dispensaries that don't have one
|
||||
2. If --run: For each dispensary, runs the orchestrator which:
|
||||
a. Checks if provider detection is needed (null/unknown/stale/low confidence)
|
||||
b. Runs detection if needed
|
||||
c. If Dutchie + production mode: runs production crawl
|
||||
d. Otherwise: runs sandbox crawl
|
||||
3. Updates schedule status and job records
|
||||
`);
|
||||
}
|
||||
|
||||
async function showStatus() {
|
||||
console.log('\n📊 Current Dispensary Crawl Status\n');
|
||||
console.log('═'.repeat(70));
|
||||
|
||||
// Get dispensary counts by provider
|
||||
const providerStats = await pool.query(`
|
||||
SELECT
|
||||
COALESCE(product_provider, 'undetected') as provider,
|
||||
COUNT(*) as count,
|
||||
COUNT(*) FILTER (WHERE product_crawler_mode = 'production') as production,
|
||||
COUNT(*) FILTER (WHERE product_crawler_mode = 'sandbox') as sandbox,
|
||||
COUNT(*) FILTER (WHERE product_crawler_mode IS NULL) as no_mode
|
||||
FROM dispensaries
|
||||
GROUP BY COALESCE(product_provider, 'undetected')
|
||||
ORDER BY count DESC
|
||||
`);
|
||||
|
||||
console.log('\nProvider Distribution:');
|
||||
console.log('-'.repeat(60));
|
||||
console.log(
|
||||
'Provider'.padEnd(20) +
|
||||
'Total'.padStart(8) +
|
||||
'Production'.padStart(12) +
|
||||
'Sandbox'.padStart(10) +
|
||||
'No Mode'.padStart(10)
|
||||
);
|
||||
console.log('-'.repeat(60));
|
||||
|
||||
for (const row of providerStats.rows) {
|
||||
console.log(
|
||||
row.provider.padEnd(20) +
|
||||
row.count.toString().padStart(8) +
|
||||
row.production.toString().padStart(12) +
|
||||
row.sandbox.toString().padStart(10) +
|
||||
row.no_mode.toString().padStart(10)
|
||||
);
|
||||
}
|
||||
|
||||
// Get schedule stats
|
||||
const scheduleStats = await pool.query(`
|
||||
SELECT
|
||||
COUNT(DISTINCT d.id) as total_dispensaries,
|
||||
COUNT(DISTINCT dcs.id) as with_schedule,
|
||||
COUNT(DISTINCT d.id) - COUNT(DISTINCT dcs.id) as without_schedule,
|
||||
COUNT(*) FILTER (WHERE dcs.is_active = TRUE) as active_schedules,
|
||||
COUNT(*) FILTER (WHERE dcs.last_status = 'success') as last_success,
|
||||
COUNT(*) FILTER (WHERE dcs.last_status = 'error') as last_error,
|
||||
COUNT(*) FILTER (WHERE dcs.last_status = 'sandbox_only') as last_sandbox,
|
||||
COUNT(*) FILTER (WHERE dcs.last_status = 'detection_only') as last_detection,
|
||||
COUNT(*) FILTER (WHERE dcs.next_run_at <= NOW()) as due_now,
|
||||
AVG(dcs.interval_minutes)::INTEGER as avg_interval
|
||||
FROM dispensaries d
|
||||
LEFT JOIN dispensary_crawl_schedule dcs ON dcs.dispensary_id = d.id
|
||||
`);
|
||||
|
||||
const s = scheduleStats.rows[0];
|
||||
console.log('\n\nSchedule Status:');
|
||||
console.log('-'.repeat(60));
|
||||
console.log(` Total Dispensaries: ${s.total_dispensaries}`);
|
||||
console.log(` With Schedule: ${s.with_schedule}`);
|
||||
console.log(` Without Schedule: ${s.without_schedule}`);
|
||||
console.log(` Active Schedules: ${s.active_schedules || 0}`);
|
||||
console.log(` Average Interval: ${s.avg_interval || 240} minutes`);
|
||||
|
||||
console.log('\n Last Run Status:');
|
||||
console.log(` - Success: ${s.last_success || 0}`);
|
||||
console.log(` - Error: ${s.last_error || 0}`);
|
||||
console.log(` - Sandbox Only: ${s.last_sandbox || 0}`);
|
||||
console.log(` - Detection Only: ${s.last_detection || 0}`);
|
||||
console.log(` - Due Now: ${s.due_now || 0}`);
|
||||
|
||||
// Get recent job stats
|
||||
const jobStats = await pool.query(`
|
||||
SELECT
|
||||
COUNT(*) as total,
|
||||
COUNT(*) FILTER (WHERE status = 'completed') as completed,
|
||||
COUNT(*) FILTER (WHERE status = 'failed') as failed,
|
||||
COUNT(*) FILTER (WHERE status = 'running') as running,
|
||||
COUNT(*) FILTER (WHERE status = 'pending') as pending,
|
||||
COUNT(*) FILTER (WHERE detection_ran = TRUE) as with_detection,
|
||||
COUNT(*) FILTER (WHERE crawl_ran = TRUE) as with_crawl,
|
||||
COUNT(*) FILTER (WHERE crawl_type = 'production') as production_crawls,
|
||||
COUNT(*) FILTER (WHERE crawl_type = 'sandbox') as sandbox_crawls,
|
||||
SUM(products_found) as total_products_found
|
||||
FROM dispensary_crawl_jobs
|
||||
WHERE created_at > NOW() - INTERVAL '24 hours'
|
||||
`);
|
||||
|
||||
const j = jobStats.rows[0];
|
||||
console.log('\n\nJobs (Last 24 Hours):');
|
||||
console.log('-'.repeat(60));
|
||||
console.log(` Total Jobs: ${j.total || 0}`);
|
||||
console.log(` Completed: ${j.completed || 0}`);
|
||||
console.log(` Failed: ${j.failed || 0}`);
|
||||
console.log(` Running: ${j.running || 0}`);
|
||||
console.log(` Pending: ${j.pending || 0}`);
|
||||
console.log(` With Detection: ${j.with_detection || 0}`);
|
||||
console.log(` With Crawl: ${j.with_crawl || 0}`);
|
||||
console.log(` - Production: ${j.production_crawls || 0}`);
|
||||
console.log(` - Sandbox: ${j.sandbox_crawls || 0}`);
|
||||
console.log(` Products Found: ${j.total_products_found || 0}`);
|
||||
|
||||
console.log('\n' + '═'.repeat(70) + '\n');
|
||||
}
|
||||
|
||||
async function createSchedules(): Promise<{ created: number; existing: number }> {
|
||||
console.log('\n📅 Creating Dispensary Schedules...\n');
|
||||
|
||||
if (flags.dryRun) {
|
||||
// Count how many would be created
|
||||
const result = await pool.query(`
|
||||
SELECT COUNT(*) as count
|
||||
FROM dispensaries d
|
||||
WHERE NOT EXISTS (
|
||||
SELECT 1 FROM dispensary_crawl_schedule dcs WHERE dcs.dispensary_id = d.id
|
||||
)
|
||||
`);
|
||||
|
||||
const wouldCreate = parseInt(result.rows[0].count);
|
||||
console.log(` Would create ${wouldCreate} new schedule entries (${flags.interval} minute interval)`);
|
||||
|
||||
return { created: wouldCreate, existing: 0 };
|
||||
}
|
||||
|
||||
const result = await ensureAllDispensariesHaveSchedules(flags.interval);
|
||||
|
||||
console.log(` ✓ Created ${result.created} new schedule entries`);
|
||||
console.log(` ✓ ${result.existing} dispensaries already had schedules`);
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
async function getDispensariesToProcess(): Promise<number[]> {
|
||||
// Build query based on filters
|
||||
let whereClause = 'TRUE';
|
||||
|
||||
if (flags.productionOnly) {
|
||||
whereClause += ` AND d.product_crawler_mode = 'production'`;
|
||||
} else if (flags.sandboxOnly) {
|
||||
whereClause += ` AND d.product_crawler_mode = 'sandbox'`;
|
||||
}
|
||||
|
||||
if (flags.detectionOnly) {
|
||||
whereClause += ` AND (d.product_provider IS NULL OR d.product_provider = 'unknown' OR d.product_confidence < 50)`;
|
||||
}
|
||||
|
||||
const limitClause = flags.limit > 0 ? `LIMIT ${flags.limit}` : '';
|
||||
|
||||
const query = `
|
||||
SELECT d.id, d.name, d.product_provider, d.product_crawler_mode
|
||||
FROM dispensaries d
|
||||
LEFT JOIN dispensary_crawl_schedule dcs ON dcs.dispensary_id = d.id
|
||||
WHERE ${whereClause}
|
||||
ORDER BY
|
||||
COALESCE(dcs.priority, 0) DESC,
|
||||
dcs.last_run_at ASC NULLS FIRST,
|
||||
d.id ASC
|
||||
${limitClause}
|
||||
`;
|
||||
|
||||
const result = await pool.query(query);
|
||||
return result.rows.map(row => row.id);
|
||||
}
|
||||
|
||||
async function runOrchestrator() {
|
||||
console.log('\n🚀 Running Dispensary Orchestrator...\n');
|
||||
|
||||
const dispensaryIds = await getDispensariesToProcess();
|
||||
|
||||
if (dispensaryIds.length === 0) {
|
||||
console.log(' No dispensaries to process.');
|
||||
return;
|
||||
}
|
||||
|
||||
console.log(` Found ${dispensaryIds.length} dispensaries to process`);
|
||||
console.log(` Concurrency: ${flags.concurrency}`);
|
||||
|
||||
if (flags.dryRun) {
|
||||
console.log('\n Would process these dispensaries:');
|
||||
|
||||
const details = await pool.query(
|
||||
`SELECT id, name, product_provider, product_crawler_mode
|
||||
FROM dispensaries WHERE id = ANY($1) ORDER BY id`,
|
||||
[dispensaryIds]
|
||||
);
|
||||
|
||||
for (const row of details.rows.slice(0, 20)) {
|
||||
console.log(` - [${row.id}] ${row.name} (${row.product_provider || 'undetected'}, ${row.product_crawler_mode || 'no mode'})`);
|
||||
}
|
||||
|
||||
if (details.rows.length > 20) {
|
||||
console.log(` ... and ${details.rows.length - 20} more`);
|
||||
}
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
console.log('\n Starting batch processing...\n');
|
||||
|
||||
const results = await runBatchDispensaryOrchestrator(dispensaryIds, flags.concurrency);
|
||||
|
||||
// Summarize results
|
||||
const summary = {
|
||||
total: results.length,
|
||||
success: results.filter(r => r.status === 'success').length,
|
||||
sandboxOnly: results.filter(r => r.status === 'sandbox_only').length,
|
||||
detectionOnly: results.filter(r => r.status === 'detection_only').length,
|
||||
error: results.filter(r => r.status === 'error').length,
|
||||
detectionsRan: results.filter(r => r.detectionRan).length,
|
||||
crawlsRan: results.filter(r => r.crawlRan).length,
|
||||
productionCrawls: results.filter(r => r.crawlType === 'production').length,
|
||||
sandboxCrawls: results.filter(r => r.crawlType === 'sandbox').length,
|
||||
totalProducts: results.reduce((sum, r) => sum + (r.productsFound || 0), 0),
|
||||
totalDuration: results.reduce((sum, r) => sum + r.durationMs, 0),
|
||||
};
|
||||
|
||||
console.log('\n' + '═'.repeat(70));
|
||||
console.log(' Orchestrator Results');
|
||||
console.log('═'.repeat(70));
|
||||
console.log(`
|
||||
Total Processed: ${summary.total}
|
||||
|
||||
Status:
|
||||
- Success: ${summary.success}
|
||||
- Sandbox Only: ${summary.sandboxOnly}
|
||||
- Detection Only: ${summary.detectionOnly}
|
||||
- Error: ${summary.error}
|
||||
|
||||
Operations:
|
||||
- Detections Ran: ${summary.detectionsRan}
|
||||
- Crawls Ran: ${summary.crawlsRan}
|
||||
- Production: ${summary.productionCrawls}
|
||||
- Sandbox: ${summary.sandboxCrawls}
|
||||
|
||||
Results:
|
||||
- Products Found: ${summary.totalProducts}
|
||||
- Total Duration: ${(summary.totalDuration / 1000).toFixed(1)}s
|
||||
- Avg per Dispensary: ${(summary.totalDuration / summary.total / 1000).toFixed(1)}s
|
||||
`);
|
||||
console.log('═'.repeat(70) + '\n');
|
||||
|
||||
// Show errors if any
|
||||
const errors = results.filter(r => r.status === 'error');
|
||||
if (errors.length > 0) {
|
||||
console.log('\n⚠️ Errors encountered:');
|
||||
for (const err of errors.slice(0, 10)) {
|
||||
console.log(` - [${err.dispensaryId}] ${err.dispensaryName}: ${err.error}`);
|
||||
}
|
||||
if (errors.length > 10) {
|
||||
console.log(` ... and ${errors.length - 10} more errors`);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
async function main() {
|
||||
if (flags.help) {
|
||||
await showHelp();
|
||||
process.exit(0);
|
||||
}
|
||||
|
||||
console.log('\n' + '═'.repeat(70));
|
||||
console.log(' Dispensary Crawl Bootstrap Discovery');
|
||||
console.log('═'.repeat(70));
|
||||
|
||||
if (flags.dryRun) {
|
||||
console.log('\n🔍 DRY RUN MODE - No changes will be made');
|
||||
}
|
||||
|
||||
try {
|
||||
// Always show status first
|
||||
await showStatus();
|
||||
|
||||
if (flags.status) {
|
||||
// Status-only mode, we're done
|
||||
await pool.end();
|
||||
process.exit(0);
|
||||
}
|
||||
|
||||
// Step 1: Create schedule entries
|
||||
await createSchedules();
|
||||
|
||||
// Step 2: Optionally run orchestrator
|
||||
if (flags.run) {
|
||||
await runOrchestrator();
|
||||
} else {
|
||||
console.log('\n💡 Tip: Use --run to also run the orchestrator for each dispensary');
|
||||
}
|
||||
|
||||
// Show final status
|
||||
if (!flags.dryRun) {
|
||||
await showStatus();
|
||||
}
|
||||
|
||||
} catch (error: any) {
|
||||
console.error('\n❌ Fatal error:', error.message);
|
||||
console.error(error.stack);
|
||||
process.exit(1);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
main();
|
||||
@@ -1,101 +0,0 @@
|
||||
/**
|
||||
* LOCAL-ONLY Admin Bootstrap Script
|
||||
*
|
||||
* Creates or resets a local admin user for development.
|
||||
* This script is ONLY for local development - never use in production.
|
||||
*
|
||||
* Usage:
|
||||
* cd backend
|
||||
* npx tsx src/scripts/bootstrap-local-admin.ts
|
||||
*
|
||||
* Default credentials:
|
||||
* Email: admin@local.test
|
||||
* Password: admin123
|
||||
*/
|
||||
|
||||
import bcrypt from 'bcrypt';
|
||||
import { query, closePool } from '../dutchie-az/db/connection';
|
||||
|
||||
// Local admin credentials - deterministic for dev
|
||||
const LOCAL_ADMIN_EMAIL = 'admin@local.test';
|
||||
const LOCAL_ADMIN_PASSWORD = 'admin123';
|
||||
const LOCAL_ADMIN_ROLE = 'admin'; // Match existing schema (admin, not superadmin)
|
||||
|
||||
async function bootstrapLocalAdmin(): Promise<void> {
|
||||
console.log('='.repeat(60));
|
||||
console.log('LOCAL ADMIN BOOTSTRAP');
|
||||
console.log('='.repeat(60));
|
||||
console.log('');
|
||||
console.log('This script creates/resets a local admin user for development.');
|
||||
console.log('');
|
||||
|
||||
try {
|
||||
// Hash the password with bcrypt (10 rounds, matching existing code)
|
||||
const passwordHash = await bcrypt.hash(LOCAL_ADMIN_PASSWORD, 10);
|
||||
|
||||
// Check if user exists
|
||||
const existing = await query<{ id: number; email: string }>(
|
||||
'SELECT id, email FROM users WHERE email = $1',
|
||||
[LOCAL_ADMIN_EMAIL]
|
||||
);
|
||||
|
||||
if (existing.rows.length > 0) {
|
||||
// User exists - update password and role
|
||||
console.log(`User "${LOCAL_ADMIN_EMAIL}" already exists (id=${existing.rows[0].id})`);
|
||||
console.log('Resetting password and ensuring admin role...');
|
||||
|
||||
await query(
|
||||
`UPDATE users
|
||||
SET password_hash = $1,
|
||||
role = $2,
|
||||
updated_at = NOW()
|
||||
WHERE email = $3`,
|
||||
[passwordHash, LOCAL_ADMIN_ROLE, LOCAL_ADMIN_EMAIL]
|
||||
);
|
||||
|
||||
console.log('User updated successfully.');
|
||||
} else {
|
||||
// User doesn't exist - create new
|
||||
console.log(`Creating new admin user: ${LOCAL_ADMIN_EMAIL}`);
|
||||
|
||||
const result = await query<{ id: number }>(
|
||||
`INSERT INTO users (email, password_hash, role, created_at, updated_at)
|
||||
VALUES ($1, $2, $3, NOW(), NOW())
|
||||
RETURNING id`,
|
||||
[LOCAL_ADMIN_EMAIL, passwordHash, LOCAL_ADMIN_ROLE]
|
||||
);
|
||||
|
||||
console.log(`User created successfully (id=${result.rows[0].id})`);
|
||||
}
|
||||
|
||||
console.log('');
|
||||
console.log('='.repeat(60));
|
||||
console.log('LOCAL ADMIN READY');
|
||||
console.log('='.repeat(60));
|
||||
console.log('');
|
||||
console.log('Login credentials:');
|
||||
console.log(` Email: ${LOCAL_ADMIN_EMAIL}`);
|
||||
console.log(` Password: ${LOCAL_ADMIN_PASSWORD}`);
|
||||
console.log('');
|
||||
console.log('Admin UI: http://localhost:8080/admin');
|
||||
console.log('');
|
||||
|
||||
} catch (error: any) {
|
||||
console.error('');
|
||||
console.error('ERROR: Failed to bootstrap local admin');
|
||||
console.error(error.message);
|
||||
|
||||
if (error.message.includes('relation "users" does not exist')) {
|
||||
console.error('');
|
||||
console.error('The "users" table does not exist.');
|
||||
console.error('Run migrations first: npm run migrate');
|
||||
}
|
||||
|
||||
process.exit(1);
|
||||
} finally {
|
||||
await closePool();
|
||||
}
|
||||
}
|
||||
|
||||
// Run the bootstrap
|
||||
bootstrapLocalAdmin();
|
||||
@@ -1,66 +0,0 @@
|
||||
/**
|
||||
* Seed crawl: trigger dutchie crawls for all dispensaries with menu_type='dutchie'
|
||||
* and a resolved platform_dispensary_id. This uses the AZ orchestrator endpoint logic.
|
||||
*
|
||||
* Usage (local):
|
||||
* node dist/scripts/crawl-all-dutchie.js
|
||||
*
|
||||
* Requires:
|
||||
* - DATABASE_URL/CRAWLSY_DATABASE_URL pointing to the consolidated DB
|
||||
* - Dispensaries table populated with menu_type and platform_dispensary_id
|
||||
*/
|
||||
|
||||
import { query } from '../dutchie-az/db/connection';
|
||||
import { runDispensaryOrchestrator } from '../services/dispensary-orchestrator';
|
||||
|
||||
async function main() {
|
||||
const { rows } = await query<{
|
||||
id: number;
|
||||
name: string;
|
||||
slug: string;
|
||||
platform_dispensary_id: string | null;
|
||||
}>(`
|
||||
SELECT id, name, slug, platform_dispensary_id
|
||||
FROM dispensaries
|
||||
WHERE menu_type = 'dutchie'
|
||||
AND platform_dispensary_id IS NOT NULL
|
||||
ORDER BY id
|
||||
`);
|
||||
|
||||
if (!rows.length) {
|
||||
console.log('No dutchie dispensaries with resolved platform_dispensary_id found.');
|
||||
process.exit(0);
|
||||
}
|
||||
|
||||
console.log(`Found ${rows.length} dutchie dispensaries with resolved IDs. Triggering crawls...`);
|
||||
|
||||
let success = 0;
|
||||
let failed = 0;
|
||||
|
||||
for (const row of rows) {
|
||||
try {
|
||||
console.log(`Crawling ${row.id} (${row.name})...`);
|
||||
const result = await runDispensaryOrchestrator(row.id);
|
||||
const ok =
|
||||
result.status === 'success' ||
|
||||
result.status === 'sandbox_only' ||
|
||||
result.status === 'detection_only';
|
||||
if (ok) {
|
||||
success++;
|
||||
} else {
|
||||
failed++;
|
||||
console.warn(`Crawl returned status ${result.status} for ${row.id} (${row.name})`);
|
||||
}
|
||||
} catch (err: any) {
|
||||
failed++;
|
||||
console.error(`Failed crawl for ${row.id} (${row.name}): ${err.message}`);
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`Completed. Success: ${success}, Failed: ${failed}`);
|
||||
}
|
||||
|
||||
main().catch((err) => {
|
||||
console.error('Fatal:', err);
|
||||
process.exit(1);
|
||||
});
|
||||
@@ -1,50 +0,0 @@
|
||||
import { runDispensaryOrchestrator } from '../services/dispensary-orchestrator';
|
||||
|
||||
// All 57 dutchie stores with platform_dispensary_id (as of 2024-12)
|
||||
const ALL_DISPENSARY_IDS = [
|
||||
72, 74, 75, 76, 77, 78, 81, 82, 85, 87, 91, 92, 97, 101, 106, 108, 110, 112,
|
||||
115, 120, 123, 125, 128, 131, 135, 139, 140, 143, 144, 145, 152, 153, 161,
|
||||
168, 176, 177, 180, 181, 189, 195, 196, 199, 200, 201, 205, 206, 207, 213,
|
||||
214, 224, 225, 227, 232, 235, 248, 252, 281
|
||||
];
|
||||
|
||||
const BATCH_SIZE = 5;
|
||||
|
||||
async function run() {
|
||||
const totalBatches = Math.ceil(ALL_DISPENSARY_IDS.length / BATCH_SIZE);
|
||||
console.log(`Starting crawl of ${ALL_DISPENSARY_IDS.length} stores in ${totalBatches} batches of ${BATCH_SIZE}...`);
|
||||
|
||||
let successCount = 0;
|
||||
let errorCount = 0;
|
||||
|
||||
for (let i = 0; i < ALL_DISPENSARY_IDS.length; i += BATCH_SIZE) {
|
||||
const batch = ALL_DISPENSARY_IDS.slice(i, i + BATCH_SIZE);
|
||||
const batchNum = Math.floor(i / BATCH_SIZE) + 1;
|
||||
console.log(`\n========== BATCH ${batchNum}/${totalBatches} (IDs: ${batch.join(', ')}) ==========`);
|
||||
|
||||
for (const id of batch) {
|
||||
console.log(`\n--- Crawling dispensary ${id} ---`);
|
||||
try {
|
||||
const result = await runDispensaryOrchestrator(id);
|
||||
console.log(` Status: ${result.status}`);
|
||||
console.log(` Summary: ${result.summary}`);
|
||||
if (result.productsFound) {
|
||||
console.log(` Products: ${result.productsFound} found, ${result.productsNew} new, ${result.productsUpdated} updated`);
|
||||
}
|
||||
successCount++;
|
||||
} catch (e: any) {
|
||||
console.log(` ERROR: ${e.message}`);
|
||||
errorCount++;
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`\n--- Batch ${batchNum} complete. Progress: ${Math.min(i + BATCH_SIZE, ALL_DISPENSARY_IDS.length)}/${ALL_DISPENSARY_IDS.length} ---`);
|
||||
}
|
||||
|
||||
console.log('\n========================================');
|
||||
console.log(`=== ALL CRAWLS COMPLETE ===`);
|
||||
console.log(`Success: ${successCount}, Errors: ${errorCount}`);
|
||||
console.log('========================================');
|
||||
}
|
||||
|
||||
run().catch(e => console.log('Fatal:', e.message));
|
||||
114
backend/src/scripts/debug-dutchie-page.ts
Normal file
114
backend/src/scripts/debug-dutchie-page.ts
Normal file
@@ -0,0 +1,114 @@
|
||||
/**
|
||||
* Debug Dutchie city page to see what data is available
|
||||
*/
|
||||
|
||||
import puppeteer from 'puppeteer-extra';
|
||||
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
|
||||
|
||||
puppeteer.use(StealthPlugin());
|
||||
|
||||
async function main() {
|
||||
const cityUrl = process.argv[2] || 'https://dutchie.com/us/dispensaries/wa-bellevue';
|
||||
|
||||
console.log(`Debugging page: ${cityUrl}`);
|
||||
|
||||
const browser = await puppeteer.launch({
|
||||
headless: 'new',
|
||||
args: ['--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage'],
|
||||
});
|
||||
|
||||
try {
|
||||
const page = await browser.newPage();
|
||||
await page.setUserAgent(
|
||||
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
|
||||
);
|
||||
|
||||
console.log('Navigating...');
|
||||
await page.goto(cityUrl, {
|
||||
waitUntil: 'networkidle2',
|
||||
timeout: 60000,
|
||||
});
|
||||
|
||||
await new Promise((r) => setTimeout(r, 5000));
|
||||
|
||||
// Get page title
|
||||
const title = await page.title();
|
||||
console.log(`\nPage title: ${title}`);
|
||||
|
||||
// Check for Cloudflare challenge
|
||||
const isCFChallenge = await page.evaluate(() => {
|
||||
return document.title.includes('Just a moment') ||
|
||||
document.body.textContent?.includes('Enable JavaScript');
|
||||
});
|
||||
|
||||
if (isCFChallenge) {
|
||||
console.log('\n⚠️ CLOUDFLARE CHALLENGE DETECTED - waiting longer...');
|
||||
await new Promise((r) => setTimeout(r, 10000));
|
||||
}
|
||||
|
||||
// Check for __NEXT_DATA__
|
||||
const nextData = await page.evaluate(() => {
|
||||
const script = document.querySelector('script#__NEXT_DATA__');
|
||||
if (script) {
|
||||
try {
|
||||
return JSON.parse(script.textContent || '{}');
|
||||
} catch {
|
||||
return { error: 'Failed to parse __NEXT_DATA__' };
|
||||
}
|
||||
}
|
||||
return null;
|
||||
});
|
||||
|
||||
if (nextData) {
|
||||
console.log('\n✅ __NEXT_DATA__ found!');
|
||||
console.log('Keys:', Object.keys(nextData));
|
||||
if (nextData.props?.pageProps) {
|
||||
console.log('pageProps keys:', Object.keys(nextData.props.pageProps));
|
||||
if (nextData.props.pageProps.dispensaries) {
|
||||
console.log('Dispensaries count:', nextData.props.pageProps.dispensaries.length);
|
||||
// Show first dispensary structure
|
||||
const first = nextData.props.pageProps.dispensaries[0];
|
||||
if (first) {
|
||||
console.log('\nFirst dispensary keys:', Object.keys(first));
|
||||
console.log('First dispensary sample:', JSON.stringify(first, null, 2).slice(0, 1000));
|
||||
}
|
||||
}
|
||||
}
|
||||
} else {
|
||||
console.log('\n❌ No __NEXT_DATA__ found');
|
||||
|
||||
// Check what scripts are on the page
|
||||
const scripts = await page.evaluate(() => {
|
||||
return Array.from(document.querySelectorAll('script[id]')).map(s => ({
|
||||
id: s.id,
|
||||
src: (s as HTMLScriptElement).src?.slice(0, 100),
|
||||
}));
|
||||
});
|
||||
console.log('Scripts with IDs:', scripts);
|
||||
|
||||
// Try to find dispensary data in window object
|
||||
const windowData = await page.evaluate(() => {
|
||||
const w = window as any;
|
||||
const keys = ['__NEXT_DATA__', '__PRELOADED_STATE__', '__INITIAL_STATE__',
|
||||
'dispensaries', '__data', 'pageData', '__remixContext'];
|
||||
const found: Record<string, any> = {};
|
||||
for (const key of keys) {
|
||||
if (w[key]) {
|
||||
found[key] = typeof w[key] === 'object' ? Object.keys(w[key]) : typeof w[key];
|
||||
}
|
||||
}
|
||||
return found;
|
||||
});
|
||||
console.log('Window data:', windowData);
|
||||
|
||||
// Get some page content
|
||||
const bodyText = await page.evaluate(() => document.body.innerText.slice(0, 500));
|
||||
console.log('\nPage text preview:', bodyText);
|
||||
}
|
||||
|
||||
} finally {
|
||||
await browser.close();
|
||||
}
|
||||
}
|
||||
|
||||
main().catch(console.error);
|
||||
258
backend/src/scripts/discover-and-import-store.ts
Normal file
258
backend/src/scripts/discover-and-import-store.ts
Normal file
@@ -0,0 +1,258 @@
|
||||
/**
|
||||
* Discover and Import Store Script
|
||||
*
|
||||
* Discovers a store from Dutchie by city+state and imports it into the dispensaries table.
|
||||
* Uses the local API endpoints - does NOT make direct GraphQL calls.
|
||||
*
|
||||
* Usage:
|
||||
* npx tsx src/scripts/discover-and-import-store.ts --city "Adelanto" --state "CA"
|
||||
* npx tsx src/scripts/discover-and-import-store.ts --city "Phoenix" --state "AZ" --dry-run
|
||||
* npx tsx src/scripts/discover-and-import-store.ts --city "Los Angeles" --state "CA" --all
|
||||
*/
|
||||
|
||||
const API_BASE = process.env.API_BASE || 'http://localhost:3010';
|
||||
|
||||
interface DiscoveryResult {
|
||||
cityId: string;
|
||||
citySlug: string;
|
||||
locationsFound: number;
|
||||
locationsUpserted: number;
|
||||
locationsNew: number;
|
||||
locationsUpdated: number;
|
||||
errors: string[];
|
||||
durationMs: number;
|
||||
}
|
||||
|
||||
interface DiscoveryLocation {
|
||||
id: number;
|
||||
name: string;
|
||||
city: string;
|
||||
stateCode: string;
|
||||
platformSlug: string;
|
||||
platformLocationId: string;
|
||||
platformMenuUrl: string;
|
||||
status: string;
|
||||
}
|
||||
|
||||
interface Store {
|
||||
id: number;
|
||||
name: string;
|
||||
slug: string;
|
||||
city: string;
|
||||
state: string;
|
||||
menu_url: string;
|
||||
platform_dispensary_id: string;
|
||||
}
|
||||
|
||||
async function discoverCity(city: string, state: string): Promise<DiscoveryResult | null> {
|
||||
const citySlug = city.toLowerCase().replace(/\s+/g, '-');
|
||||
|
||||
console.log(`\n[1/3] Discovering stores in ${city}, ${state}...`);
|
||||
|
||||
const response = await fetch(`${API_BASE}/api/discovery/admin/discover-city`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({
|
||||
citySlug,
|
||||
stateCode: state,
|
||||
countryCode: 'US'
|
||||
})
|
||||
});
|
||||
|
||||
if (!response.ok) {
|
||||
const error = await response.text();
|
||||
console.error(`Discovery failed: ${error}`);
|
||||
return null;
|
||||
}
|
||||
|
||||
const data = await response.json();
|
||||
|
||||
if (!data.success) {
|
||||
console.error(`Discovery failed: ${JSON.stringify(data)}`);
|
||||
return null;
|
||||
}
|
||||
|
||||
console.log(` Found ${data.result.locationsFound} location(s)`);
|
||||
console.log(` New: ${data.result.locationsNew}, Updated: ${data.result.locationsUpdated}`);
|
||||
|
||||
return data.result;
|
||||
}
|
||||
|
||||
async function getDiscoveredLocations(state: string, city?: string): Promise<DiscoveryLocation[]> {
|
||||
console.log(`\n[2/3] Fetching discovered locations for ${city || 'all cities'}, ${state}...`);
|
||||
|
||||
// Query the discovery_locations table via SQL since the API has a bug
|
||||
// For now, return empty and let caller handle via direct DB query
|
||||
// TODO: Fix the /api/discovery/locations endpoint
|
||||
|
||||
return [];
|
||||
}
|
||||
|
||||
async function createStore(location: {
|
||||
name: string;
|
||||
slug: string;
|
||||
city: string;
|
||||
state: string;
|
||||
menuUrl: string;
|
||||
platformId: string;
|
||||
}): Promise<Store | null> {
|
||||
console.log(`\n[3/3] Creating store: ${location.name}...`);
|
||||
|
||||
const response = await fetch(`${API_BASE}/api/stores`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({
|
||||
name: location.name,
|
||||
slug: location.slug,
|
||||
city: location.city,
|
||||
state: location.state,
|
||||
menu_url: location.menuUrl,
|
||||
menu_type: 'dutchie',
|
||||
platform: 'dutchie',
|
||||
platform_dispensary_id: location.platformId
|
||||
})
|
||||
});
|
||||
|
||||
if (!response.ok) {
|
||||
const error = await response.json();
|
||||
if (error.error?.includes('already exists')) {
|
||||
console.log(` Store already exists (slug: ${location.slug})`);
|
||||
return null;
|
||||
}
|
||||
console.error(` Failed to create store: ${JSON.stringify(error)}`);
|
||||
return null;
|
||||
}
|
||||
|
||||
const store = await response.json();
|
||||
console.log(` Created store ID: ${store.id}`);
|
||||
return store;
|
||||
}
|
||||
|
||||
async function verifyStoreExists(city: string, state: string): Promise<Store[]> {
|
||||
const response = await fetch(`${API_BASE}/api/stores?city=${encodeURIComponent(city)}&state=${state}`);
|
||||
|
||||
if (!response.ok) {
|
||||
return [];
|
||||
}
|
||||
|
||||
const data = await response.json();
|
||||
return data.stores || [];
|
||||
}
|
||||
|
||||
async function main() {
|
||||
const args = process.argv.slice(2);
|
||||
|
||||
// Parse arguments
|
||||
let city = '';
|
||||
let state = '';
|
||||
let dryRun = false;
|
||||
let importAll = false;
|
||||
|
||||
for (let i = 0; i < args.length; i++) {
|
||||
if (args[i] === '--city' && args[i + 1]) {
|
||||
city = args[i + 1];
|
||||
i++;
|
||||
} else if (args[i] === '--state' && args[i + 1]) {
|
||||
state = args[i + 1].toUpperCase();
|
||||
i++;
|
||||
} else if (args[i] === '--dry-run') {
|
||||
dryRun = true;
|
||||
} else if (args[i] === '--all') {
|
||||
importAll = true;
|
||||
}
|
||||
}
|
||||
|
||||
if (!city || !state) {
|
||||
console.log(`
|
||||
Usage: npx tsx src/scripts/discover-and-import-store.ts --city "City Name" --state "ST"
|
||||
|
||||
Options:
|
||||
--city City name (required)
|
||||
--state State code, e.g., CA, AZ (required)
|
||||
--dry-run Discover only, don't import
|
||||
--all Import all discovered locations (default: first one only)
|
||||
|
||||
Examples:
|
||||
npx tsx src/scripts/discover-and-import-store.ts --city "Adelanto" --state "CA"
|
||||
npx tsx src/scripts/discover-and-import-store.ts --city "Phoenix" --state "AZ" --all
|
||||
`);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
console.log('='.repeat(60));
|
||||
console.log(`STORE DISCOVERY & IMPORT`);
|
||||
console.log(`City: ${city}, State: ${state}`);
|
||||
console.log(`Mode: ${dryRun ? 'DRY RUN' : 'IMPORT'}`);
|
||||
console.log('='.repeat(60));
|
||||
|
||||
// Step 1: Check if stores already exist
|
||||
const existingStores = await verifyStoreExists(city, state);
|
||||
if (existingStores.length > 0) {
|
||||
console.log(`\nFound ${existingStores.length} existing store(s) in ${city}, ${state}:`);
|
||||
existingStores.forEach(s => console.log(` - ${s.name} (ID: ${s.id})`));
|
||||
|
||||
if (!importAll) {
|
||||
console.log('\nUse --all to discover and import additional stores.');
|
||||
}
|
||||
}
|
||||
|
||||
// Step 2: Discover from Dutchie
|
||||
const discovery = await discoverCity(city, state);
|
||||
|
||||
if (!discovery) {
|
||||
console.error('\nDiscovery failed. Exiting.');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
if (discovery.locationsFound === 0) {
|
||||
console.log('\nNo stores found in this city on Dutchie.');
|
||||
process.exit(0);
|
||||
}
|
||||
|
||||
if (dryRun) {
|
||||
console.log('\n[DRY RUN] Would import stores. Run without --dry-run to import.');
|
||||
process.exit(0);
|
||||
}
|
||||
|
||||
// Step 3: The discovery endpoint already saved to dutchie_discovery_locations
|
||||
// Now we need to query that table and create dispensary records
|
||||
// Since the API has bugs, we'll provide instructions for manual import
|
||||
|
||||
console.log(`
|
||||
Next steps to complete import:
|
||||
|
||||
1. Query the discovery location:
|
||||
psql -c "SELECT id, name, platform_slug, platform_location_id, platform_menu_url
|
||||
FROM dutchie_discovery_locations
|
||||
WHERE name ILIKE '%${city}%'
|
||||
ORDER BY id DESC LIMIT 5;"
|
||||
|
||||
2. Create the store via API:
|
||||
curl -X POST ${API_BASE}/api/stores \\
|
||||
-H "Content-Type: application/json" \\
|
||||
-d '{
|
||||
"name": "<NAME>",
|
||||
"slug": "<PLATFORM_SLUG>",
|
||||
"city": "${city}",
|
||||
"state": "${state}",
|
||||
"menu_url": "<PLATFORM_MENU_URL>",
|
||||
"menu_type": "dutchie",
|
||||
"platform": "dutchie",
|
||||
"platform_dispensary_id": "<PLATFORM_LOCATION_ID>"
|
||||
}'
|
||||
|
||||
3. Verify:
|
||||
curl "${API_BASE}/api/stores?city=${encodeURIComponent(city)}&state=${state}"
|
||||
`);
|
||||
|
||||
// Final verification
|
||||
const finalStores = await verifyStoreExists(city, state);
|
||||
console.log('\n' + '='.repeat(60));
|
||||
console.log(`RESULT: ${finalStores.length} store(s) now in ${city}, ${state}`);
|
||||
finalStores.forEach(s => console.log(` - ${s.name} (ID: ${s.id})`));
|
||||
console.log('='.repeat(60));
|
||||
}
|
||||
|
||||
main().catch(console.error);
|
||||
|
||||
export {};
|
||||
88
backend/src/scripts/discover-az-dutchie.ts
Normal file
88
backend/src/scripts/discover-az-dutchie.ts
Normal file
@@ -0,0 +1,88 @@
|
||||
/**
|
||||
* Discover all Arizona dispensaries from Dutchie
|
||||
* Uses the state/city HTML pages which contain __NEXT_DATA__ with full dispensary list
|
||||
*/
|
||||
import { fetchPage, extractNextData } from '../platforms/dutchie/client';
|
||||
|
||||
interface DutchieDispensary {
|
||||
platform_dispensary_id: string;
|
||||
name: string;
|
||||
slug: string;
|
||||
city: string;
|
||||
state: string;
|
||||
address: string;
|
||||
zip: string;
|
||||
}
|
||||
|
||||
async function discoverAZDispensaries() {
|
||||
console.log('Discovering Arizona dispensaries from Dutchie...\n');
|
||||
|
||||
const allDispensaries: Map<string, DutchieDispensary> = new Map();
|
||||
|
||||
// Fetch the Arizona state page
|
||||
console.log('Fetching /dispensaries/arizona...');
|
||||
const stateResult = await fetchPage('/dispensaries/arizona');
|
||||
|
||||
if (!stateResult) {
|
||||
console.error('Failed to fetch Arizona page');
|
||||
return;
|
||||
}
|
||||
|
||||
console.log(`Got ${stateResult.status} response, ${stateResult.html.length} bytes`);
|
||||
|
||||
const nextData = extractNextData(stateResult.html);
|
||||
if (!nextData) {
|
||||
console.error('Failed to extract __NEXT_DATA__');
|
||||
// Try to find dispensary links in HTML
|
||||
const links = stateResult.html.match(/\/dispensary\/([a-z0-9-]+)/gi) || [];
|
||||
console.log(`Found ${links.length} dispensary links in HTML`);
|
||||
const uniqueSlugs = [...new Set(links.map(l => l.replace('/dispensary/', '')))];
|
||||
console.log('Unique slugs:', uniqueSlugs.slice(0, 20));
|
||||
return;
|
||||
}
|
||||
|
||||
console.log('Extracted __NEXT_DATA__');
|
||||
console.log('Keys:', Object.keys(nextData));
|
||||
|
||||
// The dispensary data is usually in props.pageProps
|
||||
const pageProps = nextData?.props?.pageProps;
|
||||
if (pageProps) {
|
||||
console.log('pageProps keys:', Object.keys(pageProps));
|
||||
|
||||
// Try various possible locations
|
||||
const dispensaries = pageProps.dispensaries ||
|
||||
pageProps.nearbyDispensaries ||
|
||||
pageProps.filteredDispensaries ||
|
||||
pageProps.allDispensaries ||
|
||||
[];
|
||||
|
||||
console.log(`Found ${dispensaries.length} dispensaries in pageProps`);
|
||||
|
||||
if (dispensaries.length > 0) {
|
||||
console.log('Sample:', JSON.stringify(dispensaries[0], null, 2));
|
||||
}
|
||||
}
|
||||
|
||||
// Also look for dehydratedState (Apollo cache)
|
||||
const dehydratedState = nextData?.props?.pageProps?.__APOLLO_STATE__;
|
||||
if (dehydratedState) {
|
||||
console.log('Found Apollo state');
|
||||
const dispensaryKeys = Object.keys(dehydratedState).filter(k =>
|
||||
k.startsWith('Dispensary:') || k.includes('dispensary')
|
||||
);
|
||||
console.log(`Found ${dispensaryKeys.length} dispensary entries`);
|
||||
if (dispensaryKeys.length > 0) {
|
||||
console.log('Sample key:', dispensaryKeys[0]);
|
||||
console.log('Sample value:', JSON.stringify(dehydratedState[dispensaryKeys[0]], null, 2).slice(0, 500));
|
||||
}
|
||||
}
|
||||
|
||||
// Output the raw pageProps for analysis
|
||||
if (pageProps) {
|
||||
const fs = await import('fs');
|
||||
fs.writeFileSync('/tmp/az-pageprops.json', JSON.stringify(pageProps, null, 2));
|
||||
console.log('\nWrote pageProps to /tmp/az-pageprops.json');
|
||||
}
|
||||
}
|
||||
|
||||
discoverAZDispensaries().catch(console.error);
|
||||
@@ -1,86 +0,0 @@
|
||||
#!/usr/bin/env npx tsx
|
||||
/**
|
||||
* Dutchie City Discovery CLI Runner
|
||||
*
|
||||
* Discovers cities from Dutchie's /cities page and upserts to dutchie_discovery_cities.
|
||||
*
|
||||
* Usage:
|
||||
* npm run discovery:dutchie:cities
|
||||
* npx tsx src/scripts/discovery-dutchie-cities.ts
|
||||
*
|
||||
* Environment:
|
||||
* DATABASE_URL - PostgreSQL connection string (required)
|
||||
*/
|
||||
|
||||
import { Pool } from 'pg';
|
||||
import { DutchieCityDiscovery } from '../dutchie-az/discovery/DutchieCityDiscovery';
|
||||
|
||||
async function main() {
|
||||
console.log('='.repeat(60));
|
||||
console.log('DUTCHIE CITY DISCOVERY');
|
||||
console.log('='.repeat(60));
|
||||
|
||||
// Get database URL from environment
|
||||
const connectionString = process.env.DATABASE_URL;
|
||||
if (!connectionString) {
|
||||
console.error('ERROR: DATABASE_URL environment variable is required');
|
||||
console.error('');
|
||||
console.error('Usage:');
|
||||
console.error(' DATABASE_URL="postgresql://..." npm run discovery:dutchie:cities');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
// Create pool
|
||||
const pool = new Pool({ connectionString });
|
||||
|
||||
try {
|
||||
// Test connection
|
||||
await pool.query('SELECT 1');
|
||||
console.log('[CLI] Database connection established');
|
||||
|
||||
// Run discovery
|
||||
const discovery = new DutchieCityDiscovery(pool);
|
||||
const result = await discovery.run();
|
||||
|
||||
// Print summary
|
||||
console.log('');
|
||||
console.log('='.repeat(60));
|
||||
console.log('DISCOVERY COMPLETE');
|
||||
console.log('='.repeat(60));
|
||||
console.log(`Cities found: ${result.citiesFound}`);
|
||||
console.log(`Cities inserted: ${result.citiesInserted}`);
|
||||
console.log(`Cities updated: ${result.citiesUpdated}`);
|
||||
console.log(`Errors: ${result.errors.length}`);
|
||||
console.log(`Duration: ${(result.durationMs / 1000).toFixed(1)}s`);
|
||||
|
||||
if (result.errors.length > 0) {
|
||||
console.log('');
|
||||
console.log('Errors:');
|
||||
result.errors.forEach((e) => console.log(` - ${e}`));
|
||||
}
|
||||
|
||||
// Show stats
|
||||
console.log('');
|
||||
console.log('Current Statistics:');
|
||||
const stats = await discovery.getStats();
|
||||
console.log(` Total cities: ${stats.total}`);
|
||||
console.log(` Crawl enabled: ${stats.crawlEnabled}`);
|
||||
console.log(` Never crawled: ${stats.neverCrawled}`);
|
||||
console.log('');
|
||||
console.log('By Country:');
|
||||
stats.byCountry.forEach((c) => console.log(` ${c.countryCode}: ${c.count}`));
|
||||
console.log('');
|
||||
console.log('By State (top 10):');
|
||||
stats.byState.slice(0, 10).forEach((s) => console.log(` ${s.stateCode} (${s.countryCode}): ${s.count}`));
|
||||
|
||||
process.exit(result.errors.length > 0 ? 1 : 0);
|
||||
} catch (error: any) {
|
||||
console.error('FATAL ERROR:', error.message);
|
||||
console.error(error.stack);
|
||||
process.exit(1);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
main();
|
||||
@@ -1,189 +0,0 @@
|
||||
#!/usr/bin/env npx tsx
|
||||
/**
|
||||
* Dutchie Location Discovery CLI Runner
|
||||
*
|
||||
* Discovers store locations for cities and upserts to dutchie_discovery_locations.
|
||||
*
|
||||
* Usage:
|
||||
* npm run discovery:dutchie:locations -- --all-enabled
|
||||
* npm run discovery:dutchie:locations -- --city-slug=phoenix
|
||||
* npm run discovery:dutchie:locations -- --all-enabled --limit=10
|
||||
*
|
||||
* npx tsx src/scripts/discovery-dutchie-locations.ts --all-enabled
|
||||
* npx tsx src/scripts/discovery-dutchie-locations.ts --city-slug=phoenix
|
||||
*
|
||||
* Options:
|
||||
* --city-slug=<slug> Run for a single city by its slug
|
||||
* --all-enabled Run for all cities where crawl_enabled = TRUE
|
||||
* --limit=<n> Limit the number of cities to process
|
||||
* --delay=<ms> Delay between cities in ms (default: 2000)
|
||||
*
|
||||
* Environment:
|
||||
* DATABASE_URL - PostgreSQL connection string (required)
|
||||
*/
|
||||
|
||||
import { Pool } from 'pg';
|
||||
import { DutchieLocationDiscovery } from '../dutchie-az/discovery/DutchieLocationDiscovery';
|
||||
|
||||
// Parse command line arguments
|
||||
function parseArgs(): {
|
||||
citySlug: string | null;
|
||||
allEnabled: boolean;
|
||||
limit: number | undefined;
|
||||
delay: number;
|
||||
} {
|
||||
const args = process.argv.slice(2);
|
||||
let citySlug: string | null = null;
|
||||
let allEnabled = false;
|
||||
let limit: number | undefined = undefined;
|
||||
let delay = 2000;
|
||||
|
||||
for (const arg of args) {
|
||||
if (arg.startsWith('--city-slug=')) {
|
||||
citySlug = arg.split('=')[1];
|
||||
} else if (arg === '--all-enabled') {
|
||||
allEnabled = true;
|
||||
} else if (arg.startsWith('--limit=')) {
|
||||
limit = parseInt(arg.split('=')[1], 10);
|
||||
} else if (arg.startsWith('--delay=')) {
|
||||
delay = parseInt(arg.split('=')[1], 10);
|
||||
}
|
||||
}
|
||||
|
||||
return { citySlug, allEnabled, limit, delay };
|
||||
}
|
||||
|
||||
function printUsage() {
|
||||
console.log(`
|
||||
Dutchie Location Discovery CLI
|
||||
|
||||
Usage:
|
||||
npx tsx src/scripts/discovery-dutchie-locations.ts [options]
|
||||
|
||||
Options:
|
||||
--city-slug=<slug> Run for a single city by its slug
|
||||
--all-enabled Run for all cities where crawl_enabled = TRUE
|
||||
--limit=<n> Limit the number of cities to process
|
||||
--delay=<ms> Delay between cities in ms (default: 2000)
|
||||
|
||||
Examples:
|
||||
npx tsx src/scripts/discovery-dutchie-locations.ts --all-enabled
|
||||
npx tsx src/scripts/discovery-dutchie-locations.ts --city-slug=phoenix
|
||||
npx tsx src/scripts/discovery-dutchie-locations.ts --all-enabled --limit=5
|
||||
|
||||
Environment:
|
||||
DATABASE_URL - PostgreSQL connection string (required)
|
||||
`);
|
||||
}
|
||||
|
||||
async function main() {
|
||||
const { citySlug, allEnabled, limit, delay } = parseArgs();
|
||||
|
||||
if (!citySlug && !allEnabled) {
|
||||
console.error('ERROR: Must specify either --city-slug=<slug> or --all-enabled');
|
||||
printUsage();
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
console.log('='.repeat(60));
|
||||
console.log('DUTCHIE LOCATION DISCOVERY');
|
||||
console.log('='.repeat(60));
|
||||
|
||||
if (citySlug) {
|
||||
console.log(`Mode: Single city (${citySlug})`);
|
||||
} else {
|
||||
console.log(`Mode: All enabled cities${limit ? ` (limit: ${limit})` : ''}`);
|
||||
}
|
||||
console.log(`Delay between cities: ${delay}ms`);
|
||||
console.log('');
|
||||
|
||||
// Get database URL from environment
|
||||
const connectionString = process.env.DATABASE_URL;
|
||||
if (!connectionString) {
|
||||
console.error('ERROR: DATABASE_URL environment variable is required');
|
||||
console.error('');
|
||||
console.error('Usage:');
|
||||
console.error(' DATABASE_URL="postgresql://..." npx tsx src/scripts/discovery-dutchie-locations.ts --all-enabled');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
// Create pool
|
||||
const pool = new Pool({ connectionString });
|
||||
|
||||
try {
|
||||
// Test connection
|
||||
await pool.query('SELECT 1');
|
||||
console.log('[CLI] Database connection established');
|
||||
|
||||
const discovery = new DutchieLocationDiscovery(pool);
|
||||
|
||||
if (citySlug) {
|
||||
// Single city mode
|
||||
const city = await discovery.getCityBySlug(citySlug);
|
||||
if (!city) {
|
||||
console.error(`ERROR: City not found: ${citySlug}`);
|
||||
console.error('');
|
||||
console.error('Make sure you have run city discovery first:');
|
||||
console.error(' npm run discovery:dutchie:cities');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const result = await discovery.discoverForCity(city);
|
||||
|
||||
console.log('');
|
||||
console.log('='.repeat(60));
|
||||
console.log('DISCOVERY COMPLETE');
|
||||
console.log('='.repeat(60));
|
||||
console.log(`City: ${city.cityName}, ${city.stateCode}`);
|
||||
console.log(`Locations found: ${result.locationsFound}`);
|
||||
console.log(`Inserted: ${result.locationsInserted}`);
|
||||
console.log(`Updated: ${result.locationsUpdated}`);
|
||||
console.log(`Skipped (protected): ${result.locationsSkipped}`);
|
||||
console.log(`Errors: ${result.errors.length}`);
|
||||
console.log(`Duration: ${(result.durationMs / 1000).toFixed(1)}s`);
|
||||
|
||||
if (result.errors.length > 0) {
|
||||
console.log('');
|
||||
console.log('Errors:');
|
||||
result.errors.forEach((e) => console.log(` - ${e}`));
|
||||
}
|
||||
|
||||
process.exit(result.errors.length > 0 ? 1 : 0);
|
||||
} else {
|
||||
// All enabled cities mode
|
||||
const result = await discovery.discoverAllEnabled({ limit, delayMs: delay });
|
||||
|
||||
console.log('');
|
||||
console.log('='.repeat(60));
|
||||
console.log('DISCOVERY COMPLETE');
|
||||
console.log('='.repeat(60));
|
||||
console.log(`Total cities processed: ${result.totalCities}`);
|
||||
console.log(`Total locations found: ${result.totalLocationsFound}`);
|
||||
console.log(`Total inserted: ${result.totalInserted}`);
|
||||
console.log(`Total updated: ${result.totalUpdated}`);
|
||||
console.log(`Total skipped: ${result.totalSkipped}`);
|
||||
console.log(`Total errors: ${result.errors.length}`);
|
||||
console.log(`Duration: ${(result.durationMs / 1000).toFixed(1)}s`);
|
||||
|
||||
if (result.errors.length > 0 && result.errors.length <= 20) {
|
||||
console.log('');
|
||||
console.log('Errors:');
|
||||
result.errors.forEach((e) => console.log(` - ${e}`));
|
||||
} else if (result.errors.length > 20) {
|
||||
console.log('');
|
||||
console.log(`First 20 of ${result.errors.length} errors:`);
|
||||
result.errors.slice(0, 20).forEach((e) => console.log(` - ${e}`));
|
||||
}
|
||||
|
||||
process.exit(result.errors.length > 0 ? 1 : 0);
|
||||
}
|
||||
} catch (error: any) {
|
||||
console.error('FATAL ERROR:', error.message);
|
||||
console.error(error.stack);
|
||||
process.exit(1);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
main();
|
||||
@@ -1,749 +0,0 @@
|
||||
/**
|
||||
* Legacy Data Import ETL Script
|
||||
*
|
||||
* DEPRECATED: This script assumed a two-database architecture.
|
||||
*
|
||||
* CURRENT ARCHITECTURE (Single Database):
|
||||
* - All data lives in ONE database: cannaiq (cannaiq-postgres container)
|
||||
* - Legacy tables exist INSIDE this same database with namespaced prefixes (e.g., legacy_*)
|
||||
* - The only database is: cannaiq (in cannaiq-postgres container)
|
||||
*
|
||||
* If you need to import legacy data:
|
||||
* 1. Import into namespaced tables (legacy_dispensaries, legacy_products, etc.)
|
||||
* inside the main cannaiq database
|
||||
* 2. Use the canonical connection from src/dutchie-az/db/connection.ts
|
||||
*
|
||||
* SAFETY RULES:
|
||||
* - INSERT-ONLY: No UPDATE, no DELETE, no TRUNCATE
|
||||
* - ON CONFLICT DO NOTHING: Skip duplicates, never overwrite
|
||||
* - Batch Processing: 500-1000 rows per batch
|
||||
* - Manual Invocation Only: Requires explicit user execution
|
||||
*/
|
||||
|
||||
import { Pool, PoolClient } from 'pg';
|
||||
|
||||
// ============================================================
|
||||
// CONFIGURATION
|
||||
// ============================================================
|
||||
|
||||
const BATCH_SIZE = 500;
|
||||
|
||||
interface ETLConfig {
|
||||
dryRun: boolean;
|
||||
tables: string[];
|
||||
}
|
||||
|
||||
interface ETLStats {
|
||||
table: string;
|
||||
read: number;
|
||||
inserted: number;
|
||||
skipped: number;
|
||||
errors: number;
|
||||
durationMs: number;
|
||||
}
|
||||
|
||||
// Parse command line arguments
|
||||
function parseArgs(): ETLConfig {
|
||||
const args = process.argv.slice(2);
|
||||
const config: ETLConfig = {
|
||||
dryRun: false,
|
||||
tables: ['dispensaries', 'products', 'dutchie_products', 'dutchie_product_snapshots'],
|
||||
};
|
||||
|
||||
for (const arg of args) {
|
||||
if (arg === '--dry-run') {
|
||||
config.dryRun = true;
|
||||
} else if (arg.startsWith('--tables=')) {
|
||||
config.tables = arg.replace('--tables=', '').split(',');
|
||||
}
|
||||
}
|
||||
|
||||
return config;
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// DATABASE CONNECTIONS
|
||||
// ============================================================
|
||||
|
||||
// DEPRECATED: Both pools point to the same database (cannaiq)
|
||||
// Legacy tables exist inside the main database with namespaced prefixes
|
||||
function createLegacyPool(): Pool {
|
||||
return new Pool({
|
||||
host: process.env.CANNAIQ_DB_HOST || 'localhost',
|
||||
port: parseInt(process.env.CANNAIQ_DB_PORT || '54320'),
|
||||
user: process.env.CANNAIQ_DB_USER || 'dutchie',
|
||||
password: process.env.CANNAIQ_DB_PASS || 'dutchie_local_pass',
|
||||
database: process.env.CANNAIQ_DB_NAME || 'cannaiq',
|
||||
max: 5,
|
||||
});
|
||||
}
|
||||
|
||||
function createCannaiqPool(): Pool {
|
||||
return new Pool({
|
||||
host: process.env.CANNAIQ_DB_HOST || 'localhost',
|
||||
port: parseInt(process.env.CANNAIQ_DB_PORT || '54320'),
|
||||
user: process.env.CANNAIQ_DB_USER || 'dutchie',
|
||||
password: process.env.CANNAIQ_DB_PASS || 'dutchie_local_pass',
|
||||
database: process.env.CANNAIQ_DB_NAME || 'cannaiq',
|
||||
max: 5,
|
||||
});
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// STAGING TABLE CREATION
|
||||
// ============================================================
|
||||
|
||||
const STAGING_TABLES_SQL = `
|
||||
-- Staging table for legacy dispensaries
|
||||
CREATE TABLE IF NOT EXISTS dispensaries_from_legacy (
|
||||
id SERIAL PRIMARY KEY,
|
||||
legacy_id INTEGER NOT NULL,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
slug VARCHAR(255) NOT NULL,
|
||||
city VARCHAR(100) NOT NULL,
|
||||
state VARCHAR(10) NOT NULL,
|
||||
postal_code VARCHAR(20),
|
||||
address TEXT,
|
||||
latitude DECIMAL(10,7),
|
||||
longitude DECIMAL(10,7),
|
||||
menu_url TEXT,
|
||||
website TEXT,
|
||||
legacy_metadata JSONB,
|
||||
imported_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
UNIQUE(legacy_id)
|
||||
);
|
||||
|
||||
-- Staging table for legacy products
|
||||
CREATE TABLE IF NOT EXISTS products_from_legacy (
|
||||
id SERIAL PRIMARY KEY,
|
||||
legacy_product_id INTEGER NOT NULL,
|
||||
legacy_dispensary_id INTEGER,
|
||||
external_product_id VARCHAR(255),
|
||||
name VARCHAR(500) NOT NULL,
|
||||
brand_name VARCHAR(255),
|
||||
type VARCHAR(100),
|
||||
subcategory VARCHAR(100),
|
||||
strain_type VARCHAR(50),
|
||||
thc DECIMAL(10,4),
|
||||
cbd DECIMAL(10,4),
|
||||
price_cents INTEGER,
|
||||
original_price_cents INTEGER,
|
||||
stock_status VARCHAR(20),
|
||||
weight VARCHAR(100),
|
||||
primary_image_url TEXT,
|
||||
first_seen_at TIMESTAMPTZ,
|
||||
last_seen_at TIMESTAMPTZ,
|
||||
legacy_raw_payload JSONB,
|
||||
imported_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
UNIQUE(legacy_product_id)
|
||||
);
|
||||
|
||||
-- Staging table for legacy price history
|
||||
CREATE TABLE IF NOT EXISTS price_history_legacy (
|
||||
id SERIAL PRIMARY KEY,
|
||||
legacy_product_id INTEGER NOT NULL,
|
||||
price_cents INTEGER,
|
||||
recorded_at TIMESTAMPTZ,
|
||||
imported_at TIMESTAMPTZ DEFAULT NOW()
|
||||
);
|
||||
|
||||
-- Index for efficient lookups
|
||||
CREATE INDEX IF NOT EXISTS idx_disp_legacy_slug ON dispensaries_from_legacy(slug, city, state);
|
||||
CREATE INDEX IF NOT EXISTS idx_prod_legacy_ext_id ON products_from_legacy(external_product_id);
|
||||
`;
|
||||
|
||||
async function createStagingTables(cannaiqPool: Pool, dryRun: boolean): Promise<void> {
|
||||
console.log('[ETL] Creating staging tables...');
|
||||
|
||||
if (dryRun) {
|
||||
console.log('[ETL] DRY RUN: Would create staging tables');
|
||||
return;
|
||||
}
|
||||
|
||||
const client = await cannaiqPool.connect();
|
||||
try {
|
||||
await client.query(STAGING_TABLES_SQL);
|
||||
console.log('[ETL] Staging tables created successfully');
|
||||
} finally {
|
||||
client.release();
|
||||
}
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// ETL FUNCTIONS
|
||||
// ============================================================
|
||||
|
||||
async function importDispensaries(
|
||||
legacyPool: Pool,
|
||||
cannaiqPool: Pool,
|
||||
dryRun: boolean
|
||||
): Promise<ETLStats> {
|
||||
const startTime = Date.now();
|
||||
const stats: ETLStats = {
|
||||
table: 'dispensaries',
|
||||
read: 0,
|
||||
inserted: 0,
|
||||
skipped: 0,
|
||||
errors: 0,
|
||||
durationMs: 0,
|
||||
};
|
||||
|
||||
console.log('[ETL] Importing dispensaries...');
|
||||
|
||||
const legacyClient = await legacyPool.connect();
|
||||
const cannaiqClient = await cannaiqPool.connect();
|
||||
|
||||
try {
|
||||
// Count total rows
|
||||
const countResult = await legacyClient.query('SELECT COUNT(*) FROM dispensaries');
|
||||
const totalRows = parseInt(countResult.rows[0].count);
|
||||
console.log(`[ETL] Found ${totalRows} dispensaries in legacy database`);
|
||||
|
||||
// Process in batches
|
||||
let offset = 0;
|
||||
while (offset < totalRows) {
|
||||
const batchResult = await legacyClient.query(`
|
||||
SELECT
|
||||
id, name, slug, city, state, zip, address,
|
||||
latitude, longitude, menu_url, website, dba_name,
|
||||
menu_provider, product_provider, provider_detection_data
|
||||
FROM dispensaries
|
||||
ORDER BY id
|
||||
LIMIT $1 OFFSET $2
|
||||
`, [BATCH_SIZE, offset]);
|
||||
|
||||
stats.read += batchResult.rows.length;
|
||||
|
||||
if (dryRun) {
|
||||
console.log(`[ETL] DRY RUN: Would insert batch of ${batchResult.rows.length} dispensaries`);
|
||||
stats.inserted += batchResult.rows.length;
|
||||
} else {
|
||||
for (const row of batchResult.rows) {
|
||||
try {
|
||||
const legacyMetadata = {
|
||||
dba_name: row.dba_name,
|
||||
menu_provider: row.menu_provider,
|
||||
product_provider: row.product_provider,
|
||||
provider_detection_data: row.provider_detection_data,
|
||||
};
|
||||
|
||||
const insertResult = await cannaiqClient.query(`
|
||||
INSERT INTO dispensaries_from_legacy
|
||||
(legacy_id, name, slug, city, state, postal_code, address,
|
||||
latitude, longitude, menu_url, website, legacy_metadata)
|
||||
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12)
|
||||
ON CONFLICT (legacy_id) DO NOTHING
|
||||
RETURNING id
|
||||
`, [
|
||||
row.id,
|
||||
row.name,
|
||||
row.slug,
|
||||
row.city,
|
||||
row.state,
|
||||
row.zip,
|
||||
row.address,
|
||||
row.latitude,
|
||||
row.longitude,
|
||||
row.menu_url,
|
||||
row.website,
|
||||
JSON.stringify(legacyMetadata),
|
||||
]);
|
||||
|
||||
if (insertResult.rowCount > 0) {
|
||||
stats.inserted++;
|
||||
} else {
|
||||
stats.skipped++;
|
||||
}
|
||||
} catch (err: any) {
|
||||
stats.errors++;
|
||||
console.error(`[ETL] Error inserting dispensary ${row.id}:`, err.message);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
offset += BATCH_SIZE;
|
||||
console.log(`[ETL] Processed ${Math.min(offset, totalRows)}/${totalRows} dispensaries`);
|
||||
}
|
||||
} finally {
|
||||
legacyClient.release();
|
||||
cannaiqClient.release();
|
||||
}
|
||||
|
||||
stats.durationMs = Date.now() - startTime;
|
||||
return stats;
|
||||
}
|
||||
|
||||
async function importProducts(
|
||||
legacyPool: Pool,
|
||||
cannaiqPool: Pool,
|
||||
dryRun: boolean
|
||||
): Promise<ETLStats> {
|
||||
const startTime = Date.now();
|
||||
const stats: ETLStats = {
|
||||
table: 'products',
|
||||
read: 0,
|
||||
inserted: 0,
|
||||
skipped: 0,
|
||||
errors: 0,
|
||||
durationMs: 0,
|
||||
};
|
||||
|
||||
console.log('[ETL] Importing legacy products...');
|
||||
|
||||
const legacyClient = await legacyPool.connect();
|
||||
const cannaiqClient = await cannaiqPool.connect();
|
||||
|
||||
try {
|
||||
const countResult = await legacyClient.query('SELECT COUNT(*) FROM products');
|
||||
const totalRows = parseInt(countResult.rows[0].count);
|
||||
console.log(`[ETL] Found ${totalRows} products in legacy database`);
|
||||
|
||||
let offset = 0;
|
||||
while (offset < totalRows) {
|
||||
const batchResult = await legacyClient.query(`
|
||||
SELECT
|
||||
id, dispensary_id, dutchie_product_id, name, brand,
|
||||
subcategory, strain_type, thc_percentage, cbd_percentage,
|
||||
price, original_price, in_stock, weight, image_url,
|
||||
first_seen_at, last_seen_at, raw_data
|
||||
FROM products
|
||||
ORDER BY id
|
||||
LIMIT $1 OFFSET $2
|
||||
`, [BATCH_SIZE, offset]);
|
||||
|
||||
stats.read += batchResult.rows.length;
|
||||
|
||||
if (dryRun) {
|
||||
console.log(`[ETL] DRY RUN: Would insert batch of ${batchResult.rows.length} products`);
|
||||
stats.inserted += batchResult.rows.length;
|
||||
} else {
|
||||
for (const row of batchResult.rows) {
|
||||
try {
|
||||
const stockStatus = row.in_stock === true ? 'in_stock' :
|
||||
row.in_stock === false ? 'out_of_stock' : 'unknown';
|
||||
const priceCents = row.price ? Math.round(parseFloat(row.price) * 100) : null;
|
||||
const originalPriceCents = row.original_price ? Math.round(parseFloat(row.original_price) * 100) : null;
|
||||
|
||||
const insertResult = await cannaiqClient.query(`
|
||||
INSERT INTO products_from_legacy
|
||||
(legacy_product_id, legacy_dispensary_id, external_product_id,
|
||||
name, brand_name, subcategory, strain_type, thc, cbd,
|
||||
price_cents, original_price_cents, stock_status, weight,
|
||||
primary_image_url, first_seen_at, last_seen_at, legacy_raw_payload)
|
||||
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)
|
||||
ON CONFLICT (legacy_product_id) DO NOTHING
|
||||
RETURNING id
|
||||
`, [
|
||||
row.id,
|
||||
row.dispensary_id,
|
||||
row.dutchie_product_id,
|
||||
row.name,
|
||||
row.brand,
|
||||
row.subcategory,
|
||||
row.strain_type,
|
||||
row.thc_percentage,
|
||||
row.cbd_percentage,
|
||||
priceCents,
|
||||
originalPriceCents,
|
||||
stockStatus,
|
||||
row.weight,
|
||||
row.image_url,
|
||||
row.first_seen_at,
|
||||
row.last_seen_at,
|
||||
row.raw_data ? JSON.stringify(row.raw_data) : null,
|
||||
]);
|
||||
|
||||
if (insertResult.rowCount > 0) {
|
||||
stats.inserted++;
|
||||
} else {
|
||||
stats.skipped++;
|
||||
}
|
||||
} catch (err: any) {
|
||||
stats.errors++;
|
||||
console.error(`[ETL] Error inserting product ${row.id}:`, err.message);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
offset += BATCH_SIZE;
|
||||
console.log(`[ETL] Processed ${Math.min(offset, totalRows)}/${totalRows} products`);
|
||||
}
|
||||
} finally {
|
||||
legacyClient.release();
|
||||
cannaiqClient.release();
|
||||
}
|
||||
|
||||
stats.durationMs = Date.now() - startTime;
|
||||
return stats;
|
||||
}
|
||||
|
||||
async function importDutchieProducts(
|
||||
legacyPool: Pool,
|
||||
cannaiqPool: Pool,
|
||||
dryRun: boolean
|
||||
): Promise<ETLStats> {
|
||||
const startTime = Date.now();
|
||||
const stats: ETLStats = {
|
||||
table: 'dutchie_products',
|
||||
read: 0,
|
||||
inserted: 0,
|
||||
skipped: 0,
|
||||
errors: 0,
|
||||
durationMs: 0,
|
||||
};
|
||||
|
||||
console.log('[ETL] Importing dutchie_products...');
|
||||
|
||||
const legacyClient = await legacyPool.connect();
|
||||
const cannaiqClient = await cannaiqPool.connect();
|
||||
|
||||
try {
|
||||
const countResult = await legacyClient.query('SELECT COUNT(*) FROM dutchie_products');
|
||||
const totalRows = parseInt(countResult.rows[0].count);
|
||||
console.log(`[ETL] Found ${totalRows} dutchie_products in legacy database`);
|
||||
|
||||
// Note: For dutchie_products, we need to map dispensary_id to the canonical dispensary
|
||||
// This requires the dispensaries to be imported first
|
||||
// For now, we'll insert directly since the schema is nearly identical
|
||||
|
||||
let offset = 0;
|
||||
while (offset < totalRows) {
|
||||
const batchResult = await legacyClient.query(`
|
||||
SELECT *
|
||||
FROM dutchie_products
|
||||
ORDER BY id
|
||||
LIMIT $1 OFFSET $2
|
||||
`, [BATCH_SIZE, offset]);
|
||||
|
||||
stats.read += batchResult.rows.length;
|
||||
|
||||
if (dryRun) {
|
||||
console.log(`[ETL] DRY RUN: Would insert batch of ${batchResult.rows.length} dutchie_products`);
|
||||
stats.inserted += batchResult.rows.length;
|
||||
} else {
|
||||
// For each row, attempt insert with ON CONFLICT DO NOTHING
|
||||
for (const row of batchResult.rows) {
|
||||
try {
|
||||
// Check if dispensary exists in canonical table
|
||||
const dispCheck = await cannaiqClient.query(`
|
||||
SELECT id FROM dispensaries WHERE id = $1
|
||||
`, [row.dispensary_id]);
|
||||
|
||||
if (dispCheck.rows.length === 0) {
|
||||
stats.skipped++;
|
||||
continue; // Skip products for dispensaries not yet imported
|
||||
}
|
||||
|
||||
const insertResult = await cannaiqClient.query(`
|
||||
INSERT INTO dutchie_products
|
||||
(dispensary_id, platform, external_product_id, platform_dispensary_id,
|
||||
c_name, name, brand_name, brand_id, brand_logo_url,
|
||||
type, subcategory, strain_type, provider,
|
||||
thc, thc_content, cbd, cbd_content, cannabinoids_v2, effects,
|
||||
status, medical_only, rec_only, featured, coming_soon,
|
||||
certificate_of_analysis_enabled,
|
||||
is_below_threshold, is_below_kiosk_threshold,
|
||||
options_below_threshold, options_below_kiosk_threshold,
|
||||
stock_status, total_quantity_available,
|
||||
primary_image_url, images, measurements, weight, past_c_names,
|
||||
created_at_dutchie, updated_at_dutchie, latest_raw_payload)
|
||||
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, $29, $30, $31, $32, $33, $34, $35, $36, $37, $38, $39)
|
||||
ON CONFLICT (dispensary_id, external_product_id) DO NOTHING
|
||||
RETURNING id
|
||||
`, [
|
||||
row.dispensary_id,
|
||||
row.platform || 'dutchie',
|
||||
row.external_product_id,
|
||||
row.platform_dispensary_id,
|
||||
row.c_name,
|
||||
row.name,
|
||||
row.brand_name,
|
||||
row.brand_id,
|
||||
row.brand_logo_url,
|
||||
row.type,
|
||||
row.subcategory,
|
||||
row.strain_type,
|
||||
row.provider,
|
||||
row.thc,
|
||||
row.thc_content,
|
||||
row.cbd,
|
||||
row.cbd_content,
|
||||
row.cannabinoids_v2,
|
||||
row.effects,
|
||||
row.status,
|
||||
row.medical_only,
|
||||
row.rec_only,
|
||||
row.featured,
|
||||
row.coming_soon,
|
||||
row.certificate_of_analysis_enabled,
|
||||
row.is_below_threshold,
|
||||
row.is_below_kiosk_threshold,
|
||||
row.options_below_threshold,
|
||||
row.options_below_kiosk_threshold,
|
||||
row.stock_status,
|
||||
row.total_quantity_available,
|
||||
row.primary_image_url,
|
||||
row.images,
|
||||
row.measurements,
|
||||
row.weight,
|
||||
row.past_c_names,
|
||||
row.created_at_dutchie,
|
||||
row.updated_at_dutchie,
|
||||
row.latest_raw_payload,
|
||||
]);
|
||||
|
||||
if (insertResult.rowCount > 0) {
|
||||
stats.inserted++;
|
||||
} else {
|
||||
stats.skipped++;
|
||||
}
|
||||
} catch (err: any) {
|
||||
stats.errors++;
|
||||
if (stats.errors <= 5) {
|
||||
console.error(`[ETL] Error inserting dutchie_product ${row.id}:`, err.message);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
offset += BATCH_SIZE;
|
||||
console.log(`[ETL] Processed ${Math.min(offset, totalRows)}/${totalRows} dutchie_products`);
|
||||
}
|
||||
} finally {
|
||||
legacyClient.release();
|
||||
cannaiqClient.release();
|
||||
}
|
||||
|
||||
stats.durationMs = Date.now() - startTime;
|
||||
return stats;
|
||||
}
|
||||
|
||||
async function importDutchieSnapshots(
|
||||
legacyPool: Pool,
|
||||
cannaiqPool: Pool,
|
||||
dryRun: boolean
|
||||
): Promise<ETLStats> {
|
||||
const startTime = Date.now();
|
||||
const stats: ETLStats = {
|
||||
table: 'dutchie_product_snapshots',
|
||||
read: 0,
|
||||
inserted: 0,
|
||||
skipped: 0,
|
||||
errors: 0,
|
||||
durationMs: 0,
|
||||
};
|
||||
|
||||
console.log('[ETL] Importing dutchie_product_snapshots...');
|
||||
|
||||
const legacyClient = await legacyPool.connect();
|
||||
const cannaiqClient = await cannaiqPool.connect();
|
||||
|
||||
try {
|
||||
const countResult = await legacyClient.query('SELECT COUNT(*) FROM dutchie_product_snapshots');
|
||||
const totalRows = parseInt(countResult.rows[0].count);
|
||||
console.log(`[ETL] Found ${totalRows} dutchie_product_snapshots in legacy database`);
|
||||
|
||||
// Build mapping of legacy product IDs to canonical product IDs
|
||||
console.log('[ETL] Building product ID mapping...');
|
||||
const productMapping = new Map<number, number>();
|
||||
const mappingResult = await cannaiqClient.query(`
|
||||
SELECT id, external_product_id, dispensary_id FROM dutchie_products
|
||||
`);
|
||||
// Create a key from dispensary_id + external_product_id
|
||||
const productByKey = new Map<string, number>();
|
||||
for (const row of mappingResult.rows) {
|
||||
const key = `${row.dispensary_id}:${row.external_product_id}`;
|
||||
productByKey.set(key, row.id);
|
||||
}
|
||||
|
||||
let offset = 0;
|
||||
while (offset < totalRows) {
|
||||
const batchResult = await legacyClient.query(`
|
||||
SELECT *
|
||||
FROM dutchie_product_snapshots
|
||||
ORDER BY id
|
||||
LIMIT $1 OFFSET $2
|
||||
`, [BATCH_SIZE, offset]);
|
||||
|
||||
stats.read += batchResult.rows.length;
|
||||
|
||||
if (dryRun) {
|
||||
console.log(`[ETL] DRY RUN: Would insert batch of ${batchResult.rows.length} snapshots`);
|
||||
stats.inserted += batchResult.rows.length;
|
||||
} else {
|
||||
for (const row of batchResult.rows) {
|
||||
try {
|
||||
// Map legacy product ID to canonical product ID
|
||||
const key = `${row.dispensary_id}:${row.external_product_id}`;
|
||||
const canonicalProductId = productByKey.get(key);
|
||||
|
||||
if (!canonicalProductId) {
|
||||
stats.skipped++;
|
||||
continue; // Skip snapshots for products not yet imported
|
||||
}
|
||||
|
||||
// Insert snapshot (no conflict handling - all snapshots are historical)
|
||||
await cannaiqClient.query(`
|
||||
INSERT INTO dutchie_product_snapshots
|
||||
(dutchie_product_id, dispensary_id, platform_dispensary_id,
|
||||
external_product_id, pricing_type, crawl_mode,
|
||||
status, featured, special, medical_only, rec_only,
|
||||
is_present_in_feed, stock_status,
|
||||
rec_min_price_cents, rec_max_price_cents, rec_min_special_price_cents,
|
||||
med_min_price_cents, med_max_price_cents, med_min_special_price_cents,
|
||||
wholesale_min_price_cents,
|
||||
total_quantity_available, total_kiosk_quantity_available,
|
||||
manual_inventory, is_below_threshold, is_below_kiosk_threshold,
|
||||
options, raw_payload, crawled_at)
|
||||
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28)
|
||||
`, [
|
||||
canonicalProductId,
|
||||
row.dispensary_id,
|
||||
row.platform_dispensary_id,
|
||||
row.external_product_id,
|
||||
row.pricing_type,
|
||||
row.crawl_mode,
|
||||
row.status,
|
||||
row.featured,
|
||||
row.special,
|
||||
row.medical_only,
|
||||
row.rec_only,
|
||||
row.is_present_in_feed,
|
||||
row.stock_status,
|
||||
row.rec_min_price_cents,
|
||||
row.rec_max_price_cents,
|
||||
row.rec_min_special_price_cents,
|
||||
row.med_min_price_cents,
|
||||
row.med_max_price_cents,
|
||||
row.med_min_special_price_cents,
|
||||
row.wholesale_min_price_cents,
|
||||
row.total_quantity_available,
|
||||
row.total_kiosk_quantity_available,
|
||||
row.manual_inventory,
|
||||
row.is_below_threshold,
|
||||
row.is_below_kiosk_threshold,
|
||||
row.options,
|
||||
row.raw_payload,
|
||||
row.crawled_at,
|
||||
]);
|
||||
|
||||
stats.inserted++;
|
||||
} catch (err: any) {
|
||||
stats.errors++;
|
||||
if (stats.errors <= 5) {
|
||||
console.error(`[ETL] Error inserting snapshot ${row.id}:`, err.message);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
offset += BATCH_SIZE;
|
||||
console.log(`[ETL] Processed ${Math.min(offset, totalRows)}/${totalRows} snapshots`);
|
||||
}
|
||||
} finally {
|
||||
legacyClient.release();
|
||||
cannaiqClient.release();
|
||||
}
|
||||
|
||||
stats.durationMs = Date.now() - startTime;
|
||||
return stats;
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
// MAIN
|
||||
// ============================================================
|
||||
|
||||
async function main(): Promise<void> {
|
||||
console.log('='.repeat(60));
|
||||
console.log('LEGACY DATA IMPORT ETL');
|
||||
console.log('='.repeat(60));
|
||||
|
||||
const config = parseArgs();
|
||||
|
||||
console.log(`Mode: ${config.dryRun ? 'DRY RUN' : 'LIVE'}`);
|
||||
console.log(`Tables: ${config.tables.join(', ')}`);
|
||||
console.log('');
|
||||
|
||||
// Create connection pools
|
||||
const legacyPool = createLegacyPool();
|
||||
const cannaiqPool = createCannaiqPool();
|
||||
|
||||
try {
|
||||
// Test connections
|
||||
console.log('[ETL] Testing database connections...');
|
||||
await legacyPool.query('SELECT 1');
|
||||
console.log('[ETL] Legacy database connected');
|
||||
await cannaiqPool.query('SELECT 1');
|
||||
console.log('[ETL] CannaiQ database connected');
|
||||
console.log('');
|
||||
|
||||
// Create staging tables
|
||||
await createStagingTables(cannaiqPool, config.dryRun);
|
||||
console.log('');
|
||||
|
||||
// Run imports
|
||||
const allStats: ETLStats[] = [];
|
||||
|
||||
if (config.tables.includes('dispensaries')) {
|
||||
const stats = await importDispensaries(legacyPool, cannaiqPool, config.dryRun);
|
||||
allStats.push(stats);
|
||||
console.log('');
|
||||
}
|
||||
|
||||
if (config.tables.includes('products')) {
|
||||
const stats = await importProducts(legacyPool, cannaiqPool, config.dryRun);
|
||||
allStats.push(stats);
|
||||
console.log('');
|
||||
}
|
||||
|
||||
if (config.tables.includes('dutchie_products')) {
|
||||
const stats = await importDutchieProducts(legacyPool, cannaiqPool, config.dryRun);
|
||||
allStats.push(stats);
|
||||
console.log('');
|
||||
}
|
||||
|
||||
if (config.tables.includes('dutchie_product_snapshots')) {
|
||||
const stats = await importDutchieSnapshots(legacyPool, cannaiqPool, config.dryRun);
|
||||
allStats.push(stats);
|
||||
console.log('');
|
||||
}
|
||||
|
||||
// Print summary
|
||||
console.log('='.repeat(60));
|
||||
console.log('IMPORT SUMMARY');
|
||||
console.log('='.repeat(60));
|
||||
console.log('');
|
||||
console.log('| Table | Read | Inserted | Skipped | Errors | Duration |');
|
||||
console.log('|----------------------------|----------|----------|----------|----------|----------|');
|
||||
for (const s of allStats) {
|
||||
console.log(`| ${s.table.padEnd(26)} | ${String(s.read).padStart(8)} | ${String(s.inserted).padStart(8)} | ${String(s.skipped).padStart(8)} | ${String(s.errors).padStart(8)} | ${(s.durationMs / 1000).toFixed(1).padStart(7)}s |`);
|
||||
}
|
||||
console.log('');
|
||||
|
||||
const totalInserted = allStats.reduce((sum, s) => sum + s.inserted, 0);
|
||||
const totalErrors = allStats.reduce((sum, s) => sum + s.errors, 0);
|
||||
console.log(`Total inserted: ${totalInserted}`);
|
||||
console.log(`Total errors: ${totalErrors}`);
|
||||
|
||||
if (config.dryRun) {
|
||||
console.log('');
|
||||
console.log('DRY RUN COMPLETE - No data was written');
|
||||
console.log('Run without --dry-run to perform actual import');
|
||||
}
|
||||
|
||||
} catch (error: any) {
|
||||
console.error('[ETL] Fatal error:', error.message);
|
||||
process.exit(1);
|
||||
} finally {
|
||||
await legacyPool.end();
|
||||
await cannaiqPool.end();
|
||||
}
|
||||
|
||||
console.log('');
|
||||
console.log('ETL complete');
|
||||
}
|
||||
|
||||
main().catch((err) => {
|
||||
console.error('Unhandled error:', err);
|
||||
process.exit(1);
|
||||
});
|
||||
397
backend/src/scripts/harmonize-az-dispensaries.ts
Normal file
397
backend/src/scripts/harmonize-az-dispensaries.ts
Normal file
@@ -0,0 +1,397 @@
|
||||
/**
|
||||
* Harmonize AZ Dispensaries with Dutchie Source of Truth
|
||||
*
|
||||
* This script:
|
||||
* 1. Queries Dutchie ConsumerDispensaries API for all AZ cities
|
||||
* 2. Matches our dispensaries by platform_dispensary_id
|
||||
* 3. Updates existing records with full Dutchie data
|
||||
* 4. Creates new records for dispensaries in Dutchie but not in our DB
|
||||
* 5. Disables dispensaries not found in Dutchie
|
||||
*
|
||||
* Usage:
|
||||
* npx tsx src/scripts/harmonize-az-dispensaries.ts
|
||||
* npx tsx src/scripts/harmonize-az-dispensaries.ts --dry-run
|
||||
* npx tsx src/scripts/harmonize-az-dispensaries.ts --state CA
|
||||
*/
|
||||
|
||||
import { Pool } from 'pg';
|
||||
import { executeGraphQL, GRAPHQL_HASHES } from '../platforms/dutchie/client';
|
||||
|
||||
const pool = new Pool({
|
||||
host: process.env.CANNAIQ_DB_HOST || 'localhost',
|
||||
port: parseInt(process.env.CANNAIQ_DB_PORT || '54320'),
|
||||
database: process.env.CANNAIQ_DB_NAME || 'dutchie_menus',
|
||||
user: process.env.CANNAIQ_DB_USER || 'dutchie',
|
||||
password: process.env.CANNAIQ_DB_PASS || 'dutchie_local_pass',
|
||||
});
|
||||
|
||||
interface Dispensary {
|
||||
id: number;
|
||||
name: string;
|
||||
slug: string;
|
||||
city: string;
|
||||
state: string;
|
||||
platform_dispensary_id: string | null;
|
||||
dutchie_verified: boolean;
|
||||
crawl_enabled: boolean;
|
||||
}
|
||||
|
||||
interface DutchieDispensary {
|
||||
id: string; // Platform ID like "deHiuKKmBHGJKXzuj"
|
||||
cName: string; // Slug like "the-downtown-dispensary"
|
||||
name: string;
|
||||
phone: string | null;
|
||||
address: string;
|
||||
description: string | null;
|
||||
status: string;
|
||||
chain: string | null;
|
||||
timezone: string;
|
||||
location: {
|
||||
ln1: string;
|
||||
ln2: string;
|
||||
city: string;
|
||||
state: string;
|
||||
country: string;
|
||||
zipcode: string;
|
||||
geometry: {
|
||||
coordinates: [number, number];
|
||||
};
|
||||
};
|
||||
deliveryHours: any;
|
||||
pickupHours: any;
|
||||
offerDelivery: boolean;
|
||||
offerPickup: boolean;
|
||||
offerCurbsidePickup: boolean;
|
||||
isMedical: boolean;
|
||||
isRecreational: boolean;
|
||||
}
|
||||
|
||||
interface HarmonizationResult {
|
||||
updated: number;
|
||||
created: number;
|
||||
disabled: number;
|
||||
skipped: number;
|
||||
errors: string[];
|
||||
}
|
||||
|
||||
// Cities to query for AZ (from statesWithDispensaries)
|
||||
const AZ_CITIES = [
|
||||
'Apache Junction', 'Bisbee', 'Bullhead City', 'Casa Grande', 'Chandler',
|
||||
'Cottonwood', 'El Mirage', 'Flagstaff', 'Florence', 'Gilbert', 'Glendale',
|
||||
'Globe', 'Goodyear', 'Kingman', 'Lake Havasu City', 'Maricopa', 'Mesa',
|
||||
'Peoria', 'Phoenix', 'Prescott', 'Prescott Valley', 'Queen Creek',
|
||||
'Scottsdale', 'Show Low', 'Sierra Vista', 'Snowflake', 'Sun City',
|
||||
'Surprise', 'Tempe', 'Tolleson', 'Tucson', 'Yuma'
|
||||
];
|
||||
|
||||
async function getDispensaries(state: string): Promise<Dispensary[]> {
|
||||
const result = await pool.query<Dispensary>(
|
||||
`SELECT id, name, slug, city, state, platform_dispensary_id,
|
||||
COALESCE(dutchie_verified, false) as dutchie_verified,
|
||||
COALESCE(crawl_enabled, true) as crawl_enabled
|
||||
FROM dispensaries
|
||||
WHERE state = $1
|
||||
ORDER BY id`,
|
||||
[state]
|
||||
);
|
||||
return result.rows;
|
||||
}
|
||||
|
||||
async function fetchDutchieDispensariesByCity(
|
||||
city: string,
|
||||
state: string
|
||||
): Promise<DutchieDispensary[]> {
|
||||
const allDispensaries: DutchieDispensary[] = [];
|
||||
let page = 0;
|
||||
const perPage = 100;
|
||||
|
||||
while (true) {
|
||||
const variables = {
|
||||
dispensaryFilter: {
|
||||
activeOnly: true,
|
||||
city,
|
||||
state,
|
||||
},
|
||||
page,
|
||||
perPage,
|
||||
};
|
||||
|
||||
const result = await executeGraphQL(
|
||||
'ConsumerDispensaries',
|
||||
variables,
|
||||
GRAPHQL_HASHES.ConsumerDispensaries,
|
||||
{ cName: `${city.toLowerCase().replace(/\s+/g, '-')}-${state.toLowerCase()}`, maxRetries: 2, retryOn403: true }
|
||||
);
|
||||
|
||||
const dispensaries = result?.data?.filteredDispensaries || [];
|
||||
allDispensaries.push(...dispensaries);
|
||||
|
||||
if (dispensaries.length < perPage) break;
|
||||
page++;
|
||||
|
||||
// Rate limit
|
||||
await new Promise(resolve => setTimeout(resolve, 200));
|
||||
}
|
||||
|
||||
return allDispensaries;
|
||||
}
|
||||
|
||||
async function fetchAllDutchieDispensaries(state: string): Promise<Map<string, DutchieDispensary>> {
|
||||
const cities = state === 'AZ' ? AZ_CITIES : [];
|
||||
const dispensaryMap = new Map<string, DutchieDispensary>();
|
||||
|
||||
console.log(`Fetching dispensaries from ${cities.length} cities...`);
|
||||
|
||||
for (const city of cities) {
|
||||
const dispensaries = await fetchDutchieDispensariesByCity(city, state);
|
||||
console.log(` ${city}: ${dispensaries.length} dispensaries`);
|
||||
|
||||
for (const d of dispensaries) {
|
||||
// Index by platform ID
|
||||
if (d.id && !dispensaryMap.has(d.id)) {
|
||||
dispensaryMap.set(d.id, d);
|
||||
}
|
||||
}
|
||||
|
||||
// Rate limit between cities
|
||||
await new Promise(resolve => setTimeout(resolve, 300));
|
||||
}
|
||||
|
||||
console.log(`Total unique dispensaries from Dutchie: ${dispensaryMap.size}\n`);
|
||||
return dispensaryMap;
|
||||
}
|
||||
|
||||
async function updateDispensary(
|
||||
dispensaryId: number,
|
||||
dutchie: DutchieDispensary,
|
||||
dryRun: boolean
|
||||
): Promise<void> {
|
||||
if (dryRun) return;
|
||||
|
||||
const menuUrl = `https://dutchie.com/dispensary/${dutchie.cName}`;
|
||||
|
||||
await pool.query(
|
||||
`UPDATE dispensaries
|
||||
SET name = $2,
|
||||
slug = $3,
|
||||
address = $4,
|
||||
city = $5,
|
||||
postal_code = $6,
|
||||
phone = $7,
|
||||
latitude = $8,
|
||||
longitude = $9,
|
||||
menu_url = $10,
|
||||
menu_type = 'dutchie',
|
||||
platform = 'dutchie',
|
||||
is_delivery = $11,
|
||||
is_pickup = $12,
|
||||
dutchie_verified = true,
|
||||
dutchie_verified_at = NOW(),
|
||||
crawl_enabled = true,
|
||||
updated_at = NOW()
|
||||
WHERE id = $1`,
|
||||
[
|
||||
dispensaryId,
|
||||
dutchie.name.trim(),
|
||||
dutchie.cName,
|
||||
dutchie.location?.ln1 || dutchie.address,
|
||||
dutchie.location?.city || '',
|
||||
dutchie.location?.zipcode || '',
|
||||
dutchie.phone,
|
||||
dutchie.location?.geometry?.coordinates?.[1] || null,
|
||||
dutchie.location?.geometry?.coordinates?.[0] || null,
|
||||
menuUrl,
|
||||
dutchie.offerDelivery ?? false,
|
||||
dutchie.offerPickup ?? true,
|
||||
]
|
||||
);
|
||||
}
|
||||
|
||||
async function createDispensary(
|
||||
dutchie: DutchieDispensary,
|
||||
state: string,
|
||||
dryRun: boolean
|
||||
): Promise<number | null> {
|
||||
if (dryRun) return null;
|
||||
|
||||
const menuUrl = `https://dutchie.com/dispensary/${dutchie.cName}`;
|
||||
|
||||
const result = await pool.query<{ id: number }>(
|
||||
`INSERT INTO dispensaries (
|
||||
name, slug, city, state, platform, platform_dispensary_id,
|
||||
menu_url, menu_type, address, postal_code, latitude, longitude,
|
||||
is_delivery, is_pickup, phone,
|
||||
dutchie_verified, dutchie_verified_at,
|
||||
crawl_enabled,
|
||||
created_at, updated_at
|
||||
) VALUES (
|
||||
$1, $2, $3, $4, 'dutchie', $5,
|
||||
$6, 'dutchie', $7, $8, $9, $10,
|
||||
$11, $12, $13,
|
||||
true, NOW(),
|
||||
true,
|
||||
NOW(), NOW()
|
||||
)
|
||||
ON CONFLICT (slug) DO UPDATE SET
|
||||
platform_dispensary_id = EXCLUDED.platform_dispensary_id,
|
||||
name = EXCLUDED.name,
|
||||
menu_url = EXCLUDED.menu_url,
|
||||
address = EXCLUDED.address,
|
||||
postal_code = EXCLUDED.postal_code,
|
||||
latitude = EXCLUDED.latitude,
|
||||
longitude = EXCLUDED.longitude,
|
||||
is_delivery = EXCLUDED.is_delivery,
|
||||
is_pickup = EXCLUDED.is_pickup,
|
||||
phone = EXCLUDED.phone,
|
||||
dutchie_verified = true,
|
||||
dutchie_verified_at = NOW(),
|
||||
crawl_enabled = true,
|
||||
updated_at = NOW()
|
||||
RETURNING id`,
|
||||
[
|
||||
dutchie.name.trim(),
|
||||
dutchie.cName,
|
||||
dutchie.location?.city || '',
|
||||
state,
|
||||
dutchie.id,
|
||||
menuUrl,
|
||||
dutchie.location?.ln1 || dutchie.address,
|
||||
dutchie.location?.zipcode || '',
|
||||
dutchie.location?.geometry?.coordinates?.[1] || null,
|
||||
dutchie.location?.geometry?.coordinates?.[0] || null,
|
||||
dutchie.offerDelivery ?? false,
|
||||
dutchie.offerPickup ?? true,
|
||||
dutchie.phone,
|
||||
]
|
||||
);
|
||||
|
||||
return result.rows[0]?.id || null;
|
||||
}
|
||||
|
||||
async function disableDispensary(dispensaryId: number, reason: string, dryRun: boolean): Promise<void> {
|
||||
if (dryRun) return;
|
||||
|
||||
await pool.query(
|
||||
`UPDATE dispensaries
|
||||
SET crawl_enabled = false,
|
||||
failure_notes = $2,
|
||||
updated_at = NOW()
|
||||
WHERE id = $1`,
|
||||
[dispensaryId, reason]
|
||||
);
|
||||
}
|
||||
|
||||
async function harmonizeDispensaries(
|
||||
state: string,
|
||||
dryRun: boolean = false
|
||||
): Promise<HarmonizationResult> {
|
||||
console.log(`\n${'='.repeat(60)}`);
|
||||
console.log(`HARMONIZING ${state} DISPENSARIES${dryRun ? ' (DRY RUN)' : ''}`);
|
||||
console.log(`${'='.repeat(60)}\n`);
|
||||
|
||||
const result: HarmonizationResult = {
|
||||
updated: 0,
|
||||
created: 0,
|
||||
disabled: 0,
|
||||
skipped: 0,
|
||||
errors: [],
|
||||
};
|
||||
|
||||
// Fetch all dispensaries from Dutchie (source of truth)
|
||||
const dutchieMap = await fetchAllDutchieDispensaries(state);
|
||||
|
||||
// Get our current dispensaries
|
||||
const dispensaries = await getDispensaries(state);
|
||||
console.log(`Found ${dispensaries.length} dispensaries in our DB\n`);
|
||||
|
||||
// Track which Dutchie dispensaries we've matched
|
||||
const matchedDutchieIds = new Set<string>();
|
||||
|
||||
// Step 1: Match our dispensaries to Dutchie by platform_dispensary_id
|
||||
console.log('[Step 1/3] Matching existing dispensaries to Dutchie...');
|
||||
for (const disp of dispensaries) {
|
||||
if (disp.platform_dispensary_id && dutchieMap.has(disp.platform_dispensary_id)) {
|
||||
// Found match - update with Dutchie data
|
||||
const dutchie = dutchieMap.get(disp.platform_dispensary_id)!;
|
||||
|
||||
try {
|
||||
await updateDispensary(disp.id, dutchie, dryRun);
|
||||
console.log(` [UPDATED] ${disp.name} -> ${dutchie.name} (${dutchie.cName})`);
|
||||
result.updated++;
|
||||
matchedDutchieIds.add(disp.platform_dispensary_id);
|
||||
} catch (error: any) {
|
||||
console.error(` [ERROR] ${disp.name}: ${error.message}`);
|
||||
result.errors.push(`Update ${disp.name}: ${error.message}`);
|
||||
}
|
||||
} else if (disp.platform_dispensary_id) {
|
||||
// Has platform ID but not found in Dutchie - maybe closed?
|
||||
console.log(` [NOT FOUND] ${disp.name} (${disp.platform_dispensary_id}) - not in Dutchie`);
|
||||
await disableDispensary(disp.id, 'Platform ID not found in Dutchie - may be closed', dryRun);
|
||||
result.disabled++;
|
||||
} else {
|
||||
// No platform ID - disable
|
||||
console.log(` [NO ID] ${disp.name} - no platform_dispensary_id`);
|
||||
await disableDispensary(disp.id, 'No platform_dispensary_id', dryRun);
|
||||
result.disabled++;
|
||||
}
|
||||
}
|
||||
|
||||
// Step 2: Create new dispensaries for Dutchie records we don't have
|
||||
console.log(`\n[Step 2/3] Creating new dispensaries from Dutchie...`);
|
||||
for (const [platformId, dutchie] of dutchieMap) {
|
||||
if (matchedDutchieIds.has(platformId)) {
|
||||
continue; // Already matched
|
||||
}
|
||||
|
||||
try {
|
||||
const newId = await createDispensary(dutchie, state, dryRun);
|
||||
console.log(` [CREATED] ${dutchie.name} (${dutchie.cName}) -> ID ${newId || '(dry-run)'}`);
|
||||
result.created++;
|
||||
} catch (error: any) {
|
||||
console.error(` [ERROR] ${dutchie.name}: ${error.message}`);
|
||||
result.errors.push(`Create ${dutchie.name}: ${error.message}`);
|
||||
}
|
||||
}
|
||||
|
||||
// Summary
|
||||
console.log(`\n${'='.repeat(60)}`);
|
||||
console.log('HARMONIZATION SUMMARY');
|
||||
console.log(`${'='.repeat(60)}`);
|
||||
console.log(` Updated (matched to Dutchie): ${result.updated}`);
|
||||
console.log(` Created (new from Dutchie): ${result.created}`);
|
||||
console.log(` Disabled (not in Dutchie): ${result.disabled}`);
|
||||
console.log(` Errors: ${result.errors.length}`);
|
||||
|
||||
if (result.errors.length > 0) {
|
||||
console.log(`\nErrors:`);
|
||||
result.errors.slice(0, 20).forEach(e => console.log(` - ${e}`));
|
||||
if (result.errors.length > 20) {
|
||||
console.log(` ... and ${result.errors.length - 20} more`);
|
||||
}
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
async function main() {
|
||||
const args = process.argv.slice(2);
|
||||
let state = 'AZ';
|
||||
let dryRun = false;
|
||||
|
||||
for (let i = 0; i < args.length; i++) {
|
||||
if (args[i] === '--state' && args[i + 1]) {
|
||||
state = args[i + 1].toUpperCase();
|
||||
i++;
|
||||
} else if (args[i] === '--dry-run') {
|
||||
dryRun = true;
|
||||
}
|
||||
}
|
||||
|
||||
try {
|
||||
await harmonizeDispensaries(state, dryRun);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
main().catch(console.error);
|
||||
@@ -1,583 +0,0 @@
|
||||
#!/usr/bin/env npx tsx
|
||||
/**
|
||||
* Queue Intelligence Script
|
||||
*
|
||||
* Orchestrates the multi-category intelligence crawler system:
|
||||
* 1. Queue dispensaries that need provider detection (all 4 categories)
|
||||
* 2. Queue per-category production crawls (Dutchie products only for now)
|
||||
* 3. Queue per-category sandbox crawls (all providers)
|
||||
*
|
||||
* Each category (product, specials, brand, metadata) is handled independently.
|
||||
* A failure in one category does NOT affect other categories.
|
||||
*
|
||||
* Usage:
|
||||
* npx tsx src/scripts/queue-intelligence.ts [--detection] [--production] [--sandbox] [--all]
|
||||
* npx tsx src/scripts/queue-intelligence.ts --category=product --sandbox
|
||||
* npx tsx src/scripts/queue-intelligence.ts --process --category=product
|
||||
* npx tsx src/scripts/queue-intelligence.ts --dry-run
|
||||
*/
|
||||
|
||||
import { pool } from '../db/pool';
|
||||
import { logger } from '../services/logger';
|
||||
import {
|
||||
detectMultiCategoryProviders,
|
||||
updateAllCategoryProviders,
|
||||
IntelligenceCategory,
|
||||
} from '../services/intelligence-detector';
|
||||
import {
|
||||
runCrawlProductsJob,
|
||||
runCrawlSpecialsJob,
|
||||
runCrawlBrandIntelligenceJob,
|
||||
runCrawlMetadataJob,
|
||||
runSandboxProductsJob,
|
||||
runSandboxSpecialsJob,
|
||||
runSandboxBrandJob,
|
||||
runSandboxMetadataJob,
|
||||
runAllCategoryProductionCrawls,
|
||||
runAllCategorySandboxCrawls,
|
||||
processCategorySandboxJobs,
|
||||
} from '../services/category-crawler-jobs';
|
||||
|
||||
// Parse command line args
|
||||
const args = process.argv.slice(2);
|
||||
const flags = {
|
||||
detection: args.includes('--detection') || args.includes('--all'),
|
||||
production: args.includes('--production') || args.includes('--all'),
|
||||
sandbox: args.includes('--sandbox') || args.includes('--all'),
|
||||
dryRun: args.includes('--dry-run'),
|
||||
process: args.includes('--process'),
|
||||
help: args.includes('--help') || args.includes('-h'),
|
||||
limit: parseInt(args.find(a => a.startsWith('--limit='))?.split('=')[1] || '10'),
|
||||
category: args.find(a => a.startsWith('--category='))?.split('=')[1] as IntelligenceCategory | undefined,
|
||||
dispensary: parseInt(args.find(a => a.startsWith('--dispensary='))?.split('=')[1] || '0'),
|
||||
};
|
||||
|
||||
// If no specific flags, default to all
|
||||
if (!flags.detection && !flags.production && !flags.sandbox && !flags.process) {
|
||||
flags.detection = true;
|
||||
flags.production = true;
|
||||
flags.sandbox = true;
|
||||
}
|
||||
|
||||
const CATEGORIES: IntelligenceCategory[] = ['product', 'specials', 'brand', 'metadata'];
|
||||
|
||||
async function showHelp() {
|
||||
console.log(`
|
||||
Queue Intelligence - Multi-Category Crawler Orchestration
|
||||
|
||||
USAGE:
|
||||
npx tsx src/scripts/queue-intelligence.ts [OPTIONS]
|
||||
|
||||
OPTIONS:
|
||||
--detection Queue dispensaries that need multi-category detection
|
||||
--production Queue per-category production crawls
|
||||
--sandbox Queue per-category sandbox crawls
|
||||
--all Queue all job types (default if no specific flag)
|
||||
--process Process queued jobs instead of just queuing
|
||||
--category=CATEGORY Filter to specific category (product|specials|brand|metadata)
|
||||
--dispensary=ID Process only a specific dispensary
|
||||
--dry-run Show what would be queued without making changes
|
||||
--limit=N Maximum dispensaries to queue per type (default: 10)
|
||||
--help, -h Show this help message
|
||||
|
||||
CATEGORIES:
|
||||
product - Product/menu data (Dutchie=production, others=sandbox)
|
||||
specials - Deals and specials (all sandbox for now)
|
||||
brand - Brand intelligence (all sandbox for now)
|
||||
metadata - Categories/taxonomy (all sandbox for now)
|
||||
|
||||
EXAMPLES:
|
||||
# Queue all dispensaries for appropriate jobs
|
||||
npx tsx src/scripts/queue-intelligence.ts
|
||||
|
||||
# Only queue product detection jobs
|
||||
npx tsx src/scripts/queue-intelligence.ts --detection --category=product
|
||||
|
||||
# Process sandbox jobs for specials category
|
||||
npx tsx src/scripts/queue-intelligence.ts --process --category=specials --limit=5
|
||||
|
||||
# Run full detection for a specific dispensary
|
||||
npx tsx src/scripts/queue-intelligence.ts --process --detection --dispensary=123
|
||||
|
||||
# Dry run to see what would be queued
|
||||
npx tsx src/scripts/queue-intelligence.ts --dry-run
|
||||
`);
|
||||
}
|
||||
|
||||
async function queueMultiCategoryDetection(): Promise<number> {
|
||||
console.log('\n📡 Queueing Multi-Category Detection Jobs...');
|
||||
|
||||
// Find dispensaries that need provider detection for any category:
|
||||
// - Any *_provider is null OR
|
||||
// - Any *_confidence < 70
|
||||
// - has a website URL
|
||||
const query = `
|
||||
SELECT id, name, website, menu_url,
|
||||
product_provider, product_confidence, product_crawler_mode,
|
||||
specials_provider, specials_confidence, specials_crawler_mode,
|
||||
brand_provider, brand_confidence, brand_crawler_mode,
|
||||
metadata_provider, metadata_confidence, metadata_crawler_mode
|
||||
FROM dispensaries
|
||||
WHERE (website IS NOT NULL OR menu_url IS NOT NULL)
|
||||
AND (
|
||||
product_provider IS NULL OR product_confidence < 70 OR
|
||||
specials_provider IS NULL OR specials_confidence < 70 OR
|
||||
brand_provider IS NULL OR brand_confidence < 70 OR
|
||||
metadata_provider IS NULL OR metadata_confidence < 70
|
||||
)
|
||||
ORDER BY
|
||||
CASE WHEN product_provider IS NULL THEN 0 ELSE 1 END,
|
||||
product_confidence ASC
|
||||
LIMIT $1
|
||||
`;
|
||||
|
||||
const result = await pool.query(query, [flags.limit]);
|
||||
|
||||
if (flags.dryRun) {
|
||||
console.log(` Would queue ${result.rows.length} dispensaries for multi-category detection:`);
|
||||
for (const row of result.rows) {
|
||||
const needsDetection: string[] = [];
|
||||
if (!row.product_provider || row.product_confidence < 70) needsDetection.push('product');
|
||||
if (!row.specials_provider || row.specials_confidence < 70) needsDetection.push('specials');
|
||||
if (!row.brand_provider || row.brand_confidence < 70) needsDetection.push('brand');
|
||||
if (!row.metadata_provider || row.metadata_confidence < 70) needsDetection.push('metadata');
|
||||
console.log(` - [${row.id}] ${row.name} (needs: ${needsDetection.join(', ')})`);
|
||||
}
|
||||
return result.rows.length;
|
||||
}
|
||||
|
||||
let queued = 0;
|
||||
for (const dispensary of result.rows) {
|
||||
try {
|
||||
// Create detection jobs for each category that needs it
|
||||
for (const category of CATEGORIES) {
|
||||
const provider = dispensary[`${category}_provider`];
|
||||
const confidence = dispensary[`${category}_confidence`];
|
||||
|
||||
if (!provider || confidence < 70) {
|
||||
await pool.query(
|
||||
`INSERT INTO sandbox_crawl_jobs (dispensary_id, category, job_type, status, priority)
|
||||
VALUES ($1, $2, 'detection', 'pending', 10)
|
||||
ON CONFLICT DO NOTHING`,
|
||||
[dispensary.id, category]
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
console.log(` ✓ Queued detection: [${dispensary.id}] ${dispensary.name}`);
|
||||
queued++;
|
||||
} catch (error: any) {
|
||||
console.error(` ✗ Failed to queue [${dispensary.id}]: ${error.message}`);
|
||||
}
|
||||
}
|
||||
|
||||
return queued;
|
||||
}
|
||||
|
||||
async function queueCategoryProductionCrawls(category?: IntelligenceCategory): Promise<number> {
|
||||
const categories = category ? [category] : CATEGORIES;
|
||||
let totalQueued = 0;
|
||||
|
||||
for (const cat of categories) {
|
||||
console.log(`\n🏭 Queueing Production ${cat.toUpperCase()} Crawls...`);
|
||||
|
||||
// For now, only products have production-ready crawlers (Dutchie only)
|
||||
if (cat !== 'product') {
|
||||
console.log(` ⏭️ No production crawler for ${cat} yet - skipping`);
|
||||
continue;
|
||||
}
|
||||
|
||||
// Find dispensaries ready for production crawl
|
||||
const query = `
|
||||
SELECT id, name, ${cat}_provider as provider, last_${cat}_scan_at as last_scan
|
||||
FROM dispensaries
|
||||
WHERE ${cat}_provider = 'dutchie'
|
||||
AND ${cat}_crawler_mode = 'production'
|
||||
AND ${cat}_confidence >= 70
|
||||
AND (last_${cat}_scan_at IS NULL OR last_${cat}_scan_at < NOW() - INTERVAL '4 hours')
|
||||
ORDER BY
|
||||
CASE WHEN last_${cat}_scan_at IS NULL THEN 0 ELSE 1 END,
|
||||
last_${cat}_scan_at ASC
|
||||
LIMIT $1
|
||||
`;
|
||||
|
||||
const result = await pool.query(query, [flags.limit]);
|
||||
|
||||
if (flags.dryRun) {
|
||||
console.log(` Would queue ${result.rows.length} dispensaries for ${cat} production crawl:`);
|
||||
for (const row of result.rows) {
|
||||
const lastScan = row.last_scan ? new Date(row.last_scan).toISOString() : 'never';
|
||||
console.log(` - [${row.id}] ${row.name} (provider: ${row.provider}, last: ${lastScan})`);
|
||||
}
|
||||
totalQueued += result.rows.length;
|
||||
continue;
|
||||
}
|
||||
|
||||
for (const dispensary of result.rows) {
|
||||
try {
|
||||
// For products, use the existing crawl_jobs table for production
|
||||
await pool.query(
|
||||
`INSERT INTO crawl_jobs (store_id, job_type, trigger_type, status, priority, metadata)
|
||||
SELECT s.id, 'full_crawl', 'scheduled', 'pending', 50,
|
||||
jsonb_build_object('dispensary_id', $1, 'category', $2, 'source', 'queue-intelligence')
|
||||
FROM stores s
|
||||
JOIN dispensaries d ON (d.menu_url = s.dutchie_url OR d.name ILIKE '%' || s.name || '%')
|
||||
WHERE d.id = $1
|
||||
LIMIT 1`,
|
||||
[dispensary.id, cat]
|
||||
);
|
||||
|
||||
console.log(` ✓ Queued ${cat} production: [${dispensary.id}] ${dispensary.name}`);
|
||||
totalQueued++;
|
||||
} catch (error: any) {
|
||||
console.error(` ✗ Failed to queue [${dispensary.id}]: ${error.message}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return totalQueued;
|
||||
}
|
||||
|
||||
async function queueCategorySandboxCrawls(category?: IntelligenceCategory): Promise<number> {
|
||||
const categories = category ? [category] : CATEGORIES;
|
||||
let totalQueued = 0;
|
||||
|
||||
for (const cat of categories) {
|
||||
console.log(`\n🧪 Queueing Sandbox ${cat.toUpperCase()} Crawls...`);
|
||||
|
||||
// Find dispensaries in sandbox mode for this category
|
||||
const query = `
|
||||
SELECT d.id, d.name, d.${cat}_provider as provider, d.${cat}_confidence as confidence,
|
||||
d.website, d.menu_url
|
||||
FROM dispensaries d
|
||||
WHERE d.${cat}_crawler_mode = 'sandbox'
|
||||
AND d.${cat}_provider IS NOT NULL
|
||||
AND (d.website IS NOT NULL OR d.menu_url IS NOT NULL)
|
||||
AND NOT EXISTS (
|
||||
SELECT 1 FROM sandbox_crawl_jobs sj
|
||||
WHERE sj.dispensary_id = d.id
|
||||
AND sj.category = $1
|
||||
AND sj.status IN ('pending', 'running')
|
||||
)
|
||||
ORDER BY d.${cat}_confidence DESC, d.updated_at ASC
|
||||
LIMIT $2
|
||||
`;
|
||||
|
||||
const result = await pool.query(query, [cat, flags.limit]);
|
||||
|
||||
if (flags.dryRun) {
|
||||
console.log(` Would queue ${result.rows.length} dispensaries for ${cat} sandbox crawl:`);
|
||||
for (const row of result.rows) {
|
||||
console.log(` - [${row.id}] ${row.name} (provider: ${row.provider}, confidence: ${row.confidence}%)`);
|
||||
}
|
||||
totalQueued += result.rows.length;
|
||||
continue;
|
||||
}
|
||||
|
||||
for (const dispensary of result.rows) {
|
||||
try {
|
||||
// Create sandbox entry if needed
|
||||
const sandboxResult = await pool.query(
|
||||
`INSERT INTO crawler_sandboxes (dispensary_id, category, suspected_menu_provider, mode, status)
|
||||
VALUES ($1, $2, $3, 'template_learning', 'pending')
|
||||
ON CONFLICT (dispensary_id, category) WHERE status NOT IN ('moved_to_production', 'failed')
|
||||
DO UPDATE SET updated_at = NOW()
|
||||
RETURNING id`,
|
||||
[dispensary.id, cat, dispensary.provider]
|
||||
);
|
||||
|
||||
const sandboxId = sandboxResult.rows[0]?.id;
|
||||
|
||||
// Create sandbox job
|
||||
await pool.query(
|
||||
`INSERT INTO sandbox_crawl_jobs (dispensary_id, sandbox_id, category, job_type, status, priority)
|
||||
VALUES ($1, $2, $3, 'crawl', 'pending', 5)`,
|
||||
[dispensary.id, sandboxId, cat]
|
||||
);
|
||||
|
||||
console.log(` ✓ Queued ${cat} sandbox: [${dispensary.id}] ${dispensary.name} (${dispensary.provider})`);
|
||||
totalQueued++;
|
||||
} catch (error: any) {
|
||||
console.error(` ✗ Failed to queue [${dispensary.id}]: ${error.message}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return totalQueued;
|
||||
}
|
||||
|
||||
async function processDetectionJobs(): Promise<void> {
|
||||
console.log('\n🔍 Processing Detection Jobs...');
|
||||
|
||||
// Get pending detection jobs
|
||||
const jobs = await pool.query(
|
||||
`SELECT DISTINCT dispensary_id
|
||||
FROM sandbox_crawl_jobs
|
||||
WHERE job_type = 'detection' AND status = 'pending'
|
||||
${flags.category ? `AND category = $2` : ''}
|
||||
${flags.dispensary ? `AND dispensary_id = $${flags.category ? '3' : '2'}` : ''}
|
||||
LIMIT $1`,
|
||||
flags.category
|
||||
? (flags.dispensary ? [flags.limit, flags.category, flags.dispensary] : [flags.limit, flags.category])
|
||||
: (flags.dispensary ? [flags.limit, flags.dispensary] : [flags.limit])
|
||||
);
|
||||
|
||||
for (const job of jobs.rows) {
|
||||
console.log(`\nProcessing detection for dispensary ${job.dispensary_id}...`);
|
||||
|
||||
try {
|
||||
// Get dispensary info
|
||||
const dispResult = await pool.query(
|
||||
'SELECT id, name, website, menu_url FROM dispensaries WHERE id = $1',
|
||||
[job.dispensary_id]
|
||||
);
|
||||
const dispensary = dispResult.rows[0];
|
||||
|
||||
if (!dispensary) {
|
||||
console.log(` ✗ Dispensary not found`);
|
||||
continue;
|
||||
}
|
||||
|
||||
const websiteUrl = dispensary.website || dispensary.menu_url;
|
||||
if (!websiteUrl) {
|
||||
console.log(` ✗ No website URL`);
|
||||
continue;
|
||||
}
|
||||
|
||||
// Mark jobs as running
|
||||
await pool.query(
|
||||
`UPDATE sandbox_crawl_jobs SET status = 'running', started_at = NOW()
|
||||
WHERE dispensary_id = $1 AND job_type = 'detection' AND status = 'pending'`,
|
||||
[job.dispensary_id]
|
||||
);
|
||||
|
||||
// Run multi-category detection
|
||||
console.log(` Detecting providers for ${dispensary.name}...`);
|
||||
const detection = await detectMultiCategoryProviders(websiteUrl, { timeout: 45000 });
|
||||
|
||||
// Update all categories
|
||||
await updateAllCategoryProviders(job.dispensary_id, detection);
|
||||
|
||||
// Mark jobs as completed
|
||||
await pool.query(
|
||||
`UPDATE sandbox_crawl_jobs SET status = 'completed', completed_at = NOW(),
|
||||
result_summary = $1
|
||||
WHERE dispensary_id = $2 AND job_type = 'detection' AND status = 'running'`,
|
||||
[JSON.stringify({
|
||||
product: { provider: detection.product.provider, confidence: detection.product.confidence },
|
||||
specials: { provider: detection.specials.provider, confidence: detection.specials.confidence },
|
||||
brand: { provider: detection.brand.provider, confidence: detection.brand.confidence },
|
||||
metadata: { provider: detection.metadata.provider, confidence: detection.metadata.confidence },
|
||||
}), job.dispensary_id]
|
||||
);
|
||||
|
||||
console.log(` ✓ Detection complete:`);
|
||||
console.log(` Product: ${detection.product.provider} (${detection.product.confidence}%) -> ${detection.product.mode}`);
|
||||
console.log(` Specials: ${detection.specials.provider} (${detection.specials.confidence}%) -> ${detection.specials.mode}`);
|
||||
console.log(` Brand: ${detection.brand.provider} (${detection.brand.confidence}%) -> ${detection.brand.mode}`);
|
||||
console.log(` Metadata: ${detection.metadata.provider} (${detection.metadata.confidence}%) -> ${detection.metadata.mode}`);
|
||||
|
||||
} catch (error: any) {
|
||||
console.log(` ✗ Error: ${error.message}`);
|
||||
await pool.query(
|
||||
`UPDATE sandbox_crawl_jobs SET status = 'failed', error_message = $1
|
||||
WHERE dispensary_id = $2 AND job_type = 'detection' AND status = 'running'`,
|
||||
[error.message, job.dispensary_id]
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
async function processCrawlJobs(): Promise<void> {
|
||||
const categories = flags.category ? [flags.category] : CATEGORIES;
|
||||
|
||||
for (const cat of categories) {
|
||||
console.log(`\n⚙️ Processing ${cat.toUpperCase()} Crawl Jobs...\n`);
|
||||
|
||||
// Process sandbox jobs for this category
|
||||
if (flags.sandbox || !flags.production) {
|
||||
await processCategorySandboxJobs(cat, flags.limit);
|
||||
}
|
||||
|
||||
// Process production jobs for this category
|
||||
if (flags.production && cat === 'product') {
|
||||
// Get pending production crawls
|
||||
const prodJobs = await pool.query(
|
||||
`SELECT d.id
|
||||
FROM dispensaries d
|
||||
WHERE d.product_provider = 'dutchie'
|
||||
AND d.product_crawler_mode = 'production'
|
||||
AND d.product_confidence >= 70
|
||||
${flags.dispensary ? 'AND d.id = $2' : ''}
|
||||
LIMIT $1`,
|
||||
flags.dispensary ? [flags.limit, flags.dispensary] : [flags.limit]
|
||||
);
|
||||
|
||||
for (const job of prodJobs.rows) {
|
||||
console.log(`Processing production ${cat} crawl for dispensary ${job.id}...`);
|
||||
const result = await runCrawlProductsJob(job.id);
|
||||
console.log(` ${result.success ? '✓' : '✗'} ${result.message}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
async function processSpecificDispensary(): Promise<void> {
|
||||
if (!flags.dispensary) return;
|
||||
|
||||
console.log(`\n🎯 Processing Dispensary ${flags.dispensary}...\n`);
|
||||
|
||||
const dispResult = await pool.query(
|
||||
'SELECT * FROM dispensaries WHERE id = $1',
|
||||
[flags.dispensary]
|
||||
);
|
||||
|
||||
if (dispResult.rows.length === 0) {
|
||||
console.log('Dispensary not found');
|
||||
return;
|
||||
}
|
||||
|
||||
const dispensary = dispResult.rows[0];
|
||||
console.log(`Name: ${dispensary.name}`);
|
||||
console.log(`Website: ${dispensary.website || dispensary.menu_url || 'none'}`);
|
||||
console.log('');
|
||||
|
||||
if (flags.detection) {
|
||||
console.log('Running multi-category detection...');
|
||||
const websiteUrl = dispensary.website || dispensary.menu_url;
|
||||
if (websiteUrl) {
|
||||
const detection = await detectMultiCategoryProviders(websiteUrl);
|
||||
await updateAllCategoryProviders(flags.dispensary, detection);
|
||||
console.log('Detection results:');
|
||||
console.log(` Product: ${detection.product.provider} (${detection.product.confidence}%) -> ${detection.product.mode}`);
|
||||
console.log(` Specials: ${detection.specials.provider} (${detection.specials.confidence}%) -> ${detection.specials.mode}`);
|
||||
console.log(` Brand: ${detection.brand.provider} (${detection.brand.confidence}%) -> ${detection.brand.mode}`);
|
||||
console.log(` Metadata: ${detection.metadata.provider} (${detection.metadata.confidence}%) -> ${detection.metadata.mode}`);
|
||||
}
|
||||
}
|
||||
|
||||
if (flags.production) {
|
||||
console.log('\nRunning production crawls...');
|
||||
const results = await runAllCategoryProductionCrawls(flags.dispensary);
|
||||
console.log(` ${results.summary}`);
|
||||
}
|
||||
|
||||
if (flags.sandbox) {
|
||||
console.log('\nRunning sandbox crawls...');
|
||||
const results = await runAllCategorySandboxCrawls(flags.dispensary);
|
||||
console.log(` ${results.summary}`);
|
||||
}
|
||||
}
|
||||
|
||||
async function showStats(): Promise<void> {
|
||||
console.log('\n📊 Multi-Category Intelligence Stats:');
|
||||
|
||||
// Per-category stats
|
||||
for (const cat of CATEGORIES) {
|
||||
const stats = await pool.query(`
|
||||
SELECT
|
||||
COUNT(*) as total,
|
||||
COUNT(*) FILTER (WHERE ${cat}_provider IS NULL) as no_provider,
|
||||
COUNT(*) FILTER (WHERE ${cat}_provider = 'dutchie') as dutchie,
|
||||
COUNT(*) FILTER (WHERE ${cat}_provider = 'treez') as treez,
|
||||
COUNT(*) FILTER (WHERE ${cat}_provider NOT IN ('dutchie', 'treez', 'unknown') AND ${cat}_provider IS NOT NULL) as other,
|
||||
COUNT(*) FILTER (WHERE ${cat}_provider = 'unknown') as unknown,
|
||||
COUNT(*) FILTER (WHERE ${cat}_crawler_mode = 'production') as production,
|
||||
COUNT(*) FILTER (WHERE ${cat}_crawler_mode = 'sandbox') as sandbox,
|
||||
AVG(${cat}_confidence) as avg_confidence
|
||||
FROM dispensaries
|
||||
`);
|
||||
|
||||
const s = stats.rows[0];
|
||||
console.log(`
|
||||
${cat.toUpperCase()}:
|
||||
Providers: Dutchie=${s.dutchie}, Treez=${s.treez}, Other=${s.other}, Unknown=${s.unknown}, None=${s.no_provider}
|
||||
Modes: Production=${s.production}, Sandbox=${s.sandbox}
|
||||
Avg Confidence: ${Math.round(s.avg_confidence || 0)}%`);
|
||||
}
|
||||
|
||||
// Job stats per category
|
||||
console.log('\n Sandbox Jobs by Category:');
|
||||
const jobStats = await pool.query(`
|
||||
SELECT
|
||||
category,
|
||||
COUNT(*) FILTER (WHERE status = 'pending') as pending,
|
||||
COUNT(*) FILTER (WHERE status = 'running') as running,
|
||||
COUNT(*) FILTER (WHERE status = 'completed') as completed,
|
||||
COUNT(*) FILTER (WHERE status = 'failed') as failed
|
||||
FROM sandbox_crawl_jobs
|
||||
GROUP BY category
|
||||
ORDER BY category
|
||||
`);
|
||||
|
||||
for (const row of jobStats.rows) {
|
||||
console.log(` ${row.category}: pending=${row.pending}, running=${row.running}, completed=${row.completed}, failed=${row.failed}`);
|
||||
}
|
||||
}
|
||||
|
||||
async function main() {
|
||||
if (flags.help) {
|
||||
await showHelp();
|
||||
process.exit(0);
|
||||
}
|
||||
|
||||
console.log('═══════════════════════════════════════════════════════');
|
||||
console.log(' Multi-Category Intelligence Queue Manager');
|
||||
console.log('═══════════════════════════════════════════════════════');
|
||||
|
||||
if (flags.dryRun) {
|
||||
console.log('\n🔍 DRY RUN MODE - No changes will be made\n');
|
||||
}
|
||||
|
||||
if (flags.category) {
|
||||
console.log(`\n📌 Filtering to category: ${flags.category}\n`);
|
||||
}
|
||||
|
||||
try {
|
||||
// Show current stats first
|
||||
await showStats();
|
||||
|
||||
// If specific dispensary specified, process it directly
|
||||
if (flags.dispensary && flags.process) {
|
||||
await processSpecificDispensary();
|
||||
} else if (flags.process) {
|
||||
// Process mode - run jobs
|
||||
if (flags.detection) {
|
||||
await processDetectionJobs();
|
||||
}
|
||||
await processCrawlJobs();
|
||||
} else {
|
||||
// Queuing mode
|
||||
let totalQueued = 0;
|
||||
|
||||
if (flags.detection) {
|
||||
totalQueued += await queueMultiCategoryDetection();
|
||||
}
|
||||
|
||||
if (flags.production) {
|
||||
totalQueued += await queueCategoryProductionCrawls(flags.category);
|
||||
}
|
||||
|
||||
if (flags.sandbox) {
|
||||
totalQueued += await queueCategorySandboxCrawls(flags.category);
|
||||
}
|
||||
|
||||
console.log('\n═══════════════════════════════════════════════════════');
|
||||
console.log(` Total queued: ${totalQueued}`);
|
||||
console.log('═══════════════════════════════════════════════════════\n');
|
||||
}
|
||||
|
||||
// Show updated stats
|
||||
if (!flags.dryRun) {
|
||||
await showStats();
|
||||
}
|
||||
|
||||
} catch (error) {
|
||||
console.error('Fatal error:', error);
|
||||
process.exit(1);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
main();
|
||||
@@ -1,173 +0,0 @@
|
||||
#!/usr/bin/env npx tsx
|
||||
/**
|
||||
* Dutchie Platform ID Resolver
|
||||
*
|
||||
* Standalone script to resolve a Dutchie dispensary slug to its platform ID.
|
||||
*
|
||||
* USAGE:
|
||||
* npx tsx src/scripts/resolve-dutchie-id.ts <slug>
|
||||
* npx tsx src/scripts/resolve-dutchie-id.ts hydroman-dispensary
|
||||
* npx tsx src/scripts/resolve-dutchie-id.ts AZ-Deeply-Rooted
|
||||
*
|
||||
* RESOLUTION STRATEGY:
|
||||
* 1. Navigate to https://dutchie.com/embedded-menu/{slug} via Puppeteer
|
||||
* 2. Extract window.reactEnv.dispensaryId (preferred - fastest)
|
||||
* 3. If reactEnv fails, call GraphQL GetAddressBasedDispensaryData as fallback
|
||||
*
|
||||
* OUTPUT:
|
||||
* - dispensaryId: The MongoDB ObjectId (e.g., "6405ef617056e8014d79101b")
|
||||
* - source: "reactEnv" or "graphql"
|
||||
* - httpStatus: HTTP status from embedded menu page
|
||||
* - error: Error message if resolution failed
|
||||
*/
|
||||
|
||||
import { resolveDispensaryIdWithDetails, ResolveDispensaryResult } from '../dutchie-az/services/graphql-client';
|
||||
|
||||
async function main() {
|
||||
const args = process.argv.slice(2);
|
||||
|
||||
if (args.length === 0 || args.includes('--help') || args.includes('-h')) {
|
||||
console.log(`
|
||||
Dutchie Platform ID Resolver
|
||||
|
||||
Usage:
|
||||
npx tsx src/scripts/resolve-dutchie-id.ts <slug>
|
||||
|
||||
Examples:
|
||||
npx tsx src/scripts/resolve-dutchie-id.ts hydroman-dispensary
|
||||
npx tsx src/scripts/resolve-dutchie-id.ts AZ-Deeply-Rooted
|
||||
npx tsx src/scripts/resolve-dutchie-id.ts mint-cannabis
|
||||
|
||||
Resolution Strategy:
|
||||
1. Puppeteer navigates to https://dutchie.com/embedded-menu/{slug}
|
||||
2. Extracts window.reactEnv.dispensaryId (preferred)
|
||||
3. Falls back to GraphQL GetAddressBasedDispensaryData if needed
|
||||
|
||||
Output Fields:
|
||||
- dispensaryId: MongoDB ObjectId (e.g., "6405ef617056e8014d79101b")
|
||||
- source: "reactEnv" (from page) or "graphql" (from API)
|
||||
- httpStatus: HTTP status code from page load
|
||||
- error: Error message if resolution failed
|
||||
`);
|
||||
process.exit(0);
|
||||
}
|
||||
|
||||
const slug = args[0];
|
||||
|
||||
console.log('='.repeat(60));
|
||||
console.log('DUTCHIE PLATFORM ID RESOLVER');
|
||||
console.log('='.repeat(60));
|
||||
console.log(`Slug: ${slug}`);
|
||||
console.log(`Embedded Menu URL: https://dutchie.com/embedded-menu/${slug}`);
|
||||
console.log('');
|
||||
console.log('Resolving...');
|
||||
console.log('');
|
||||
|
||||
const startTime = Date.now();
|
||||
|
||||
try {
|
||||
const result: ResolveDispensaryResult = await resolveDispensaryIdWithDetails(slug);
|
||||
const duration = Date.now() - startTime;
|
||||
|
||||
console.log('='.repeat(60));
|
||||
console.log('RESOLUTION RESULT');
|
||||
console.log('='.repeat(60));
|
||||
|
||||
if (result.dispensaryId) {
|
||||
console.log(`✓ SUCCESS`);
|
||||
console.log('');
|
||||
console.log(` Dispensary ID: ${result.dispensaryId}`);
|
||||
console.log(` Source: ${result.source}`);
|
||||
console.log(` HTTP Status: ${result.httpStatus || 'N/A'}`);
|
||||
console.log(` Duration: ${duration}ms`);
|
||||
console.log('');
|
||||
|
||||
// Show how to use this ID
|
||||
console.log('='.repeat(60));
|
||||
console.log('USAGE');
|
||||
console.log('='.repeat(60));
|
||||
console.log('');
|
||||
console.log('Use this ID in GraphQL FilteredProducts query:');
|
||||
console.log('');
|
||||
console.log(' POST https://dutchie.com/api-3/graphql');
|
||||
console.log('');
|
||||
console.log(' Body:');
|
||||
console.log(` {
|
||||
"operationName": "FilteredProducts",
|
||||
"variables": {
|
||||
"productsFilter": {
|
||||
"dispensaryId": "${result.dispensaryId}",
|
||||
"pricingType": "rec",
|
||||
"Status": "Active"
|
||||
},
|
||||
"page": 0,
|
||||
"perPage": 100
|
||||
},
|
||||
"extensions": {
|
||||
"persistedQuery": {
|
||||
"version": 1,
|
||||
"sha256Hash": "ee29c060826dc41c527e470e9ae502c9b2c169720faa0a9f5d25e1b9a530a4a0"
|
||||
}
|
||||
}
|
||||
}`);
|
||||
console.log('');
|
||||
|
||||
// Output for piping/scripting
|
||||
console.log('='.repeat(60));
|
||||
console.log('JSON OUTPUT');
|
||||
console.log('='.repeat(60));
|
||||
console.log(JSON.stringify({
|
||||
success: true,
|
||||
slug,
|
||||
dispensaryId: result.dispensaryId,
|
||||
source: result.source,
|
||||
httpStatus: result.httpStatus,
|
||||
durationMs: duration,
|
||||
}, null, 2));
|
||||
|
||||
} else {
|
||||
console.log(`✗ FAILED`);
|
||||
console.log('');
|
||||
console.log(` Error: ${result.error || 'Unknown error'}`);
|
||||
console.log(` HTTP Status: ${result.httpStatus || 'N/A'}`);
|
||||
console.log(` Duration: ${duration}ms`);
|
||||
console.log('');
|
||||
|
||||
if (result.httpStatus === 403 || result.httpStatus === 404) {
|
||||
console.log('NOTE: This store may be removed or not accessible on Dutchie.');
|
||||
console.log(' Mark dispensary as not_crawlable in the database.');
|
||||
}
|
||||
|
||||
console.log('');
|
||||
console.log('JSON OUTPUT:');
|
||||
console.log(JSON.stringify({
|
||||
success: false,
|
||||
slug,
|
||||
error: result.error,
|
||||
httpStatus: result.httpStatus,
|
||||
durationMs: duration,
|
||||
}, null, 2));
|
||||
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
} catch (error: any) {
|
||||
const duration = Date.now() - startTime;
|
||||
console.error('='.repeat(60));
|
||||
console.error('ERROR');
|
||||
console.error('='.repeat(60));
|
||||
console.error(`Message: ${error.message}`);
|
||||
console.error(`Duration: ${duration}ms`);
|
||||
console.error('');
|
||||
|
||||
if (error.message.includes('net::ERR_NAME_NOT_RESOLVED')) {
|
||||
console.error('NOTE: DNS resolution failed. This typically happens when running');
|
||||
console.error(' locally due to network restrictions. Try running from the');
|
||||
console.error(' Kubernetes pod or a cloud environment.');
|
||||
}
|
||||
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
main();
|
||||
@@ -1,151 +0,0 @@
|
||||
/**
|
||||
* LEGACY SCRIPT - Run Dutchie GraphQL Scrape
|
||||
*
|
||||
* DEPRECATED: This script creates its own database pool.
|
||||
* Future implementations should use the CannaiQ API endpoints instead.
|
||||
*
|
||||
* This script demonstrates the full pipeline:
|
||||
* 1. Puppeteer navigates to Dutchie menu
|
||||
* 2. GraphQL responses are intercepted
|
||||
* 3. Products are normalized to our schema
|
||||
* 4. Products are upserted to database
|
||||
* 5. Derived views (brands, categories, specials) are automatically updated
|
||||
*
|
||||
* DO NOT:
|
||||
* - Add this to package.json scripts
|
||||
* - Run this in automated jobs
|
||||
* - Use DATABASE_URL directly
|
||||
*/
|
||||
|
||||
import { Pool } from 'pg';
|
||||
import { scrapeDutchieMenu } from '../scrapers/dutchie-graphql';
|
||||
|
||||
console.warn('\n⚠️ LEGACY SCRIPT: This script should be replaced with CannaiQ API calls.\n');
|
||||
|
||||
// Single database connection (cannaiq in cannaiq-postgres container)
|
||||
const DATABASE_URL = process.env.CANNAIQ_DB_URL ||
|
||||
`postgresql://${process.env.CANNAIQ_DB_USER || 'dutchie'}:${process.env.CANNAIQ_DB_PASS || 'dutchie_local_pass'}@${process.env.CANNAIQ_DB_HOST || 'localhost'}:${process.env.CANNAIQ_DB_PORT || '54320'}/${process.env.CANNAIQ_DB_NAME || 'cannaiq'}`;
|
||||
|
||||
async function main() {
|
||||
const pool = new Pool({ connectionString: DATABASE_URL });
|
||||
|
||||
try {
|
||||
console.log('='.repeat(80));
|
||||
console.log('DUTCHIE GRAPHQL SCRAPER - FULL PIPELINE TEST');
|
||||
console.log('='.repeat(80));
|
||||
console.log(`Database: ${DATABASE_URL.replace(/:[^:@]+@/, ':***@')}`);
|
||||
|
||||
// Configuration
|
||||
const storeId = 1; // Deeply Rooted
|
||||
const menuUrl = 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted';
|
||||
|
||||
console.log(`\nStore ID: ${storeId}`);
|
||||
console.log(`Menu URL: ${menuUrl}`);
|
||||
console.log('\n' + '-'.repeat(80));
|
||||
|
||||
// Run the scrape
|
||||
console.log('\n🚀 Starting scrape...\n');
|
||||
const result = await scrapeDutchieMenu(pool, storeId, menuUrl);
|
||||
|
||||
console.log('\n' + '-'.repeat(80));
|
||||
console.log('📊 SCRAPE RESULTS:');
|
||||
console.log('-'.repeat(80));
|
||||
console.log(` Success: ${result.success}`);
|
||||
console.log(` Products Found: ${result.productsFound}`);
|
||||
console.log(` Inserted: ${result.inserted}`);
|
||||
console.log(` Updated: ${result.updated}`);
|
||||
if (result.error) {
|
||||
console.log(` Error: ${result.error}`);
|
||||
}
|
||||
|
||||
// Query derived views to show the result
|
||||
if (result.success) {
|
||||
console.log('\n' + '-'.repeat(80));
|
||||
console.log('📈 DERIVED DATA (from products table):');
|
||||
console.log('-'.repeat(80));
|
||||
|
||||
// Brands
|
||||
const brandsResult = await pool.query(`
|
||||
SELECT brand_name, product_count, min_price, max_price
|
||||
FROM derived_brands
|
||||
WHERE store_id = $1
|
||||
ORDER BY product_count DESC
|
||||
LIMIT 5
|
||||
`, [storeId]);
|
||||
|
||||
console.log('\nTop 5 Brands:');
|
||||
brandsResult.rows.forEach(row => {
|
||||
console.log(` - ${row.brand_name}: ${row.product_count} products ($${row.min_price} - $${row.max_price})`);
|
||||
});
|
||||
|
||||
// Specials
|
||||
const specialsResult = await pool.query(`
|
||||
SELECT name, brand, rec_price, rec_special_price, discount_percent
|
||||
FROM current_specials
|
||||
WHERE store_id = $1
|
||||
LIMIT 5
|
||||
`, [storeId]);
|
||||
|
||||
console.log('\nTop 5 Specials:');
|
||||
if (specialsResult.rows.length === 0) {
|
||||
console.log(' (No specials found - is_on_special may not be populated yet)');
|
||||
} else {
|
||||
specialsResult.rows.forEach(row => {
|
||||
console.log(` - ${row.name} (${row.brand}): $${row.rec_price} → $${row.rec_special_price} (${row.discount_percent}% off)`);
|
||||
});
|
||||
}
|
||||
|
||||
// Categories
|
||||
const categoriesResult = await pool.query(`
|
||||
SELECT category_name, product_count
|
||||
FROM derived_categories
|
||||
WHERE store_id = $1
|
||||
ORDER BY product_count DESC
|
||||
LIMIT 5
|
||||
`, [storeId]);
|
||||
|
||||
console.log('\nTop 5 Categories:');
|
||||
if (categoriesResult.rows.length === 0) {
|
||||
console.log(' (No categories found - subcategory may not be populated yet)');
|
||||
} else {
|
||||
categoriesResult.rows.forEach(row => {
|
||||
console.log(` - ${row.category_name}: ${row.product_count} products`);
|
||||
});
|
||||
}
|
||||
|
||||
// Sample product
|
||||
const sampleResult = await pool.query(`
|
||||
SELECT name, brand, subcategory, rec_price, rec_special_price, is_on_special, thc_percentage, status
|
||||
FROM products
|
||||
WHERE store_id = $1 AND subcategory IS NOT NULL
|
||||
ORDER BY updated_at DESC
|
||||
LIMIT 1
|
||||
`, [storeId]);
|
||||
|
||||
if (sampleResult.rows.length > 0) {
|
||||
const sample = sampleResult.rows[0];
|
||||
console.log('\nSample Product (with new fields):');
|
||||
console.log(` Name: ${sample.name}`);
|
||||
console.log(` Brand: ${sample.brand}`);
|
||||
console.log(` Category: ${sample.subcategory}`);
|
||||
console.log(` Price: $${sample.rec_price}`);
|
||||
console.log(` Sale Price: ${sample.rec_special_price ? `$${sample.rec_special_price}` : 'N/A'}`);
|
||||
console.log(` On Special: ${sample.is_on_special}`);
|
||||
console.log(` THC: ${sample.thc_percentage}%`);
|
||||
console.log(` Status: ${sample.status}`);
|
||||
}
|
||||
}
|
||||
|
||||
console.log('\n' + '='.repeat(80));
|
||||
console.log('✅ SCRAPE COMPLETE');
|
||||
console.log('='.repeat(80));
|
||||
|
||||
} catch (error: any) {
|
||||
console.error('\n❌ Error:', error.message);
|
||||
throw error;
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
main().catch(console.error);
|
||||
@@ -1,225 +0,0 @@
|
||||
/**
|
||||
* Sandbox Crawl Script for Dispensary 101 (Trulieve Scottsdale)
|
||||
*
|
||||
* Runs a full crawl and captures trace data for observability.
|
||||
* NO automatic promotion or status changes.
|
||||
*/
|
||||
|
||||
import { Pool } from 'pg';
|
||||
import { crawlDispensaryProducts } from '../dutchie-az/services/product-crawler';
|
||||
import { Dispensary } from '../dutchie-az/types';
|
||||
|
||||
const pool = new Pool({ connectionString: process.env.DATABASE_URL });
|
||||
|
||||
async function main() {
|
||||
console.log('=== SANDBOX CRAWL: Dispensary 101 (Trulieve Scottsdale) ===\n');
|
||||
const startTime = Date.now();
|
||||
|
||||
// Load dispensary from database (only columns that exist in local schema)
|
||||
const dispResult = await pool.query(`
|
||||
SELECT id, name, city, state, menu_type, platform_dispensary_id, menu_url
|
||||
FROM dispensaries
|
||||
WHERE id = 101
|
||||
`);
|
||||
|
||||
if (!dispResult.rows[0]) {
|
||||
console.log('ERROR: Dispensary 101 not found');
|
||||
await pool.end();
|
||||
return;
|
||||
}
|
||||
|
||||
const row = dispResult.rows[0];
|
||||
|
||||
// Map to Dispensary interface (snake_case -> camelCase)
|
||||
const dispensary: Dispensary = {
|
||||
id: row.id,
|
||||
platform: 'dutchie',
|
||||
name: row.name,
|
||||
slug: row.name.toLowerCase().replace(/\s+/g, '-'),
|
||||
city: row.city,
|
||||
state: row.state,
|
||||
platformDispensaryId: row.platform_dispensary_id,
|
||||
menuType: row.menu_type,
|
||||
menuUrl: row.menu_url,
|
||||
createdAt: new Date(),
|
||||
updatedAt: new Date(),
|
||||
};
|
||||
|
||||
console.log('=== DISPENSARY INFO ===');
|
||||
console.log(`Name: ${dispensary.name}`);
|
||||
console.log(`Location: ${dispensary.city}, ${dispensary.state}`);
|
||||
console.log(`Menu Type: ${dispensary.menuType}`);
|
||||
console.log(`Platform ID: ${dispensary.platformDispensaryId}`);
|
||||
console.log(`Menu URL: ${dispensary.menuUrl}`);
|
||||
console.log('');
|
||||
|
||||
// Get profile info
|
||||
const profileResult = await pool.query(`
|
||||
SELECT id, profile_key, status, config FROM dispensary_crawler_profiles
|
||||
WHERE dispensary_id = 101
|
||||
`);
|
||||
|
||||
const profile = profileResult.rows[0];
|
||||
if (profile) {
|
||||
console.log('=== PROFILE ===');
|
||||
console.log(`Profile Key: ${profile.profile_key}`);
|
||||
console.log(`Profile Status: ${profile.status}`);
|
||||
console.log(`Config: ${JSON.stringify(profile.config, null, 2)}`);
|
||||
console.log('');
|
||||
} else {
|
||||
console.log('=== PROFILE ===');
|
||||
console.log('No profile found - will use defaults');
|
||||
console.log('');
|
||||
}
|
||||
|
||||
// Run the crawl
|
||||
console.log('=== STARTING CRAWL ===');
|
||||
console.log('Options: useBothModes=true, downloadImages=false (sandbox)');
|
||||
console.log('');
|
||||
|
||||
try {
|
||||
const result = await crawlDispensaryProducts(dispensary, 'rec', {
|
||||
useBothModes: true,
|
||||
downloadImages: false, // Skip images in sandbox mode for speed
|
||||
});
|
||||
|
||||
console.log('');
|
||||
console.log('=== CRAWL RESULT ===');
|
||||
console.log(`Success: ${result.success}`);
|
||||
console.log(`Products Found: ${result.productsFound}`);
|
||||
console.log(`Products Fetched: ${result.productsFetched}`);
|
||||
console.log(`Products Upserted: ${result.productsUpserted}`);
|
||||
console.log(`Snapshots Created: ${result.snapshotsCreated}`);
|
||||
if (result.errorMessage) {
|
||||
console.log(`Error: ${result.errorMessage}`);
|
||||
}
|
||||
console.log(`Duration: ${result.durationMs}ms`);
|
||||
console.log('');
|
||||
|
||||
// Show sample products from database
|
||||
if (result.productsUpserted > 0) {
|
||||
const sampleProducts = await pool.query(`
|
||||
SELECT
|
||||
id, name, brand_name, type, subcategory, strain_type,
|
||||
price_rec, price_rec_original, stock_status, external_product_id
|
||||
FROM dutchie_products
|
||||
WHERE dispensary_id = 101
|
||||
ORDER BY updated_at DESC
|
||||
LIMIT 10
|
||||
`);
|
||||
|
||||
console.log('=== SAMPLE PRODUCTS (10) ===');
|
||||
sampleProducts.rows.forEach((p: any, i: number) => {
|
||||
console.log(`${i + 1}. ${p.name}`);
|
||||
console.log(` Brand: ${p.brand_name || 'N/A'}`);
|
||||
console.log(` Type: ${p.type} / ${p.subcategory || 'N/A'}`);
|
||||
console.log(` Strain: ${p.strain_type || 'N/A'}`);
|
||||
console.log(` Price: $${p.price_rec || 'N/A'} (orig: $${p.price_rec_original || 'N/A'})`);
|
||||
console.log(` Stock: ${p.stock_status}`);
|
||||
console.log(` External ID: ${p.external_product_id}`);
|
||||
console.log('');
|
||||
});
|
||||
|
||||
// Show field coverage stats
|
||||
const fieldStats = await pool.query(`
|
||||
SELECT
|
||||
COUNT(*) as total,
|
||||
COUNT(brand_name) as with_brand,
|
||||
COUNT(type) as with_type,
|
||||
COUNT(strain_type) as with_strain,
|
||||
COUNT(price_rec) as with_price,
|
||||
COUNT(image_url) as with_image,
|
||||
COUNT(description) as with_description,
|
||||
COUNT(thc_content) as with_thc,
|
||||
COUNT(cbd_content) as with_cbd
|
||||
FROM dutchie_products
|
||||
WHERE dispensary_id = 101
|
||||
`);
|
||||
|
||||
const stats = fieldStats.rows[0];
|
||||
console.log('=== FIELD COVERAGE ===');
|
||||
console.log(`Total products: ${stats.total}`);
|
||||
console.log(`With brand: ${stats.with_brand} (${Math.round(stats.with_brand / stats.total * 100)}%)`);
|
||||
console.log(`With type: ${stats.with_type} (${Math.round(stats.with_type / stats.total * 100)}%)`);
|
||||
console.log(`With strain_type: ${stats.with_strain} (${Math.round(stats.with_strain / stats.total * 100)}%)`);
|
||||
console.log(`With price_rec: ${stats.with_price} (${Math.round(stats.with_price / stats.total * 100)}%)`);
|
||||
console.log(`With image_url: ${stats.with_image} (${Math.round(stats.with_image / stats.total * 100)}%)`);
|
||||
console.log(`With description: ${stats.with_description} (${Math.round(stats.with_description / stats.total * 100)}%)`);
|
||||
console.log(`With THC: ${stats.with_thc} (${Math.round(stats.with_thc / stats.total * 100)}%)`);
|
||||
console.log(`With CBD: ${stats.with_cbd} (${Math.round(stats.with_cbd / stats.total * 100)}%)`);
|
||||
console.log('');
|
||||
}
|
||||
|
||||
// Insert trace record for observability
|
||||
const traceData = {
|
||||
crawlResult: result,
|
||||
dispensaryInfo: {
|
||||
id: dispensary.id,
|
||||
name: dispensary.name,
|
||||
platformDispensaryId: dispensary.platformDispensaryId,
|
||||
menuUrl: dispensary.menuUrl,
|
||||
},
|
||||
profile: profile || null,
|
||||
timestamp: new Date().toISOString(),
|
||||
};
|
||||
|
||||
await pool.query(`
|
||||
INSERT INTO crawl_orchestration_traces
|
||||
(dispensary_id, profile_id, profile_key, crawler_module, mode,
|
||||
state_at_start, state_at_end, trace, success, products_found,
|
||||
duration_ms, started_at, completed_at)
|
||||
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, NOW())
|
||||
`, [
|
||||
101,
|
||||
profile?.id || null,
|
||||
profile?.profile_key || null,
|
||||
'product-crawler',
|
||||
'sandbox',
|
||||
profile?.status || 'no_profile',
|
||||
profile?.status || 'no_profile', // No status change in sandbox
|
||||
JSON.stringify(traceData),
|
||||
result.success,
|
||||
result.productsFound,
|
||||
result.durationMs,
|
||||
new Date(startTime),
|
||||
]);
|
||||
|
||||
console.log('=== TRACE RECORDED ===');
|
||||
console.log('Trace saved to crawl_orchestration_traces table');
|
||||
|
||||
} catch (error: any) {
|
||||
console.error('=== CRAWL ERROR ===');
|
||||
console.error('Error:', error.message);
|
||||
console.error('Stack:', error.stack);
|
||||
|
||||
// Record error trace
|
||||
await pool.query(`
|
||||
INSERT INTO crawl_orchestration_traces
|
||||
(dispensary_id, profile_id, profile_key, crawler_module, mode,
|
||||
state_at_start, state_at_end, trace, success, error_message,
|
||||
duration_ms, started_at, completed_at)
|
||||
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, NOW())
|
||||
`, [
|
||||
101,
|
||||
profile?.id || null,
|
||||
profile?.profile_key || null,
|
||||
'product-crawler',
|
||||
'sandbox',
|
||||
profile?.status || 'no_profile',
|
||||
profile?.status || 'no_profile',
|
||||
JSON.stringify({ error: error.message, stack: error.stack }),
|
||||
false,
|
||||
error.message,
|
||||
Date.now() - startTime,
|
||||
new Date(startTime),
|
||||
]);
|
||||
}
|
||||
|
||||
await pool.end();
|
||||
console.log('=== SANDBOX CRAWL COMPLETE ===');
|
||||
}
|
||||
|
||||
main().catch(e => {
|
||||
console.error('Fatal error:', e.message);
|
||||
process.exit(1);
|
||||
});
|
||||
@@ -1,181 +0,0 @@
|
||||
/**
|
||||
* LEGACY SCRIPT - Sandbox Crawl Test
|
||||
*
|
||||
* DEPRECATED: This script uses direct database connections.
|
||||
* Future implementations should use the CannaiQ API endpoints instead.
|
||||
*
|
||||
* This script runs sandbox crawl for a dispensary and captures the full trace.
|
||||
* It is kept for historical reference and manual testing only.
|
||||
*
|
||||
* DO NOT:
|
||||
* - Add this to package.json scripts
|
||||
* - Run this in automated jobs
|
||||
* - Use DATABASE_URL directly
|
||||
*
|
||||
* Usage (manual only):
|
||||
* STORAGE_DRIVER=local npx tsx src/scripts/sandbox-test.ts <dispensary_id>
|
||||
*
|
||||
* LOCAL MODE REQUIREMENTS:
|
||||
* - STORAGE_DRIVER=local
|
||||
* - STORAGE_BASE_PATH=./storage
|
||||
* - Local cannaiq-postgres on port 54320
|
||||
* - NO MinIO, NO Kubernetes
|
||||
*/
|
||||
|
||||
import { query, getClient, closePool } from '../dutchie-az/db/connection';
|
||||
import { runDispensaryOrchestrator } from '../services/dispensary-orchestrator';
|
||||
|
||||
// Verify local mode
|
||||
function verifyLocalMode(): void {
|
||||
const storageDriver = process.env.STORAGE_DRIVER || 'local';
|
||||
const minioEndpoint = process.env.MINIO_ENDPOINT;
|
||||
|
||||
console.log('=== LOCAL MODE VERIFICATION ===');
|
||||
console.log(`STORAGE_DRIVER: ${storageDriver}`);
|
||||
console.log(`MINIO_ENDPOINT: ${minioEndpoint || 'NOT SET (good)'}`);
|
||||
console.log(`STORAGE_BASE_PATH: ${process.env.STORAGE_BASE_PATH || './storage'}`);
|
||||
console.log('DB Connection: Using canonical CannaiQ pool');
|
||||
|
||||
if (storageDriver !== 'local') {
|
||||
console.error('ERROR: STORAGE_DRIVER must be "local"');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
if (minioEndpoint) {
|
||||
console.error('ERROR: MINIO_ENDPOINT should NOT be set in local mode');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
console.log('✅ Local mode verified\n');
|
||||
}
|
||||
|
||||
async function getDispensaryInfo(dispensaryId: number) {
|
||||
const result = await query(`
|
||||
SELECT d.id, d.name, d.city, d.menu_type, d.platform_dispensary_id, d.menu_url,
|
||||
p.profile_key, p.status as profile_status, p.config
|
||||
FROM dispensaries d
|
||||
LEFT JOIN dispensary_crawler_profiles p ON p.dispensary_id = d.id
|
||||
WHERE d.id = $1
|
||||
`, [dispensaryId]);
|
||||
|
||||
return result.rows[0];
|
||||
}
|
||||
|
||||
async function getLatestTrace(dispensaryId: number) {
|
||||
const result = await query(`
|
||||
SELECT *
|
||||
FROM crawl_orchestration_traces
|
||||
WHERE dispensary_id = $1
|
||||
ORDER BY created_at DESC
|
||||
LIMIT 1
|
||||
`, [dispensaryId]);
|
||||
|
||||
return result.rows[0];
|
||||
}
|
||||
|
||||
async function main() {
|
||||
console.warn('\n⚠️ LEGACY SCRIPT: This script should be replaced with CannaiQ API calls.\n');
|
||||
|
||||
const dispensaryId = parseInt(process.argv[2], 10);
|
||||
|
||||
if (!dispensaryId || isNaN(dispensaryId)) {
|
||||
console.error('Usage: npx tsx src/scripts/sandbox-test.ts <dispensary_id>');
|
||||
console.error('Example: npx tsx src/scripts/sandbox-test.ts 101');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
// Verify local mode first
|
||||
verifyLocalMode();
|
||||
|
||||
try {
|
||||
// Get dispensary info
|
||||
console.log(`=== DISPENSARY INFO (ID: ${dispensaryId}) ===`);
|
||||
const dispensary = await getDispensaryInfo(dispensaryId);
|
||||
|
||||
if (!dispensary) {
|
||||
console.error(`Dispensary ${dispensaryId} not found`);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
console.log(`Name: ${dispensary.name}`);
|
||||
console.log(`City: ${dispensary.city}`);
|
||||
console.log(`Menu Type: ${dispensary.menu_type}`);
|
||||
console.log(`Platform Dispensary ID: ${dispensary.platform_dispensary_id || 'NULL'}`);
|
||||
console.log(`Menu URL: ${dispensary.menu_url || 'NULL'}`);
|
||||
console.log(`Profile Key: ${dispensary.profile_key || 'NONE'}`);
|
||||
console.log(`Profile Status: ${dispensary.profile_status || 'N/A'}`);
|
||||
console.log(`Profile Config: ${JSON.stringify(dispensary.config, null, 2)}`);
|
||||
console.log('');
|
||||
|
||||
// Run sandbox crawl
|
||||
console.log('=== RUNNING SANDBOX CRAWL ===');
|
||||
console.log(`Starting sandbox crawl for ${dispensary.name}...`);
|
||||
const startTime = Date.now();
|
||||
|
||||
const result = await runDispensaryOrchestrator(dispensaryId);
|
||||
|
||||
const duration = Date.now() - startTime;
|
||||
|
||||
console.log('\n=== CRAWL RESULT ===');
|
||||
console.log(`Status: ${result.status}`);
|
||||
console.log(`Summary: ${result.summary}`);
|
||||
console.log(`Run ID: ${result.runId}`);
|
||||
console.log(`Duration: ${duration}ms`);
|
||||
console.log(`Detection Ran: ${result.detectionRan}`);
|
||||
console.log(`Crawl Ran: ${result.crawlRan}`);
|
||||
console.log(`Crawl Type: ${result.crawlType || 'N/A'}`);
|
||||
console.log(`Products Found: ${result.productsFound || 0}`);
|
||||
console.log(`Products New: ${result.productsNew || 0}`);
|
||||
console.log(`Products Updated: ${result.productsUpdated || 0}`);
|
||||
|
||||
if (result.error) {
|
||||
console.log(`Error: ${result.error}`);
|
||||
}
|
||||
|
||||
// Get the trace
|
||||
console.log('\n=== ORCHESTRATOR TRACE ===');
|
||||
const trace = await getLatestTrace(dispensaryId);
|
||||
|
||||
if (trace) {
|
||||
console.log(`Trace ID: ${trace.id}`);
|
||||
console.log(`Profile Key: ${trace.profile_key || 'N/A'}`);
|
||||
console.log(`Mode: ${trace.mode}`);
|
||||
console.log(`Status: ${trace.status}`);
|
||||
console.log(`Started At: ${trace.started_at}`);
|
||||
console.log(`Completed At: ${trace.completed_at || 'In Progress'}`);
|
||||
|
||||
if (trace.steps && Array.isArray(trace.steps)) {
|
||||
console.log(`\nSteps (${trace.steps.length} total):`);
|
||||
trace.steps.forEach((step: any, i: number) => {
|
||||
const status = step.status === 'completed' ? '✅' : step.status === 'failed' ? '❌' : '⏳';
|
||||
console.log(` ${i + 1}. ${status} ${step.action}: ${step.description}`);
|
||||
if (step.output && Object.keys(step.output).length > 0) {
|
||||
console.log(` Output: ${JSON.stringify(step.output)}`);
|
||||
}
|
||||
if (step.error) {
|
||||
console.log(` Error: ${step.error}`);
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
if (trace.result) {
|
||||
console.log(`\nResult: ${JSON.stringify(trace.result, null, 2)}`);
|
||||
}
|
||||
|
||||
if (trace.error_message) {
|
||||
console.log(`\nError Message: ${trace.error_message}`);
|
||||
}
|
||||
} else {
|
||||
console.log('No trace found for this dispensary');
|
||||
}
|
||||
|
||||
} catch (error: any) {
|
||||
console.error('Error running sandbox test:', error.message);
|
||||
console.error(error.stack);
|
||||
process.exit(1);
|
||||
} finally {
|
||||
await closePool();
|
||||
}
|
||||
}
|
||||
|
||||
main();
|
||||
@@ -1,332 +0,0 @@
|
||||
/**
|
||||
* LEGACY SCRIPT - Scrape All Active Products
|
||||
*
|
||||
* DEPRECATED: This script creates its own database pool.
|
||||
* Future implementations should use the CannaiQ API endpoints instead.
|
||||
*
|
||||
* Scrapes ALL active products via direct GraphQL pagination.
|
||||
* This is more reliable than category navigation.
|
||||
*
|
||||
* DO NOT:
|
||||
* - Add this to package.json scripts
|
||||
* - Run this in automated jobs
|
||||
* - Use DATABASE_URL directly
|
||||
*/
|
||||
|
||||
import puppeteer from 'puppeteer-extra';
|
||||
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
|
||||
import { Pool } from 'pg';
|
||||
import { normalizeDutchieProduct, DutchieProduct } from '../scrapers/dutchie-graphql';
|
||||
|
||||
puppeteer.use(StealthPlugin());
|
||||
|
||||
console.warn('\n⚠️ LEGACY SCRIPT: This script should be replaced with CannaiQ API calls.\n');
|
||||
|
||||
// Single database connection (cannaiq in cannaiq-postgres container)
|
||||
const DATABASE_URL = process.env.CANNAIQ_DB_URL ||
|
||||
`postgresql://${process.env.CANNAIQ_DB_USER || 'dutchie'}:${process.env.CANNAIQ_DB_PASS || 'dutchie_local_pass'}@${process.env.CANNAIQ_DB_HOST || 'localhost'}:${process.env.CANNAIQ_DB_PORT || '54320'}/${process.env.CANNAIQ_DB_NAME || 'cannaiq'}`;
|
||||
const GRAPHQL_HASH = 'ee29c060826dc41c527e470e9ae502c9b2c169720faa0a9f5d25e1b9a530a4a0';
|
||||
|
||||
async function scrapeAllProducts(menuUrl: string, storeId: number) {
|
||||
const pool = new Pool({ connectionString: DATABASE_URL });
|
||||
|
||||
const browser = await puppeteer.launch({
|
||||
headless: 'new',
|
||||
args: ['--no-sandbox', '--disable-setuid-sandbox'],
|
||||
});
|
||||
|
||||
try {
|
||||
const page = await browser.newPage();
|
||||
await page.setUserAgent(
|
||||
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36'
|
||||
);
|
||||
|
||||
console.log('Loading menu to establish session...');
|
||||
await page.goto(menuUrl, {
|
||||
waitUntil: 'networkidle2',
|
||||
timeout: 60000,
|
||||
});
|
||||
await new Promise((r) => setTimeout(r, 3000));
|
||||
|
||||
const dispensaryId = await page.evaluate(() => (window as any).reactEnv?.dispensaryId);
|
||||
console.log('Dispensary ID:', dispensaryId);
|
||||
|
||||
// Paginate through all products
|
||||
const allProducts: DutchieProduct[] = [];
|
||||
let pageNum = 0;
|
||||
const perPage = 100;
|
||||
|
||||
console.log('\nFetching all products via paginated GraphQL...');
|
||||
|
||||
while (true) {
|
||||
const result = await page.evaluate(
|
||||
async (dispId: string, hash: string, page: number, perPage: number) => {
|
||||
const variables = {
|
||||
includeEnterpriseSpecials: false,
|
||||
productsFilter: {
|
||||
dispensaryId: dispId,
|
||||
pricingType: 'rec',
|
||||
Status: 'Active',
|
||||
types: [],
|
||||
useCache: false,
|
||||
isDefaultSort: true,
|
||||
sortBy: 'popularSortIdx',
|
||||
sortDirection: 1,
|
||||
bypassOnlineThresholds: true,
|
||||
isKioskMenu: false,
|
||||
removeProductsBelowOptionThresholds: false,
|
||||
},
|
||||
page,
|
||||
perPage,
|
||||
};
|
||||
|
||||
const qs = new URLSearchParams({
|
||||
operationName: 'FilteredProducts',
|
||||
variables: JSON.stringify(variables),
|
||||
extensions: JSON.stringify({ persistedQuery: { version: 1, sha256Hash: hash } }),
|
||||
});
|
||||
|
||||
const resp = await fetch(`https://dutchie.com/graphql?${qs.toString()}`, {
|
||||
method: 'GET',
|
||||
headers: {
|
||||
'content-type': 'application/json',
|
||||
'apollographql-client-name': 'Marketplace (production)',
|
||||
},
|
||||
credentials: 'include',
|
||||
});
|
||||
|
||||
const json = await resp.json();
|
||||
return {
|
||||
products: json?.data?.filteredProducts?.products || [],
|
||||
totalCount: json?.data?.filteredProducts?.queryInfo?.totalCount,
|
||||
};
|
||||
},
|
||||
dispensaryId,
|
||||
GRAPHQL_HASH,
|
||||
pageNum,
|
||||
perPage
|
||||
);
|
||||
|
||||
if (result.products.length === 0) {
|
||||
break;
|
||||
}
|
||||
|
||||
allProducts.push(...result.products);
|
||||
console.log(
|
||||
`Page ${pageNum}: ${result.products.length} products (total so far: ${allProducts.length}/${result.totalCount})`
|
||||
);
|
||||
|
||||
pageNum++;
|
||||
|
||||
// Safety limit
|
||||
if (pageNum > 50) {
|
||||
console.log('Reached page limit');
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`\nTotal products fetched: ${allProducts.length}`);
|
||||
|
||||
// Normalize and upsert
|
||||
console.log('\nNormalizing and upserting to database...');
|
||||
const normalized = allProducts.map(normalizeDutchieProduct);
|
||||
|
||||
const client = await pool.connect();
|
||||
let inserted = 0;
|
||||
let updated = 0;
|
||||
|
||||
try {
|
||||
await client.query('BEGIN');
|
||||
|
||||
for (const product of normalized) {
|
||||
const result = await client.query(
|
||||
`
|
||||
INSERT INTO products (
|
||||
store_id, external_id, slug, name, enterprise_product_id,
|
||||
brand, brand_external_id, brand_logo_url,
|
||||
subcategory, strain_type, canonical_category,
|
||||
price, rec_price, med_price, rec_special_price, med_special_price,
|
||||
is_on_special, special_name, discount_percent, special_data,
|
||||
sku, inventory_quantity, inventory_available, is_below_threshold, status,
|
||||
thc_percentage, cbd_percentage, cannabinoids,
|
||||
weight_mg, net_weight_value, net_weight_unit, options, raw_options,
|
||||
image_url, additional_images,
|
||||
is_featured, medical_only, rec_only,
|
||||
source_created_at, source_updated_at,
|
||||
description, raw_data,
|
||||
dutchie_url, last_seen_at, updated_at
|
||||
)
|
||||
VALUES (
|
||||
$1, $2, $3, $4, $5,
|
||||
$6, $7, $8,
|
||||
$9, $10, $11,
|
||||
$12, $13, $14, $15, $16,
|
||||
$17, $18, $19, $20,
|
||||
$21, $22, $23, $24, $25,
|
||||
$26, $27, $28,
|
||||
$29, $30, $31, $32, $33,
|
||||
$34, $35,
|
||||
$36, $37, $38,
|
||||
$39, $40,
|
||||
$41, $42,
|
||||
'', NOW(), NOW()
|
||||
)
|
||||
ON CONFLICT (store_id, slug) DO UPDATE SET
|
||||
name = EXCLUDED.name,
|
||||
enterprise_product_id = EXCLUDED.enterprise_product_id,
|
||||
brand = EXCLUDED.brand,
|
||||
brand_external_id = EXCLUDED.brand_external_id,
|
||||
brand_logo_url = EXCLUDED.brand_logo_url,
|
||||
subcategory = EXCLUDED.subcategory,
|
||||
strain_type = EXCLUDED.strain_type,
|
||||
canonical_category = EXCLUDED.canonical_category,
|
||||
price = EXCLUDED.price,
|
||||
rec_price = EXCLUDED.rec_price,
|
||||
med_price = EXCLUDED.med_price,
|
||||
rec_special_price = EXCLUDED.rec_special_price,
|
||||
med_special_price = EXCLUDED.med_special_price,
|
||||
is_on_special = EXCLUDED.is_on_special,
|
||||
special_name = EXCLUDED.special_name,
|
||||
discount_percent = EXCLUDED.discount_percent,
|
||||
special_data = EXCLUDED.special_data,
|
||||
sku = EXCLUDED.sku,
|
||||
inventory_quantity = EXCLUDED.inventory_quantity,
|
||||
inventory_available = EXCLUDED.inventory_available,
|
||||
is_below_threshold = EXCLUDED.is_below_threshold,
|
||||
status = EXCLUDED.status,
|
||||
thc_percentage = EXCLUDED.thc_percentage,
|
||||
cbd_percentage = EXCLUDED.cbd_percentage,
|
||||
cannabinoids = EXCLUDED.cannabinoids,
|
||||
weight_mg = EXCLUDED.weight_mg,
|
||||
net_weight_value = EXCLUDED.net_weight_value,
|
||||
net_weight_unit = EXCLUDED.net_weight_unit,
|
||||
options = EXCLUDED.options,
|
||||
raw_options = EXCLUDED.raw_options,
|
||||
image_url = EXCLUDED.image_url,
|
||||
additional_images = EXCLUDED.additional_images,
|
||||
is_featured = EXCLUDED.is_featured,
|
||||
medical_only = EXCLUDED.medical_only,
|
||||
rec_only = EXCLUDED.rec_only,
|
||||
source_created_at = EXCLUDED.source_created_at,
|
||||
source_updated_at = EXCLUDED.source_updated_at,
|
||||
description = EXCLUDED.description,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
last_seen_at = NOW(),
|
||||
updated_at = NOW()
|
||||
RETURNING (xmax = 0) AS was_inserted
|
||||
`,
|
||||
[
|
||||
storeId,
|
||||
product.external_id,
|
||||
product.slug,
|
||||
product.name,
|
||||
product.enterprise_product_id,
|
||||
product.brand,
|
||||
product.brand_external_id,
|
||||
product.brand_logo_url,
|
||||
product.subcategory,
|
||||
product.strain_type,
|
||||
product.canonical_category,
|
||||
product.price,
|
||||
product.rec_price,
|
||||
product.med_price,
|
||||
product.rec_special_price,
|
||||
product.med_special_price,
|
||||
product.is_on_special,
|
||||
product.special_name,
|
||||
product.discount_percent,
|
||||
product.special_data ? JSON.stringify(product.special_data) : null,
|
||||
product.sku,
|
||||
product.inventory_quantity,
|
||||
product.inventory_available,
|
||||
product.is_below_threshold,
|
||||
product.status,
|
||||
product.thc_percentage,
|
||||
product.cbd_percentage,
|
||||
product.cannabinoids ? JSON.stringify(product.cannabinoids) : null,
|
||||
product.weight_mg,
|
||||
product.net_weight_value,
|
||||
product.net_weight_unit,
|
||||
product.options,
|
||||
product.raw_options,
|
||||
product.image_url,
|
||||
product.additional_images,
|
||||
product.is_featured,
|
||||
product.medical_only,
|
||||
product.rec_only,
|
||||
product.source_created_at,
|
||||
product.source_updated_at,
|
||||
product.description,
|
||||
product.raw_data ? JSON.stringify(product.raw_data) : null,
|
||||
]
|
||||
);
|
||||
|
||||
if (result.rows[0]?.was_inserted) {
|
||||
inserted++;
|
||||
} else {
|
||||
updated++;
|
||||
}
|
||||
}
|
||||
|
||||
await client.query('COMMIT');
|
||||
} catch (error) {
|
||||
await client.query('ROLLBACK');
|
||||
throw error;
|
||||
} finally {
|
||||
client.release();
|
||||
}
|
||||
|
||||
console.log(`\nDatabase: ${inserted} inserted, ${updated} updated`);
|
||||
|
||||
// Show summary stats
|
||||
const stats = await pool.query(
|
||||
`
|
||||
SELECT
|
||||
COUNT(*) as total,
|
||||
COUNT(*) FILTER (WHERE is_on_special) as specials,
|
||||
COUNT(DISTINCT brand) as brands,
|
||||
COUNT(DISTINCT subcategory) as categories
|
||||
FROM products WHERE store_id = $1
|
||||
`,
|
||||
[storeId]
|
||||
);
|
||||
|
||||
console.log('\nStore summary:');
|
||||
console.log(` Total products: ${stats.rows[0].total}`);
|
||||
console.log(` On special: ${stats.rows[0].specials}`);
|
||||
console.log(` Unique brands: ${stats.rows[0].brands}`);
|
||||
console.log(` Categories: ${stats.rows[0].categories}`);
|
||||
|
||||
return {
|
||||
success: true,
|
||||
totalProducts: allProducts.length,
|
||||
inserted,
|
||||
updated,
|
||||
};
|
||||
} finally {
|
||||
await browser.close();
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
// Run
|
||||
const menuUrl = process.argv[2] || 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted';
|
||||
const storeId = parseInt(process.argv[3] || '1', 10);
|
||||
|
||||
console.log('='.repeat(60));
|
||||
console.log('DUTCHIE GRAPHQL FULL SCRAPE');
|
||||
console.log('='.repeat(60));
|
||||
console.log(`Menu URL: ${menuUrl}`);
|
||||
console.log(`Store ID: ${storeId}`);
|
||||
console.log('');
|
||||
|
||||
scrapeAllProducts(menuUrl, storeId)
|
||||
.then((result) => {
|
||||
console.log('\n' + '='.repeat(60));
|
||||
console.log('COMPLETE');
|
||||
console.log(JSON.stringify(result, null, 2));
|
||||
})
|
||||
.catch((error) => {
|
||||
console.error('Error:', error.message);
|
||||
process.exit(1);
|
||||
});
|
||||
@@ -1,156 +0,0 @@
|
||||
/**
|
||||
* Test script: End-to-end Dutchie GraphQL → DB → Dashboard flow
|
||||
*
|
||||
* This demonstrates the complete data pipeline:
|
||||
* 1. Fetch one product from Dutchie GraphQL via Puppeteer
|
||||
* 2. Normalize it to our schema
|
||||
* 3. Show the mapping
|
||||
*/
|
||||
|
||||
import { normalizeDutchieProduct, DutchieProduct, NormalizedProduct } from '../scrapers/dutchie-graphql';
|
||||
import * as fs from 'fs';
|
||||
|
||||
// Load the captured sample product from schema capture
|
||||
const capturedData = JSON.parse(
|
||||
fs.readFileSync('/tmp/dutchie-schema-capture.json', 'utf-8')
|
||||
);
|
||||
|
||||
const sampleProduct: DutchieProduct = capturedData.sampleProduct;
|
||||
|
||||
console.log('='.repeat(80));
|
||||
console.log('DUTCHIE GRAPHQL → DATABASE MAPPING DEMONSTRATION');
|
||||
console.log('='.repeat(80));
|
||||
|
||||
console.log('\n📥 RAW DUTCHIE GRAPHQL PRODUCT:');
|
||||
console.log('-'.repeat(80));
|
||||
|
||||
// Show key fields from raw product
|
||||
const keyRawFields = {
|
||||
'_id': sampleProduct._id,
|
||||
'Name': sampleProduct.Name,
|
||||
'cName': sampleProduct.cName,
|
||||
'brandName': sampleProduct.brandName,
|
||||
'brand.id': sampleProduct.brand?.id,
|
||||
'type': sampleProduct.type,
|
||||
'subcategory': sampleProduct.subcategory,
|
||||
'strainType': sampleProduct.strainType,
|
||||
'Prices': sampleProduct.Prices,
|
||||
'recPrices': sampleProduct.recPrices,
|
||||
'recSpecialPrices': sampleProduct.recSpecialPrices,
|
||||
'special': sampleProduct.special,
|
||||
'specialData.saleSpecials[0].specialName': sampleProduct.specialData?.saleSpecials?.[0]?.specialName,
|
||||
'specialData.saleSpecials[0].discount': sampleProduct.specialData?.saleSpecials?.[0]?.discount,
|
||||
'THCContent.range[0]': sampleProduct.THCContent?.range?.[0],
|
||||
'CBDContent.range[0]': sampleProduct.CBDContent?.range?.[0],
|
||||
'Status': sampleProduct.Status,
|
||||
'Image': sampleProduct.Image,
|
||||
'POSMetaData.canonicalSKU': sampleProduct.POSMetaData?.canonicalSKU,
|
||||
'POSMetaData.children[0].quantity': sampleProduct.POSMetaData?.children?.[0]?.quantity,
|
||||
'POSMetaData.children[0].quantityAvailable': sampleProduct.POSMetaData?.children?.[0]?.quantityAvailable,
|
||||
};
|
||||
|
||||
Object.entries(keyRawFields).forEach(([key, value]) => {
|
||||
console.log(` ${key}: ${JSON.stringify(value)}`);
|
||||
});
|
||||
|
||||
console.log('\n📤 NORMALIZED DATABASE ROW:');
|
||||
console.log('-'.repeat(80));
|
||||
|
||||
// Normalize the product
|
||||
const normalized: NormalizedProduct = normalizeDutchieProduct(sampleProduct);
|
||||
|
||||
// Show the normalized result (excluding raw_data for readability)
|
||||
const { raw_data, cannabinoids, special_data, ...displayFields } = normalized;
|
||||
|
||||
Object.entries(displayFields).forEach(([key, value]) => {
|
||||
if (value !== undefined && value !== null) {
|
||||
console.log(` ${key}: ${JSON.stringify(value)}`);
|
||||
}
|
||||
});
|
||||
|
||||
console.log('\n🔗 FIELD MAPPING:');
|
||||
console.log('-'.repeat(80));
|
||||
|
||||
const fieldMappings = [
|
||||
['_id / id', 'external_id', sampleProduct._id, normalized.external_id],
|
||||
['Name', 'name', sampleProduct.Name, normalized.name],
|
||||
['cName', 'slug', sampleProduct.cName, normalized.slug],
|
||||
['brandName', 'brand', sampleProduct.brandName, normalized.brand],
|
||||
['brand.id', 'brand_external_id', sampleProduct.brand?.id, normalized.brand_external_id],
|
||||
['subcategory', 'subcategory', sampleProduct.subcategory, normalized.subcategory],
|
||||
['strainType', 'strain_type', sampleProduct.strainType, normalized.strain_type],
|
||||
['recPrices[0]', 'rec_price', sampleProduct.recPrices?.[0], normalized.rec_price],
|
||||
['recSpecialPrices[0]', 'rec_special_price', sampleProduct.recSpecialPrices?.[0], normalized.rec_special_price],
|
||||
['special', 'is_on_special', sampleProduct.special, normalized.is_on_special],
|
||||
['specialData...specialName', 'special_name', sampleProduct.specialData?.saleSpecials?.[0]?.specialName?.substring(0, 40) + '...', normalized.special_name?.substring(0, 40) + '...'],
|
||||
['THCContent.range[0]', 'thc_percentage', sampleProduct.THCContent?.range?.[0], normalized.thc_percentage],
|
||||
['CBDContent.range[0]', 'cbd_percentage', sampleProduct.CBDContent?.range?.[0], normalized.cbd_percentage],
|
||||
['Status', 'status', sampleProduct.Status, normalized.status],
|
||||
['Image', 'image_url', sampleProduct.Image?.substring(0, 50) + '...', normalized.image_url?.substring(0, 50) + '...'],
|
||||
['POSMetaData.canonicalSKU', 'sku', sampleProduct.POSMetaData?.canonicalSKU, normalized.sku],
|
||||
];
|
||||
|
||||
console.log(' GraphQL Field → DB Column | Value');
|
||||
console.log(' ' + '-'.repeat(75));
|
||||
|
||||
fieldMappings.forEach(([gqlField, dbCol, gqlVal, dbVal]) => {
|
||||
const gqlStr = String(gqlField).padEnd(30);
|
||||
const dbStr = String(dbCol).padEnd(20);
|
||||
console.log(` ${gqlStr} → ${dbStr} | ${JSON.stringify(dbVal)}`);
|
||||
});
|
||||
|
||||
console.log('\n📊 SQL INSERT STATEMENT:');
|
||||
console.log('-'.repeat(80));
|
||||
|
||||
// Generate example SQL
|
||||
const sqlExample = `
|
||||
INSERT INTO products (
|
||||
store_id, external_id, slug, name,
|
||||
brand, brand_external_id,
|
||||
subcategory, strain_type,
|
||||
rec_price, rec_special_price,
|
||||
is_on_special, special_name, discount_percent,
|
||||
thc_percentage, cbd_percentage,
|
||||
status, image_url, sku
|
||||
) VALUES (
|
||||
1, -- store_id (Deeply Rooted)
|
||||
'${normalized.external_id}', -- external_id
|
||||
'${normalized.slug}', -- slug
|
||||
'${normalized.name}', -- name
|
||||
'${normalized.brand}', -- brand
|
||||
'${normalized.brand_external_id}', -- brand_external_id
|
||||
'${normalized.subcategory}', -- subcategory
|
||||
'${normalized.strain_type}', -- strain_type
|
||||
${normalized.rec_price}, -- rec_price
|
||||
${normalized.rec_special_price}, -- rec_special_price
|
||||
${normalized.is_on_special}, -- is_on_special
|
||||
'${normalized.special_name?.substring(0, 50)}...', -- special_name
|
||||
${normalized.discount_percent || 'NULL'}, -- discount_percent
|
||||
${normalized.thc_percentage}, -- thc_percentage
|
||||
${normalized.cbd_percentage}, -- cbd_percentage
|
||||
'${normalized.status}', -- status
|
||||
'${normalized.image_url}', -- image_url
|
||||
'${normalized.sku}' -- sku
|
||||
)
|
||||
ON CONFLICT (store_id, slug) DO UPDATE SET ...;
|
||||
`;
|
||||
|
||||
console.log(sqlExample);
|
||||
|
||||
console.log('\n✅ SUMMARY:');
|
||||
console.log('-'.repeat(80));
|
||||
console.log(` Product: ${normalized.name}`);
|
||||
console.log(` Brand: ${normalized.brand}`);
|
||||
console.log(` Category: ${normalized.subcategory}`);
|
||||
console.log(` Price: $${normalized.rec_price} → $${normalized.rec_special_price} (${normalized.discount_percent}% off)`);
|
||||
console.log(` THC: ${normalized.thc_percentage}%`);
|
||||
console.log(` Status: ${normalized.status}`);
|
||||
console.log(` On Special: ${normalized.is_on_special}`);
|
||||
console.log(` SKU: ${normalized.sku}`);
|
||||
|
||||
console.log('\n🎯 DERIVED VIEWS (computed from products table):');
|
||||
console.log('-'.repeat(80));
|
||||
console.log(' - current_specials: Products where is_on_special = true');
|
||||
console.log(' - derived_brands: Aggregated by brand name with counts/prices');
|
||||
console.log(' - derived_categories: Aggregated by subcategory');
|
||||
console.log('\nAll views are computed from the single products table - no separate tables needed!');
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user