fix: Public API column mappings and OOS detection
- Fix store_products column references (name_raw, brand_name_raw, category_raw) - Fix v_product_snapshots column references (crawled_at, *_cents pricing) - Fix dispensaries column references (zipcode, logo_image, remove hours/amenities) - Add services and license_type to dispensary API response - Add consecutive_misses OOS tracking to product-resync handler - Add migration 075 for consecutive_misses column - Add CRAWL_PIPELINE.md documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
308
backend/docs/CRAWL_PIPELINE.md
Normal file
308
backend/docs/CRAWL_PIPELINE.md
Normal file
@@ -0,0 +1,308 @@
|
|||||||
|
# Crawl Pipeline Documentation
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The crawl pipeline fetches product data from Dutchie dispensary menus and stores it in the canonical database. This document covers the complete flow from task scheduling to data storage.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pipeline Stages
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────┐
|
||||||
|
│ store_discovery │ Find new dispensaries
|
||||||
|
└─────────┬───────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────┐
|
||||||
|
│ entry_point_discovery│ Resolve slug → platform_dispensary_id
|
||||||
|
└─────────┬───────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────┐
|
||||||
|
│ product_discovery │ Initial product crawl
|
||||||
|
└─────────┬───────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────┐
|
||||||
|
│ product_resync │ Recurring crawl (every 4 hours)
|
||||||
|
└─────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Stage Details
|
||||||
|
|
||||||
|
### 1. Store Discovery
|
||||||
|
**Purpose:** Find new dispensaries to crawl
|
||||||
|
|
||||||
|
**Handler:** `src/tasks/handlers/store-discovery.ts`
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
1. Query Dutchie `ConsumerDispensaries` GraphQL for cities/states
|
||||||
|
2. Extract dispensary info (name, address, menu_url)
|
||||||
|
3. Insert into `dutchie_discovery_locations`
|
||||||
|
4. Queue `entry_point_discovery` for each new location
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 2. Entry Point Discovery
|
||||||
|
**Purpose:** Resolve menu URL slug to platform_dispensary_id (MongoDB ObjectId)
|
||||||
|
|
||||||
|
**Handler:** `src/tasks/handlers/entry-point-discovery.ts`
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
1. Load dispensary from database
|
||||||
|
2. Extract slug from `menu_url`:
|
||||||
|
- `/embedded-menu/<slug>` or `/dispensary/<slug>`
|
||||||
|
3. Start stealth session (fingerprint + proxy)
|
||||||
|
4. Query `resolveDispensaryIdWithDetails(slug)` via GraphQL
|
||||||
|
5. Update dispensary with `platform_dispensary_id`
|
||||||
|
6. Queue `product_discovery` task
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
```
|
||||||
|
menu_url: https://dutchie.com/embedded-menu/deeply-rooted
|
||||||
|
slug: deeply-rooted
|
||||||
|
platform_dispensary_id: 6405ef617056e8014d79101b
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 3. Product Discovery
|
||||||
|
**Purpose:** Initial crawl of a new dispensary
|
||||||
|
|
||||||
|
**Handler:** `src/tasks/handlers/product-discovery.ts`
|
||||||
|
|
||||||
|
Same as product_resync but for first-time crawls.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 4. Product Resync
|
||||||
|
**Purpose:** Recurring crawl to capture price/stock changes
|
||||||
|
|
||||||
|
**Handler:** `src/tasks/handlers/product-resync.ts`
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
|
||||||
|
#### Step 1: Load Dispensary Info
|
||||||
|
```sql
|
||||||
|
SELECT id, name, platform_dispensary_id, menu_url, state
|
||||||
|
FROM dispensaries
|
||||||
|
WHERE id = $1 AND crawl_enabled = true
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Step 2: Start Stealth Session
|
||||||
|
- Generate random browser fingerprint
|
||||||
|
- Set locale/timezone matching state
|
||||||
|
- Optional proxy rotation
|
||||||
|
|
||||||
|
#### Step 3: Fetch Products via GraphQL
|
||||||
|
**Endpoint:** `https://dutchie.com/api-3/graphql`
|
||||||
|
|
||||||
|
**Variables:**
|
||||||
|
```javascript
|
||||||
|
{
|
||||||
|
includeEnterpriseSpecials: false,
|
||||||
|
productsFilter: {
|
||||||
|
dispensaryId: "<platform_dispensary_id>",
|
||||||
|
pricingType: "rec",
|
||||||
|
Status: "All",
|
||||||
|
types: [],
|
||||||
|
useCache: false,
|
||||||
|
isDefaultSort: true,
|
||||||
|
sortBy: "popularSortIdx",
|
||||||
|
sortDirection: 1,
|
||||||
|
bypassOnlineThresholds: true,
|
||||||
|
isKioskMenu: false,
|
||||||
|
removeProductsBelowOptionThresholds: false
|
||||||
|
},
|
||||||
|
page: 0,
|
||||||
|
perPage: 100
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key Notes:**
|
||||||
|
- `Status: "All"` returns all products (Active returns same count)
|
||||||
|
- `Status: null` returns 0 products (broken)
|
||||||
|
- `pricingType: "rec"` returns BOTH rec and med prices
|
||||||
|
- Paginate until `products.length < perPage` or `allProducts.length >= totalCount`
|
||||||
|
|
||||||
|
#### Step 4: Normalize Data
|
||||||
|
Transform raw Dutchie payload to canonical format via `DutchieNormalizer`.
|
||||||
|
|
||||||
|
#### Step 5: Upsert Products
|
||||||
|
Insert/update `store_products` table with normalized data.
|
||||||
|
|
||||||
|
#### Step 6: Create Snapshots
|
||||||
|
Insert point-in-time record to `store_product_snapshots`.
|
||||||
|
|
||||||
|
#### Step 7: Track Missing Products (OOS Detection)
|
||||||
|
```sql
|
||||||
|
-- Reset consecutive_misses for products IN the feed
|
||||||
|
UPDATE store_products
|
||||||
|
SET consecutive_misses = 0, last_seen_at = NOW()
|
||||||
|
WHERE dispensary_id = $1
|
||||||
|
AND provider = 'dutchie'
|
||||||
|
AND provider_product_id = ANY($2)
|
||||||
|
|
||||||
|
-- Increment for products NOT in feed
|
||||||
|
UPDATE store_products
|
||||||
|
SET consecutive_misses = consecutive_misses + 1
|
||||||
|
WHERE dispensary_id = $1
|
||||||
|
AND provider = 'dutchie'
|
||||||
|
AND provider_product_id NOT IN (...)
|
||||||
|
AND consecutive_misses < 3
|
||||||
|
|
||||||
|
-- Mark OOS at 3 consecutive misses
|
||||||
|
UPDATE store_products
|
||||||
|
SET stock_status = 'oos', is_in_stock = false
|
||||||
|
WHERE dispensary_id = $1
|
||||||
|
AND consecutive_misses >= 3
|
||||||
|
AND stock_status != 'oos'
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Step 8: Download Images
|
||||||
|
For new products, download and store images locally.
|
||||||
|
|
||||||
|
#### Step 9: Update Dispensary
|
||||||
|
```sql
|
||||||
|
UPDATE dispensaries SET last_crawl_at = NOW() WHERE id = $1
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## GraphQL Payload Structure
|
||||||
|
|
||||||
|
### Product Fields (from filteredProducts.products[])
|
||||||
|
|
||||||
|
| Field | Type | Description |
|
||||||
|
|-------|------|-------------|
|
||||||
|
| `_id` / `id` | string | MongoDB ObjectId (24 hex chars) |
|
||||||
|
| `Name` | string | Product display name |
|
||||||
|
| `brandName` | string | Brand name |
|
||||||
|
| `brand.name` | string | Brand name (nested) |
|
||||||
|
| `brand.description` | string | Brand description |
|
||||||
|
| `type` | string | Category (Flower, Edible, Concentrate, etc.) |
|
||||||
|
| `subcategory` | string | Subcategory |
|
||||||
|
| `strainType` | string | Hybrid, Indica, Sativa, N/A |
|
||||||
|
| `Status` | string | Always "Active" in feed |
|
||||||
|
| `Image` | string | Primary image URL |
|
||||||
|
| `images[]` | array | All product images |
|
||||||
|
|
||||||
|
### Pricing Fields
|
||||||
|
|
||||||
|
| Field | Type | Description |
|
||||||
|
|-------|------|-------------|
|
||||||
|
| `Prices[]` | number[] | Rec prices per option |
|
||||||
|
| `recPrices[]` | number[] | Rec prices |
|
||||||
|
| `medicalPrices[]` | number[] | Medical prices |
|
||||||
|
| `recSpecialPrices[]` | number[] | Rec sale prices |
|
||||||
|
| `medicalSpecialPrices[]` | number[] | Medical sale prices |
|
||||||
|
| `Options[]` | string[] | Size options ("1/8oz", "1g", etc.) |
|
||||||
|
| `rawOptions[]` | string[] | Raw weight options ("3.5g") |
|
||||||
|
|
||||||
|
### Inventory Fields (POSMetaData.children[])
|
||||||
|
|
||||||
|
| Field | Type | Description |
|
||||||
|
|-------|------|-------------|
|
||||||
|
| `quantity` | number | Total inventory count |
|
||||||
|
| `quantityAvailable` | number | Available for online orders |
|
||||||
|
| `kioskQuantityAvailable` | number | Available for kiosk orders |
|
||||||
|
| `option` | string | Which size option this is for |
|
||||||
|
|
||||||
|
### Potency Fields
|
||||||
|
|
||||||
|
| Field | Type | Description |
|
||||||
|
|-------|------|-------------|
|
||||||
|
| `THCContent.range[]` | number[] | THC percentage |
|
||||||
|
| `CBDContent.range[]` | number[] | CBD percentage |
|
||||||
|
| `cannabinoidsV2[]` | array | Detailed cannabinoid breakdown |
|
||||||
|
|
||||||
|
### Specials (specialData.bogoSpecials[])
|
||||||
|
|
||||||
|
| Field | Type | Description |
|
||||||
|
|-------|------|-------------|
|
||||||
|
| `specialName` | string | Deal name |
|
||||||
|
| `specialType` | string | "bogo", "sale", etc. |
|
||||||
|
| `itemsForAPrice.value` | string | Bundle price |
|
||||||
|
| `bogoRewards[].totalQuantity.quantity` | number | Required quantity |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## OOS Detection Logic
|
||||||
|
|
||||||
|
Products disappear from the Dutchie feed when they go out of stock. We track this via `consecutive_misses`:
|
||||||
|
|
||||||
|
| Scenario | Action |
|
||||||
|
|----------|--------|
|
||||||
|
| Product in feed | `consecutive_misses = 0` |
|
||||||
|
| Product missing 1st time | `consecutive_misses = 1` |
|
||||||
|
| Product missing 2nd time | `consecutive_misses = 2` |
|
||||||
|
| Product missing 3rd time | `consecutive_misses = 3`, mark `stock_status = 'oos'` |
|
||||||
|
| Product returns to feed | `consecutive_misses = 0`, update stock_status |
|
||||||
|
|
||||||
|
**Why 3 misses?**
|
||||||
|
- Protects against false positives from crawl failures
|
||||||
|
- Single bad crawl doesn't trigger mass OOS alerts
|
||||||
|
- Balances detection speed vs accuracy
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Database Tables
|
||||||
|
|
||||||
|
### store_products
|
||||||
|
Current state of each product:
|
||||||
|
- `provider_product_id` - Dutchie's MongoDB ObjectId
|
||||||
|
- `name_raw`, `brand_name_raw` - Raw values from feed
|
||||||
|
- `price_rec`, `price_med` - Current prices
|
||||||
|
- `is_in_stock`, `stock_status` - Availability
|
||||||
|
- `consecutive_misses` - OOS detection counter
|
||||||
|
- `last_seen_at` - Last time product was in feed
|
||||||
|
|
||||||
|
### store_product_snapshots
|
||||||
|
Point-in-time records for historical analysis:
|
||||||
|
- One row per product per crawl
|
||||||
|
- Captures price, stock, potency at that moment
|
||||||
|
- Used for price history, analytics
|
||||||
|
|
||||||
|
### dispensaries
|
||||||
|
Store metadata:
|
||||||
|
- `platform_dispensary_id` - MongoDB ObjectId for GraphQL
|
||||||
|
- `menu_url` - Source URL
|
||||||
|
- `last_crawl_at` - Last successful crawl
|
||||||
|
- `crawl_enabled` - Whether to crawl
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Scheduling
|
||||||
|
|
||||||
|
Crawls are scheduled via `worker_tasks` table:
|
||||||
|
|
||||||
|
| Role | Frequency | Description |
|
||||||
|
|------|-----------|-------------|
|
||||||
|
| `product_resync` | Every 4 hours | Regular product refresh |
|
||||||
|
| `entry_point_discovery` | On-demand | New store setup |
|
||||||
|
| `store_discovery` | Daily | Find new stores |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Error Handling
|
||||||
|
|
||||||
|
- **GraphQL errors:** Logged, task marked failed, retried later
|
||||||
|
- **Normalization errors:** Logged as warnings, continue with valid products
|
||||||
|
- **Image download errors:** Non-fatal, logged, continue
|
||||||
|
- **Database errors:** Task fails, will be retried
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Files
|
||||||
|
|
||||||
|
| File | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `src/tasks/handlers/product-resync.ts` | Main crawl handler |
|
||||||
|
| `src/tasks/handlers/entry-point-discovery.ts` | Slug → ID resolution |
|
||||||
|
| `src/platforms/dutchie/index.ts` | GraphQL client, session management |
|
||||||
|
| `src/hydration/normalizers/dutchie.ts` | Payload normalization |
|
||||||
|
| `src/hydration/canonical-upsert.ts` | Database upsert logic |
|
||||||
|
| `migrations/075_consecutive_misses.sql` | OOS tracking column |
|
||||||
13
backend/migrations/075_consecutive_misses.sql
Normal file
13
backend/migrations/075_consecutive_misses.sql
Normal file
@@ -0,0 +1,13 @@
|
|||||||
|
-- Migration 075: Add consecutive_misses column to store_products
|
||||||
|
-- Used to track how many consecutive crawls a product has been missing from the feed
|
||||||
|
-- After 3 consecutive misses, product is marked as OOS
|
||||||
|
|
||||||
|
ALTER TABLE store_products
|
||||||
|
ADD COLUMN IF NOT EXISTS consecutive_misses INTEGER NOT NULL DEFAULT 0;
|
||||||
|
|
||||||
|
-- Index for finding products that need OOS check
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_store_products_consecutive_misses
|
||||||
|
ON store_products (dispensary_id, consecutive_misses)
|
||||||
|
WHERE consecutive_misses > 0;
|
||||||
|
|
||||||
|
COMMENT ON COLUMN store_products.consecutive_misses IS 'Number of consecutive crawls where product was not in feed. Reset to 0 when seen. At 3, mark OOS.';
|
||||||
@@ -430,14 +430,14 @@ router.get('/products', async (req: PublicApiRequest, res: Response) => {
|
|||||||
|
|
||||||
// Filter by category
|
// Filter by category
|
||||||
if (category) {
|
if (category) {
|
||||||
whereClause += ` AND LOWER(p.category) = LOWER($${paramIndex})`;
|
whereClause += ` AND LOWER(p.category_raw) = LOWER($${paramIndex})`;
|
||||||
params.push(category);
|
params.push(category);
|
||||||
paramIndex++;
|
paramIndex++;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Filter by brand
|
// Filter by brand
|
||||||
if (brand) {
|
if (brand) {
|
||||||
whereClause += ` AND LOWER(p.brand_name) LIKE LOWER($${paramIndex})`;
|
whereClause += ` AND LOWER(p.brand_name_raw) LIKE LOWER($${paramIndex})`;
|
||||||
params.push(`%${brand}%`);
|
params.push(`%${brand}%`);
|
||||||
paramIndex++;
|
paramIndex++;
|
||||||
}
|
}
|
||||||
@@ -468,7 +468,7 @@ router.get('/products', async (req: PublicApiRequest, res: Response) => {
|
|||||||
|
|
||||||
// Search by name or brand
|
// Search by name or brand
|
||||||
if (search) {
|
if (search) {
|
||||||
whereClause += ` AND (LOWER(p.name) LIKE LOWER($${paramIndex}) OR LOWER(p.brand_name) LIKE LOWER($${paramIndex}))`;
|
whereClause += ` AND (LOWER(p.name_raw) LIKE LOWER($${paramIndex}) OR LOWER(p.brand_name_raw) LIKE LOWER($${paramIndex}))`;
|
||||||
params.push(`%${search}%`);
|
params.push(`%${search}%`);
|
||||||
paramIndex++;
|
paramIndex++;
|
||||||
}
|
}
|
||||||
@@ -479,10 +479,11 @@ router.get('/products', async (req: PublicApiRequest, res: Response) => {
|
|||||||
|
|
||||||
// Build ORDER BY clause (use pricing_type for price sorting)
|
// Build ORDER BY clause (use pricing_type for price sorting)
|
||||||
const sortDirection = sort_dir === 'desc' ? 'DESC' : 'ASC';
|
const sortDirection = sort_dir === 'desc' ? 'DESC' : 'ASC';
|
||||||
let orderBy = 'p.name ASC';
|
let orderBy = 'p.name_raw ASC';
|
||||||
switch (sort_by) {
|
switch (sort_by) {
|
||||||
case 'price':
|
case 'price':
|
||||||
const sortPriceCol = pricing_type === 'med' ? 's.price_med' : 's.price_rec';
|
// View uses *_cents columns, but we SELECT as price_rec/price_med
|
||||||
|
const sortPriceCol = pricing_type === 'med' ? 's.med_min_price_cents' : 's.rec_min_price_cents';
|
||||||
orderBy = `${sortPriceCol} ${sortDirection} NULLS LAST`;
|
orderBy = `${sortPriceCol} ${sortDirection} NULLS LAST`;
|
||||||
break;
|
break;
|
||||||
case 'thc':
|
case 'thc':
|
||||||
@@ -493,13 +494,14 @@ router.get('/products', async (req: PublicApiRequest, res: Response) => {
|
|||||||
break;
|
break;
|
||||||
case 'name':
|
case 'name':
|
||||||
default:
|
default:
|
||||||
orderBy = `p.name ${sortDirection}`;
|
orderBy = `p.name_raw ${sortDirection}`;
|
||||||
}
|
}
|
||||||
|
|
||||||
params.push(limitNum, offsetNum);
|
params.push(limitNum, offsetNum);
|
||||||
|
|
||||||
// Determine which price column to use for filtering based on pricing_type
|
// Determine which price column to use for filtering based on pricing_type
|
||||||
const priceColumn = pricing_type === 'med' ? 's.price_med' : 's.price_rec';
|
// View uses *_cents columns, divide by 100 for dollar comparison
|
||||||
|
const priceColumn = pricing_type === 'med' ? 's.med_min_price_cents / 100.0' : 's.rec_min_price_cents / 100.0';
|
||||||
|
|
||||||
// Query products with latest snapshot data
|
// Query products with latest snapshot data
|
||||||
// Uses store_products + v_product_snapshots (canonical tables with raw_data)
|
// Uses store_products + v_product_snapshots (canonical tables with raw_data)
|
||||||
@@ -508,10 +510,10 @@ router.get('/products', async (req: PublicApiRequest, res: Response) => {
|
|||||||
p.id,
|
p.id,
|
||||||
p.dispensary_id,
|
p.dispensary_id,
|
||||||
p.provider_product_id as dutchie_id,
|
p.provider_product_id as dutchie_id,
|
||||||
p.name,
|
p.name_raw as name,
|
||||||
p.brand_name as brand,
|
p.brand_name_raw as brand,
|
||||||
p.category,
|
p.category_raw as category,
|
||||||
p.subcategory,
|
p.subcategory_raw as subcategory,
|
||||||
p.strain_type,
|
p.strain_type,
|
||||||
p.stock_status,
|
p.stock_status,
|
||||||
p.thc_percent as thc,
|
p.thc_percent as thc,
|
||||||
@@ -519,19 +521,19 @@ router.get('/products', async (req: PublicApiRequest, res: Response) => {
|
|||||||
p.image_url,
|
p.image_url,
|
||||||
p.created_at,
|
p.created_at,
|
||||||
p.updated_at,
|
p.updated_at,
|
||||||
s.price_rec,
|
s.rec_min_price_cents / 100.0 as price_rec,
|
||||||
s.price_med,
|
s.med_min_price_cents / 100.0 as price_med,
|
||||||
s.price_rec_special,
|
s.rec_min_special_price_cents / 100.0 as price_rec_special,
|
||||||
s.price_med_special,
|
s.med_min_special_price_cents / 100.0 as price_med_special,
|
||||||
s.stock_quantity as total_quantity_available,
|
s.stock_quantity as total_quantity_available,
|
||||||
s.is_on_special as special,
|
s.special,
|
||||||
s.captured_at as snapshot_at,
|
s.crawled_at as snapshot_at,
|
||||||
${include_variants === 'true' || include_variants === '1' ? "s.raw_data->'POSMetaData'->'children' as variants_raw" : 'NULL as variants_raw'}
|
${include_variants === 'true' || include_variants === '1' ? "s.raw_data->'POSMetaData'->'children' as variants_raw" : 'NULL as variants_raw'}
|
||||||
FROM store_products p
|
FROM store_products p
|
||||||
LEFT JOIN LATERAL (
|
LEFT JOIN LATERAL (
|
||||||
SELECT * FROM v_product_snapshots
|
SELECT * FROM v_product_snapshots
|
||||||
WHERE store_product_id = p.id
|
WHERE store_product_id = p.id
|
||||||
ORDER BY captured_at DESC
|
ORDER BY crawled_at DESC
|
||||||
LIMIT 1
|
LIMIT 1
|
||||||
) s ON true
|
) s ON true
|
||||||
${whereClause}
|
${whereClause}
|
||||||
@@ -545,9 +547,9 @@ router.get('/products', async (req: PublicApiRequest, res: Response) => {
|
|||||||
const { rows: countRows } = await pool.query(`
|
const { rows: countRows } = await pool.query(`
|
||||||
SELECT COUNT(*) as total FROM store_products p
|
SELECT COUNT(*) as total FROM store_products p
|
||||||
LEFT JOIN LATERAL (
|
LEFT JOIN LATERAL (
|
||||||
SELECT price_rec, price_med, is_on_special FROM v_product_snapshots
|
SELECT rec_min_price_cents / 100.0 as price_rec, med_min_price_cents / 100.0 as price_med, special as is_on_special FROM v_product_snapshots
|
||||||
WHERE store_product_id = p.id
|
WHERE store_product_id = p.id
|
||||||
ORDER BY captured_at DESC
|
ORDER BY crawled_at DESC
|
||||||
LIMIT 1
|
LIMIT 1
|
||||||
) s ON true
|
) s ON true
|
||||||
${whereClause}
|
${whereClause}
|
||||||
@@ -1002,22 +1004,27 @@ router.get('/dispensaries', async (req: PublicApiRequest, res: Response) => {
|
|||||||
SELECT
|
SELECT
|
||||||
d.id,
|
d.id,
|
||||||
d.name,
|
d.name,
|
||||||
d.address,
|
d.address1,
|
||||||
|
d.address2,
|
||||||
d.city,
|
d.city,
|
||||||
d.state,
|
d.state,
|
||||||
d.zip,
|
d.zipcode as zip,
|
||||||
d.phone,
|
d.phone,
|
||||||
|
d.email,
|
||||||
d.website,
|
d.website,
|
||||||
d.latitude,
|
d.latitude,
|
||||||
d.longitude,
|
d.longitude,
|
||||||
d.menu_type as platform,
|
d.menu_type as platform,
|
||||||
d.menu_url,
|
d.menu_url,
|
||||||
d.hours,
|
|
||||||
d.amenities,
|
|
||||||
d.description,
|
d.description,
|
||||||
d.image_url,
|
d.logo_image as image_url,
|
||||||
d.google_rating,
|
d.google_rating,
|
||||||
d.google_review_count,
|
d.google_review_count,
|
||||||
|
d.offer_pickup,
|
||||||
|
d.offer_delivery,
|
||||||
|
d.offer_curbside_pickup,
|
||||||
|
d.is_medical,
|
||||||
|
d.is_recreational,
|
||||||
COALESCE(pc.product_count, 0) as product_count,
|
COALESCE(pc.product_count, 0) as product_count,
|
||||||
COALESCE(pc.in_stock_count, 0) as in_stock_count,
|
COALESCE(pc.in_stock_count, 0) as in_stock_count,
|
||||||
pc.last_updated
|
pc.last_updated
|
||||||
@@ -1051,11 +1058,13 @@ router.get('/dispensaries', async (req: PublicApiRequest, res: Response) => {
|
|||||||
dispensaries: [{
|
dispensaries: [{
|
||||||
id: d.id,
|
id: d.id,
|
||||||
name: d.name,
|
name: d.name,
|
||||||
address: d.address,
|
address1: d.address1,
|
||||||
|
address2: d.address2,
|
||||||
city: d.city,
|
city: d.city,
|
||||||
state: d.state,
|
state: d.state,
|
||||||
zip: d.zip,
|
zip: d.zip,
|
||||||
phone: d.phone,
|
phone: d.phone,
|
||||||
|
email: d.email,
|
||||||
website: d.website,
|
website: d.website,
|
||||||
menu_url: d.menu_url,
|
menu_url: d.menu_url,
|
||||||
location: d.latitude && d.longitude ? {
|
location: d.latitude && d.longitude ? {
|
||||||
@@ -1063,10 +1072,17 @@ router.get('/dispensaries', async (req: PublicApiRequest, res: Response) => {
|
|||||||
longitude: parseFloat(d.longitude)
|
longitude: parseFloat(d.longitude)
|
||||||
} : null,
|
} : null,
|
||||||
platform: d.platform,
|
platform: d.platform,
|
||||||
hours: d.hours || null,
|
|
||||||
amenities: d.amenities || [],
|
|
||||||
description: d.description || null,
|
description: d.description || null,
|
||||||
image_url: d.image_url || null,
|
image_url: d.image_url || null,
|
||||||
|
services: {
|
||||||
|
pickup: d.offer_pickup || false,
|
||||||
|
delivery: d.offer_delivery || false,
|
||||||
|
curbside: d.offer_curbside_pickup || false
|
||||||
|
},
|
||||||
|
license_type: {
|
||||||
|
medical: d.is_medical || false,
|
||||||
|
recreational: d.is_recreational || false
|
||||||
|
},
|
||||||
rating: d.google_rating ? parseFloat(d.google_rating) : null,
|
rating: d.google_rating ? parseFloat(d.google_rating) : null,
|
||||||
review_count: d.google_review_count ? parseInt(d.google_review_count, 10) : null,
|
review_count: d.google_review_count ? parseInt(d.google_review_count, 10) : null,
|
||||||
product_count: parseInt(d.product_count || '0', 10),
|
product_count: parseInt(d.product_count || '0', 10),
|
||||||
@@ -1109,22 +1125,27 @@ router.get('/dispensaries', async (req: PublicApiRequest, res: Response) => {
|
|||||||
SELECT
|
SELECT
|
||||||
d.id,
|
d.id,
|
||||||
d.name,
|
d.name,
|
||||||
d.address,
|
d.address1,
|
||||||
|
d.address2,
|
||||||
d.city,
|
d.city,
|
||||||
d.state,
|
d.state,
|
||||||
d.zip,
|
d.zipcode as zip,
|
||||||
d.phone,
|
d.phone,
|
||||||
|
d.email,
|
||||||
d.website,
|
d.website,
|
||||||
d.latitude,
|
d.latitude,
|
||||||
d.longitude,
|
d.longitude,
|
||||||
d.menu_type as platform,
|
d.menu_type as platform,
|
||||||
d.menu_url,
|
d.menu_url,
|
||||||
d.hours,
|
|
||||||
d.amenities,
|
|
||||||
d.description,
|
d.description,
|
||||||
d.image_url,
|
d.logo_image as image_url,
|
||||||
d.google_rating,
|
d.google_rating,
|
||||||
d.google_review_count,
|
d.google_review_count,
|
||||||
|
d.offer_pickup,
|
||||||
|
d.offer_delivery,
|
||||||
|
d.offer_curbside_pickup,
|
||||||
|
d.is_medical,
|
||||||
|
d.is_recreational,
|
||||||
COALESCE(pc.product_count, 0) as product_count,
|
COALESCE(pc.product_count, 0) as product_count,
|
||||||
COALESCE(pc.in_stock_count, 0) as in_stock_count,
|
COALESCE(pc.in_stock_count, 0) as in_stock_count,
|
||||||
pc.last_updated
|
pc.last_updated
|
||||||
@@ -1158,11 +1179,13 @@ router.get('/dispensaries', async (req: PublicApiRequest, res: Response) => {
|
|||||||
const transformedDispensaries = dispensaries.map((d) => ({
|
const transformedDispensaries = dispensaries.map((d) => ({
|
||||||
id: d.id,
|
id: d.id,
|
||||||
name: d.name,
|
name: d.name,
|
||||||
address: d.address,
|
address1: d.address1,
|
||||||
|
address2: d.address2,
|
||||||
city: d.city,
|
city: d.city,
|
||||||
state: d.state,
|
state: d.state,
|
||||||
zip: d.zip,
|
zip: d.zip,
|
||||||
phone: d.phone,
|
phone: d.phone,
|
||||||
|
email: d.email,
|
||||||
website: d.website,
|
website: d.website,
|
||||||
menu_url: d.menu_url,
|
menu_url: d.menu_url,
|
||||||
location: d.latitude && d.longitude ? {
|
location: d.latitude && d.longitude ? {
|
||||||
@@ -1170,10 +1193,17 @@ router.get('/dispensaries', async (req: PublicApiRequest, res: Response) => {
|
|||||||
longitude: parseFloat(d.longitude)
|
longitude: parseFloat(d.longitude)
|
||||||
} : null,
|
} : null,
|
||||||
platform: d.platform,
|
platform: d.platform,
|
||||||
hours: d.hours || null,
|
|
||||||
amenities: d.amenities || [],
|
|
||||||
description: d.description || null,
|
description: d.description || null,
|
||||||
image_url: d.image_url || null,
|
image_url: d.image_url || null,
|
||||||
|
services: {
|
||||||
|
pickup: d.offer_pickup || false,
|
||||||
|
delivery: d.offer_delivery || false,
|
||||||
|
curbside: d.offer_curbside_pickup || false
|
||||||
|
},
|
||||||
|
license_type: {
|
||||||
|
medical: d.is_medical || false,
|
||||||
|
recreational: d.is_recreational || false
|
||||||
|
},
|
||||||
rating: d.google_rating ? parseFloat(d.google_rating) : null,
|
rating: d.google_rating ? parseFloat(d.google_rating) : null,
|
||||||
review_count: d.google_review_count ? parseInt(d.google_review_count, 10) : null,
|
review_count: d.google_review_count ? parseInt(d.google_review_count, 10) : null,
|
||||||
product_count: parseInt(d.product_count || '0', 10),
|
product_count: parseInt(d.product_count || '0', 10),
|
||||||
@@ -1415,8 +1445,8 @@ router.get('/stores/:id/metrics', async (req: PublicApiRequest, res: Response) =
|
|||||||
COUNT(*) as total_products,
|
COUNT(*) as total_products,
|
||||||
COUNT(*) FILTER (WHERE stock_status = 'in_stock') as in_stock,
|
COUNT(*) FILTER (WHERE stock_status = 'in_stock') as in_stock,
|
||||||
COUNT(*) FILTER (WHERE stock_status = 'out_of_stock') as out_of_stock,
|
COUNT(*) FILTER (WHERE stock_status = 'out_of_stock') as out_of_stock,
|
||||||
COUNT(DISTINCT brand_name) FILTER (WHERE brand_name IS NOT NULL) as unique_brands,
|
COUNT(DISTINCT brand_name_raw) FILTER (WHERE brand_name_raw IS NOT NULL) as unique_brands,
|
||||||
COUNT(DISTINCT category) FILTER (WHERE category IS NOT NULL) as unique_categories
|
COUNT(DISTINCT category_raw) FILTER (WHERE category_raw IS NOT NULL) as unique_categories
|
||||||
FROM store_products
|
FROM store_products
|
||||||
WHERE dispensary_id = $1
|
WHERE dispensary_id = $1
|
||||||
`, [storeId]);
|
`, [storeId]);
|
||||||
@@ -1441,12 +1471,12 @@ router.get('/stores/:id/metrics', async (req: PublicApiRequest, res: Response) =
|
|||||||
// Get category breakdown
|
// Get category breakdown
|
||||||
const { rows: categoryBreakdown } = await pool.query(`
|
const { rows: categoryBreakdown } = await pool.query(`
|
||||||
SELECT
|
SELECT
|
||||||
COALESCE(category, 'Uncategorized') as category,
|
COALESCE(category_raw, 'Uncategorized') as category,
|
||||||
COUNT(*) as count,
|
COUNT(*) as count,
|
||||||
COUNT(*) FILTER (WHERE stock_status = 'in_stock') as in_stock
|
COUNT(*) FILTER (WHERE stock_status = 'in_stock') as in_stock
|
||||||
FROM store_products
|
FROM store_products
|
||||||
WHERE dispensary_id = $1
|
WHERE dispensary_id = $1
|
||||||
GROUP BY category
|
GROUP BY category_raw
|
||||||
ORDER BY count DESC
|
ORDER BY count DESC
|
||||||
LIMIT 10
|
LIMIT 10
|
||||||
`, [storeId]);
|
`, [storeId]);
|
||||||
@@ -1584,9 +1614,9 @@ router.get('/stores/:id/product-metrics', async (req: PublicApiRequest, res: Res
|
|||||||
)
|
)
|
||||||
SELECT
|
SELECT
|
||||||
sp.id,
|
sp.id,
|
||||||
sp.name,
|
sp.name_raw as name,
|
||||||
sp.brand_name,
|
sp.brand_name_raw as brand_name,
|
||||||
sp.category,
|
sp.category_raw as category,
|
||||||
sp.stock_status,
|
sp.stock_status,
|
||||||
ls.current_price,
|
ls.current_price,
|
||||||
ls.current_special_price,
|
ls.current_special_price,
|
||||||
@@ -1606,7 +1636,7 @@ router.get('/stores/:id/product-metrics', async (req: PublicApiRequest, res: Res
|
|||||||
${whereClause}
|
${whereClause}
|
||||||
ORDER BY
|
ORDER BY
|
||||||
${sort_by === 'price' ? 'ls.current_price DESC NULLS LAST' :
|
${sort_by === 'price' ? 'ls.current_price DESC NULLS LAST' :
|
||||||
sort_by === 'stock_status' ? "CASE sp.stock_status WHEN 'out_of_stock' THEN 0 ELSE 1 END, sp.name" :
|
sort_by === 'stock_status' ? "CASE sp.stock_status WHEN 'out_of_stock' THEN 0 ELSE 1 END, sp.name_raw" :
|
||||||
'ABS(COALESCE(price_change_percent, 0)) DESC'}
|
'ABS(COALESCE(price_change_percent, 0)) DESC'}
|
||||||
LIMIT $${paramIndex}
|
LIMIT $${paramIndex}
|
||||||
`, params);
|
`, params);
|
||||||
@@ -1719,7 +1749,7 @@ router.get('/stores/:id/competitor-snapshot', async (req: PublicApiRequest, res:
|
|||||||
// Get this store's average prices by category
|
// Get this store's average prices by category
|
||||||
const { rows: storePrices } = await pool.query(`
|
const { rows: storePrices } = await pool.query(`
|
||||||
SELECT
|
SELECT
|
||||||
sp.category,
|
sp.category_raw as category,
|
||||||
ROUND(AVG(sps.price_rec)::numeric, 2) as avg_price,
|
ROUND(AVG(sps.price_rec)::numeric, 2) as avg_price,
|
||||||
COUNT(*) as product_count
|
COUNT(*) as product_count
|
||||||
FROM store_products sp
|
FROM store_products sp
|
||||||
@@ -1729,8 +1759,8 @@ router.get('/stores/:id/competitor-snapshot', async (req: PublicApiRequest, res:
|
|||||||
WHERE dispensary_id = $1
|
WHERE dispensary_id = $1
|
||||||
ORDER BY store_product_id, captured_at DESC
|
ORDER BY store_product_id, captured_at DESC
|
||||||
) sps ON sp.id = sps.store_product_id
|
) sps ON sp.id = sps.store_product_id
|
||||||
WHERE sp.dispensary_id = $1 AND sp.category IS NOT NULL AND sps.price_rec > 0
|
WHERE sp.dispensary_id = $1 AND sp.category_raw IS NOT NULL AND sps.price_rec > 0
|
||||||
GROUP BY sp.category
|
GROUP BY sp.category_raw
|
||||||
`, [storeId]);
|
`, [storeId]);
|
||||||
|
|
||||||
// Get market average prices by category (all competitors)
|
// Get market average prices by category (all competitors)
|
||||||
@@ -1740,7 +1770,7 @@ router.get('/stores/:id/competitor-snapshot', async (req: PublicApiRequest, res:
|
|||||||
if (competitorIds.length > 0) {
|
if (competitorIds.length > 0) {
|
||||||
const { rows } = await pool.query(`
|
const { rows } = await pool.query(`
|
||||||
SELECT
|
SELECT
|
||||||
sp.category,
|
sp.category_raw as category,
|
||||||
ROUND(AVG(sps.price_rec)::numeric, 2) as market_avg_price,
|
ROUND(AVG(sps.price_rec)::numeric, 2) as market_avg_price,
|
||||||
COUNT(DISTINCT sp.dispensary_id) as store_count
|
COUNT(DISTINCT sp.dispensary_id) as store_count
|
||||||
FROM store_products sp
|
FROM store_products sp
|
||||||
@@ -1750,17 +1780,17 @@ router.get('/stores/:id/competitor-snapshot', async (req: PublicApiRequest, res:
|
|||||||
WHERE dispensary_id = ANY($1)
|
WHERE dispensary_id = ANY($1)
|
||||||
ORDER BY store_product_id, captured_at DESC
|
ORDER BY store_product_id, captured_at DESC
|
||||||
) sps ON sp.id = sps.store_product_id
|
) sps ON sp.id = sps.store_product_id
|
||||||
WHERE sp.dispensary_id = ANY($1) AND sp.category IS NOT NULL AND sps.price_rec > 0
|
WHERE sp.dispensary_id = ANY($1) AND sp.category_raw IS NOT NULL AND sps.price_rec > 0
|
||||||
GROUP BY sp.category
|
GROUP BY sp.category_raw
|
||||||
`, [competitorIds]);
|
`, [competitorIds]);
|
||||||
marketPrices = rows;
|
marketPrices = rows;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Get this store's brands
|
// Get this store's brands
|
||||||
const { rows: storeBrands } = await pool.query(`
|
const { rows: storeBrands } = await pool.query(`
|
||||||
SELECT DISTINCT brand_name
|
SELECT DISTINCT brand_name_raw as brand_name
|
||||||
FROM store_products
|
FROM store_products
|
||||||
WHERE dispensary_id = $1 AND brand_name IS NOT NULL
|
WHERE dispensary_id = $1 AND brand_name_raw IS NOT NULL
|
||||||
`, [storeId]);
|
`, [storeId]);
|
||||||
|
|
||||||
const storeBrandSet = new Set(storeBrands.map(b => b.brand_name.toLowerCase()));
|
const storeBrandSet = new Set(storeBrands.map(b => b.brand_name.toLowerCase()));
|
||||||
@@ -1772,13 +1802,13 @@ router.get('/stores/:id/competitor-snapshot', async (req: PublicApiRequest, res:
|
|||||||
SELECT
|
SELECT
|
||||||
d.id as competitor_id,
|
d.id as competitor_id,
|
||||||
d.name as competitor_name,
|
d.name as competitor_name,
|
||||||
COUNT(DISTINCT sp.brand_name) as total_brands,
|
COUNT(DISTINCT sp.brand_name_raw) as total_brands,
|
||||||
COUNT(DISTINCT sp.brand_name) FILTER (
|
COUNT(DISTINCT sp.brand_name_raw) FILTER (
|
||||||
WHERE LOWER(sp.brand_name) = ANY($2)
|
WHERE LOWER(sp.brand_name_raw) = ANY($2)
|
||||||
) as shared_brands
|
) as shared_brands
|
||||||
FROM dispensaries d
|
FROM dispensaries d
|
||||||
INNER JOIN store_products sp ON sp.dispensary_id = d.id
|
INNER JOIN store_products sp ON sp.dispensary_id = d.id
|
||||||
WHERE d.id = ANY($1) AND sp.brand_name IS NOT NULL
|
WHERE d.id = ANY($1) AND sp.brand_name_raw IS NOT NULL
|
||||||
GROUP BY d.id, d.name
|
GROUP BY d.id, d.name
|
||||||
`, [competitorIds, Array.from(storeBrandSet)]);
|
`, [competitorIds, Array.from(storeBrandSet)]);
|
||||||
brandOverlap = rows;
|
brandOverlap = rows;
|
||||||
@@ -1835,6 +1865,39 @@ router.get('/stores/:id/competitor-snapshot', async (req: PublicApiRequest, res:
|
|||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
|
/**
|
||||||
|
* GET /api/v1/stats
|
||||||
|
* Get aggregate stats for consumer sites (product count, brand count, dispensary count)
|
||||||
|
*/
|
||||||
|
router.get('/stats', async (req: PublicApiRequest, res: Response) => {
|
||||||
|
try {
|
||||||
|
// Get aggregate stats across all data
|
||||||
|
const { rows: stats } = await pool.query(`
|
||||||
|
SELECT
|
||||||
|
(SELECT COUNT(*) FROM store_products) as product_count,
|
||||||
|
(SELECT COUNT(DISTINCT brand_name_raw) FROM store_products WHERE brand_name_raw IS NOT NULL) as brand_count,
|
||||||
|
(SELECT COUNT(*) FROM dispensaries WHERE crawl_enabled = true AND product_count > 0) as dispensary_count
|
||||||
|
`);
|
||||||
|
|
||||||
|
const s = stats[0] || {};
|
||||||
|
|
||||||
|
res.json({
|
||||||
|
success: true,
|
||||||
|
stats: {
|
||||||
|
products: parseInt(s.product_count || '0', 10),
|
||||||
|
brands: parseInt(s.brand_count || '0', 10),
|
||||||
|
dispensaries: parseInt(s.dispensary_count || '0', 10)
|
||||||
|
}
|
||||||
|
});
|
||||||
|
} catch (error: any) {
|
||||||
|
console.error('Public API stats error:', error);
|
||||||
|
res.status(500).json({
|
||||||
|
error: 'Failed to fetch stats',
|
||||||
|
message: error.message
|
||||||
|
});
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* GET /api/v1/menu
|
* GET /api/v1/menu
|
||||||
* Get complete menu summary for the authenticated dispensary
|
* Get complete menu summary for the authenticated dispensary
|
||||||
|
|||||||
@@ -1,12 +1,35 @@
|
|||||||
/**
|
/**
|
||||||
* Product Resync Handler
|
* Product Resync Handler
|
||||||
*
|
*
|
||||||
* Re-crawls a store that already has products to capture price/stock changes.
|
* Re-crawls a store to capture price/stock changes using the GraphQL pipeline.
|
||||||
* Uses the scraper-v2 engine for crawling.
|
*
|
||||||
|
* Flow:
|
||||||
|
* 1. Load dispensary info from database
|
||||||
|
* 2. Start stealth session (fingerprint + optional proxy)
|
||||||
|
* 3. Fetch products via GraphQL (Status: 'All')
|
||||||
|
* 4. Normalize data via DutchieNormalizer
|
||||||
|
* 5. Upsert to store_products and store_product_snapshots
|
||||||
|
* 6. Track missing products (increment consecutive_misses, mark OOS at 3)
|
||||||
|
* 7. Download new product images
|
||||||
|
* 8. End session
|
||||||
*/
|
*/
|
||||||
|
|
||||||
import { TaskContext, TaskResult } from '../task-worker';
|
import { TaskContext, TaskResult } from '../task-worker';
|
||||||
import { scrapeStore } from '../../scraper-v2';
|
import {
|
||||||
|
executeGraphQL,
|
||||||
|
startSession,
|
||||||
|
endSession,
|
||||||
|
GRAPHQL_HASHES,
|
||||||
|
DUTCHIE_CONFIG,
|
||||||
|
} from '../../platforms/dutchie';
|
||||||
|
import { DutchieNormalizer } from '../../hydration/normalizers/dutchie';
|
||||||
|
import {
|
||||||
|
upsertStoreProducts,
|
||||||
|
createStoreProductSnapshots,
|
||||||
|
downloadProductImages,
|
||||||
|
} from '../../hydration/canonical-upsert';
|
||||||
|
|
||||||
|
const normalizer = new DutchieNormalizer();
|
||||||
|
|
||||||
export async function handleProductResync(ctx: TaskContext): Promise<TaskResult> {
|
export async function handleProductResync(ctx: TaskContext): Promise<TaskResult> {
|
||||||
const { pool, task } = ctx;
|
const { pool, task } = ctx;
|
||||||
@@ -17,9 +40,12 @@ export async function handleProductResync(ctx: TaskContext): Promise<TaskResult>
|
|||||||
}
|
}
|
||||||
|
|
||||||
try {
|
try {
|
||||||
// Get dispensary info
|
// ============================================================
|
||||||
|
// STEP 1: Load dispensary info
|
||||||
|
// ============================================================
|
||||||
const dispResult = await pool.query(`
|
const dispResult = await pool.query(`
|
||||||
SELECT id, name, platform_dispensary_id, menu_url
|
SELECT
|
||||||
|
id, name, platform_dispensary_id, menu_url, menu_type, city, state
|
||||||
FROM dispensaries
|
FROM dispensaries
|
||||||
WHERE id = $1 AND crawl_enabled = true
|
WHERE id = $1 AND crawl_enabled = true
|
||||||
`, [dispensaryId]);
|
`, [dispensaryId]);
|
||||||
@@ -35,21 +61,264 @@ export async function handleProductResync(ctx: TaskContext): Promise<TaskResult>
|
|||||||
return { success: false, error: `Dispensary ${dispensaryId} has no platform_dispensary_id` };
|
return { success: false, error: `Dispensary ${dispensaryId} has no platform_dispensary_id` };
|
||||||
}
|
}
|
||||||
|
|
||||||
console.log(`[ProductResync] Crawling ${dispensary.name} (${dispensaryId})`);
|
// Extract cName from menu_url
|
||||||
|
const cNameMatch = dispensary.menu_url?.match(/\/(?:embedded-menu|dispensary)\/([^/?]+)/);
|
||||||
|
const cName = cNameMatch ? cNameMatch[1] : 'dispensary';
|
||||||
|
|
||||||
|
console.log(`[ProductResync] Starting crawl for ${dispensary.name} (ID: ${dispensaryId})`);
|
||||||
|
console.log(`[ProductResync] Platform ID: ${platformId}, cName: ${cName}`);
|
||||||
|
|
||||||
|
// ============================================================
|
||||||
|
// STEP 2: Start stealth session
|
||||||
|
// ============================================================
|
||||||
|
const session = startSession(dispensary.state || 'AZ', 'America/Phoenix');
|
||||||
|
console.log(`[ProductResync] Session started: ${session.sessionId}`);
|
||||||
|
|
||||||
// Send heartbeat before long operation
|
|
||||||
await ctx.heartbeat();
|
await ctx.heartbeat();
|
||||||
|
|
||||||
// Use scraper-v2 scrapeStore function
|
// ============================================================
|
||||||
await scrapeStore(dispensaryId);
|
// STEP 3: Fetch products via GraphQL (Status: 'All')
|
||||||
|
// ============================================================
|
||||||
|
const allProducts: any[] = [];
|
||||||
|
let page = 0;
|
||||||
|
let totalCount = 0;
|
||||||
|
const perPage = DUTCHIE_CONFIG.perPage;
|
||||||
|
const maxPages = DUTCHIE_CONFIG.maxPages;
|
||||||
|
|
||||||
|
try {
|
||||||
|
while (page < maxPages) {
|
||||||
|
const variables = {
|
||||||
|
includeEnterpriseSpecials: false,
|
||||||
|
productsFilter: {
|
||||||
|
dispensaryId: platformId,
|
||||||
|
pricingType: 'rec',
|
||||||
|
Status: 'All',
|
||||||
|
types: [],
|
||||||
|
useCache: false,
|
||||||
|
isDefaultSort: true,
|
||||||
|
sortBy: 'popularSortIdx',
|
||||||
|
sortDirection: 1,
|
||||||
|
bypassOnlineThresholds: true,
|
||||||
|
isKioskMenu: false,
|
||||||
|
removeProductsBelowOptionThresholds: false,
|
||||||
|
},
|
||||||
|
page,
|
||||||
|
perPage,
|
||||||
|
};
|
||||||
|
|
||||||
|
console.log(`[ProductResync] Fetching page ${page + 1}...`);
|
||||||
|
|
||||||
|
const result = await executeGraphQL(
|
||||||
|
'FilteredProducts',
|
||||||
|
variables,
|
||||||
|
GRAPHQL_HASHES.FilteredProducts,
|
||||||
|
{ cName, maxRetries: 3 }
|
||||||
|
);
|
||||||
|
|
||||||
|
const data = result?.data?.filteredProducts;
|
||||||
|
if (!data || !data.products) {
|
||||||
|
if (page === 0) {
|
||||||
|
throw new Error('No product data returned from GraphQL');
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
const products = data.products;
|
||||||
|
allProducts.push(...products);
|
||||||
|
|
||||||
|
if (page === 0) {
|
||||||
|
totalCount = data.queryInfo?.totalCount || products.length;
|
||||||
|
console.log(`[ProductResync] Total products reported: ${totalCount}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (allProducts.length >= totalCount || products.length < perPage) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
page++;
|
||||||
|
|
||||||
|
if (page < maxPages) {
|
||||||
|
await new Promise(r => setTimeout(r, DUTCHIE_CONFIG.pageDelayMs));
|
||||||
|
}
|
||||||
|
|
||||||
|
if (page % 5 === 0) {
|
||||||
|
await ctx.heartbeat();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
console.log(`[ProductResync] Fetched ${allProducts.length} products in ${page + 1} pages`);
|
||||||
|
|
||||||
|
} finally {
|
||||||
|
endSession();
|
||||||
|
}
|
||||||
|
|
||||||
|
if (allProducts.length === 0) {
|
||||||
|
return {
|
||||||
|
success: false,
|
||||||
|
error: 'No products returned from GraphQL',
|
||||||
|
productsProcessed: 0,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
// Heartbeat again
|
|
||||||
await ctx.heartbeat();
|
await ctx.heartbeat();
|
||||||
|
|
||||||
// Update dispensary last_crawled_at
|
// ============================================================
|
||||||
|
// STEP 4: Normalize data
|
||||||
|
// ============================================================
|
||||||
|
console.log(`[ProductResync] Normalizing ${allProducts.length} products...`);
|
||||||
|
|
||||||
|
// Build RawPayload for the normalizer
|
||||||
|
const rawPayload = {
|
||||||
|
id: `resync-${dispensaryId}-${Date.now()}`,
|
||||||
|
dispensary_id: dispensaryId,
|
||||||
|
crawl_run_id: null,
|
||||||
|
platform: 'dutchie',
|
||||||
|
payload_version: 1,
|
||||||
|
raw_json: { data: { filteredProducts: { products: allProducts } } },
|
||||||
|
product_count: allProducts.length,
|
||||||
|
pricing_type: 'dual',
|
||||||
|
crawl_mode: 'dual_mode',
|
||||||
|
fetched_at: new Date(),
|
||||||
|
processed: false,
|
||||||
|
normalized_at: null,
|
||||||
|
hydration_error: null,
|
||||||
|
hydration_attempts: 0,
|
||||||
|
created_at: new Date(),
|
||||||
|
};
|
||||||
|
|
||||||
|
const normalizationResult = normalizer.normalize(rawPayload);
|
||||||
|
|
||||||
|
if (normalizationResult.errors.length > 0) {
|
||||||
|
console.warn(`[ProductResync] Normalization warnings: ${normalizationResult.errors.map(e => e.message).join(', ')}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (normalizationResult.products.length === 0) {
|
||||||
|
return {
|
||||||
|
success: false,
|
||||||
|
error: 'Normalization produced no products',
|
||||||
|
productsProcessed: 0,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
console.log(`[ProductResync] Normalized ${normalizationResult.products.length} products`);
|
||||||
|
|
||||||
|
await ctx.heartbeat();
|
||||||
|
|
||||||
|
// ============================================================
|
||||||
|
// STEP 5: Upsert to canonical tables
|
||||||
|
// ============================================================
|
||||||
|
console.log(`[ProductResync] Upserting to store_products...`);
|
||||||
|
|
||||||
|
const upsertResult = await upsertStoreProducts(
|
||||||
|
pool,
|
||||||
|
normalizationResult.products,
|
||||||
|
normalizationResult.pricing,
|
||||||
|
normalizationResult.availability
|
||||||
|
);
|
||||||
|
|
||||||
|
console.log(`[ProductResync] Upserted: ${upsertResult.upserted} (${upsertResult.new} new, ${upsertResult.updated} updated)`);
|
||||||
|
|
||||||
|
await ctx.heartbeat();
|
||||||
|
|
||||||
|
// Create snapshots
|
||||||
|
console.log(`[ProductResync] Creating snapshots...`);
|
||||||
|
|
||||||
|
const snapshotsResult = await createStoreProductSnapshots(
|
||||||
|
pool,
|
||||||
|
dispensaryId,
|
||||||
|
normalizationResult.products,
|
||||||
|
normalizationResult.pricing,
|
||||||
|
normalizationResult.availability,
|
||||||
|
null // No crawl_run_id in new system
|
||||||
|
);
|
||||||
|
|
||||||
|
console.log(`[ProductResync] Created ${snapshotsResult.created} snapshots`);
|
||||||
|
|
||||||
|
await ctx.heartbeat();
|
||||||
|
|
||||||
|
// ============================================================
|
||||||
|
// STEP 6: Track missing products (consecutive_misses logic)
|
||||||
|
// - Products in feed: reset consecutive_misses to 0
|
||||||
|
// - Products not in feed: increment consecutive_misses
|
||||||
|
// - At 3 consecutive misses: mark as OOS
|
||||||
|
// ============================================================
|
||||||
|
const currentProductIds = allProducts
|
||||||
|
.map((p: any) => p._id || p.id)
|
||||||
|
.filter(Boolean);
|
||||||
|
|
||||||
|
// Reset consecutive_misses for products that ARE in the feed
|
||||||
|
if (currentProductIds.length > 0) {
|
||||||
|
await pool.query(`
|
||||||
|
UPDATE store_products
|
||||||
|
SET consecutive_misses = 0, last_seen_at = NOW()
|
||||||
|
WHERE dispensary_id = $1
|
||||||
|
AND provider = 'dutchie'
|
||||||
|
AND provider_product_id = ANY($2)
|
||||||
|
`, [dispensaryId, currentProductIds]);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Increment consecutive_misses for products NOT in the feed
|
||||||
|
const incrementResult = await pool.query(`
|
||||||
|
UPDATE store_products
|
||||||
|
SET consecutive_misses = consecutive_misses + 1
|
||||||
|
WHERE dispensary_id = $1
|
||||||
|
AND provider = 'dutchie'
|
||||||
|
AND provider_product_id NOT IN (SELECT unnest($2::text[]))
|
||||||
|
AND consecutive_misses < 3
|
||||||
|
RETURNING id
|
||||||
|
`, [dispensaryId, currentProductIds]);
|
||||||
|
|
||||||
|
const incrementedCount = incrementResult.rowCount || 0;
|
||||||
|
if (incrementedCount > 0) {
|
||||||
|
console.log(`[ProductResync] Incremented consecutive_misses for ${incrementedCount} products`);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Mark as OOS any products that hit 3 consecutive misses
|
||||||
|
const oosResult = await pool.query(`
|
||||||
|
UPDATE store_products
|
||||||
|
SET stock_status = 'oos', is_in_stock = false
|
||||||
|
WHERE dispensary_id = $1
|
||||||
|
AND provider = 'dutchie'
|
||||||
|
AND consecutive_misses >= 3
|
||||||
|
AND stock_status != 'oos'
|
||||||
|
RETURNING id
|
||||||
|
`, [dispensaryId]);
|
||||||
|
|
||||||
|
const markedOosCount = oosResult.rowCount || 0;
|
||||||
|
if (markedOosCount > 0) {
|
||||||
|
console.log(`[ProductResync] Marked ${markedOosCount} products as OOS (3+ consecutive misses)`);
|
||||||
|
}
|
||||||
|
|
||||||
|
await ctx.heartbeat();
|
||||||
|
|
||||||
|
// ============================================================
|
||||||
|
// STEP 7: Download images for new products
|
||||||
|
// ============================================================
|
||||||
|
if (upsertResult.productsNeedingImages.length > 0) {
|
||||||
|
console.log(`[ProductResync] Downloading images for ${upsertResult.productsNeedingImages.length} products...`);
|
||||||
|
|
||||||
|
try {
|
||||||
|
const dispensaryContext = {
|
||||||
|
stateCode: dispensary.state || 'AZ',
|
||||||
|
storeSlug: cName,
|
||||||
|
};
|
||||||
|
await downloadProductImages(
|
||||||
|
pool,
|
||||||
|
upsertResult.productsNeedingImages,
|
||||||
|
dispensaryContext
|
||||||
|
);
|
||||||
|
} catch (imgError: any) {
|
||||||
|
// Image download errors shouldn't fail the whole task
|
||||||
|
console.warn(`[ProductResync] Image download error (non-fatal): ${imgError.message}`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ============================================================
|
||||||
|
// STEP 8: Update dispensary last_crawl_at
|
||||||
|
// ============================================================
|
||||||
await pool.query(`
|
await pool.query(`
|
||||||
UPDATE dispensaries
|
UPDATE dispensaries
|
||||||
SET last_crawled_at = NOW()
|
SET last_crawl_at = NOW()
|
||||||
WHERE id = $1
|
WHERE id = $1
|
||||||
`, [dispensaryId]);
|
`, [dispensaryId]);
|
||||||
|
|
||||||
@@ -57,7 +326,13 @@ export async function handleProductResync(ctx: TaskContext): Promise<TaskResult>
|
|||||||
|
|
||||||
return {
|
return {
|
||||||
success: true,
|
success: true,
|
||||||
|
productsProcessed: normalizationResult.products.length,
|
||||||
|
snapshotsCreated: snapshotsResult.created,
|
||||||
|
newProducts: upsertResult.new,
|
||||||
|
updatedProducts: upsertResult.updated,
|
||||||
|
markedOos: markedOosCount,
|
||||||
};
|
};
|
||||||
|
|
||||||
} catch (error: unknown) {
|
} catch (error: unknown) {
|
||||||
const errorMessage = error instanceof Error ? error.message : 'Unknown error';
|
const errorMessage = error instanceof Error ? error.message : 'Unknown error';
|
||||||
console.error(`[ProductResync] Error for dispensary ${dispensaryId}:`, errorMessage);
|
console.error(`[ProductResync] Error for dispensary ${dispensaryId}:`, errorMessage);
|
||||||
|
|||||||
Reference in New Issue
Block a user