feat: SEO template library, discovery pipeline, and orchestrator enhancements
## SEO Template Library - Add complete template library with 7 page types (state, city, category, brand, product, search, regeneration) - Add Template Library tab in SEO Orchestrator with accordion-based editors - Add template preview, validation, and variable injection engine - Add API endpoints: /api/seo/templates, preview, validate, generate, regenerate ## Discovery Pipeline - Add promotion.ts for discovery location validation and promotion - Add discover-all-states.ts script for multi-state discovery - Add promotion log migration (067) - Enhance discovery routes and types ## Orchestrator & Admin - Add crawl_enabled filter to stores page - Add API permissions page - Add job queue management - Add price analytics routes - Add markets and intelligence routes - Enhance dashboard and worker monitoring ## Infrastructure - Add migrations for worker definitions, SEO settings, field alignment - Add canonical pipeline for scraper v2 - Update hydration and sync orchestrator - Enhance multi-state query service 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
671
docs/DUTCHIE_CRAWL_WORKFLOW.md
Normal file
671
docs/DUTCHIE_CRAWL_WORKFLOW.md
Normal file
@@ -0,0 +1,671 @@
|
||||
# Dutchie Crawl Workflow
|
||||
|
||||
Complete end-to-end documentation for the Dutchie GraphQL crawl pipeline, from store discovery to product management.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Architecture Overview](#1-architecture-overview)
|
||||
2. [Store Discovery](#2-store-discovery)
|
||||
3. [Platform ID Resolution](#3-platform-id-resolution)
|
||||
4. [Product Crawling](#4-product-crawling)
|
||||
5. [Normalization Pipeline](#5-normalization-pipeline)
|
||||
6. [Canonical Data Model](#6-canonical-data-model)
|
||||
7. [Hydration (Writing to DB)](#7-hydration-writing-to-db)
|
||||
8. [Key Files Reference](#8-key-files-reference)
|
||||
9. [Common Issues & Solutions](#9-common-issues--solutions)
|
||||
10. [Running Crawls](#10-running-crawls)
|
||||
|
||||
---
|
||||
|
||||
## 1. Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ DUTCHIE CRAWL PIPELINE │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │
|
||||
│ │ Discovery │ -> │ Resolution │ -> │ Crawl │ -> │ Hydrate │ │
|
||||
│ │ (find URLs) │ │ (get IDs) │ │ (fetch data) │ │ (to DB) │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ └───────────┘ │
|
||||
│ │ │ │ │ │
|
||||
│ v v v v │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │
|
||||
│ │ dispensaries │ │ dispensaries │ │ Raw JSON │ │ store_ │ │
|
||||
│ │ .menu_url │ │ .platform_ │ │ Products │ │ products │ │
|
||||
│ │ │ │ dispensary_id│ │ │ │ snapshots │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │ variants │ │
|
||||
│ └───────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Key Principles
|
||||
|
||||
1. **GraphQL Only**: All Dutchie data comes from `https://dutchie.com/api-3/graphql`
|
||||
2. **Curl-Based HTTP**: Uses curl via child_process to bypass TLS fingerprinting
|
||||
3. **No Puppeteer**: The old DOM-based scraper is deprecated - DO NOT USE `scraper-v2/engine.ts` for Dutchie
|
||||
4. **Historical Data**: Never delete products/snapshots - always append
|
||||
|
||||
---
|
||||
|
||||
## 2. Store Discovery
|
||||
|
||||
### How Stores Get Into the System
|
||||
|
||||
Stores are added to the `dispensaries` table with a `menu_url` pointing to their Dutchie menu.
|
||||
|
||||
**Menu URL Formats:**
|
||||
```
|
||||
https://dutchie.com/dispensary/<slug>
|
||||
https://dutchie.com/embedded-menu/<slug>
|
||||
https://<custom-domain>.com/menu (redirects to Dutchie)
|
||||
```
|
||||
|
||||
### Required Fields for Crawling
|
||||
|
||||
| Field | Required | Description |
|
||||
|-------|----------|-------------|
|
||||
| `menu_url` | Yes | URL to the Dutchie menu |
|
||||
| `menu_type` | Yes | Must be `'dutchie'` |
|
||||
| `platform_dispensary_id` | Yes | MongoDB ObjectId from Dutchie |
|
||||
|
||||
**A store CANNOT be crawled until `platform_dispensary_id` is resolved.**
|
||||
|
||||
---
|
||||
|
||||
## 3. Platform ID Resolution
|
||||
|
||||
### What is `platform_dispensary_id`?
|
||||
|
||||
Dutchie uses MongoDB ObjectIds internally (e.g., `6405ef617056e8014d79101b`). This ID is required for all GraphQL product queries.
|
||||
|
||||
### Resolution Process
|
||||
|
||||
```typescript
|
||||
// File: src/platforms/dutchie/queries.ts
|
||||
|
||||
import { resolveDispensaryId } from '../platforms/dutchie';
|
||||
|
||||
// Extract slug from menu_url
|
||||
const slug = menuUrl.match(/\/(?:embedded-menu|dispensary)\/([^/?]+)/)?.[1];
|
||||
|
||||
// Resolve to platform ID via GraphQL
|
||||
const platformId = await resolveDispensaryId(slug);
|
||||
// Returns: "6405ef617056e8014d79101b" or null
|
||||
```
|
||||
|
||||
### GraphQL Query Used
|
||||
|
||||
```graphql
|
||||
query GetAddressBasedDispensaryData($dispensaryFilter: dispensaryFilter!) {
|
||||
dispensary(filter: $dispensaryFilter) {
|
||||
id # <-- This is the platform_dispensary_id
|
||||
name
|
||||
cName
|
||||
...
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Variables:**
|
||||
```json
|
||||
{
|
||||
"dispensaryFilter": {
|
||||
"cNameOrID": "AZ-Deeply-Rooted"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Persisted Query Hash
|
||||
|
||||
```typescript
|
||||
GRAPHQL_HASHES.GetAddressBasedDispensaryData = '13461f73abf7268770dfd05fe7e10c523084b2bb916a929c08efe3d87531977b'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Product Crawling
|
||||
|
||||
### GraphQL Query: FilteredProducts
|
||||
|
||||
This is the main query for fetching products from a dispensary.
|
||||
|
||||
**Endpoint:** `https://dutchie.com/api-3/graphql`
|
||||
|
||||
**Method:** POST (via curl)
|
||||
|
||||
**Persisted Query Hash:**
|
||||
```typescript
|
||||
GRAPHQL_HASHES.FilteredProducts = 'ee29c060826dc41c527e470e9ae502c9b2c169720faa0a9f5d25e1b9a530a4a0'
|
||||
```
|
||||
|
||||
### Query Variables
|
||||
|
||||
```typescript
|
||||
const variables = {
|
||||
includeEnterpriseSpecials: false,
|
||||
productsFilter: {
|
||||
dispensaryId: '6405ef617056e8014d79101b', // platform_dispensary_id
|
||||
pricingType: 'rec', // 'rec' or 'med'
|
||||
Status: 'Active', // CRITICAL: Use 'Active', NOT null
|
||||
types: [], // empty = all categories
|
||||
useCache: true,
|
||||
isDefaultSort: true,
|
||||
sortBy: 'popularSortIdx',
|
||||
sortDirection: 1,
|
||||
bypassOnlineThresholds: true,
|
||||
isKioskMenu: false,
|
||||
removeProductsBelowOptionThresholds: false,
|
||||
},
|
||||
page: 0, // 0-indexed pagination
|
||||
perPage: 100, // max 100 per page
|
||||
};
|
||||
```
|
||||
|
||||
### CRITICAL: Status Parameter
|
||||
|
||||
| Value | Result |
|
||||
|-------|--------|
|
||||
| `'Active'` | Returns in-stock products WITH pricing |
|
||||
| `null` | Returns 0 products (broken) |
|
||||
| `'Inactive'` | Returns out-of-stock products only |
|
||||
|
||||
**Always use `Status: 'Active'` for product crawls.**
|
||||
|
||||
### Response Structure
|
||||
|
||||
```json
|
||||
{
|
||||
"data": {
|
||||
"filteredProducts": {
|
||||
"products": [
|
||||
{
|
||||
"_id": "product-mongo-id",
|
||||
"Name": "Product Name",
|
||||
"brandName": "Brand Name",
|
||||
"type": "Flower",
|
||||
"subcategory": "Indica",
|
||||
"Status": "Active",
|
||||
"recPrices": [45.00, 90.00],
|
||||
"recSpecialPrices": [],
|
||||
"THCContent": { "unit": "PERCENTAGE", "range": [28.24] },
|
||||
"CBDContent": { "unit": "PERCENTAGE", "range": [0] },
|
||||
"Image": "https://images.dutchie.com/...",
|
||||
"POSMetaData": {
|
||||
"children": [
|
||||
{
|
||||
"option": "1/8oz",
|
||||
"recPrice": 45.00,
|
||||
"quantityAvailable": 10
|
||||
},
|
||||
{
|
||||
"option": "1/4oz",
|
||||
"recPrice": 90.00,
|
||||
"quantityAvailable": 5
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
],
|
||||
"queryInfo": {
|
||||
"totalCount": 1009,
|
||||
"totalPages": 11
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Pagination
|
||||
|
||||
```typescript
|
||||
const DUTCHIE_CONFIG = {
|
||||
perPage: 100, // Products per page
|
||||
maxPages: 200, // Safety limit
|
||||
pageDelayMs: 500, // Delay between pages
|
||||
};
|
||||
|
||||
// Fetch all pages
|
||||
let page = 0;
|
||||
let totalPages = 1;
|
||||
|
||||
while (page < totalPages) {
|
||||
const result = await executeGraphQL('FilteredProducts', { ...variables, page });
|
||||
const data = result.data.filteredProducts;
|
||||
|
||||
totalPages = Math.ceil(data.queryInfo.totalCount / 100);
|
||||
allProducts.push(...data.products);
|
||||
|
||||
page++;
|
||||
await sleep(500); // Rate limiting
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Normalization Pipeline
|
||||
|
||||
### Purpose
|
||||
|
||||
Convert raw Dutchie JSON into a standardized format before database insertion.
|
||||
|
||||
### Key File: `src/hydration/normalizers/dutchie.ts`
|
||||
|
||||
```typescript
|
||||
import { DutchieNormalizer } from '../hydration';
|
||||
|
||||
const normalizer = new DutchieNormalizer();
|
||||
|
||||
// Build RawPayload structure
|
||||
const rawPayload = {
|
||||
id: 'unique-id',
|
||||
dispensary_id: 112,
|
||||
crawl_run_id: null,
|
||||
platform: 'dutchie',
|
||||
payload_version: 1,
|
||||
raw_json: { products: rawProducts }, // <-- Products go here
|
||||
product_count: rawProducts.length,
|
||||
pricing_type: 'rec',
|
||||
crawl_mode: 'active',
|
||||
fetched_at: new Date(),
|
||||
processed: false,
|
||||
normalized_at: null,
|
||||
hydration_error: null,
|
||||
hydration_attempts: 0,
|
||||
created_at: new Date(),
|
||||
};
|
||||
|
||||
// Normalize
|
||||
const result = normalizer.normalize(rawPayload);
|
||||
|
||||
// Result contains:
|
||||
// - result.products: NormalizedProduct[]
|
||||
// - result.pricing: Map<externalId, NormalizedPricing>
|
||||
// - result.availability: Map<externalId, NormalizedAvailability>
|
||||
// - result.brands: NormalizedBrand[]
|
||||
```
|
||||
|
||||
### Field Mappings
|
||||
|
||||
| Dutchie Field | Normalized Field |
|
||||
|---------------|------------------|
|
||||
| `_id` / `id` | `externalProductId` |
|
||||
| `Name` | `name` |
|
||||
| `brandName` | `brandName` |
|
||||
| `type` | `category` |
|
||||
| `subcategory` | `subcategory` |
|
||||
| `Status` | `status`, `isActive` |
|
||||
| `THCContent.range[0]` | `thcPercent` |
|
||||
| `CBDContent.range[0]` | `cbdPercent` |
|
||||
| `Image` | `primaryImageUrl` |
|
||||
| `recPrices[0]` | `priceRec` (in cents) |
|
||||
| `recSpecialPrices[0]` | `priceRecSpecial` (in cents) |
|
||||
|
||||
### Data Validation
|
||||
|
||||
The normalizer handles edge cases:
|
||||
|
||||
```typescript
|
||||
// THC/CBD values > 100 are milligrams, not percentages - skip them
|
||||
if (thcPercent > 100) thcPercent = null;
|
||||
|
||||
// Products without IDs are skipped
|
||||
if (!externalId) return null;
|
||||
|
||||
// Products without names are skipped
|
||||
if (!name) return null;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Canonical Data Model
|
||||
|
||||
### Tables
|
||||
|
||||
#### `store_products` - Current product state per store
|
||||
|
||||
```sql
|
||||
CREATE TABLE store_products (
|
||||
id SERIAL PRIMARY KEY,
|
||||
dispensary_id INTEGER NOT NULL REFERENCES dispensaries(id),
|
||||
provider VARCHAR(50) NOT NULL DEFAULT 'dutchie',
|
||||
provider_product_id VARCHAR(100), -- Dutchie's _id
|
||||
|
||||
name_raw VARCHAR(500) NOT NULL,
|
||||
brand_name_raw VARCHAR(255),
|
||||
category_raw VARCHAR(100),
|
||||
subcategory_raw VARCHAR(100),
|
||||
|
||||
price_rec NUMERIC(10,2),
|
||||
price_med NUMERIC(10,2),
|
||||
price_rec_special NUMERIC(10,2),
|
||||
price_med_special NUMERIC(10,2),
|
||||
is_on_special BOOLEAN DEFAULT false,
|
||||
discount_percent NUMERIC(5,2),
|
||||
|
||||
is_in_stock BOOLEAN DEFAULT true,
|
||||
stock_quantity INTEGER,
|
||||
stock_status VARCHAR(50) DEFAULT 'in_stock',
|
||||
|
||||
thc_percent NUMERIC(5,2), -- Max 99.99
|
||||
cbd_percent NUMERIC(5,2), -- Max 99.99
|
||||
|
||||
image_url TEXT,
|
||||
|
||||
first_seen_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
last_seen_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
|
||||
UNIQUE(dispensary_id, provider, provider_product_id)
|
||||
);
|
||||
```
|
||||
|
||||
#### `store_product_snapshots` - Historical price/stock records
|
||||
|
||||
```sql
|
||||
CREATE TABLE store_product_snapshots (
|
||||
id SERIAL PRIMARY KEY,
|
||||
dispensary_id INTEGER NOT NULL,
|
||||
store_product_id INTEGER REFERENCES store_products(id),
|
||||
provider VARCHAR(50) NOT NULL,
|
||||
provider_product_id VARCHAR(100),
|
||||
crawl_run_id INTEGER,
|
||||
captured_at TIMESTAMPTZ NOT NULL,
|
||||
|
||||
name_raw VARCHAR(500),
|
||||
brand_name_raw VARCHAR(255),
|
||||
category_raw VARCHAR(100),
|
||||
|
||||
price_rec NUMERIC(10,2),
|
||||
price_med NUMERIC(10,2),
|
||||
price_rec_special NUMERIC(10,2),
|
||||
price_med_special NUMERIC(10,2),
|
||||
is_on_special BOOLEAN,
|
||||
|
||||
is_in_stock BOOLEAN,
|
||||
stock_quantity INTEGER,
|
||||
stock_status VARCHAR(50),
|
||||
|
||||
thc_percent NUMERIC(5,2),
|
||||
cbd_percent NUMERIC(5,2),
|
||||
|
||||
raw_data JSONB -- Full raw product for debugging
|
||||
);
|
||||
```
|
||||
|
||||
#### `product_variants` - Per-weight pricing options
|
||||
|
||||
```sql
|
||||
CREATE TABLE product_variants (
|
||||
id SERIAL PRIMARY KEY,
|
||||
store_product_id INTEGER NOT NULL REFERENCES store_products(id),
|
||||
dispensary_id INTEGER NOT NULL,
|
||||
|
||||
option VARCHAR(50) NOT NULL, -- "1/8oz", "1g", "100mg"
|
||||
|
||||
price_rec NUMERIC(10,2),
|
||||
price_med NUMERIC(10,2),
|
||||
price_rec_special NUMERIC(10,2),
|
||||
price_med_special NUMERIC(10,2),
|
||||
|
||||
quantity INTEGER,
|
||||
in_stock BOOLEAN,
|
||||
|
||||
weight_value NUMERIC(10,4), -- Parsed: 3.5
|
||||
weight_unit VARCHAR(10), -- Parsed: "g"
|
||||
|
||||
UNIQUE(store_product_id, option)
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Hydration (Writing to DB)
|
||||
|
||||
### Key File: `src/hydration/canonical-upsert.ts`
|
||||
|
||||
### Function: `hydrateToCanonical()`
|
||||
|
||||
```typescript
|
||||
import { hydrateToCanonical } from '../hydration';
|
||||
|
||||
const result = await hydrateToCanonical(
|
||||
pool, // pg Pool
|
||||
dispensaryId, // number
|
||||
normResult, // NormalizationResult from normalizer
|
||||
crawlRunId // number | null
|
||||
);
|
||||
|
||||
// Result:
|
||||
// {
|
||||
// productsUpserted: 1009,
|
||||
// productsNew: 50,
|
||||
// snapshotsCreated: 1009,
|
||||
// variantsUpserted: 1011,
|
||||
// brandsUpserted: 102,
|
||||
// }
|
||||
```
|
||||
|
||||
### Upsert Logic
|
||||
|
||||
**Products:** `ON CONFLICT (dispensary_id, provider, provider_product_id) DO UPDATE`
|
||||
- Updates: name, prices, stock, THC/CBD, timestamps
|
||||
- Preserves: `first_seen_at`, `id`
|
||||
|
||||
**Snapshots:** Always INSERT (append-only history)
|
||||
- One snapshot per product per crawl
|
||||
- Contains full state at capture time
|
||||
|
||||
**Variants:** `ON CONFLICT (store_product_id, option) DO UPDATE`
|
||||
- Updates: prices, stock, quantity
|
||||
- Tracks: `last_price_change_at`, `last_stock_change_at`
|
||||
|
||||
### Data Transformations
|
||||
|
||||
```typescript
|
||||
// Prices: cents -> dollars
|
||||
priceRec: productPricing.priceRec / 100
|
||||
|
||||
// THC/CBD: Clamp to valid percentage range
|
||||
thcPercent: product.thcPercent <= 100 ? product.thcPercent : null
|
||||
|
||||
// Stock status mapping
|
||||
stockStatus: availability.stockStatus || 'unknown'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Key Files Reference
|
||||
|
||||
### HTTP Client
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `src/platforms/dutchie/client.ts` | Curl-based HTTP client (LOCKED) |
|
||||
| `src/platforms/dutchie/queries.ts` | GraphQL query wrappers |
|
||||
| `src/platforms/dutchie/index.ts` | Public exports |
|
||||
|
||||
### Normalization
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `src/hydration/normalizers/dutchie.ts` | Dutchie-specific normalization |
|
||||
| `src/hydration/normalizers/base.ts` | Base normalizer class |
|
||||
| `src/hydration/types.ts` | Type definitions |
|
||||
|
||||
### Database
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `src/hydration/canonical-upsert.ts` | Upsert functions for canonical tables |
|
||||
| `src/hydration/index.ts` | Public exports |
|
||||
| `src/db/pool.ts` | Database connection pool |
|
||||
|
||||
### Scripts
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `src/scripts/test-crawl-to-canonical.ts` | Test script for single dispensary |
|
||||
|
||||
---
|
||||
|
||||
## 9. Common Issues & Solutions
|
||||
|
||||
### Issue: GraphQL Returns 0 Products
|
||||
|
||||
**Cause:** Using `Status: null` instead of `Status: 'Active'`
|
||||
|
||||
**Solution:**
|
||||
```typescript
|
||||
productsFilter: {
|
||||
Status: 'Active', // NOT null
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
### Issue: Numeric Field Overflow
|
||||
|
||||
**Cause:** THC/CBD values in milligrams (e.g., 1400mg) stored in percentage field
|
||||
|
||||
**Solution:** Clamp values > 100 to null:
|
||||
```typescript
|
||||
thcPercent: value <= 100 ? value : null
|
||||
```
|
||||
|
||||
### Issue: Column "name" Does Not Exist
|
||||
|
||||
**Cause:** Code uses `name` but table has `name_raw`
|
||||
|
||||
**Column Mapping:**
|
||||
| Code | Database |
|
||||
|------|----------|
|
||||
| `name` | `name_raw` |
|
||||
| `brand_name` | `brand_name_raw` |
|
||||
| `category` | `category_raw` |
|
||||
| `subcategory` | `subcategory_raw` |
|
||||
|
||||
### Issue: 403 Forbidden
|
||||
|
||||
**Cause:** TLS fingerprinting or rate limiting
|
||||
|
||||
**Solution:** The curl-based client handles this with:
|
||||
- Browser fingerprint rotation
|
||||
- Proper headers (Origin, Referer, User-Agent)
|
||||
- Retry with exponential backoff
|
||||
|
||||
### Issue: Normalizer Returns 0 Products
|
||||
|
||||
**Cause:** Wrong payload structure passed to `normalize()`
|
||||
|
||||
**Solution:** Use `RawPayload` structure:
|
||||
```typescript
|
||||
const rawPayload = {
|
||||
raw_json: { products: [...] }, // Products in raw_json
|
||||
dispensary_id: 112, // Required
|
||||
// ... other fields
|
||||
};
|
||||
normalizer.normalize(rawPayload); // NOT (payload, id)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10. Running Crawls
|
||||
|
||||
### Test Script (Single Dispensary)
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
|
||||
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" \
|
||||
npx tsx src/scripts/test-crawl-to-canonical.ts <dispensaryId>
|
||||
|
||||
# Example:
|
||||
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" \
|
||||
npx tsx src/scripts/test-crawl-to-canonical.ts 112
|
||||
```
|
||||
|
||||
### Expected Output
|
||||
|
||||
```
|
||||
============================================================
|
||||
Test Crawl to Canonical - Dispensary 112
|
||||
============================================================
|
||||
|
||||
[Step 1] Getting dispensary info...
|
||||
Name: Deeply Rooted Boutique Cannabis Company
|
||||
Platform ID: 6405ef617056e8014d79101b
|
||||
Menu URL: https://azdeeplyrooted.com/home
|
||||
cName: dispensary
|
||||
|
||||
[Step 2] Fetching products from Dutchie GraphQL...
|
||||
[Fetch] Starting fetch for 6405ef617056e8014d79101b (cName: dispensary)
|
||||
[Dutchie Client] curl POST FilteredProducts (attempt 1/4)
|
||||
[Dutchie Client] Response status: 200
|
||||
[Fetch] Page 1/11: 100 products (total so far: 100)
|
||||
...
|
||||
Total products fetched: 1009
|
||||
|
||||
[Step 3] Normalizing products...
|
||||
Validation: PASS
|
||||
Normalized products: 1009
|
||||
Brands extracted: 102
|
||||
|
||||
[Step 4] Writing to canonical tables via hydrateToCanonical...
|
||||
Products upserted: 1009
|
||||
Variants upserted: 1011
|
||||
|
||||
[Step 5] Verifying data in canonical tables...
|
||||
store_products count: 1060
|
||||
product_variants count: 1011
|
||||
store_product_snapshots count: 4315
|
||||
|
||||
============================================================
|
||||
SUCCESS - Crawl and hydration complete!
|
||||
============================================================
|
||||
```
|
||||
|
||||
### Verification Queries
|
||||
|
||||
```sql
|
||||
-- Check products for a dispensary
|
||||
SELECT id, name_raw, brand_name_raw, price_rec, is_in_stock
|
||||
FROM store_products
|
||||
WHERE dispensary_id = 112
|
||||
ORDER BY last_seen_at DESC
|
||||
LIMIT 10;
|
||||
|
||||
-- Check variants
|
||||
SELECT pv.option, pv.price_rec, pv.in_stock, sp.name_raw
|
||||
FROM product_variants pv
|
||||
JOIN store_products sp ON sp.id = pv.store_product_id
|
||||
WHERE pv.dispensary_id = 112
|
||||
LIMIT 10;
|
||||
|
||||
-- Check snapshot history
|
||||
SELECT COUNT(*) as total, MAX(captured_at) as latest
|
||||
FROM store_product_snapshots
|
||||
WHERE dispensary_id = 112;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Appendix: GraphQL Hashes
|
||||
|
||||
All Dutchie GraphQL queries use persisted queries with SHA256 hashes:
|
||||
|
||||
```typescript
|
||||
export const GRAPHQL_HASHES = {
|
||||
FilteredProducts: 'ee29c060826dc41c527e470e9ae502c9b2c169720faa0a9f5d25e1b9a530a4a0',
|
||||
GetAddressBasedDispensaryData: '13461f73abf7268770dfd05fe7e10c523084b2bb916a929c08efe3d87531977b',
|
||||
ConsumerDispensaries: '0a5bfa6ca1d64ae47bcccb7c8077c87147cbc4e6982c17ceec97a2a4948b311b',
|
||||
DispensaryInfo: '13461f73abf7268770dfd05fe7e10c523084b2bb916a929c08efe3d87531977b',
|
||||
GetAllCitiesByState: 'ae547a0466ace5a48f91e55bf6699eacd87e3a42841560f0c0eabed5a0a920e6',
|
||||
};
|
||||
```
|
||||
|
||||
These hashes are fixed and tied to Dutchie's API version. If Dutchie changes their API, these may need updating.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: December 2024*
|
||||
Reference in New Issue
Block a user