- GET /api/az/admin/dutchie-stores - Lists all Dutchie stores with crawl status
- POST /api/az/admin/crawl-all - Enqueues product crawl jobs for all ready stores
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Menu detection now always crawls websites to find actual embedded menu
providers instead of marking stores as proprietary based on domain alone.
This fixes detection for stores like Curaleaf that may use Dutchie embeds.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add extractFromMenuUrl() to discovery.ts that extracts either cName or platformId directly
from Dutchie URLs (handles /api/v2/embedded-menu/<id>.js pattern)
- Add isObjectId() helper to identify MongoDB ObjectIds in URLs
- Update menu-detection.ts to skip GraphQL resolution when URL contains platformId directly
- For proprietary domains (curaleaf, sol), crawl website to find actual menu provider
instead of blindly marking as not_crawlable
- If website crawl finds Dutchie embedded menu, set menu_type='dutchie' and resolve platform ID
- Tested successfully with consumeaz.com which discovers Dutchie embedded menu JS URL
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
When onlyUnknown=true and onlyMissingPlatformId=true, the query now
uses OR logic instead of AND to include:
- Stores with unknown menu_type (new stores needing detection)
- Stores with menu_type='dutchie' but no platform_dispensary_id
This allows re-detection to:
1. Resolve platform IDs for actual Dutchie stores
2. Reclassify stores that migrated away from Dutchie (e.g. to Curaleaf)
Also changed default for onlyMissingPlatformId to true so scheduled
detection jobs always attempt to resolve missing platform IDs.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add dba_name to DISPENSARY_COLUMNS for the /api/az/stores list endpoint
and getDispensaryById for the single store endpoint. This allows the
frontend to display DBA names (trade names) when available, falling
back to the legal entity name.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
The UPDATE query was trying to set a column that doesn't exist in the database
schema, causing platform ID resolution to fail silently. Now stores the
resolved_at timestamp in provider_detection_data JSONB instead.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
The job_run_logs table tracks scheduled job orchestration, not individual
worker jobs. Worker info (worker_id, worker_hostname) belongs on
dispensary_crawl_jobs, not job_run_logs.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Product crawl job now only targets ready dispensaries (menu_type=dutchie
AND platform_dispensary_id IS NOT NULL). Detection is handled by the
separate menu_detection schedule.
This ensures:
- Single path for menu discovery (menu_detection job)
- Product crawl only processes ready dutchie stores
- All 5 workers claim jobs via queue/locking
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Reorder processProducts() to create snapshots BEFORE attempting image downloads.
Previously, if image downloads hung or failed, the process would be killed before
snapshots were created, resulting in 0 snapshots despite successful product upserts.
Changes:
- Move Step 3 (snapshot creation) before Step 4 (image downloads)
- Ensures core crawl data (products + snapshots) is persisted even if images fail
- Adds chunked batch processing for improved memory management
Tested locally: 771 snapshots created for dispensary 112 with quantity data populated.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Changed missing_from_feed=true to stock_status='missing_from_feed'
- Changed products_inserted/snapshots_created to products_new/products_updated
- Changed crawl_jobs table reference to dispensary_crawl_jobs
- Fixed product query to use actual snapshot columns (price in cents, etc.)
- Added explicit column list for dispensaries to avoid SELECT * issues
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add 'curaleaf' to MenuProvider type enum
- Add Curaleaf URL patterns BEFORE Dutchie in PROVIDER_URL_PATTERNS for proper precedence
- Add isCuraleafUrl() and extractCuraleafStoreUrl() helper functions
- Check website field for Curaleaf pattern before any Dutchie resolution
- Clear stale Dutchie menu_url when store is identified as Curaleaf
- Mark Curaleaf stores as not_crawlable with reason until crawler is built
- This prevents 60s Dutchie timeouts for stores that migrated to Curaleaf
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add extractCName() helper to parse cName from dispensary.menuUrl
- Handles /embedded-menu/<cName> and /dispensary/<cName> URL patterns
- Falls back to dispensary.slug if menuUrl extraction fails
- Pass cName to fetchAllProductsBothModes and fetchAllProducts
- Make cName required parameter (no hardcoded defaults)
- Add normBool and normDate helpers for API data normalization
- Refactor graphql-client to use server-side fetch with Puppeteer session cookies
Previously all stores were using AZ-Deeply-Rooted cName, causing 0 products
for other dispensaries like Sol Flower.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Switch from /api to /api/v1 endpoints
- Use X-API-Key header instead of Bearer token
- Update test connection to use /menu endpoint
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add dutchie-az module with GraphQL product crawler, scheduler, and admin UI
- Add public API v1 endpoints (/api/v1/products, /categories, /brands, /specials, /menu)
- API key auth maps dispensary to dutchie_az store for per-dispensary data access
- Add frontend pages for Dutchie AZ stores, store details, and schedule management
- Update Layout with Dutchie AZ navigation section
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Renamed plugin to avoid WordPress.org naming conflict causing false update notifications
- Added /downloads static route to serve plugin zip file
- Updated all CSS classes from dutchie- to crawlsy- prefix
- Added plugin zip to backend/public/downloads for hosting
- Plugin available at: https://dispos.crawlsy.com/downloads/crawlsy-menus-1.3.0.zip🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add store_id and store_name columns to wp_dutchie_api_permissions
- Backend: Add /stores endpoint, require store_id when creating permissions
- Frontend: Add store selector dropdown to API Permissions form
- WordPress plugin v1.3.0: Remove store_id from shortcodes (store is tied to token)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Dutchie pages can take 30-40s to load, and when running 3 parallel browsers
they compete for resources, causing some to hit the 60s timeout. Increased
to 90s to accommodate slow-loading React SPAs.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Dutchie blocks all our datacenter proxy IPs, returning empty/different
content. Direct connection from pod IP works fine (100 products found).
Added PROXY_SKIP_DOMAINS list for sites that block datacenter IPs.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Change waitUntil from 'domcontentloaded' to 'networkidle2' for SPAs
- Add waitForSelector to wait for product elements before parsing
- WordPress plugin: update API endpoints to use hardcoded URL
The scraper was returning 0 products because it wasn't waiting for
React to render the product list. Now it properly waits for either
the product list items or an empty state indicator.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Migration 029: Creates dispensary records for stores with dutchie_url
that don't have a dispensary_id yet, then links them.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add migration 026 to update dispensary_crawl_status view with new fields
- Update dashboard API to use dispensaries table (not stores)
- Show current inventory counts (products seen in last 7 days)
- Update ScraperSchedule UI to show provider_type correctly
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Fix column name from s.dutchie_plus_url to s.dutchie_url
- Add availability tracking and product freshness APIs
- Add crawl script for sequential dispensary processing
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Migration 025: Dispensaries ARE stores - add crawl metadata fields
(menu_url, provider_type, scrape_enabled, crawl_status, etc.)
directly to dispensaries table instead of maintaining separate stores table.
- Copies menu_url from 22 existing stores to their dispensaries
- Migrates products from store_id to dispensary_id
- Detects provider_type from menu_url domain
- Adds indexes for crawl scheduling
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add migrations 021-023 for dispensary_crawl_schedule tables and views
- Add dispensary-orchestrator service and bootstrap-discovery script
- Update schedule routes with dispensary endpoints (/api/schedule/dispensaries)
- Switch frontend scheduler to use canonical dispensaries table (182 AZDHS entries)
- Add dispensary schedule API methods to frontend api.ts
- Remove "Unmapped" badge logic - all dispensaries now linked properly
- Add proper URL linking to dispensary detail pages (/dispensaries/:state/:city/:slug)
- Update Jobs table to show dispensary_name
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add /api/version endpoint that returns build info from env vars
- Add version footer to Layout.tsx showing build version, git SHA, and image tag
- Update Dockerfile to accept build args for version info (APP_BUILD_VERSION, APP_GIT_SHA, APP_BUILD_TIME, CONTAINER_IMAGE_TAG)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Make products_store_id_slug_unique constraint creation idempotent
by checking if it exists before attempting to add it.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add scheduler UI with store schedules, job queue, and global settings
- Add store crawl orchestrator for intelligent crawl workflow
- Add multi-category intelligence detection (product, specials, brands, metadata)
- Add CrawlerLogger for structured JSON logging
- Add migrations for scheduler tables and dispensary linking
- Add dispensary → scheduler navigation link
- Support production/sandbox crawler modes per provider
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>