Commit Graph

91 Commits

Author SHA1 Message Date
Kelly
e234dc2947 feat(frontend): rewire dashboard to use AZ data endpoint
Main dashboard now uses /api/az/dashboard for dispensary, product, and
brand counts instead of the legacy /api/dashboard/stats endpoint. This
ensures the dashboard displays consistent data with the /az pages.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 17:28:22 -07:00
Kelly
cac414dafd Fix snapshot creation order - run before image downloads
Reorder processProducts() to create snapshots BEFORE attempting image downloads.
Previously, if image downloads hung or failed, the process would be killed before
snapshots were created, resulting in 0 snapshots despite successful product upserts.

Changes:
- Move Step 3 (snapshot creation) before Step 4 (image downloads)
- Ensures core crawl data (products + snapshots) is persisted even if images fail
- Adds chunked batch processing for improved memory management

Tested locally: 771 snapshots created for dispensary 112 with quantity data populated.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 17:27:44 -07:00
Kelly
e67849bb3a Fix SQL queries to use correct column names in store summary endpoints
- Changed missing_from_feed=true to stock_status='missing_from_feed'
- Changed products_inserted/snapshots_created to products_new/products_updated
- Changed crawl_jobs table reference to dispensary_crawl_jobs
- Fixed product query to use actual snapshot columns (price in cents, etc.)
- Added explicit column list for dispensaries to avoid SELECT * issues

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 15:23:50 -07:00
Kelly
45209c3518 Add Curaleaf provider detection to menu detection service
- Add 'curaleaf' to MenuProvider type enum
- Add Curaleaf URL patterns BEFORE Dutchie in PROVIDER_URL_PATTERNS for proper precedence
- Add isCuraleafUrl() and extractCuraleafStoreUrl() helper functions
- Check website field for Curaleaf pattern before any Dutchie resolution
- Clear stale Dutchie menu_url when store is identified as Curaleaf
- Mark Curaleaf stores as not_crawlable with reason until crawler is built
- This prevents 60s Dutchie timeouts for stores that migrated to Curaleaf

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 15:12:22 -07:00
Kelly
c10710d6a7 Fix cName bug: extract cName from menuUrl per dispensary
- Add extractCName() helper to parse cName from dispensary.menuUrl
- Handles /embedded-menu/<cName> and /dispensary/<cName> URL patterns
- Falls back to dispensary.slug if menuUrl extraction fails
- Pass cName to fetchAllProductsBothModes and fetchAllProducts
- Make cName required parameter (no hardcoded defaults)
- Add normBool and normDate helpers for API data normalization
- Refactor graphql-client to use server-side fetch with Puppeteer session cookies

Previously all stores were using AZ-Deeply-Rooted cName, causing 0 products
for other dispensaries like Sol Flower.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-02 20:53:28 -07:00
Kelly
9caa52fd5b Update CLAUDE guidelines: local image storage, no MinIO 2025-12-02 13:32:57 -07:00
Kelly
04b5c3bd09 Add CLAUDE guidelines for consolidated pipeline 2025-12-02 13:28:23 -07:00
Kelly
9219d8a77a Add brand history view to dutchie GraphQL migration 2025-12-02 11:34:01 -07:00
Kelly
acecb2b8ee Add dispensary_crawl_status view to consolidated schema 2025-12-02 11:31:39 -07:00
Kelly
ca80f30c8a Fix dashboard views for consolidated schema 2025-12-02 11:28:46 -07:00
Kelly
08b1994d0e Add AZ dashboard views to consolidated schema 2025-12-02 11:15:33 -07:00
Kelly
16819f756b Prefer CRAWLSY_DATABASE_URL for AZ pipeline DB 2025-12-02 10:42:43 -07:00
Kelly
7a76a29acb Add AZ schema migration script and health check 2025-12-02 10:40:06 -07:00
Kelly
8c83bd451a Point main dashboard stats to AZ pipeline DB 2025-12-02 10:36:58 -07:00
Kelly
53a0918233 Expose AZ data under neutral /api/az and update frontend routes 2025-12-02 10:19:44 -07:00
Kelly
62ecbed076 Add neutral AZ API alias and slug resolver 2025-12-02 10:15:27 -07:00
Kelly
9e5fa437ff Update WordPress plugin to v1.4.0 with new public API v1
- Switch from /api to /api/v1 endpoints
- Use X-API-Key header instead of Bearer token
- Update test connection to use /menu endpoint

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-02 09:47:12 -07:00
Kelly
917e91297e Add Dutchie AZ data pipeline and public API v1
- Add dutchie-az module with GraphQL product crawler, scheduler, and admin UI
- Add public API v1 endpoints (/api/v1/products, /categories, /brands, /specials, /menu)
- API key auth maps dispensary to dutchie_az store for per-dispensary data access
- Add frontend pages for Dutchie AZ stores, store details, and schedule management
- Update Layout with Dutchie AZ navigation section

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-02 09:43:26 -07:00
Kelly
511629b4e6 Rename WordPress plugin from Dutchie Menus to Crawlsy Menus v1.3.0
- Renamed plugin to avoid WordPress.org naming conflict causing false update notifications
- Added /downloads static route to serve plugin zip file
- Updated all CSS classes from dutchie- to crawlsy- prefix
- Added plugin zip to backend/public/downloads for hosting
- Plugin available at: https://dispos.crawlsy.com/downloads/crawlsy-menus-1.3.0.zip

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-01 16:29:21 -07:00
Kelly
e345707db2 Add store selection to API permissions
- Add store_id and store_name columns to wp_dutchie_api_permissions
- Backend: Add /stores endpoint, require store_id when creating permissions
- Frontend: Add store selector dropdown to API Permissions form
- WordPress plugin v1.3.0: Remove store_id from shortcodes (store is tied to token)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-01 13:59:01 -07:00
Kelly
d2635ed123 Rename WP plugin zip to include version in filename
menus.zip -> menus-v1.2.0.zip

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-01 13:43:43 -07:00
Kelly
457f426628 Add no-cache headers for WordPress plugin downloads
Prevents Cloudflare from caching zip file responses.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-01 13:38:48 -07:00
Kelly
c701d7b5e2 Update WordPress plugin to v1.2.0 and add downloadable zip
- Bump plugin version from 1.1.0 to 1.2.0
- Add menus.zip for download at /wordpress/menus.zip

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-01 13:22:27 -07:00
Kelly
d614979569 Increase navigation timeout to 90s for parallel browser crawling
Dutchie pages can take 30-40s to load, and when running 3 parallel browsers
they compete for resources, causing some to hit the 60s timeout. Increased
to 90s to accommodate slow-loading React SPAs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-01 10:37:52 -07:00
Kelly
81606447d2 Skip proxies for Dutchie - datacenter IPs are blocked
Dutchie blocks all our datacenter proxy IPs, returning empty/different
content. Direct connection from pod IP works fine (100 products found).
Added PROXY_SKIP_DOMAINS list for sites that block datacenter IPs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-01 09:53:55 -07:00
Kelly
e518bb8169 Fix Dutchie scraper to wait for React content to load
- Change waitUntil from 'domcontentloaded' to 'networkidle2' for SPAs
- Add waitForSelector to wait for product elements before parsing
- WordPress plugin: update API endpoints to use hardcoded URL

The scraper was returning 0 products because it wasn't waiting for
React to render the product list. Now it properly waits for either
the product list items or an empty state indicator.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-01 09:33:32 -07:00
Kelly
199b6a8a23 Remove incorrect migration 029, add snapshot architecture, improve scraper
- Delete migration 029 that was incorrectly creating duplicate dispensaries
- Add migration 028 for snapshot architecture
- Improve downloader with proxy/UA rotation
- Update scraper monitor and tools pages
- Various scraper improvements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-01 08:52:54 -07:00
Kelly
e5b88b093c Add Getting Started guide for development and deployment
Includes:
- Architecture overview with k8s diagram
- Local development setup
- Database information (local vs remote)
- K8s deployment instructions
- Running crawlers remotely
- Proxy system documentation
- Common tasks and troubleshooting

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-01 08:41:29 -07:00
Kelly
855b9bfd16 Add migration to link Dutchie stores to dispensaries
Migration 029: Creates dispensary records for stores with dutchie_url
that don't have a dispensary_id yet, then links them.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-01 08:22:37 -07:00
Kelly
c9947eb1de Add brand history tracking for new/dropped brands detection
- Add first_seen_at/last_seen_at columns to brands table
- Create brand_history table for event tracking (added, dropped, returned)
- Add dispensary_brand_stats view for dashboard

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-01 00:18:47 -07:00
Kelly
9de0d709b2 Update admin panel to use unified dispensaries table
- Add migration 026 to update dispensary_crawl_status view with new fields
- Update dashboard API to use dispensaries table (not stores)
- Show current inventory counts (products seen in last 7 days)
- Update ScraperSchedule UI to show provider_type correctly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-01 00:13:41 -07:00
Kelly
9d8972aa86 Fix category-crawler-jobs store lookup query
- Fix column name from s.dutchie_plus_url to s.dutchie_url
- Add availability tracking and product freshness APIs
- Add crawl script for sequential dispensary processing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-01 00:07:00 -07:00
Kelly
20a7b69537 Fix ScraperMonitor to show all dispensaries instead of hardcoded ID 112
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-01 00:02:42 -07:00
Kelly
1fb8f84929 Add crawl fields directly to dispensaries table
Migration 025: Dispensaries ARE stores - add crawl metadata fields
(menu_url, provider_type, scrape_enabled, crawl_status, etc.)
directly to dispensaries table instead of maintaining separate stores table.

- Copies menu_url from 22 existing stores to their dispensaries
- Migrates products from store_id to dispensary_id
- Detects provider_type from menu_url domain
- Adds indexes for crawl scheduling

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-30 23:52:48 -07:00
Kelly
5306c3f4ca Switch scheduler UI to dispensary-based API
- Add migrations 021-023 for dispensary_crawl_schedule tables and views
- Add dispensary-orchestrator service and bootstrap-discovery script
- Update schedule routes with dispensary endpoints (/api/schedule/dispensaries)
- Switch frontend scheduler to use canonical dispensaries table (182 AZDHS entries)
- Add dispensary schedule API methods to frontend api.ts
- Remove "Unmapped" badge logic - all dispensaries now linked properly
- Add proper URL linking to dispensary detail pages (/dispensaries/:state/:city/:slug)
- Update Jobs table to show dispensary_name

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-30 18:54:52 -07:00
Kelly
0083f6a510 Add version display in admin sidebar footer
- Add /api/version endpoint that returns build info from env vars
- Add version footer to Layout.tsx showing build version, git SHA, and image tag
- Update Dockerfile to accept build args for version info (APP_BUILD_VERSION, APP_GIT_SHA, APP_BUILD_TIME, CONTAINER_IMAGE_TAG)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-30 10:01:10 -07:00
Kelly
afc71e4225 Fix migration idempotency for constraint creation
Make products_store_id_slug_unique constraint creation idempotent
by checking if it exists before attempting to add it.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-30 09:55:51 -07:00
Kelly
3861a31a3b Add crawler scheduler, orchestrator, and multi-category intelligence
- Add scheduler UI with store schedules, job queue, and global settings
- Add store crawl orchestrator for intelligent crawl workflow
- Add multi-category intelligence detection (product, specials, brands, metadata)
- Add CrawlerLogger for structured JSON logging
- Add migrations for scheduler tables and dispensary linking
- Add dispensary → scheduler navigation link
- Support production/sandbox crawler modes per provider

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-30 09:29:15 -07:00
Kelly
8b4292fbb2 Add local product detail page with Dutchie comparison
- Add ProductDetail page for viewing products locally
- Add Dutchie and Details buttons to product cards in Products and StoreDetail pages
- Add Last Updated display showing data freshness
- Add parallel scrape scripts and routes
- Add K8s deployment configurations
- Add frontend Dockerfile with nginx

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-30 06:34:38 -07:00
Kelly
6e597f15ca Fix Dockerfile for Puppeteer/Chromium support 2025-11-28 20:08:57 -07:00
Kelly
5757a8e9bd Initial commit - Dutchie dispensary scraper 2025-11-28 19:45:44 -07:00