Commit Graph

187 Commits

Author SHA1 Message Date
Kelly
8b3ae40089 feat: Remove Run Now, add source tracking, optimize dashboard
- Remove /run-now endpoint (use task priority instead)
- Add source tracking to worker_tasks (source, source_schedule_id, source_metadata)
- Parallelize dashboard API calls (Promise.all)
- Add 1-5 min caching to /markets/dashboard and /national/summary
- Add performance indexes for dashboard queries

Migrations:
- 104: Task source tracking columns
- 105: Dashboard performance indexes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 13:23:35 -07:00
Kelly
a8fec97bcb feat: Support per-dispensary schedules (not just per-state)
- Add dispensary_id column to task_schedules table
- Update scheduler to handle single-dispensary schedules
- Update run-now endpoint to handle single-dispensary schedules
- Update frontend modal to pass dispensary_id when 1 store selected
- Fix existing "Deeply Rooted Hourly" schedule with dispensary_id=112

Now when you select ONE store and check "Make recurring", it creates
a schedule that runs for that specific store every interval.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 12:03:08 -07:00
Kelly
c969c7385b fix: Handle product_refresh and payload_fetch in run-now endpoint
The run-now endpoint only fanned out to stores for product_discovery
schedules, not product_refresh or payload_fetch. This caused single
tasks to be created without dispensary_id, which then failed.

Now all crawl roles (product_discovery, product_refresh, payload_fetch)
with state_code properly fan out to individual store tasks.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 03:49:10 -07:00
Kelly
5084cb1a85 fix: Block images/fonts/media in Puppeteer to save bandwidth
Add request interception to all Puppeteer handlers to block unnecessary
resources (images, fonts, media, stylesheets). We only need HTML/JS for
the session cookie, then the GraphQL JSON response.

This was causing 2.4GB of bandwidth from assets2.dutchie.com - every
page visit downloaded all product thumbnails, logos, etc.

Files updated:
- product-discovery-http.ts
- entry-point-discovery.ts
- store-discovery-http.ts
- store-discovery-state.ts
- puppeteer-preflight.ts

Note: Product images from payload are still downloaded once to MinIO
via image-storage.ts - this only blocks browser-rendered page images.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 03:28:12 -07:00
Kelly
ec6843dfd6 feat: Add working hours for natural traffic patterns
Workers check their timezone (from preflight IP geolocation) and current
hour's weight probability to determine availability. This creates natural
traffic patterns - more workers active during peak hours, fewer during
off-peak. Tasks queue up at night and drain during the day.

Migrations:
- 099: working_hours table with hourly weights by profile
- 100: Add timezone column to worker_registry
- 101: Store timezone from preflight IP geolocation
- 102: check_working_hours() function with probability roll

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 03:24:42 -07:00
Kelly
268429b86c feat: Use MinIO for permanent product image storage
- Rewrite image-storage.ts to use MinIO instead of ephemeral local filesystem
- Images downloaded ONCE from Dutchie CDN, stored permanently in MinIO
- Check MinIO before downloading (skipIfExists) to avoid re-downloads
- Convert images to webp before storage
- Storage path: images/products/<state>/<store>/<brand>/<product>/image-<hash>.webp
- Public URL: https://cdn.cannabrands.app/cannaiq/images/...

This fixes the 2.4GB bandwidth issue from repeatedly downloading images
that were lost when K8s pods restarted.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 03:24:22 -07:00
Kelly
5c08135007 feat(plugin): Add Elementor dynamic tags and product loop widget v1.7.0
WordPress Plugin:
- Add dynamic tags for all product payload fields (name, brand, price, THC, effects, etc.)
- Add Product Loop widget with filtering, sorting, and layout options
- Register CannaIQ widget category in Elementor
- Update build script to auto-upload to MinIO CDN
- Remove legacy dutchie references
- Bump version to 1.7.0

Backend:
- Redirect /downloads/* to CDN instead of serving from local filesystem

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 03:10:01 -07:00
Kelly
9f0d68d4c9 Revert "feat: Store full Dutchie payload in latest_raw_payload"
This reverts commit e11400566e.
2025-12-13 02:33:30 -07:00
Kelly
e11400566e feat: Store full Dutchie payload in latest_raw_payload
Now stores the complete raw product JSON from Dutchie on every
product refresh. This enables querying any Dutchie field
(terpenes, effects, description, etc.) without schema changes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 02:31:00 -07:00
Kelly
e50f54e621 revert: Remove brand aliasing migrations
Dutchie already provides unique brand UUIDs (provider_brand_id).
No need for separate brand normalization/aliasing logic.
Use provider_brand_id for brand grouping instead.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 02:16:05 -07:00
Kelly
983cd71fc2 feat: Performance optimizations and preflight improvements
- Add missing /api/analytics/national/summary endpoint
- Optimize dashboard activity queries (subquery vs JOIN+GROUP BY)
- Add PreflightSummary component to Workers page with gold qualified badge
- Add preflight retry logic - workers retry every 30s until qualified
- Run stale task cleanup on ALL workers (not just worker-0)
- Add preflight fields to worker-registry API (ip, fingerprint, is_qualified)

Database indexes added:
- idx_store_products_created_at (for recent products)
- idx_dispensaries_last_crawl_at (for recent scrapes)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 02:06:33 -07:00
Kelly
7849ee0256 feat: Add POST /api/tasks/fix-null-methods endpoint
Updates null method tasks to 'http' for proper worker qualification

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 01:33:09 -07:00
Kelly
432842f442 fix: Ensure all crawl tasks use method='http' transport
- product_discovery → product_refresh now sets method: 'http'
- product_refresh → entry_point_discovery now sets method: 'http'
- All crawl tasks now require HTTP preflight to claim

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 01:31:22 -07:00
Kelly
94ebbb2497 fix: State dropdown and locked platform in schedule modal
- State Code → State dropdown with available states
- Platform field locked to 'dutchie' (read-only)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 01:26:58 -07:00
Kelly
e826a4dd3e fix: Add consecutive_failures column to migration
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 01:20:59 -07:00
Kelly
b7e96359ef feat: Auto-retry failed tasks with exponential backoff
- Hard failures now auto-retry up to 3 times
- Exponential backoff: 5, 10, 20 minutes
- Only permanently fails after max retries exceeded
- Soft failures still requeue immediately

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 01:19:48 -07:00
Kelly
b1c1955082 feat: Add POST /api/tasks/retry-failed endpoint
Resets failed tasks back to pending for retry.
Options: role (filter), max_age_hours (default 24), limit (default 100)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 01:18:07 -07:00
Kelly
95c23fcdff fix: Add consecutive_successes column to dispensaries table
Required for stage checkpoint tracking in task handlers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 01:15:59 -07:00
Kelly
7067db68fc feat: Add server-side brand search to Intelligence page
- Backend: Add 'search' param to /api/admin/intelligence/brands
- Frontend: Debounced search triggers server-side query
- Now searches ALL brands, not just top 500

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 01:14:02 -07:00
Kelly
271faf0f00 perf: Optimize dashboard queries for faster load times
- Use pg_stat for approximate product count (instant vs full scan)
- LIMIT on DISTINCT queries for brand/category counts
- Single combined query (reduces round trips)
- Add index on store_product_snapshots.captured_at
- Add index on worker_tasks.worker_id and created_at

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 01:09:02 -07:00
Kelly
291a8279bd fix(entry-point-discovery): Self-healing duplicate detection
When resolving platform_dispensary_id, check if it already exists on
another dispensary. If so, mark current dispensary as duplicate instead
of failing with unique constraint violation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 00:46:10 -07:00
Kelly
b69d03c02f feat: Add stage checkpoints to task handlers and fix worker name display
Stage checkpoints (observational, non-blocking):
- product_refresh: success → 'production', failure tracking → 'failing' after 3
- product_discovery: success → 'hydrating', failure tracking
- entry_point_discovery: success → 'promoted', failure tracking

Worker name fix:
- Join worker_registry in tasks query to get friendly_name directly
- Update TasksDashboard to use worker_name from joined query
- Fallback to registry lookup then pod ID suffix

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 00:43:00 -07:00
Kelly
54f59c6082 fix(analytics): Fix market-summary store count and add search indexes
- market-summary now counts from store_products table (not product_variants)
- Added trigram indexes for fast ILIKE product searches

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 00:35:17 -07:00
Kelly
24dd301d84 fix: State stores endpoint returns only Dutchie stores with products
- Filter by menu_type = 'dutchie'
- Use INNER JOIN + HAVING to only return stores with products
- Stores without product discovery are excluded

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 00:15:23 -07:00
Kelly
1d6211db19 perf: Add store_intelligence_cache for fast /intelligence/stores
- Remove costly correlated subquery (snapshot_count) from /stores endpoint
- Add migration 092 for store_intelligence_cache table
- Update analytics_refresh to populate cache with pre-computed metrics
- Add /intelligence/stores/cached endpoint using cache table

Performance: O(n*m) → O(1) for snapshot counts, ~10x faster response

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2025-12-13 00:13:41 -07:00
Kelly
e62f927218 feat: Auto-retry failed proxies after cooldown period
- Add last_failed_at column to track failure time
- Failed proxies auto-retry after 4 hours (configurable)
- Proxies permanently failed after 10 failures
- Add /retry-stats and /reenable-failed API endpoints
- markProxySuccess() re-enables recovered proxies

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 00:08:44 -07:00
Kelly
675f42841e feat: Import 500 Evomi residential proxies
- Update unique constraint to include username/password for session-based proxies
- All proxies imported as inactive (run Test All to verify and activate)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 00:05:21 -07:00
Kelly
2d489e068b fix: Correct mv_state_metrics to use brand_name_raw
- Changed unique_brands from COUNT(brand_id) to COUNT(brand_name_raw)
- brand_id is often NULL, brand_name_raw has actual data
- AZ now correctly shows 462 brands (was 144)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 23:50:53 -07:00
Kelly
470097eb19 fix: Intelligence stores endpoint and UI consistency
- Fix stores endpoint to only show stores with actual products (INNER JOIN + HAVING)
- Update badge colors to match Workers/Tasks dashboard style
- Use emerald/amber/red/gray color scheme consistently
- Chain badge now uses purple (bg-purple-100)
- Add migration 092 to fix Trulieve store URLs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 23:37:28 -07:00
Kelly
5af86edf83 feat: Update last_payload_at and last_store_discovery_at timestamps
- payload-storage.ts: Update dispensaries.last_payload_at when saving payload
- promotion.ts: Update dispensaries.last_store_discovery_at on INSERT/UPDATE

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2025-12-12 23:30:57 -07:00
Kelly
55b26e9153 feat: Auto-healing entry_point_discovery with browser-first transport
- Rewrote entry_point_discovery with auto-healing scheme:
  1. Check dutchie_discovery_locations for existing platform_location_id
  2. Browser-based GraphQL with 5x network retries
  3. Mark as needs_investigation on hard failure
- Browser (Puppeteer) is now DEFAULT transport - curl only when explicit
- Added migration 091 for tracking columns:
  - last_store_discovery_at: When store_discovery updated record
  - last_payload_at: When last product payload was saved
- Updated CODEBASE_MAP.md with transport rules documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2025-12-12 22:55:21 -07:00
Kelly
a6f09ee6e3 fix: Calculate stale task count from heartbeat age 2025-12-12 22:15:16 -07:00
Kelly
c62f8cbf06 feat: Parallelized store discovery, modification tracking, and task deduplication
Store Discovery Parallelization:
- Add store_discovery_state handler for per-state parallel discovery
- Add POST /api/tasks/batch/store-discovery endpoint
- 8 workers can now process states in parallel (~30-45 min vs 3+ hours)

Modification Tracking (Migration 090):
- Add last_modified_at, last_modified_by_task, last_modified_task_id to dispensaries
- Add same columns to store_products
- Update all handlers to set tracking info on modifications

Stale Task Recovery:
- Add periodic stale cleanup every 10 minutes (worker-0 only)
- Prevents orphaned tasks from blocking queue after worker crashes

Task Deduplication:
- createStaggeredTasks now skips if pending/active task exists for same role
- Skips if same role completed within last 4 hours
- API responses include skipped count

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2025-12-12 22:15:04 -07:00
kelly
e4e8438d8b Merge pull request 'feat: Worker improvements and Run Now duplicate prevention' (#64) from feat/minio-payload-storage into master 2025-12-13 03:35:48 +00:00
Kelly
822d2b0609 feat: Idempotent entry_point_discovery with bulk endpoint
- Track id_resolution_status, attempts, and errors in handler
- Add POST /api/tasks/batch/entry-point-discovery endpoint
- Skip already-resolved stores, retry failed with force flag
2025-12-12 20:27:36 -07:00
Kelly
4ea7139ed5 feat: Add step reporting to all task handlers
Added updateStep() calls to:
- payload-fetch-curl: loading → preflight → fetching → saving
- product-refresh: loading → normalizing → upserting
- store-discovery-http: starting → preflight → navigating → fetching

This enables real-time visibility of worker progress in the dashboard.
2025-12-12 20:14:00 -07:00
Kelly
63023a4061 feat: Worker improvements and Run Now duplicate prevention
- Fix Run Now to prevent duplicate task creation
- Add loading state to Run Now button in UI
- Return early when no stores need refresh
- Worker dashboard improvements
- Browser pooling architecture updates
- K8s worker config updates (8 replicas, 3 concurrent tasks)
2025-12-12 20:11:31 -07:00
kelly
13a80e893e Merge pull request 'feat: Add MinIO/S3 support for payload storage' (#63) from feat/minio-payload-storage into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/63
2025-12-12 19:00:29 +00:00
Kelly
c98c409f59 feat: Add MinIO/S3 support for payload storage
- Update payload-storage.ts to use MinIO when configured
- Payloads stored at: cannaiq/payloads/{year}/{month}/{day}/store_{id}_{ts}.json.gz
- Falls back to local filesystem when MINIO_* env vars not set
- Enables shared storage across all worker pods
- Fixes ephemeral storage issue where payloads were lost on pod restart

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 11:30:57 -07:00
kelly
6c8993f7bd Merge pull request 'fix(workers): Increase max concurrent tasks to 15' (#62) from feat/proxy-reload-and-bulk-import into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/62
2025-12-12 18:19:04 +00:00
Kelly
92f88fdcd6 fix(workers): Increase max concurrent tasks to 15 and add K8s permission rule
- Change MAX_CONCURRENT_TASKS default from 3 to 15
- Add CLAUDE.md rule requiring explicit permission before kubectl commands

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 10:54:33 -07:00
kelly
fd4a9b1434 Merge pull request 'feat(scheduler): Immutable schedules and HTTP-only pipeline' (#61) from feat/proxy-reload-and-bulk-import into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/61
2025-12-12 16:37:16 +00:00
Kelly
832ef1cf83 feat(scheduler): Immutable schedules and HTTP-only pipeline
## Changes
- **Migration 089**: Add is_immutable and method columns to task_schedules
  - Per-state product_discovery schedules (4h default)
  - Store discovery weekly (168h)
  - All schedules use HTTP transport (Puppeteer/browser)
- **Task Scheduler**: HTTP-only product discovery with per-state scheduling
  - Each state has its own immutable schedule
  - Schedules can be edited (interval/priority) but not deleted
- **TasksDashboard UI**: Full immutability support
  - Lock icon for immutable schedules
  - State and Method columns in schedules table
  - Disabled delete for immutable, restricted edit fields
- **Store Discovery HTTP**: Auto-queue product_discovery for new stores
- **Migration 088**: Discovery payloads storage schema

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 09:24:08 -07:00
kelly
b05eaceaf0 Merge pull request 'feat(tasks): Dual transport handlers and self-healing product_refresh' (#60) from feat/proxy-reload-and-bulk-import into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/60
2025-12-12 10:33:13 +00:00
kelly
909470d3dc Merge pull request 'fix(proxy): Convert non-standard proxy URL format and simplify preflight' (#59) from feat/proxy-reload-and-bulk-import into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/59
2025-12-12 10:03:14 +00:00
Kelly
9a24b4896c feat(tasks): Dual transport handlers and self-healing product_refresh
- Rename product-discovery.ts to product-discovery-curl.ts (axios-based)
- Rename payload-fetch.ts to payload-fetch-curl.ts
- Add product-discovery-http.ts (Puppeteer browser-based handler)
- Add method field to CreateTaskParams for transport selection
- Update task-service to insert method column on task creation
- Update task-worker with getHandlerForTask() for dual transport routing
- product_refresh now queues upstream tasks when no payload exists:
  - Has platform_dispensary_id → queues product_discovery (http)
  - No platform_dispensary_id → queues entry_point_discovery

This enables HTTP workers to pick up browser-based tasks while curl
workers handle axios-based tasks, and prevents product_refresh from
failing repeatedly when no crawl has been performed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 03:02:56 -07:00
Kelly
dd8fce6e35 fix(proxy): Convert non-standard proxy URL format and simplify preflight
- CrawlRotator.getProxyUrl() now converts non-standard format (http://host:port:user:pass) to standard format (http://user:pass@host:port)
- Simplify puppeteer preflight to only use ipify.org for IP verification (much lighter than fingerprint.com)
- Remove heavy anti-detect site tests from preflight - not needed, trust stealth plugin
- Fixes 503 errors when using session-based residential proxies

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 02:13:51 -07:00
kelly
65b96d9cb9 Merge pull request 'feat(workers): Add proxy reload, staggered tasks, and bulk proxy import' (#58) from feat/proxy-reload-and-bulk-import into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/58
2025-12-12 09:11:23 +00:00
Kelly
f82eed4dc3 feat(workers): Add proxy reload, staggered tasks, and bulk proxy import
- Periodic proxy reload: Workers now reload proxies every 60s to pick up changes
- Staggered task scheduling: New API endpoints for creating tasks with delays
- Bulk proxy import: Script supports multiple URL formats including host:port:user:pass
- Proxy URL column: Migration 086 adds proxy_url for non-standard formats

Key changes:
- crawl-rotator.ts: Added reloadIfStale(), isStale(), setReloadInterval()
- task-worker.ts: Calls reloadIfStale() in main loop
- task-service.ts: Added createStaggeredTasks() and createAZStoreTasks()
- tasks.ts: Added POST /batch/staggered and /batch/az-stores endpoints
- import-proxies.ts: New script for bulk proxy import
- CLAUDE.md: Documented staggered task workflow

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 01:53:15 -07:00
kelly
d997ec51a2 Merge pull request 'feat(tasks): Consolidate schedule management into task_schedules' (#57) from feat/task-schedules-consolidation into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/57
2025-12-12 08:31:29 +00:00