Commit Graph

4 Commits

Author SHA1 Message Date
Kelly
459ad7d9c9 fix(tasks): Fix missing column errors in task queries
- Change 'active' to 'is_active' in states table query (store-discovery.ts)
- Remove non-existent 'active' column check from worker_tasks query (task-service.ts)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 23:54:28 -07:00
Kelly
4949b22457 feat(tasks): Refactor task workflow with payload/refresh separation
Major changes:
- Split crawl into payload_fetch (API → disk) and product_refresh (disk → DB)
- Add task chaining: store_discovery → product_discovery → payload_fetch → product_refresh
- Add payload storage utilities for gzipped JSON on filesystem
- Add /api/payloads endpoints for payload access and diffing
- Add DB-driven TaskScheduler with schedule persistence
- Track newDispensaryIds through discovery promotion for chaining
- Add stealth improvements: HTTP fingerprinting, proxy rotation enhancements
- Add Workers dashboard K8s scaling controls

New files:
- src/tasks/handlers/payload-fetch.ts - Fetches from API, saves to disk
- src/services/task-scheduler.ts - DB-driven schedule management
- src/utils/payload-storage.ts - Payload save/load utilities
- src/routes/payloads.ts - Payload API endpoints
- src/services/http-fingerprint.ts - Browser fingerprint generation
- docs/TASK_WORKFLOW_2024-12-10.md - Complete workflow documentation

Migrations:
- 078: Proxy consecutive 403 tracking
- 079: task_schedules table
- 080: raw_crawl_payloads table
- 081: payload column and last_fetch_at

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 22:15:35 -07:00
Kelly
9199db3927 fix: Remove legacy imports from task handlers
- Remove non-existent DutchieClient import from product-resync and entry-point-discovery
- Remove non-existent DiscoveryCrawler import from store-discovery
- Use scrapeStore from scraper-v2 for product resync
- Use discoverState from discovery module for store discovery
- Fix Pool type by using getPool() instead of pool wrapper
- Update FullDiscoveryResult property access to use correct field names

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 17:25:19 -07:00
Kelly
89c262ee20 feat(tasks): Add unified task-based worker architecture
Replace fragmented job systems (job_schedules, dispensary_crawl_jobs, SyncOrchestrator)
with a single unified task queue:

- Add worker_tasks table with atomic task claiming via SELECT FOR UPDATE SKIP LOCKED
- Add TaskService for CRUD, claiming, and capacity metrics
- Add TaskWorker with role-based handlers (resync, discovery, analytics)
- Add /api/tasks endpoints for management and migration from legacy systems
- Add TasksDashboard UI and integrate task counts into main dashboard
- Add comprehensive documentation

Task roles: store_discovery, entry_point_discovery, product_discovery, product_resync, analytics_refresh

Run workers with: WORKER_ROLE=product_resync npx tsx src/tasks/task-worker.ts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 16:27:03 -07:00