Commit Graph

7 Commits

Author SHA1 Message Date
Kelly
8a09691e91 feat(tasks): Add task pool start/stop toggle
- Add task-pool-state.ts for shared pause/resume state
- Add /api/tasks/pool/status, pause, resume endpoints
- Add Start/Stop Pool toggle button to TasksDashboard
- Spinner stops when pool is closed
- Fix is_active column name in store-discovery.ts
- Fix missing active column in task-service.ts claimTask
- Auto-refresh every 15 seconds

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 00:07:14 -07:00
Kelly
824d48fd85 fix: Add curl to Docker, add active flag to worker_tasks
- Install curl in Docker container for Dutchie HTTP requests
- Add 'active' column to worker_tasks (default false) to prevent
  accidental task execution on startup
- Update task-service to only claim tasks where active=true

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 23:12:09 -07:00
Kelly
4949b22457 feat(tasks): Refactor task workflow with payload/refresh separation
Major changes:
- Split crawl into payload_fetch (API → disk) and product_refresh (disk → DB)
- Add task chaining: store_discovery → product_discovery → payload_fetch → product_refresh
- Add payload storage utilities for gzipped JSON on filesystem
- Add /api/payloads endpoints for payload access and diffing
- Add DB-driven TaskScheduler with schedule persistence
- Track newDispensaryIds through discovery promotion for chaining
- Add stealth improvements: HTTP fingerprinting, proxy rotation enhancements
- Add Workers dashboard K8s scaling controls

New files:
- src/tasks/handlers/payload-fetch.ts - Fetches from API, saves to disk
- src/services/task-scheduler.ts - DB-driven schedule management
- src/utils/payload-storage.ts - Payload save/load utilities
- src/routes/payloads.ts - Payload API endpoints
- src/services/http-fingerprint.ts - Browser fingerprint generation
- docs/TASK_WORKFLOW_2024-12-10.md - Complete workflow documentation

Migrations:
- 078: Proxy consecutive 403 tracking
- 079: task_schedules table
- 080: raw_crawl_payloads table
- 081: payload column and last_fetch_at

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 22:15:35 -07:00
Kelly
249d3c1b7f fix: Build args format for version info + schema-tolerant routes
CI/CD:
- Fix build_args format in woodpecker CI (comma-separated, not YAML list)
- This fixes "unknown" SHA/version showing on remote deployments

Backend schema-tolerant fixes (graceful fallbacks when tables missing):
- users.ts: Check which columns exist before querying
- worker-registry.ts: Return empty result if table doesn't exist
- task-service.ts: Add tableExists() helper, handle missing tables/views
- proxies.ts: Return totalProxies in test-all response

Frontend fixes:
- Proxies: Use total from response for accurate progress display
- SEO PagesTab: Dim Generate button when no AI provider active

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 09:53:21 -07:00
Kelly
4e84f30f8b feat: Auto-retry tasks, 403 proxy rotation, task deletion
- Fix 403 handler to rotate BOTH proxy and fingerprint (was only fingerprint)
- Add auto-retry logic to task service (retry up to max_retries before failing)
- Add error tooltip on task status badge showing retry count and error message
- Add DELETE /api/tasks/:id endpoint (only for non-running tasks)
- Add delete button to JobQueue task table

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 08:41:14 -07:00
Kelly
56cc171287 feat: Stealth worker system with mandatory proxy rotation
## Worker System
- Role-agnostic workers that can handle any task type
- Pod-based architecture with StatefulSet (5-15 pods, 5 workers each)
- Custom pod names (Aethelgard, Xylos, Kryll, etc.)
- Worker registry with friendly names and resource monitoring
- Hub-and-spoke visualization on JobQueue page

## Stealth & Anti-Detection (REQUIRED)
- Proxies are MANDATORY - workers fail to start without active proxies
- CrawlRotator initializes on worker startup
- Loads proxies from `proxies` table
- Auto-rotates proxy + fingerprint on 403 errors
- 12 browser fingerprints (Chrome, Firefox, Safari, Edge)
- Locale/timezone matching for geographic consistency

## Task System
- Renamed product_resync → product_refresh
- Task chaining: store_discovery → entry_point → product_discovery
- Priority-based claiming with FOR UPDATE SKIP LOCKED
- Heartbeat and stale task recovery

## UI Updates
- JobQueue: Pod visualization, resource monitoring on hover
- WorkersDashboard: Simplified worker list
- Removed unused filters from task list

## Other
- IP2Location service for visitor analytics
- Findagram consumer features scaffolding
- Documentation updates

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 00:44:59 -07:00
Kelly
89c262ee20 feat(tasks): Add unified task-based worker architecture
Replace fragmented job systems (job_schedules, dispensary_crawl_jobs, SyncOrchestrator)
with a single unified task queue:

- Add worker_tasks table with atomic task claiming via SELECT FOR UPDATE SKIP LOCKED
- Add TaskService for CRUD, claiming, and capacity metrics
- Add TaskWorker with role-based handlers (resync, discovery, analytics)
- Add /api/tasks endpoints for management and migration from legacy systems
- Add TasksDashboard UI and integrate task counts into main dashboard
- Add comprehensive documentation

Task roles: store_discovery, entry_point_discovery, product_discovery, product_resync, analytics_refresh

Run workers with: WORKER_ROLE=product_resync npx tsx src/tasks/task-worker.ts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 16:27:03 -07:00