Workers can now process multiple tasks concurrently (default: 3 max). Self-regulate based on resource usage - back off at 85% memory or 90% CPU. Backend changes: - TaskWorker handles concurrent tasks using async Maps - Resource monitoring (memory %, CPU %) with backoff logic - Heartbeat reports active_task_count, max_concurrent_tasks, resource stats - Decommission support via worker_commands table Frontend changes: - Workers Dashboard shows tasks per worker (N/M format) - Resource badges with color-coded thresholds - Pod visualization with clickable selection - Decommission controls per worker New env vars: - MAX_CONCURRENT_TASKS (default: 3) - MEMORY_BACKOFF_THRESHOLD (default: 0.85) - CPU_BACKOFF_THRESHOLD (default: 0.90) - BACKOFF_DURATION_MS (default: 10000) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
17 KiB
Worker Task Architecture
This document describes the unified task-based worker system that replaces the legacy fragmented job systems.
Overview
The task worker architecture provides a single, unified system for managing all background work in CannaiQ:
- Store discovery - Find new dispensaries on platforms
- Entry point discovery - Resolve platform IDs from menu URLs
- Product discovery - Initial product fetch for new stores
- Product resync - Regular price/stock updates for existing stores
- Analytics refresh - Refresh materialized views and analytics
Architecture
Database Tables
worker_tasks - Central task queue
CREATE TABLE worker_tasks (
id SERIAL PRIMARY KEY,
role task_role NOT NULL, -- What type of work
dispensary_id INTEGER, -- Which store (if applicable)
platform VARCHAR(50), -- Which platform (dutchie, etc.)
status task_status DEFAULT 'pending',
priority INTEGER DEFAULT 0, -- Higher = process first
scheduled_for TIMESTAMP, -- Don't process before this time
worker_id VARCHAR(100), -- Which worker claimed it
claimed_at TIMESTAMP,
started_at TIMESTAMP,
completed_at TIMESTAMP,
last_heartbeat_at TIMESTAMP, -- For stale detection
result JSONB, -- Output from handler
error_message TEXT,
retry_count INTEGER DEFAULT 0,
max_retries INTEGER DEFAULT 3,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
Key indexes:
idx_worker_tasks_pending_priority- For efficient task claimingidx_worker_tasks_active_dispensary- Prevents concurrent tasks per store (partial unique index)
Task Roles
| Role | Purpose | Per-Store | Scheduled |
|---|---|---|---|
store_discovery |
Find new stores on a platform | No | Daily |
entry_point_discovery |
Resolve platform IDs | Yes | On-demand |
product_discovery |
Initial product fetch | Yes | After entry_point |
product_resync |
Price/stock updates | Yes | Every 4 hours |
analytics_refresh |
Refresh MVs | No | Daily |
Task Lifecycle
pending → claimed → running → completed
↓
failed
- pending - Task is waiting to be picked up
- claimed - Worker has claimed it (atomic via SELECT FOR UPDATE SKIP LOCKED)
- running - Worker is actively processing
- completed - Task finished successfully
- failed - Task encountered an error
- stale - Task lost its worker (recovered automatically)
Files
Core Files
| File | Purpose |
|---|---|
src/tasks/task-service.ts |
TaskService - CRUD, claiming, capacity metrics |
src/tasks/task-worker.ts |
TaskWorker - Main worker loop |
src/tasks/index.ts |
Module exports |
src/routes/tasks.ts |
API endpoints |
migrations/074_worker_task_queue.sql |
Database schema |
Task Handlers
| File | Role |
|---|---|
src/tasks/handlers/store-discovery.ts |
store_discovery |
src/tasks/handlers/entry-point-discovery.ts |
entry_point_discovery |
src/tasks/handlers/product-discovery.ts |
product_discovery |
src/tasks/handlers/product-resync.ts |
product_resync |
src/tasks/handlers/analytics-refresh.ts |
analytics_refresh |
Running Workers
Environment Variables
| Variable | Default | Description |
|---|---|---|
WORKER_ROLE |
(required) | Which task role to process |
WORKER_ID |
auto-generated | Custom worker identifier |
POLL_INTERVAL_MS |
5000 | How often to check for tasks |
HEARTBEAT_INTERVAL_MS |
30000 | How often to update heartbeat |
Starting a Worker
# Start a product resync worker
WORKER_ROLE=product_resync npx tsx src/tasks/task-worker.ts
# Start with custom ID
WORKER_ROLE=product_resync WORKER_ID=resync-1 npx tsx src/tasks/task-worker.ts
# Start multiple workers for different roles
WORKER_ROLE=store_discovery npx tsx src/tasks/task-worker.ts &
WORKER_ROLE=product_resync npx tsx src/tasks/task-worker.ts &
Kubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: task-worker-resync
spec:
replicas: 3
template:
spec:
containers:
- name: worker
image: code.cannabrands.app/creationshop/dispensary-scraper:latest
command: ["npx", "tsx", "src/tasks/task-worker.ts"]
env:
- name: WORKER_ROLE
value: "product_resync"
API Endpoints
Task Management
| Endpoint | Method | Description |
|---|---|---|
/api/tasks |
GET | List tasks with filters |
/api/tasks |
POST | Create a new task |
/api/tasks/:id |
GET | Get task by ID |
/api/tasks/counts |
GET | Get counts by status |
/api/tasks/capacity |
GET | Get capacity metrics |
/api/tasks/capacity/:role |
GET | Get role-specific capacity |
/api/tasks/recover-stale |
POST | Recover tasks from dead workers |
Task Generation
| Endpoint | Method | Description |
|---|---|---|
/api/tasks/generate/resync |
POST | Generate daily resync tasks |
/api/tasks/generate/discovery |
POST | Create store discovery task |
Migration (from legacy systems)
| Endpoint | Method | Description |
|---|---|---|
/api/tasks/migration/status |
GET | Compare old vs new systems |
/api/tasks/migration/disable-old-schedules |
POST | Disable job_schedules |
/api/tasks/migration/cancel-pending-crawl-jobs |
POST | Cancel old crawl jobs |
/api/tasks/migration/create-resync-tasks |
POST | Create tasks for all stores |
/api/tasks/migration/full-migrate |
POST | One-click migration |
Role-Specific Endpoints
| Endpoint | Method | Description |
|---|---|---|
/api/tasks/role/:role/last-completion |
GET | Last completion time |
/api/tasks/role/:role/recent |
GET | Recent completions |
/api/tasks/store/:id/active |
GET | Check if store has active task |
Capacity Planning
The v_worker_capacity view provides real-time metrics:
SELECT * FROM v_worker_capacity;
Returns:
pending_tasks- Tasks waiting to be claimedready_tasks- Tasks ready now (scheduled_for is null or past)claimed_tasks- Tasks claimed but not startedrunning_tasks- Tasks actively processingcompleted_last_hour- Recent completionsfailed_last_hour- Recent failuresactive_workers- Workers with recent heartbeatsavg_duration_sec- Average task durationtasks_per_worker_hour- Throughput estimateestimated_hours_to_drain- Time to clear queue
Scaling Recommendations
// API: GET /api/tasks/capacity/:role
{
"role": "product_resync",
"pending_tasks": 500,
"active_workers": 3,
"workers_needed": {
"for_1_hour": 10,
"for_4_hours": 3,
"for_8_hours": 2
}
}
Task Chaining
Tasks can automatically create follow-up tasks:
store_discovery → entry_point_discovery → product_discovery
↓
(store has platform_dispensary_id)
↓
Daily resync tasks
The chainNextTask() method handles this automatically.
Stale Task Recovery
Tasks are considered stale if last_heartbeat_at is older than the threshold (default 10 minutes).
SELECT recover_stale_tasks(10); -- 10 minute threshold
Or via API:
curl -X POST /api/tasks/recover-stale \
-H 'Content-Type: application/json' \
-d '{"threshold_minutes": 10}'
Migration from Legacy Systems
Legacy Systems Replaced
- job_schedules + job_run_logs - Scheduled job definitions
- dispensary_crawl_jobs - Per-dispensary crawl queue
- SyncOrchestrator + HydrationWorker - Raw payload processing
Migration Steps
Option 1: One-Click Migration
curl -X POST /api/tasks/migration/full-migrate
This will:
- Disable all job_schedules
- Cancel pending dispensary_crawl_jobs
- Generate resync tasks for all stores
- Create discovery and analytics tasks
Option 2: Manual Migration
# 1. Check current status
curl /api/tasks/migration/status
# 2. Disable old schedules
curl -X POST /api/tasks/migration/disable-old-schedules
# 3. Cancel pending crawl jobs
curl -X POST /api/tasks/migration/cancel-pending-crawl-jobs
# 4. Create resync tasks
curl -X POST /api/tasks/migration/create-resync-tasks \
-H 'Content-Type: application/json' \
-d '{"state_code": "AZ"}'
# 5. Generate daily resync schedule
curl -X POST /api/tasks/generate/resync \
-H 'Content-Type: application/json' \
-d '{"batches_per_day": 6}'
Per-Store Locking
The system prevents concurrent tasks for the same store using a partial unique index:
CREATE UNIQUE INDEX idx_worker_tasks_active_dispensary
ON worker_tasks (dispensary_id)
WHERE dispensary_id IS NOT NULL
AND status IN ('claimed', 'running');
This ensures only one task can be active per store at any time.
Task Priority
Tasks are claimed in priority order (higher first), then by creation time:
ORDER BY priority DESC, created_at ASC
Default priorities:
store_discovery: 0entry_point_discovery: 10 (high - new stores)product_discovery: 10 (high - new stores)product_resync: 0analytics_refresh: 0
Scheduled Tasks
Tasks can be scheduled for future execution:
await taskService.createTask({
role: 'product_resync',
dispensary_id: 123,
scheduled_for: new Date('2025-01-10T06:00:00Z'),
});
The generate_resync_tasks() function creates staggered tasks throughout the day:
SELECT generate_resync_tasks(6, '2025-01-10'); -- 6 batches = every 4 hours
Dashboard Integration
The admin dashboard shows task queue status in the main overview:
Task Queue Summary
------------------
Pending: 45
Running: 3
Completed: 1,234
Failed: 12
Full task management is available at /admin/tasks.
Error Handling
Failed tasks include the error message in error_message and can be retried:
-- View failed tasks
SELECT id, role, dispensary_id, error_message, retry_count
FROM worker_tasks
WHERE status = 'failed'
ORDER BY completed_at DESC
LIMIT 20;
-- Retry failed tasks
UPDATE worker_tasks
SET status = 'pending', retry_count = retry_count + 1
WHERE status = 'failed' AND retry_count < max_retries;
Concurrent Task Processing (Added 2024-12)
Workers can now process multiple tasks concurrently within a single worker instance. This improves throughput by utilizing async I/O efficiently.
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Pod (K8s) │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ TaskWorker │ │
│ │ │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ Task 1 │ │ Task 2 │ │ Task 3 │ (concurrent)│ │
│ │ └─────────┘ └─────────┘ └─────────┘ │ │
│ │ │ │
│ │ Resource Monitor │ │
│ │ ├── Memory: 65% (threshold: 85%) │ │
│ │ ├── CPU: 45% (threshold: 90%) │ │
│ │ └── Status: Normal │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Environment Variables
| Variable | Default | Description |
|---|---|---|
MAX_CONCURRENT_TASKS |
3 | Maximum tasks a worker will run concurrently |
MEMORY_BACKOFF_THRESHOLD |
0.85 | Back off when heap memory exceeds 85% |
CPU_BACKOFF_THRESHOLD |
0.90 | Back off when CPU exceeds 90% |
BACKOFF_DURATION_MS |
10000 | How long to wait when backing off (10s) |
How It Works
- Main Loop: Worker continuously tries to fill up to
MAX_CONCURRENT_TASKS - Resource Monitoring: Before claiming a new task, worker checks memory and CPU
- Backoff: If resources exceed thresholds, worker pauses and stops claiming new tasks
- Concurrent Execution: Tasks run in parallel using
Promise- they don't block each other - Graceful Shutdown: On SIGTERM/decommission, worker stops claiming but waits for active tasks
Resource Monitoring
// ResourceStats interface
interface ResourceStats {
memoryPercent: number; // Current heap usage as decimal (0.0-1.0)
memoryMb: number; // Current heap used in MB
memoryTotalMb: number; // Total heap available in MB
cpuPercent: number; // CPU usage as percentage (0-100)
isBackingOff: boolean; // True if worker is in backoff state
backoffReason: string; // Why the worker is backing off
}
Heartbeat Data
Workers report the following in their heartbeat:
{
"worker_id": "worker-abc123",
"current_task_id": 456,
"current_task_ids": [456, 457, 458],
"active_task_count": 3,
"max_concurrent_tasks": 3,
"status": "active",
"resources": {
"memory_mb": 256,
"memory_total_mb": 512,
"memory_rss_mb": 320,
"memory_percent": 50,
"cpu_user_ms": 12500,
"cpu_system_ms": 3200,
"cpu_percent": 45,
"is_backing_off": false,
"backoff_reason": null
}
}
Backoff Behavior
When resources exceed thresholds:
-
Worker logs the backoff reason:
[TaskWorker] MyWorker backing off: Memory at 87.3% (threshold: 85%) -
Worker stops claiming new tasks but continues existing tasks
-
After
BACKOFF_DURATION_MS, worker rechecks resources -
When resources return to normal:
[TaskWorker] MyWorker resuming normal operation
UI Display
The Workers Dashboard shows:
- Tasks Column:
2/3 tasks(active/max concurrent) - Resources Column: Memory % and CPU % with color coding
- Green: < 50%
- Yellow: 50-74%
- Amber: 75-89%
- Red: 90%+
- Backing Off: Orange warning badge when worker is in backoff state
Task Count Badge Details
┌─────────────────────────────────────────────┐
│ Worker: "MyWorker" │
│ Tasks: 2/3 tasks #456, #457 │
│ Resources: 🧠 65% 💻 45% │
│ Status: ● Active │
└─────────────────────────────────────────────┘
Best Practices
- Start Conservative: Use
MAX_CONCURRENT_TASKS=3initially - Monitor Resources: Watch for frequent backoffs in logs
- Tune Per Workload: I/O-bound tasks benefit from higher concurrency
- Scale Horizontally: Add more pods rather than cranking concurrency too high
Code References
| File | Purpose |
|---|---|
src/tasks/task-worker.ts:68-71 |
Concurrency environment variables |
src/tasks/task-worker.ts:104-111 |
ResourceStats interface |
src/tasks/task-worker.ts:149-179 |
getResourceStats() method |
src/tasks/task-worker.ts:184-196 |
shouldBackOff() method |
src/tasks/task-worker.ts:462-516 |
mainLoop() with concurrent claiming |
src/routes/worker-registry.ts:148-195 |
Heartbeat endpoint handling |
cannaiq/src/pages/WorkersDashboard.tsx:233-305 |
UI components for resources |
Monitoring
Logs
Workers log to stdout:
[TaskWorker] Starting worker worker-product_resync-a1b2c3d4 for role: product_resync
[TaskWorker] Claimed task 123 (product_resync) for dispensary 456
[TaskWorker] Task 123 completed successfully
Health Check
Check if workers are active:
SELECT worker_id, role, COUNT(*), MAX(last_heartbeat_at)
FROM worker_tasks
WHERE last_heartbeat_at > NOW() - INTERVAL '5 minutes'
GROUP BY worker_id, role;
Metrics
-- Tasks by status
SELECT status, COUNT(*) FROM worker_tasks GROUP BY status;
-- Tasks by role
SELECT role, status, COUNT(*) FROM worker_tasks GROUP BY role, status;
-- Average duration by role
SELECT role, AVG(EXTRACT(EPOCH FROM (completed_at - started_at))) as avg_seconds
FROM worker_tasks
WHERE status = 'completed' AND completed_at > NOW() - INTERVAL '24 hours'
GROUP BY role;