feat: Worker improvements and Run Now duplicate prevention
- Fix Run Now to prevent duplicate task creation - Add loading state to Run Now button in UI - Return early when no stores need refresh - Worker dashboard improvements - Browser pooling architecture updates - K8s worker config updates (8 replicas, 3 concurrent tasks)
This commit is contained in:
25
CLAUDE.md
25
CLAUDE.md
@@ -25,13 +25,26 @@ Never import `src/db/migrate.ts` at runtime. Use `src/db/pool.ts` for DB access.
|
||||
- **Worker** = Concurrent task runner INSIDE a pod (controlled by `MAX_CONCURRENT_TASKS` env var)
|
||||
- Formula: `8 pods × MAX_CONCURRENT_TASKS = total concurrent workers`
|
||||
|
||||
**To increase workers:** Change `MAX_CONCURRENT_TASKS` env var, NOT replicas.
|
||||
```bash
|
||||
# CORRECT - increase workers per pod
|
||||
kubectl set env deployment/scraper-worker -n dispensary-scraper MAX_CONCURRENT_TASKS=5
|
||||
**Browser Task Memory Limits:**
|
||||
- Each Puppeteer/Chrome browser uses ~400 MB RAM
|
||||
- Pod memory limit is 2 GB
|
||||
- **MAX_CONCURRENT_TASKS=3** is the safe maximum for browser tasks
|
||||
- More than 3 concurrent browsers per pod = OOM crash
|
||||
|
||||
# WRONG - never scale above 8 replicas
|
||||
kubectl scale deployment/scraper-worker --replicas=20 # NEVER DO THIS
|
||||
| Browsers | RAM Used | Status |
|
||||
|----------|----------|--------|
|
||||
| 3 | ~1.3 GB | Safe (recommended) |
|
||||
| 4 | ~1.7 GB | Risky |
|
||||
| 5+ | >2 GB | OOM crash |
|
||||
|
||||
**To increase throughput:** Add more pods (up to 8), NOT more concurrent tasks per pod.
|
||||
|
||||
```bash
|
||||
# CORRECT - scale pods (up to 8)
|
||||
kubectl scale deployment/scraper-worker -n dispensary-scraper --replicas=8
|
||||
|
||||
# WRONG - will cause OOM crashes
|
||||
kubectl set env deployment/scraper-worker -n dispensary-scraper MAX_CONCURRENT_TASKS=10
|
||||
```
|
||||
|
||||
**If K8s API returns ServiceUnavailable:** STOP IMMEDIATELY. Do not retry. The cluster is overloaded.
|
||||
|
||||
@@ -504,6 +504,103 @@ The Workers Dashboard shows:
|
||||
| `src/routes/worker-registry.ts:148-195` | Heartbeat endpoint handling |
|
||||
| `cannaiq/src/pages/WorkersDashboard.tsx:233-305` | UI components for resources |
|
||||
|
||||
## Browser Task Memory Limits (Updated 2025-12)
|
||||
|
||||
Browser-based tasks (Puppeteer/Chrome) have strict memory constraints that limit concurrency.
|
||||
|
||||
### Why Browser Tasks Are Different
|
||||
|
||||
Each browser task launches a Chrome process. Unlike I/O-bound API calls, browsers consume significant RAM:
|
||||
|
||||
| Component | RAM Usage |
|
||||
|-----------|-----------|
|
||||
| Node.js runtime | ~150 MB |
|
||||
| Chrome browser (base) | ~200-250 MB |
|
||||
| Dutchie menu page (loaded) | ~100-150 MB |
|
||||
| **Per browser total** | **~350-450 MB** |
|
||||
|
||||
### Memory Math for Pod Limits
|
||||
|
||||
```
|
||||
Pod memory limit: 2 GB (2000 MB)
|
||||
Node.js runtime: -150 MB
|
||||
Safety buffer: -100 MB
|
||||
────────────────────────────────
|
||||
Available for browsers: 1750 MB
|
||||
|
||||
Per browser + page: ~400 MB
|
||||
|
||||
Max browsers: 1750 ÷ 400 = ~4 browsers
|
||||
|
||||
Recommended: 3 browsers (leaves headroom for spikes)
|
||||
```
|
||||
|
||||
### MAX_CONCURRENT_TASKS for Browser Tasks
|
||||
|
||||
| Browsers per Pod | RAM Used | Risk Level |
|
||||
|------------------|----------|------------|
|
||||
| 1 | ~500 MB | Very safe |
|
||||
| 2 | ~900 MB | Safe |
|
||||
| **3** | **~1.3 GB** | **Recommended** |
|
||||
| 4 | ~1.7 GB | Tight (may OOM) |
|
||||
| 5+ | >2 GB | Will OOM crash |
|
||||
|
||||
**CRITICAL**: `MAX_CONCURRENT_TASKS=3` is the maximum safe value for browser tasks with current pod limits.
|
||||
|
||||
### Scaling Strategy
|
||||
|
||||
Scale **horizontally** (more pods) rather than vertically (more concurrency per pod):
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ Cluster: 8 pods × 3 browsers = 24 concurrent tasks │
|
||||
│ │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||
│ │ Pod 0 │ │ Pod 1 │ │ Pod 2 │ │ Pod 3 │ │
|
||||
│ │ 3 browsers │ │ 3 browsers │ │ 3 browsers │ │ 3 browsers │ │
|
||||
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||
│ │ Pod 4 │ │ Pod 5 │ │ Pod 6 │ │ Pod 7 │ │
|
||||
│ │ 3 browsers │ │ 3 browsers │ │ 3 browsers │ │ 3 browsers │ │
|
||||
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Browser Lifecycle Per Task
|
||||
|
||||
Each task gets a fresh browser with fresh IP/identity:
|
||||
|
||||
```
|
||||
1. Claim task from queue
|
||||
2. Get fresh proxy from pool
|
||||
3. Launch browser with proxy
|
||||
4. Run preflight (verify IP)
|
||||
5. Execute scrape
|
||||
6. Close browser
|
||||
7. Repeat
|
||||
```
|
||||
|
||||
This ensures:
|
||||
- Fresh IP per task (proxy rotation)
|
||||
- Fresh fingerprint per task (UA rotation)
|
||||
- No cookie/session bleed between tasks
|
||||
- Predictable memory usage
|
||||
|
||||
### Increasing Capacity
|
||||
|
||||
To handle more concurrent tasks:
|
||||
|
||||
1. **Add more pods** (up to 8 per CLAUDE.md limit)
|
||||
2. **Increase pod memory** (allows 4 browsers per pod):
|
||||
```yaml
|
||||
resources:
|
||||
limits:
|
||||
memory: "2.5Gi" # from 2Gi
|
||||
```
|
||||
|
||||
**DO NOT** simply increase `MAX_CONCURRENT_TASKS` without also increasing pod memory limits.
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Logs
|
||||
|
||||
@@ -526,14 +526,17 @@ router.delete('/schedules/:id', async (req: Request, res: Response) => {
|
||||
/**
|
||||
* POST /api/tasks/schedules/:id/run-now
|
||||
* Manually trigger a scheduled task to run immediately
|
||||
*
|
||||
* For product_discovery schedules with state_code, this creates individual
|
||||
* tasks for each store in that state (fans out properly).
|
||||
*/
|
||||
router.post('/schedules/:id/run-now', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const scheduleId = parseInt(req.params.id, 10);
|
||||
|
||||
// Get the schedule
|
||||
// Get the full schedule
|
||||
const scheduleResult = await pool.query(`
|
||||
SELECT id, name, role, state_code, platform, priority
|
||||
SELECT id, name, role, state_code, platform, priority, interval_hours, method
|
||||
FROM task_schedules WHERE id = $1
|
||||
`, [scheduleId]);
|
||||
|
||||
@@ -542,27 +545,80 @@ router.post('/schedules/:id/run-now', async (req: Request, res: Response) => {
|
||||
}
|
||||
|
||||
const schedule = scheduleResult.rows[0];
|
||||
let tasksCreated = 0;
|
||||
|
||||
// Create a task based on the schedule
|
||||
const task = await taskService.createTask({
|
||||
// For product_discovery with state_code, fan out to individual stores
|
||||
if (schedule.role === 'product_discovery' && schedule.state_code) {
|
||||
// Find stores in this state needing refresh
|
||||
const storeResult = await pool.query(`
|
||||
SELECT d.id
|
||||
FROM dispensaries d
|
||||
JOIN states s ON d.state_id = s.id
|
||||
WHERE d.crawl_enabled = true
|
||||
AND d.platform_dispensary_id IS NOT NULL
|
||||
AND s.code = $1
|
||||
-- No pending/running product_discovery task already
|
||||
AND NOT EXISTS (
|
||||
SELECT 1 FROM worker_tasks t
|
||||
WHERE t.dispensary_id = d.id
|
||||
AND t.role = 'product_discovery'
|
||||
AND t.status IN ('pending', 'claimed', 'running')
|
||||
)
|
||||
ORDER BY d.last_fetch_at NULLS FIRST, d.id
|
||||
`, [schedule.state_code]);
|
||||
|
||||
const dispensaryIds = storeResult.rows.map((r: { id: number }) => r.id);
|
||||
|
||||
if (dispensaryIds.length > 0) {
|
||||
// Create staggered tasks for all stores
|
||||
const result = await taskService.createStaggeredTasks(
|
||||
dispensaryIds,
|
||||
'product_discovery',
|
||||
15, // 15 seconds stagger
|
||||
schedule.platform || 'dutchie',
|
||||
schedule.method || 'http'
|
||||
);
|
||||
tasksCreated = result.created;
|
||||
} else {
|
||||
// No stores need refresh - return early with message
|
||||
return res.json({
|
||||
success: true,
|
||||
message: `No ${schedule.state_code} stores need refresh at this time`,
|
||||
tasksCreated: 0,
|
||||
stateCode: schedule.state_code,
|
||||
});
|
||||
}
|
||||
} else if (schedule.role !== 'product_discovery') {
|
||||
// For other schedules (store_discovery, analytics_refresh), create a single task
|
||||
await taskService.createTask({
|
||||
role: schedule.role,
|
||||
platform: schedule.platform,
|
||||
priority: schedule.priority + 10, // Boost priority for manual runs
|
||||
priority: schedule.priority + 10,
|
||||
method: schedule.method,
|
||||
});
|
||||
tasksCreated = 1;
|
||||
} else {
|
||||
// product_discovery without state_code - shouldn't happen, reject
|
||||
return res.status(400).json({
|
||||
error: 'product_discovery schedules require a state_code',
|
||||
});
|
||||
}
|
||||
|
||||
// Update last_run_at on the schedule
|
||||
await pool.query(`
|
||||
UPDATE task_schedules
|
||||
SET last_run_at = NOW(),
|
||||
next_run_at = NOW() + (interval_hours || ' hours')::interval,
|
||||
last_task_count = $2,
|
||||
updated_at = NOW()
|
||||
WHERE id = $1
|
||||
`, [scheduleId]);
|
||||
`, [scheduleId, tasksCreated]);
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
message: `Schedule "${schedule.name}" triggered`,
|
||||
task,
|
||||
tasksCreated,
|
||||
stateCode: schedule.state_code,
|
||||
});
|
||||
} catch (error: unknown) {
|
||||
console.error('Error running schedule:', error);
|
||||
@@ -1187,6 +1243,142 @@ router.post('/batch/az-stores', async (req: Request, res: Response) => {
|
||||
}
|
||||
});
|
||||
|
||||
// ============================================================
|
||||
// STATE-BASED CRAWL ENDPOINTS
|
||||
// ============================================================
|
||||
|
||||
/**
|
||||
* POST /api/tasks/crawl-state/:stateCode
|
||||
* Create product_discovery tasks for all stores in a state
|
||||
*
|
||||
* This is the primary endpoint for triggering crawls by state.
|
||||
* Creates staggered tasks for all crawl-enabled stores in the specified state.
|
||||
*
|
||||
* Params:
|
||||
* - stateCode: State code (e.g., 'AZ', 'CA', 'CO')
|
||||
*
|
||||
* Body (optional):
|
||||
* - stagger_seconds: number (default: 15) - Seconds between each task
|
||||
* - priority: number (default: 10) - Task priority
|
||||
* - method: 'curl' | 'http' | null (default: 'http')
|
||||
*
|
||||
* Returns:
|
||||
* - tasks_created: Number of tasks created
|
||||
* - stores_in_state: Total stores found for the state
|
||||
* - skipped: Number skipped (already have active tasks)
|
||||
*/
|
||||
router.post('/crawl-state/:stateCode', async (req: Request, res: Response) => {
|
||||
try {
|
||||
const stateCode = req.params.stateCode.toUpperCase();
|
||||
const {
|
||||
stagger_seconds = 15,
|
||||
priority = 10,
|
||||
method = 'http',
|
||||
} = req.body;
|
||||
|
||||
// Verify state exists
|
||||
const stateResult = await pool.query(`
|
||||
SELECT id, code, name FROM states WHERE code = $1
|
||||
`, [stateCode]);
|
||||
|
||||
if (stateResult.rows.length === 0) {
|
||||
return res.status(404).json({
|
||||
error: 'State not found',
|
||||
state_code: stateCode,
|
||||
});
|
||||
}
|
||||
|
||||
const state = stateResult.rows[0];
|
||||
|
||||
// Get all crawl-enabled dispensaries in this state
|
||||
const dispensariesResult = await pool.query(`
|
||||
SELECT d.id, d.name
|
||||
FROM dispensaries d
|
||||
WHERE d.state_id = $1
|
||||
AND d.crawl_enabled = true
|
||||
AND d.platform_dispensary_id IS NOT NULL
|
||||
ORDER BY d.last_fetch_at NULLS FIRST, d.id
|
||||
`, [state.id]);
|
||||
|
||||
if (dispensariesResult.rows.length === 0) {
|
||||
return res.status(200).json({
|
||||
success: true,
|
||||
message: `No crawl-enabled stores found in ${state.name}`,
|
||||
state_code: stateCode,
|
||||
state_name: state.name,
|
||||
tasks_created: 0,
|
||||
stores_in_state: 0,
|
||||
});
|
||||
}
|
||||
|
||||
const dispensaryIds = dispensariesResult.rows.map((d: { id: number }) => d.id);
|
||||
|
||||
// Create staggered tasks
|
||||
const result = await taskService.createStaggeredTasks(
|
||||
dispensaryIds,
|
||||
'product_discovery',
|
||||
stagger_seconds,
|
||||
'dutchie',
|
||||
method
|
||||
);
|
||||
|
||||
const totalDuration = (result.created - 1) * stagger_seconds;
|
||||
const estimatedEndTime = new Date(Date.now() + totalDuration * 1000);
|
||||
|
||||
res.status(201).json({
|
||||
success: true,
|
||||
state_code: stateCode,
|
||||
state_name: state.name,
|
||||
tasks_created: result.created,
|
||||
stores_in_state: dispensariesResult.rows.length,
|
||||
skipped: dispensariesResult.rows.length - result.created,
|
||||
stagger_seconds,
|
||||
total_duration_seconds: totalDuration,
|
||||
estimated_completion: estimatedEndTime.toISOString(),
|
||||
message: `Created ${result.created} product_discovery tasks for ${state.name} (${stagger_seconds}s apart, ~${Math.ceil(totalDuration / 60)} min total)`,
|
||||
});
|
||||
} catch (error: unknown) {
|
||||
console.error('Error creating state crawl tasks:', error);
|
||||
res.status(500).json({ error: 'Failed to create state crawl tasks' });
|
||||
}
|
||||
});
|
||||
|
||||
/**
|
||||
* GET /api/tasks/states
|
||||
* List all states with their store counts and crawl status
|
||||
*/
|
||||
router.get('/states', async (_req: Request, res: Response) => {
|
||||
try {
|
||||
const result = await pool.query(`
|
||||
SELECT
|
||||
s.code,
|
||||
s.name,
|
||||
COUNT(d.id)::int as total_stores,
|
||||
COUNT(d.id) FILTER (WHERE d.crawl_enabled = true AND d.platform_dispensary_id IS NOT NULL)::int as crawl_enabled_stores,
|
||||
COUNT(d.id) FILTER (WHERE d.crawl_enabled = true AND d.platform_dispensary_id IS NULL)::int as missing_platform_id,
|
||||
MAX(d.last_fetch_at) as last_crawl_at,
|
||||
(SELECT COUNT(*) FROM worker_tasks t
|
||||
JOIN dispensaries d2 ON t.dispensary_id = d2.id
|
||||
WHERE d2.state_id = s.id
|
||||
AND t.role = 'product_discovery'
|
||||
AND t.status IN ('pending', 'claimed', 'running'))::int as active_tasks
|
||||
FROM states s
|
||||
LEFT JOIN dispensaries d ON d.state_id = s.id
|
||||
GROUP BY s.id, s.code, s.name
|
||||
HAVING COUNT(d.id) > 0
|
||||
ORDER BY COUNT(d.id) DESC
|
||||
`);
|
||||
|
||||
res.json({
|
||||
states: result.rows,
|
||||
total_states: result.rows.length,
|
||||
});
|
||||
} catch (error: unknown) {
|
||||
console.error('Error listing states:', error);
|
||||
res.status(500).json({ error: 'Failed to list states' });
|
||||
}
|
||||
});
|
||||
|
||||
// ============================================================
|
||||
// TASK POOL MANAGEMENT
|
||||
// ============================================================
|
||||
|
||||
@@ -155,7 +155,12 @@ router.post('/heartbeat', async (req: Request, res: Response) => {
|
||||
active_task_count,
|
||||
max_concurrent_tasks,
|
||||
status = 'active',
|
||||
resources
|
||||
resources,
|
||||
// Step tracking fields
|
||||
current_step,
|
||||
current_step_detail,
|
||||
current_step_started_at,
|
||||
task_steps,
|
||||
} = req.body;
|
||||
|
||||
if (!worker_id) {
|
||||
@@ -168,6 +173,11 @@ router.post('/heartbeat', async (req: Request, res: Response) => {
|
||||
if (current_task_ids) metadata.current_task_ids = current_task_ids;
|
||||
if (active_task_count !== undefined) metadata.active_task_count = active_task_count;
|
||||
if (max_concurrent_tasks !== undefined) metadata.max_concurrent_tasks = max_concurrent_tasks;
|
||||
// Step tracking
|
||||
if (current_step) metadata.current_step = current_step;
|
||||
if (current_step_detail) metadata.current_step_detail = current_step_detail;
|
||||
if (current_step_started_at) metadata.current_step_started_at = current_step_started_at;
|
||||
if (task_steps) metadata.task_steps = task_steps;
|
||||
|
||||
// Store resources in metadata jsonb column
|
||||
const { rows } = await pool.query(`
|
||||
|
||||
@@ -27,7 +27,7 @@ import { taskService } from '../task-service';
|
||||
const FILTERED_PRODUCTS_HASH = 'ee29c060826dc41c527e470e9ae502c9b2c169720faa0a9f5d25e1b9a530a4a0';
|
||||
|
||||
export async function handleProductDiscoveryHttp(ctx: TaskContext): Promise<TaskResult> {
|
||||
const { pool, task, crawlRotator } = ctx;
|
||||
const { pool, task, crawlRotator, updateStep } = ctx;
|
||||
const dispensaryId = task.dispensary_id;
|
||||
|
||||
if (!dispensaryId) {
|
||||
@@ -40,6 +40,7 @@ export async function handleProductDiscoveryHttp(ctx: TaskContext): Promise<Task
|
||||
// ============================================================
|
||||
// STEP 1: Load dispensary info
|
||||
// ============================================================
|
||||
updateStep('loading', 'Loading dispensary info');
|
||||
const dispResult = await pool.query(`
|
||||
SELECT
|
||||
id, name, platform_dispensary_id, menu_url, menu_type, city, state
|
||||
@@ -70,6 +71,7 @@ export async function handleProductDiscoveryHttp(ctx: TaskContext): Promise<Task
|
||||
// ============================================================
|
||||
// STEP 2: Setup Puppeteer with proxy
|
||||
// ============================================================
|
||||
updateStep('preflight', `Launching browser for ${dispensary.name}`);
|
||||
const puppeteer = require('puppeteer-extra');
|
||||
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
|
||||
puppeteer.use(StealthPlugin());
|
||||
@@ -114,6 +116,7 @@ export async function handleProductDiscoveryHttp(ctx: TaskContext): Promise<Task
|
||||
// ============================================================
|
||||
// STEP 3: Establish session by visiting embedded menu
|
||||
// ============================================================
|
||||
updateStep('navigating', `Loading menu page`);
|
||||
const embedUrl = `https://dutchie.com/embedded-menu/${cName}?menuType=rec`;
|
||||
console.log(`[ProductDiscoveryHTTP] Establishing session at ${embedUrl}...`);
|
||||
|
||||
@@ -178,6 +181,7 @@ export async function handleProductDiscoveryHttp(ctx: TaskContext): Promise<Task
|
||||
// ============================================================
|
||||
// STEP 4: Fetch ALL products via GraphQL from browser context
|
||||
// ============================================================
|
||||
updateStep('fetching', `Executing GraphQL query`);
|
||||
const result = await page.evaluate(async (platformId: string, graphqlHash: string) => {
|
||||
const allProducts: any[] = [];
|
||||
const logs: string[] = [];
|
||||
@@ -301,6 +305,7 @@ export async function handleProductDiscoveryHttp(ctx: TaskContext): Promise<Task
|
||||
// ============================================================
|
||||
// STEP 5: Save raw payload to filesystem
|
||||
// ============================================================
|
||||
updateStep('saving', `Saving ${result.products.length} products`);
|
||||
const rawPayload = {
|
||||
dispensaryId,
|
||||
platformId,
|
||||
|
||||
@@ -32,7 +32,7 @@ import { taskService } from '../task-service';
|
||||
const normalizer = new DutchieNormalizer();
|
||||
|
||||
export async function handleProductRefresh(ctx: TaskContext): Promise<TaskResult> {
|
||||
const { pool, task } = ctx;
|
||||
const { pool, task, updateStep } = ctx;
|
||||
const dispensaryId = task.dispensary_id;
|
||||
|
||||
if (!dispensaryId) {
|
||||
@@ -43,6 +43,7 @@ export async function handleProductRefresh(ctx: TaskContext): Promise<TaskResult
|
||||
// ============================================================
|
||||
// STEP 1: Load dispensary info
|
||||
// ============================================================
|
||||
updateStep('loading', 'Loading dispensary info');
|
||||
const dispResult = await pool.query(`
|
||||
SELECT
|
||||
id, name, platform_dispensary_id, menu_url, menu_type, city, state
|
||||
@@ -68,6 +69,7 @@ export async function handleProductRefresh(ctx: TaskContext): Promise<TaskResult
|
||||
// STEP 2: Load payload from filesystem
|
||||
// Per TASK_WORKFLOW_2024-12-10.md: Read local payload, not API
|
||||
// ============================================================
|
||||
updateStep('loading', 'Loading payload from storage');
|
||||
let payloadData: any;
|
||||
let payloadId: number;
|
||||
|
||||
@@ -142,6 +144,7 @@ export async function handleProductRefresh(ctx: TaskContext): Promise<TaskResult
|
||||
// ============================================================
|
||||
// STEP 3: Normalize data
|
||||
// ============================================================
|
||||
updateStep('normalizing', `Normalizing ${allProducts.length} products`);
|
||||
console.log(`[ProductRefresh] Normalizing ${allProducts.length} products...`);
|
||||
|
||||
// Build RawPayload for the normalizer
|
||||
|
||||
@@ -271,52 +271,69 @@ class TaskService {
|
||||
}
|
||||
|
||||
/**
|
||||
* Mark a task as failed, with auto-retry if under max_retries
|
||||
* Returns true if task was re-queued for retry, false if permanently failed
|
||||
* Determine if an error is a "soft failure" (transient) that should be requeued
|
||||
* Soft failures: timeouts, connection issues, browser launch issues
|
||||
* Hard failures: business logic errors like "No products returned"
|
||||
*/
|
||||
async failTask(taskId: number, errorMessage: string): Promise<boolean> {
|
||||
// Get current retry state
|
||||
const result = await pool.query(
|
||||
`SELECT retry_count, max_retries FROM worker_tasks WHERE id = $1`,
|
||||
[taskId]
|
||||
);
|
||||
private isSoftFailure(errorMessage: string): boolean {
|
||||
const softFailurePatterns = [
|
||||
/timeout/i,
|
||||
/timed out/i,
|
||||
/connection.*terminated/i,
|
||||
/connection.*refused/i,
|
||||
/ECONNRESET/i,
|
||||
/ECONNREFUSED/i,
|
||||
/ETIMEDOUT/i,
|
||||
/socket hang up/i,
|
||||
/WS endpoint/i,
|
||||
/browser process/i,
|
||||
/Failed to launch/i,
|
||||
/Navigation.*exceeded/i,
|
||||
/net::ERR_/i,
|
||||
/ENOENT.*storage/i, // Storage path issues (transient)
|
||||
/ENOENT.*payload/i, // Payload path issues (transient)
|
||||
];
|
||||
|
||||
if (result.rows.length === 0) {
|
||||
return false;
|
||||
return softFailurePatterns.some(pattern => pattern.test(errorMessage));
|
||||
}
|
||||
|
||||
const { retry_count, max_retries } = result.rows[0];
|
||||
const newRetryCount = (retry_count || 0) + 1;
|
||||
/**
|
||||
* Mark a task as failed
|
||||
*
|
||||
* Soft failures (timeouts, connection issues): Requeue back to pending for later pickup
|
||||
* Hard failures (business logic errors): Mark as failed permanently
|
||||
*/
|
||||
async failTask(taskId: number, errorMessage: string): Promise<boolean> {
|
||||
const isSoft = this.isSoftFailure(errorMessage);
|
||||
|
||||
if (newRetryCount < (max_retries || 3)) {
|
||||
// Re-queue for retry - reset to pending with incremented retry_count
|
||||
if (isSoft) {
|
||||
// Soft failure: put back in queue immediately for another worker
|
||||
await pool.query(
|
||||
`UPDATE worker_tasks
|
||||
SET status = 'pending',
|
||||
worker_id = NULL,
|
||||
claimed_at = NULL,
|
||||
started_at = NULL,
|
||||
retry_count = $2,
|
||||
error_message = $3,
|
||||
error_message = $2,
|
||||
scheduled_for = NULL,
|
||||
updated_at = NOW()
|
||||
WHERE id = $1`,
|
||||
[taskId, newRetryCount, `Retry ${newRetryCount}: ${errorMessage}`]
|
||||
[taskId, `Requeued: ${errorMessage}`]
|
||||
);
|
||||
console.log(`[TaskService] Task ${taskId} queued for retry ${newRetryCount}/${max_retries || 3}`);
|
||||
console.log(`[TaskService] Task ${taskId} requeued for another worker`);
|
||||
return true;
|
||||
}
|
||||
|
||||
// Max retries exceeded - mark as permanently failed
|
||||
// Hard failure: mark as permanently failed
|
||||
await pool.query(
|
||||
`UPDATE worker_tasks
|
||||
SET status = 'failed',
|
||||
completed_at = NOW(),
|
||||
retry_count = $2,
|
||||
error_message = $3
|
||||
error_message = $2
|
||||
WHERE id = $1`,
|
||||
[taskId, newRetryCount, `Failed after ${newRetryCount} attempts: ${errorMessage}`]
|
||||
[taskId, `Hard failure: ${errorMessage}`]
|
||||
);
|
||||
console.log(`[TaskService] Task ${taskId} permanently failed after ${newRetryCount} attempts`);
|
||||
console.log(`[TaskService] Task ${taskId} hard failed: ${errorMessage}`);
|
||||
return false;
|
||||
}
|
||||
|
||||
|
||||
@@ -97,8 +97,12 @@ const API_BASE_URL = process.env.API_BASE_URL || 'http://localhost:3010';
|
||||
// =============================================================================
|
||||
|
||||
// Maximum number of tasks this worker will run concurrently
|
||||
// Tune based on workload: I/O-bound tasks benefit from higher concurrency
|
||||
const MAX_CONCURRENT_TASKS = parseInt(process.env.MAX_CONCURRENT_TASKS || '15');
|
||||
// Browser tasks (Puppeteer) use ~400MB RAM each. With 2GB pod limit:
|
||||
// - 3 browsers = ~1.3GB = SAFE
|
||||
// - 4 browsers = ~1.7GB = RISKY
|
||||
// - 5+ browsers = OOM CRASH
|
||||
// See: docs/WORKER_TASK_ARCHITECTURE.md#browser-task-memory-limits
|
||||
const MAX_CONCURRENT_TASKS = parseInt(process.env.MAX_CONCURRENT_TASKS || '3');
|
||||
|
||||
// When heap memory usage exceeds this threshold (as decimal 0.0-1.0), stop claiming new tasks
|
||||
// Default 85% - gives headroom before OOM
|
||||
@@ -131,6 +135,8 @@ export interface TaskContext {
|
||||
task: WorkerTask;
|
||||
heartbeat: () => Promise<void>;
|
||||
crawlRotator?: CrawlRotator;
|
||||
/** Update the current step being executed (shown in dashboard) */
|
||||
updateStep: (step: string, detail?: string) => void;
|
||||
}
|
||||
|
||||
export interface TaskResult {
|
||||
@@ -264,6 +270,18 @@ export class TaskWorker {
|
||||
private preflightsCompleted: boolean = false;
|
||||
private initializingPromise: Promise<void> | null = null;
|
||||
|
||||
// ==========================================================================
|
||||
// STEP TRACKING FOR DASHBOARD VISIBILITY
|
||||
// ==========================================================================
|
||||
// Workers report their current step in heartbeats so the dashboard can show
|
||||
// real-time progress like "preflight", "loading page", "processing products"
|
||||
// ==========================================================================
|
||||
private currentStep: string = 'idle';
|
||||
private currentStepDetail: string | null = null;
|
||||
private currentStepStartedAt: Date | null = null;
|
||||
/** Map of task ID -> step info for concurrent tasks */
|
||||
private taskSteps: Map<number, { step: string; detail: string | null; startedAt: Date }> = new Map();
|
||||
|
||||
constructor(role: TaskRole | null = null, workerId?: string) {
|
||||
this.pool = getPool();
|
||||
this.role = role;
|
||||
@@ -346,6 +364,65 @@ export class TaskWorker {
|
||||
return this.activeTasks.size < this.maxConcurrentTasks;
|
||||
}
|
||||
|
||||
// ==========================================================================
|
||||
// STEP TRACKING METHODS
|
||||
// ==========================================================================
|
||||
|
||||
/**
|
||||
* Update the current step for a task (for dashboard visibility)
|
||||
* @param taskId - The task ID to update
|
||||
* @param step - Short step name (e.g., "preflight", "loading", "processing")
|
||||
* @param detail - Optional detail (e.g., "Verifying IP 1.2.3.4")
|
||||
*/
|
||||
public updateTaskStep(taskId: number, step: string, detail?: string): void {
|
||||
this.taskSteps.set(taskId, {
|
||||
step,
|
||||
detail: detail || null,
|
||||
startedAt: new Date(),
|
||||
});
|
||||
|
||||
// Also update the "primary" step for single-task backwards compat
|
||||
if (this.activeTasks.size === 1 || taskId === Array.from(this.activeTasks.keys())[0]) {
|
||||
this.currentStep = step;
|
||||
this.currentStepDetail = detail || null;
|
||||
this.currentStepStartedAt = new Date();
|
||||
}
|
||||
|
||||
console.log(`[TaskWorker] Step: ${step}${detail ? ` - ${detail}` : ''} (task #${taskId})`);
|
||||
}
|
||||
|
||||
/**
|
||||
* Clear step tracking for a task (when task completes)
|
||||
*/
|
||||
private clearTaskStep(taskId: number): void {
|
||||
this.taskSteps.delete(taskId);
|
||||
|
||||
// Reset primary step if no more active tasks
|
||||
if (this.activeTasks.size === 0) {
|
||||
this.currentStep = 'idle';
|
||||
this.currentStepDetail = null;
|
||||
this.currentStepStartedAt = null;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Get current step info for all active tasks (for heartbeat)
|
||||
*/
|
||||
private getTaskStepsInfo(): Array<{
|
||||
task_id: number;
|
||||
step: string;
|
||||
detail: string | null;
|
||||
elapsed_ms: number;
|
||||
}> {
|
||||
const now = Date.now();
|
||||
return Array.from(this.taskSteps.entries()).map(([taskId, info]) => ({
|
||||
task_id: taskId,
|
||||
step: info.step,
|
||||
detail: info.detail,
|
||||
elapsed_ms: now - info.startedAt.getTime(),
|
||||
}));
|
||||
}
|
||||
|
||||
/**
|
||||
* Initialize stealth systems (proxy rotation, fingerprints)
|
||||
* Called LAZILY on first task claim attempt (NOT at worker startup).
|
||||
@@ -635,7 +712,7 @@ export class TaskWorker {
|
||||
}
|
||||
|
||||
/**
|
||||
* Send heartbeat to registry with resource usage and proxy location
|
||||
* Send heartbeat to registry with resource usage, proxy location, and step info
|
||||
*/
|
||||
private async sendRegistryHeartbeat(): Promise<void> {
|
||||
try {
|
||||
@@ -647,6 +724,9 @@ export class TaskWorker {
|
||||
// Get array of active task IDs
|
||||
const activeTaskIds = Array.from(this.activeTasks.keys());
|
||||
|
||||
// Get step info for all active tasks
|
||||
const taskSteps = this.getTaskStepsInfo();
|
||||
|
||||
await fetch(`${API_BASE_URL}/api/worker-registry/heartbeat`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
@@ -657,6 +737,11 @@ export class TaskWorker {
|
||||
active_task_count: this.activeTasks.size,
|
||||
max_concurrent_tasks: this.maxConcurrentTasks,
|
||||
status: this.activeTasks.size > 0 ? 'active' : 'idle',
|
||||
// Step tracking for dashboard visibility
|
||||
current_step: this.currentStep,
|
||||
current_step_detail: this.currentStepDetail,
|
||||
current_step_started_at: this.currentStepStartedAt?.toISOString() || null,
|
||||
task_steps: taskSteps, // Per-task step info for concurrent workers
|
||||
resources: {
|
||||
memory_mb: Math.round(memUsage.heapUsed / 1024 / 1024),
|
||||
memory_total_mb: Math.round(memUsage.heapTotal / 1024 / 1024),
|
||||
@@ -915,7 +1000,7 @@ export class TaskWorker {
|
||||
throw new Error(`No handler registered for role: ${task.role}`);
|
||||
}
|
||||
|
||||
// Create context
|
||||
// Create context with step tracking
|
||||
const ctx: TaskContext = {
|
||||
pool: this.pool,
|
||||
workerId: this.workerId,
|
||||
@@ -924,12 +1009,21 @@ export class TaskWorker {
|
||||
await taskService.heartbeat(task.id);
|
||||
},
|
||||
crawlRotator: this.crawlRotator,
|
||||
updateStep: (step: string, detail?: string) => {
|
||||
this.updateTaskStep(task.id, step, detail);
|
||||
},
|
||||
};
|
||||
|
||||
// Initialize step tracking for this task
|
||||
this.updateTaskStep(task.id, 'starting', `Initializing ${task.role}`);
|
||||
|
||||
// Execute the task
|
||||
const result = await handler(ctx);
|
||||
|
||||
if (result.success) {
|
||||
// Clear step tracking
|
||||
this.clearTaskStep(task.id);
|
||||
|
||||
// Mark as completed
|
||||
await taskService.completeTask(task.id, result);
|
||||
await this.reportTaskCompletion(true);
|
||||
@@ -945,12 +1039,18 @@ export class TaskWorker {
|
||||
console.log(`[TaskWorker] Chained new task ${chainedTask.id} (${chainedTask.role})`);
|
||||
}
|
||||
} else {
|
||||
// Clear step tracking
|
||||
this.clearTaskStep(task.id);
|
||||
|
||||
// Mark as failed
|
||||
await taskService.failTask(task.id, result.error || 'Unknown error');
|
||||
await this.reportTaskCompletion(false);
|
||||
console.log(`[TaskWorker] ${this.friendlyName} failed task ${task.id}: ${result.error}`);
|
||||
}
|
||||
} catch (error: any) {
|
||||
// Clear step tracking
|
||||
this.clearTaskStep(task.id);
|
||||
|
||||
// Mark as failed
|
||||
await taskService.failTask(task.id, error.message);
|
||||
await this.reportTaskCompletion(false);
|
||||
|
||||
@@ -3051,7 +3051,7 @@ class ApiClient {
|
||||
}
|
||||
|
||||
async runTaskScheduleNow(id: number) {
|
||||
return this.request<{ success: boolean; message: string; task: any }>(`/api/tasks/schedules/${id}/run-now`, {
|
||||
return this.request<{ success: boolean; message: string; tasksCreated?: number; stateCode?: string }>(`/api/tasks/schedules/${id}/run-now`, {
|
||||
method: 'POST',
|
||||
});
|
||||
}
|
||||
|
||||
@@ -718,6 +718,7 @@ export default function TasksDashboard() {
|
||||
const [selectedSchedules, setSelectedSchedules] = useState<Set<number>>(new Set());
|
||||
const [editingSchedule, setEditingSchedule] = useState<TaskSchedule | null>(null);
|
||||
const [showScheduleModal, setShowScheduleModal] = useState(false);
|
||||
const [runningScheduleId, setRunningScheduleId] = useState<number | null>(null);
|
||||
|
||||
// Pagination
|
||||
const [page, setPage] = useState(0);
|
||||
@@ -812,13 +813,17 @@ export default function TasksDashboard() {
|
||||
};
|
||||
|
||||
const handleRunScheduleNow = async (scheduleId: number) => {
|
||||
if (runningScheduleId !== null) return; // Prevent duplicate clicks
|
||||
setRunningScheduleId(scheduleId);
|
||||
try {
|
||||
const result = await api.runTaskScheduleNow(scheduleId);
|
||||
alert(result.message);
|
||||
const result = await api.runTaskScheduleNow(scheduleId) as { success: boolean; message: string; tasksCreated?: number };
|
||||
alert(result.message + (result.tasksCreated ? ` (${result.tasksCreated} tasks created)` : ''));
|
||||
fetchData();
|
||||
} catch (err: any) {
|
||||
console.error('Run schedule error:', err);
|
||||
alert(err.response?.data?.error || 'Failed to run schedule');
|
||||
} finally {
|
||||
setRunningScheduleId(null);
|
||||
}
|
||||
};
|
||||
|
||||
@@ -1046,18 +1051,18 @@ export default function TasksDashboard() {
|
||||
{formatDuration(metric.avg_duration_sec)}
|
||||
</td>
|
||||
<td className="px-4 py-3 text-sm text-right text-gray-600">
|
||||
{metric.tasks_per_worker_hour?.toFixed(1) || '-'}
|
||||
{metric.tasks_per_worker_hour ? Number(metric.tasks_per_worker_hour).toFixed(1) : '-'}
|
||||
</td>
|
||||
<td className="px-4 py-3 text-sm text-right">
|
||||
{metric.estimated_hours_to_drain ? (
|
||||
<span
|
||||
className={
|
||||
metric.estimated_hours_to_drain > 4
|
||||
Number(metric.estimated_hours_to_drain) > 4
|
||||
? 'text-red-600 font-medium'
|
||||
: 'text-gray-600'
|
||||
}
|
||||
>
|
||||
{metric.estimated_hours_to_drain.toFixed(1)}h
|
||||
{Number(metric.estimated_hours_to_drain).toFixed(1)}h
|
||||
</span>
|
||||
) : (
|
||||
'-'
|
||||
@@ -1257,10 +1262,17 @@ export default function TasksDashboard() {
|
||||
<div className="flex items-center gap-1">
|
||||
<button
|
||||
onClick={() => handleRunScheduleNow(schedule.id)}
|
||||
className="p-1.5 text-gray-400 hover:text-emerald-600 hover:bg-emerald-50 rounded transition-colors"
|
||||
title="Run now"
|
||||
disabled={runningScheduleId !== null}
|
||||
className={`p-1.5 rounded transition-colors ${
|
||||
runningScheduleId === schedule.id
|
||||
? 'text-emerald-600 bg-emerald-50 cursor-wait'
|
||||
: runningScheduleId !== null
|
||||
? 'text-gray-300 cursor-not-allowed'
|
||||
: 'text-gray-400 hover:text-emerald-600 hover:bg-emerald-50'
|
||||
}`}
|
||||
title={runningScheduleId === schedule.id ? 'Running...' : 'Run now'}
|
||||
>
|
||||
<PlayCircle className="w-4 h-4" />
|
||||
<PlayCircle className={`w-4 h-4 ${runningScheduleId === schedule.id ? 'animate-pulse' : ''}`} />
|
||||
</button>
|
||||
<button
|
||||
onClick={() => handleToggleSchedule(schedule.id)}
|
||||
|
||||
@@ -78,6 +78,16 @@ interface Worker {
|
||||
timezone?: string;
|
||||
isRotating?: boolean;
|
||||
};
|
||||
// Step tracking
|
||||
current_step?: string;
|
||||
current_step_detail?: string;
|
||||
current_step_started_at?: string;
|
||||
task_steps?: Array<{
|
||||
task_id: number;
|
||||
step: string;
|
||||
detail: string | null;
|
||||
elapsed_ms: number;
|
||||
}>;
|
||||
} | null;
|
||||
}
|
||||
|
||||
@@ -87,11 +97,24 @@ interface Task {
|
||||
role: string;
|
||||
dispensary_id: number | null;
|
||||
dispensary_name?: string;
|
||||
dispensary_slug?: string;
|
||||
status: string;
|
||||
priority: number;
|
||||
started_at: string | null;
|
||||
completed_at: string | null;
|
||||
claimed_by: string | null;
|
||||
worker_id: string | null;
|
||||
error_message?: string | null;
|
||||
result?: {
|
||||
success?: boolean;
|
||||
productsProcessed?: number;
|
||||
snapshotsCreated?: number;
|
||||
newProducts?: number;
|
||||
updatedProducts?: number;
|
||||
storesDiscovered?: number;
|
||||
markedOos?: number;
|
||||
error?: string;
|
||||
} | null;
|
||||
}
|
||||
|
||||
function formatRelativeTime(dateStr: string | null): string {
|
||||
@@ -349,7 +372,59 @@ function TransportBadge({ worker }: { worker: Worker }) {
|
||||
);
|
||||
}
|
||||
|
||||
// Task count badge showing active/max concurrent tasks
|
||||
// Step badge showing current step with detail
|
||||
function StepBadge({ worker }: { worker: Worker }) {
|
||||
const step = worker.metadata?.current_step;
|
||||
const detail = worker.metadata?.current_step_detail;
|
||||
const startedAt = worker.metadata?.current_step_started_at;
|
||||
const taskSteps = worker.metadata?.task_steps;
|
||||
|
||||
if (!step || step === 'idle') {
|
||||
return null;
|
||||
}
|
||||
|
||||
// Calculate elapsed time
|
||||
let elapsedStr = '';
|
||||
if (startedAt) {
|
||||
const elapsed = Date.now() - new Date(startedAt).getTime();
|
||||
if (elapsed < 60000) {
|
||||
elapsedStr = `${Math.round(elapsed / 1000)}s`;
|
||||
} else {
|
||||
elapsedStr = `${Math.round(elapsed / 60000)}m`;
|
||||
}
|
||||
}
|
||||
|
||||
// Step colors
|
||||
const getStepColor = (s: string) => {
|
||||
if (s.includes('preflight')) return 'text-yellow-600 bg-yellow-50';
|
||||
if (s.includes('loading') || s.includes('navigating')) return 'text-blue-600 bg-blue-50';
|
||||
if (s.includes('processing') || s.includes('normalizing')) return 'text-purple-600 bg-purple-50';
|
||||
if (s.includes('saving') || s.includes('upserting')) return 'text-emerald-600 bg-emerald-50';
|
||||
if (s.includes('error') || s.includes('failed')) return 'text-red-600 bg-red-50';
|
||||
return 'text-gray-600 bg-gray-50';
|
||||
};
|
||||
|
||||
const colorClass = getStepColor(step);
|
||||
|
||||
// Build tooltip with all task steps if concurrent
|
||||
const tooltipLines = taskSteps?.map(ts =>
|
||||
`Task #${ts.task_id}: ${ts.step}${ts.detail ? ` - ${ts.detail}` : ''} (${Math.round(ts.elapsed_ms / 1000)}s)`
|
||||
) || [];
|
||||
|
||||
return (
|
||||
<div
|
||||
className={`inline-flex items-center gap-1.5 px-2 py-1 rounded text-xs font-medium ${colorClass}`}
|
||||
title={tooltipLines.length > 0 ? tooltipLines.join('\n') : undefined}
|
||||
>
|
||||
<span className="animate-pulse">●</span>
|
||||
<span className="font-semibold">{step}</span>
|
||||
{detail && <span className="text-gray-500 truncate max-w-[120px]">- {detail}</span>}
|
||||
{elapsedStr && <span className="text-gray-400">({elapsedStr})</span>}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
// Task count badge showing active/max concurrent tasks with task details
|
||||
function TaskCountBadge({ worker, tasks }: { worker: Worker; tasks: Task[] }) {
|
||||
const activeCount = worker.active_task_count ?? (worker.current_task_id ? 1 : 0);
|
||||
const maxCount = worker.max_concurrent_tasks ?? 1;
|
||||
@@ -359,20 +434,34 @@ function TaskCountBadge({ worker, tasks }: { worker: Worker; tasks: Task[] }) {
|
||||
return <span className="text-gray-400 text-sm">Idle</span>;
|
||||
}
|
||||
|
||||
// Get task names for tooltip
|
||||
const taskNames = taskIds.map(id => {
|
||||
const task = tasks.find(t => t.id === id);
|
||||
return task ? `#${id}: ${task.role}${task.dispensary_name ? ` (${task.dispensary_name})` : ''}` : `#${id}`;
|
||||
}).join('\n');
|
||||
// Get task details for display
|
||||
const activeTasks = taskIds.map(id => tasks.find(t => t.id === id)).filter(Boolean) as Task[];
|
||||
|
||||
// Build tooltip with full details
|
||||
const tooltipLines = activeTasks.map(task =>
|
||||
`#${task.id}: ${task.role}${task.dispensary_name ? ` - ${task.dispensary_name}` : ''}`
|
||||
);
|
||||
|
||||
// Show first task details inline
|
||||
const firstTask = activeTasks[0];
|
||||
const roleLabel = firstTask?.role?.replace(/_/g, ' ') || 'task';
|
||||
const storeName = firstTask?.dispensary_name;
|
||||
|
||||
return (
|
||||
<div className="flex items-center gap-2" title={taskNames}>
|
||||
<div className="flex flex-col gap-0.5" title={tooltipLines.join('\n')}>
|
||||
<span className="text-sm font-medium text-blue-600">
|
||||
{activeCount}/{maxCount} tasks
|
||||
{activeCount}/{maxCount} active
|
||||
</span>
|
||||
{firstTask && (
|
||||
<span className="text-xs text-gray-500 truncate max-w-[140px]">
|
||||
{roleLabel}{storeName ? `: ${storeName}` : ''}
|
||||
</span>
|
||||
{taskIds.length === 1 && (
|
||||
<span className="text-xs text-gray-500">#{taskIds[0]}</span>
|
||||
)}
|
||||
{activeTasks.length > 1 && (
|
||||
<span className="text-xs text-gray-400">+{activeTasks.length - 1} more</span>
|
||||
)}
|
||||
{/* Show current step */}
|
||||
<StepBadge worker={worker} />
|
||||
</div>
|
||||
);
|
||||
}
|
||||
@@ -507,6 +596,175 @@ function groupWorkersByPod(workers: Worker[]): Map<string, Worker[]> {
|
||||
return pods;
|
||||
}
|
||||
|
||||
// Calculate task duration in seconds
|
||||
function getTaskDuration(task: Task): number | null {
|
||||
if (!task.started_at) return null;
|
||||
const start = new Date(task.started_at);
|
||||
const end = task.completed_at ? new Date(task.completed_at) : new Date();
|
||||
return Math.round((end.getTime() - start.getTime()) / 1000);
|
||||
}
|
||||
|
||||
// Format duration for display
|
||||
function formatTaskDuration(seconds: number | null): string {
|
||||
if (seconds === null) return '-';
|
||||
if (seconds < 60) return `${seconds}s`;
|
||||
const mins = Math.floor(seconds / 60);
|
||||
const secs = seconds % 60;
|
||||
if (mins < 60) return `${mins}m ${secs}s`;
|
||||
const hrs = Math.floor(mins / 60);
|
||||
return `${hrs}h ${mins % 60}m`;
|
||||
}
|
||||
|
||||
// Get friendly worker name from worker_id
|
||||
function getWorkerShortName(workerId: string | null): string {
|
||||
if (!workerId) return 'Unknown';
|
||||
// Extract last part after the hash (e.g., "scraper-worker-75b8b9b5c9-46p4j" -> "46p4j")
|
||||
const parts = workerId.split('-');
|
||||
return parts[parts.length - 1] || workerId.slice(-8);
|
||||
}
|
||||
|
||||
// Live Activity Panel - shows recent task completions and failures
|
||||
function LiveActivityPanel({
|
||||
recentTasks,
|
||||
runningTasks
|
||||
}: {
|
||||
recentTasks: Task[];
|
||||
runningTasks: Task[];
|
||||
}) {
|
||||
// Combine running and recent completed/failed, sort by most recent activity
|
||||
const allActivity = [
|
||||
...runningTasks.map(t => ({ ...t, activityType: 'running' as const })),
|
||||
...recentTasks.map(t => ({ ...t, activityType: t.status as 'completed' | 'failed' })),
|
||||
].sort((a, b) => {
|
||||
const aTime = a.activityType === 'running' ? a.started_at : a.completed_at;
|
||||
const bTime = b.activityType === 'running' ? b.started_at : b.completed_at;
|
||||
if (!aTime || !bTime) return 0;
|
||||
return new Date(bTime).getTime() - new Date(aTime).getTime();
|
||||
}).slice(0, 15); // Show max 15 items
|
||||
|
||||
const getRoleIcon = (role: string) => {
|
||||
switch (role) {
|
||||
case 'product_refresh': return '🔄';
|
||||
case 'product_discovery': return '🔍';
|
||||
case 'store_discovery': return '🏪';
|
||||
case 'entry_point_discovery': return '🎯';
|
||||
case 'analytics_refresh': return '📊';
|
||||
default: return '📋';
|
||||
}
|
||||
};
|
||||
|
||||
const getStatusConfig = (status: string) => {
|
||||
switch (status) {
|
||||
case 'running':
|
||||
return { bg: 'bg-blue-50', border: 'border-blue-200', icon: '🔵', text: 'text-blue-700' };
|
||||
case 'completed':
|
||||
return { bg: 'bg-emerald-50', border: 'border-emerald-200', icon: '🟢', text: 'text-emerald-700' };
|
||||
case 'failed':
|
||||
return { bg: 'bg-red-50', border: 'border-red-200', icon: '🔴', text: 'text-red-700' };
|
||||
default:
|
||||
return { bg: 'bg-gray-50', border: 'border-gray-200', icon: '⚪', text: 'text-gray-700' };
|
||||
}
|
||||
};
|
||||
|
||||
const getResultSummary = (task: Task): string => {
|
||||
if (!task.result) return '';
|
||||
const parts: string[] = [];
|
||||
if (task.result.productsProcessed) parts.push(`${task.result.productsProcessed} products`);
|
||||
if (task.result.newProducts) parts.push(`${task.result.newProducts} new`);
|
||||
if (task.result.storesDiscovered) parts.push(`${task.result.storesDiscovered} stores`);
|
||||
if (task.result.markedOos && task.result.markedOos > 0) parts.push(`${task.result.markedOos} OOS`);
|
||||
return parts.length > 0 ? ` - ${parts.join(', ')}` : '';
|
||||
};
|
||||
|
||||
return (
|
||||
<div className="bg-white rounded-lg border border-gray-200 overflow-hidden">
|
||||
<div className="px-4 py-3 border-b border-gray-200 bg-gray-50">
|
||||
<div className="flex items-center justify-between">
|
||||
<h3 className="text-sm font-semibold text-gray-900 flex items-center gap-2">
|
||||
<Activity className="w-4 h-4 text-blue-500" />
|
||||
Live Activity
|
||||
</h3>
|
||||
<span className="text-xs text-gray-500">
|
||||
{runningTasks.length} running, {recentTasks.length} recent
|
||||
</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div className="divide-y divide-gray-100 max-h-[400px] overflow-y-auto">
|
||||
{allActivity.length === 0 ? (
|
||||
<div className="px-4 py-8 text-center text-gray-500">
|
||||
<Activity className="w-8 h-8 mx-auto mb-2 text-gray-300" />
|
||||
<p className="text-sm">No recent activity</p>
|
||||
</div>
|
||||
) : (
|
||||
allActivity.map((task) => {
|
||||
const config = getStatusConfig(task.activityType);
|
||||
const duration = getTaskDuration(task);
|
||||
const workerName = getWorkerShortName(task.worker_id);
|
||||
|
||||
return (
|
||||
<div
|
||||
key={`${task.id}-${task.activityType}`}
|
||||
className={`px-4 py-3 ${config.bg} ${task.activityType === 'running' ? 'animate-pulse' : ''}`}
|
||||
>
|
||||
<div className="flex items-start gap-3">
|
||||
<span className="text-lg flex-shrink-0">{config.icon}</span>
|
||||
<div className="flex-1 min-w-0">
|
||||
<div className="flex items-center gap-2 flex-wrap">
|
||||
<span className="text-xs font-medium text-gray-500 bg-gray-100 px-1.5 py-0.5 rounded">
|
||||
{workerName}
|
||||
</span>
|
||||
<span className={`text-sm font-medium ${config.text}`}>
|
||||
{task.activityType === 'running' ? 'working on' : task.activityType}
|
||||
</span>
|
||||
<span className="text-sm text-gray-700">
|
||||
{getRoleIcon(task.role)} {task.role.replace(/_/g, ' ')}
|
||||
</span>
|
||||
</div>
|
||||
|
||||
{task.dispensary_name && (
|
||||
<p className="text-sm text-gray-900 font-medium mt-1 truncate">
|
||||
{task.dispensary_name}
|
||||
</p>
|
||||
)}
|
||||
|
||||
<div className="flex items-center gap-3 mt-1 text-xs text-gray-500">
|
||||
{duration !== null && (
|
||||
<span className="flex items-center gap-1">
|
||||
<Timer className="w-3 h-3" />
|
||||
{formatTaskDuration(duration)}
|
||||
</span>
|
||||
)}
|
||||
{task.activityType === 'completed' && task.result && (
|
||||
<span className="text-emerald-600 font-medium">
|
||||
{getResultSummary(task)}
|
||||
</span>
|
||||
)}
|
||||
{task.activityType === 'failed' && task.error_message && (
|
||||
<span className="text-red-600 truncate max-w-[200px]" title={task.error_message}>
|
||||
{task.error_message.slice(0, 50)}...
|
||||
</span>
|
||||
)}
|
||||
{task.completed_at && (
|
||||
<span className="text-gray-400">
|
||||
{formatRelativeTime(task.completed_at)}
|
||||
</span>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
<span className="text-xs text-gray-400 flex-shrink-0">
|
||||
#{task.id}
|
||||
</span>
|
||||
</div>
|
||||
</div>
|
||||
);
|
||||
})
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
// Format estimated time remaining
|
||||
function formatEstimatedTime(hours: number): string {
|
||||
if (hours < 1) {
|
||||
@@ -524,7 +782,8 @@ function formatEstimatedTime(hours: number): string {
|
||||
|
||||
export function WorkersDashboard() {
|
||||
const [workers, setWorkers] = useState<Worker[]>([]);
|
||||
const [tasks, setTasks] = useState<Task[]>([]);
|
||||
const [tasks, setTasks] = useState<Task[]>([]); // Running tasks
|
||||
const [recentTasks, setRecentTasks] = useState<Task[]>([]); // Recent completed/failed
|
||||
const [pendingTaskCount, setPendingTaskCount] = useState<number>(0);
|
||||
const [loading, setLoading] = useState(true);
|
||||
const [error, setError] = useState<string | null>(null);
|
||||
@@ -538,15 +797,28 @@ export function WorkersDashboard() {
|
||||
|
||||
const fetchData = useCallback(async () => {
|
||||
try {
|
||||
// Fetch workers from registry, running tasks, and task counts
|
||||
const [workersRes, tasksRes, countsRes] = await Promise.all([
|
||||
// Fetch workers from registry, running tasks, recent tasks, and task counts
|
||||
const [workersRes, tasksRes, recentCompletedRes, recentFailedRes, countsRes] = await Promise.all([
|
||||
api.get('/api/worker-registry/workers'),
|
||||
api.get('/api/tasks?status=running&limit=100'),
|
||||
api.get('/api/tasks?status=completed&limit=10'),
|
||||
api.get('/api/tasks?status=failed&limit=5'),
|
||||
api.get('/api/tasks/counts'),
|
||||
]);
|
||||
|
||||
setWorkers(workersRes.data.workers || []);
|
||||
setTasks(tasksRes.data.tasks || []);
|
||||
|
||||
// Combine recent completed and failed, sort by completion time
|
||||
const recentCompleted = recentCompletedRes.data.tasks || [];
|
||||
const recentFailed = recentFailedRes.data.tasks || [];
|
||||
const combined = [...recentCompleted, ...recentFailed].sort((a, b) => {
|
||||
const aTime = a.completed_at ? new Date(a.completed_at).getTime() : 0;
|
||||
const bTime = b.completed_at ? new Date(b.completed_at).getTime() : 0;
|
||||
return bTime - aTime;
|
||||
});
|
||||
setRecentTasks(combined.slice(0, 15));
|
||||
|
||||
setPendingTaskCount(countsRes.data?.pending || 0);
|
||||
setError(null);
|
||||
} catch (err: any) {
|
||||
@@ -764,8 +1036,15 @@ export function WorkersDashboard() {
|
||||
);
|
||||
})()}
|
||||
|
||||
{/* Worker Pods Visualization */}
|
||||
<div className="bg-white rounded-lg border border-gray-200 overflow-hidden">
|
||||
{/* Two Column Layout: Live Activity + Worker Pods */}
|
||||
<div className="grid grid-cols-1 lg:grid-cols-3 gap-6">
|
||||
{/* Live Activity Panel - Takes 1/3 width on large screens */}
|
||||
<div className="lg:col-span-1">
|
||||
<LiveActivityPanel recentTasks={recentTasks} runningTasks={tasks} />
|
||||
</div>
|
||||
|
||||
{/* Worker Pods Visualization - Takes 2/3 width on large screens */}
|
||||
<div className="lg:col-span-2 bg-white rounded-lg border border-gray-200 overflow-hidden">
|
||||
<div className="px-4 py-3 border-b border-gray-200 bg-gray-50">
|
||||
<div className="flex items-center justify-between">
|
||||
<div>
|
||||
@@ -912,6 +1191,7 @@ export function WorkersDashboard() {
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Workers Table */}
|
||||
<div className="bg-white rounded-lg border border-gray-200 overflow-hidden">
|
||||
|
||||
@@ -12,7 +12,10 @@ metadata:
|
||||
name: scraper-worker
|
||||
namespace: dispensary-scraper
|
||||
spec:
|
||||
replicas: 25
|
||||
# MAX 8 PODS - See CLAUDE.md rule #6
|
||||
# Each pod runs up to MAX_CONCURRENT_TASKS browsers (~400MB each)
|
||||
# Scale pods for throughput, not concurrent tasks per pod
|
||||
replicas: 8
|
||||
selector:
|
||||
matchLabels:
|
||||
app: scraper-worker
|
||||
@@ -44,6 +47,10 @@ spec:
|
||||
value: "http://scraper"
|
||||
- name: NODE_OPTIONS
|
||||
value: "--max-old-space-size=1500"
|
||||
# Browser memory limits - see docs/WORKER_TASK_ARCHITECTURE.md
|
||||
# 3 browsers × ~400MB = ~1.3GB (safe for 2GB pod limit)
|
||||
- name: MAX_CONCURRENT_TASKS
|
||||
value: "3"
|
||||
resources:
|
||||
requests:
|
||||
memory: "1Gi"
|
||||
@@ -61,169 +68,3 @@ spec:
|
||||
periodSeconds: 30
|
||||
failureThreshold: 3
|
||||
terminationGracePeriodSeconds: 60
|
||||
---
|
||||
# =============================================================================
|
||||
# ALTERNATIVE: StatefulSet with multiple workers per pod (not currently used)
|
||||
# =============================================================================
|
||||
# Task Worker Pods (StatefulSet)
|
||||
# Each pod runs 5 role-agnostic workers that pull tasks from worker_tasks queue.
|
||||
#
|
||||
# Architecture:
|
||||
# - Pods are named from a predefined list (Aethelgard, Xylos, etc.)
|
||||
# - Each pod spawns 5 worker processes
|
||||
# - Workers register with API and show their pod name
|
||||
# - HPA scales pods 5-15 based on pending task count
|
||||
# - Workers use DB-level locking (FOR UPDATE SKIP LOCKED) to prevent conflicts
|
||||
#
|
||||
# Pod Names (up to 25):
|
||||
# Aethelgard, Xylos, Kryll, Coriolis, Dimidium, Veridia, Zetani, Talos IV,
|
||||
# Onyx, Celestia, Gormand, Betha, Ragnar, Syphon, Axiom, Nadir, Terra Nova,
|
||||
# Acheron, Nexus, Vespera, Helios Prime, Oasis, Mordina, Cygnus, Umbra
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: pod-names
|
||||
namespace: dispensary-scraper
|
||||
data:
|
||||
names: |
|
||||
Aethelgard
|
||||
Xylos
|
||||
Kryll
|
||||
Coriolis
|
||||
Dimidium
|
||||
Veridia
|
||||
Zetani
|
||||
Talos IV
|
||||
Onyx
|
||||
Celestia
|
||||
Gormand
|
||||
Betha
|
||||
Ragnar
|
||||
Syphon
|
||||
Axiom
|
||||
Nadir
|
||||
Terra Nova
|
||||
Acheron
|
||||
Nexus
|
||||
Vespera
|
||||
Helios Prime
|
||||
Oasis
|
||||
Mordina
|
||||
Cygnus
|
||||
Umbra
|
||||
---
|
||||
apiVersion: apps/v1
|
||||
kind: StatefulSet
|
||||
metadata:
|
||||
name: worker-pod
|
||||
namespace: dispensary-scraper
|
||||
spec:
|
||||
serviceName: worker-pods
|
||||
replicas: 5
|
||||
podManagementPolicy: Parallel
|
||||
selector:
|
||||
matchLabels:
|
||||
app: worker-pod
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: worker-pod
|
||||
spec:
|
||||
imagePullSecrets:
|
||||
- name: regcred
|
||||
containers:
|
||||
- name: workers
|
||||
image: code.cannabrands.app/creationshop/dispensary-scraper:latest
|
||||
# Run 5 workers per pod
|
||||
command: ["/bin/sh", "-c"]
|
||||
args:
|
||||
- |
|
||||
# Get pod ordinal (0, 1, 2, etc.)
|
||||
ORDINAL=$(echo $HOSTNAME | rev | cut -d'-' -f1 | rev)
|
||||
# Get pod name from configmap
|
||||
POD_NAME=$(sed -n "$((ORDINAL + 1))p" /etc/pod-names/names)
|
||||
echo "Starting pod: $POD_NAME (ordinal: $ORDINAL)"
|
||||
|
||||
# Start 5 workers in this pod
|
||||
for i in 1 2 3 4 5; do
|
||||
WORKER_ID="${POD_NAME}-worker-${i}" \
|
||||
POD_NAME="$POD_NAME" \
|
||||
node dist/tasks/task-worker.js &
|
||||
done
|
||||
|
||||
# Wait for all workers
|
||||
wait
|
||||
envFrom:
|
||||
- configMapRef:
|
||||
name: scraper-config
|
||||
- secretRef:
|
||||
name: scraper-secrets
|
||||
env:
|
||||
- name: API_BASE_URL
|
||||
value: "http://scraper:3010"
|
||||
- name: WORKERS_PER_POD
|
||||
value: "5"
|
||||
volumeMounts:
|
||||
- name: pod-names
|
||||
mountPath: /etc/pod-names
|
||||
resources:
|
||||
requests:
|
||||
memory: "512Mi"
|
||||
cpu: "200m"
|
||||
limits:
|
||||
memory: "1Gi"
|
||||
cpu: "1000m"
|
||||
livenessProbe:
|
||||
exec:
|
||||
command:
|
||||
- /bin/sh
|
||||
- -c
|
||||
- "pgrep -f 'task-worker' > /dev/null"
|
||||
initialDelaySeconds: 15
|
||||
periodSeconds: 30
|
||||
failureThreshold: 3
|
||||
volumes:
|
||||
- name: pod-names
|
||||
configMap:
|
||||
name: pod-names
|
||||
terminationGracePeriodSeconds: 60
|
||||
---
|
||||
# Headless service for StatefulSet
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: worker-pods
|
||||
namespace: dispensary-scraper
|
||||
spec:
|
||||
clusterIP: None
|
||||
selector:
|
||||
app: worker-pod
|
||||
ports:
|
||||
- port: 80
|
||||
name: placeholder
|
||||
---
|
||||
# HPA to scale pods based on pending tasks
|
||||
apiVersion: autoscaling/v2
|
||||
kind: HorizontalPodAutoscaler
|
||||
metadata:
|
||||
name: worker-pod-hpa
|
||||
namespace: dispensary-scraper
|
||||
spec:
|
||||
scaleTargetRef:
|
||||
apiVersion: apps/v1
|
||||
kind: StatefulSet
|
||||
name: worker-pod
|
||||
minReplicas: 5
|
||||
maxReplicas: 15
|
||||
metrics:
|
||||
- type: External
|
||||
external:
|
||||
metric:
|
||||
name: pending_tasks
|
||||
selector:
|
||||
matchLabels:
|
||||
queue: worker_tasks
|
||||
target:
|
||||
type: AverageValue
|
||||
averageValue: "10"
|
||||
|
||||
Reference in New Issue
Block a user