- Fix Run Now to prevent duplicate task creation
- Add loading state to Run Now button in UI
- Return early when no stores need refresh
- Worker dashboard improvements
- Browser pooling architecture updates
- K8s worker config updates (8 replicas, 3 concurrent tasks)
- Update payload-storage.ts to use MinIO when configured
- Payloads stored at: cannaiq/payloads/{year}/{month}/{day}/store_{id}_{ts}.json.gz
- Falls back to local filesystem when MINIO_* env vars not set
- Enables shared storage across all worker pods
- Fixes ephemeral storage issue where payloads were lost on pod restart
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change MAX_CONCURRENT_TASKS default from 3 to 15
- Add CLAUDE.md rule requiring explicit permission before kubectl commands
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Changes
- **Migration 089**: Add is_immutable and method columns to task_schedules
- Per-state product_discovery schedules (4h default)
- Store discovery weekly (168h)
- All schedules use HTTP transport (Puppeteer/browser)
- **Task Scheduler**: HTTP-only product discovery with per-state scheduling
- Each state has its own immutable schedule
- Schedules can be edited (interval/priority) but not deleted
- **TasksDashboard UI**: Full immutability support
- Lock icon for immutable schedules
- State and Method columns in schedules table
- Disabled delete for immutable, restricted edit fields
- **Store Discovery HTTP**: Auto-queue product_discovery for new stores
- **Migration 088**: Discovery payloads storage schema
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Rename product-discovery.ts to product-discovery-curl.ts (axios-based)
- Rename payload-fetch.ts to payload-fetch-curl.ts
- Add product-discovery-http.ts (Puppeteer browser-based handler)
- Add method field to CreateTaskParams for transport selection
- Update task-service to insert method column on task creation
- Update task-worker with getHandlerForTask() for dual transport routing
- product_refresh now queues upstream tasks when no payload exists:
- Has platform_dispensary_id → queues product_discovery (http)
- No platform_dispensary_id → queues entry_point_discovery
This enables HTTP workers to pick up browser-based tasks while curl
workers handle axios-based tasks, and prevents product_refresh from
failing repeatedly when no crawl has been performed.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- CrawlRotator.getProxyUrl() now converts non-standard format (http://host:port:user:pass) to standard format (http://user:pass@host:port)
- Simplify puppeteer preflight to only use ipify.org for IP verification (much lighter than fingerprint.com)
- Remove heavy anti-detect site tests from preflight - not needed, trust stealth plugin
- Fixes 503 errors when using session-based residential proxies
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add schedule CRUD endpoints to /api/tasks/schedules
- Add Schedules section to TasksDashboard with edit/delete/bulk actions
- Deprecate job_schedules table (entries disabled in DB)
- Mark CrawlSchedulePage as deprecated (removed from menu)
- Add deprecation comments to legacy schedule methods in api.ts
- Add migration comments to workers.ts explaining consolidation
Key changes:
- Schedule management now at /admin/tasks instead of /admin/schedule
- task_schedules uses interval_hours (simpler than base_interval_minutes + jitter)
- All schedule routes placed before /:id to avoid Express route conflicts
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix update_worker_preflight call to use correct parameter order:
(worker_id, transport, status, ip, response_ms, error, fingerprint)
- Add proxyIp to both curl and http preflight reports
- Add fingerprint JSONB with timezone, location, and bot detection data
- Log HTTP IP and timezone after preflight completes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add IP geolocation lookup via ip-api.com to get timezone from proxy IP
- Use ipify.org API for reliable proxy IP detection (replaces unreliable fingerprint.com scraping)
- Set browser timezone via CDP Emulation.setTimezoneOverride to match proxy location
- Add detectedTimezone and detectedLocation to preflight result
- Add /api/worker-registry/preflight-test endpoint for smoke testing
Fixes timezone mismatch where browser showed America/Phoenix while proxy was in America/New_York
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Query API:
- GET /api/payloads/store/:id/query - Filter products with flexible params
(brand, category, price_min/max, thc_min/max, search, sort, pagination)
- GET /api/payloads/store/:id/aggregate - Group by brand/category with metrics
(count, avg_price, min_price, max_price, avg_thc, in_stock_count)
- Documentation at docs/QUERY_API.md
Trusted Origins Admin:
- GET/POST/PUT/DELETE /api/admin/trusted-origins - Manage auth bypass list
- Trusted IPs, domains, and regex patterns stored in DB
- 5-minute cache with invalidation on admin updates
- Fallback to hardcoded defaults if DB unavailable
- Migration 085 creates table with seed data
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Exposes curl_ip, http_ip, preflight_status, preflight_at, and fingerprint_data
in the /api/worker-registry/workers response.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Moved hydration module back from _deprecated (needed for product_refresh)
- Restored product_refresh handler for processing stored payloads
- Restored geolocation service for findadispo/findagram
- Stubbed system routes that depend on deprecated SyncOrchestrator
- Removed crawler-sandbox route (deprecated)
- Fixed all TypeScript compilation errors
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Workers now run both curl and http (Puppeteer) preflights on startup:
- curl-preflight.ts: Tests axios + proxy via httpbin.org
- puppeteer-preflight.ts: Tests browser + StealthPlugin via fingerprint.com
(with amiunique.org fallback)
- Migration 084: Adds preflight columns to worker_registry and method
column to worker_tasks
- Workers report preflight status, IP, fingerprint, and response time
- Tasks can require specific transport method (curl/http)
- Dashboard shows Transport column with preflight status badges
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove cache_from/cache_to from CI (plugin bug splitting commas)
- Add preflight() method to CrawlRotator - tests proxy + anti-detect
- Add pre-task preflight check - workers MUST pass before executing
- Add releaseTask() to release tasks back to pending on preflight fail
- Rename proxy_test task to whoami for clarity
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add proxy_test task handler that fetches IP via proxy to verify connectivity
- Add discovery_runs migration (083) for tracking store discovery progress
- Register proxy_test in task service and worker
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Re-registration no longer overwrites pod_name with K8s name
- New workers get fantasy name (Aethelgard, Xylos, etc.) as pod_name
- Document worker naming convention in CLAUDE.md
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix memory calculation to use max-old-space-size (1500MB) instead of
V8's dynamic heapTotal. This prevents false 95%+ readings when idle.
- Add yellow color for backing-off workers in pod visualization
- Update legend and tooltips with backing-off status
- Remove pool toggle from TasksDashboard (moved to Workers page)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Workers can now process multiple tasks concurrently (default: 3 max).
Self-regulate based on resource usage - back off at 85% memory or 90% CPU.
Backend changes:
- TaskWorker handles concurrent tasks using async Maps
- Resource monitoring (memory %, CPU %) with backoff logic
- Heartbeat reports active_task_count, max_concurrent_tasks, resource stats
- Decommission support via worker_commands table
Frontend changes:
- Workers Dashboard shows tasks per worker (N/M format)
- Resource badges with color-coded thresholds
- Pod visualization with clickable selection
- Decommission controls per worker
New env vars:
- MAX_CONCURRENT_TASKS (default: 3)
- MEMORY_BACKOFF_THRESHOLD (default: 0.85)
- CPU_BACKOFF_THRESHOLD (default: 0.90)
- BACKOFF_DURATION_MS (default: 10000)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add /api/auth/verify-password endpoint for re-authentication
- Add PasswordConfirmModal component for sensitive actions
- Worker scaling (+/-) now requires password confirmation
- Add RBAC (ServiceAccount, Role, RoleBinding) for scraper pod
- Scraper pod can now read/scale worker deployment via k8s API
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Increase worker replicas from 5 to 25
- Task pool now starts PAUSED on deploy, admin must click Start Pool
- Prevents workers from grabbing tasks before system is ready
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add task-pool-state.ts for shared pause/resume state
- Add /api/tasks/pool/status, pause, resume endpoints
- Add Start/Stop Pool toggle button to TasksDashboard
- Spinner stops when pool is closed
- Fix is_active column name in store-discovery.ts
- Fix missing active column in task-service.ts claimTask
- Auto-refresh every 15 seconds
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change 'active' to 'is_active' in states table query (store-discovery.ts)
- Remove non-existent 'active' column check from worker_tasks query (task-service.ts)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add DispensarySchedule page showing crawl history and upcoming schedule
- Add /dispensaries/:state/:city/:slug/schedule route
- Add API endpoint for store crawl history
- Update View Schedule link to use dispensary-specific route
- Remove colored badges from DispensaryDetail product table (plain text)
- Make Details button ghost style in product table
- Add "Sort by States" option to IntelligenceBrands
- Remove status filter dropdown from Dispensaries page
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The "Active States" metric was arbitrary and confusing. Changed to
show total states count - all states in the system regardless of
whether they have data or not.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Active states should count states with actual product data, not just
states where crawling is enabled. A state can have historical data
even if crawling is currently disabled.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change active_states to count states with crawl_enabled=true dispensaries
- Filter all national summary queries by crawl_enabled=true
- Remove unused StateBadge from National Dashboard header
- StateBadge was showing "Arizona" with no way to change it
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add state filter to Intelligence Brands API and frontend
- Add state filter to Intelligence Pricing API and frontend
- Add state filter to Intelligence Stores API and frontend
- Fix null safety issues with toLocaleString() calls
- Update backend /stores endpoint to return skuCount, snapshotCount, chainName
- Add overall stats to pricing endpoint
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Install curl in Docker container for Dutchie HTTP requests
- Add 'active' column to worker_tasks (default false) to prevent
accidental task execution on startup
- Update task-service to only claim tasks where active=true
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The states dropdown was showing count of ALL dispensaries instead of
just crawl-enabled ones. Now correctly filters to match the actual
stores that would be displayed.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Workers now use PostgreSQL LISTEN/NOTIFY to wake up immediately when proxies are added
- Added trigger on proxies table to NOTIFY 'proxy_added' when active proxy inserted/updated
- Falls back to 30s polling if LISTEN fails
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Task worker now waits up to 60 minutes for active proxies
- Retries every 30 seconds with clear logging
- Updated K8s scraper-worker.yaml with Deployment definition
- Deployment uses task-worker.js entrypoint with correct liveness probe
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add authMiddleware to analytics-v2.ts to require authentication
- Add permanent rule #6 to CLAUDE.md: "ALL API ROUTES REQUIRE AUTHENTICATION"
- Add forbidden action #19: "Creating API routes without authMiddleware"
- Document authentication flow and trusted origins
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When a user logs in and has a Bearer token, use their actual identity
instead of falling back to internal@system. This ensures logged-in
users see their real email in the admin UI.
Order of auth:
1. If Bearer token provided → use JWT/API token (real user identity)
2. If no token → check trusted origins (for API access like WordPress)
3. Otherwise → 401 unauthorized
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>