Was using rpc.evomi.com which doesn't exist.
Residential proxies use rp.evomi.com:1000
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The preflight was only checking DB proxy pool which was empty.
Now checks Evomi API first (when configured), then falls back to DB.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Platform badge now shows green (emerald) for dutchie even when null/undefined
- State badge shows "ALL" (uppercase) with indigo color when no state specified
- Remove "(HTTP transport)" from store discovery description
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Document priority order (Evomi API first, DB fallback)
- List environment variables and defaults
- Show K8s secret location
- Explain proxy URL format with geo targeting
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Check Evomi API availability before waiting for DB proxies
- If EVOMI_USER/EVOMI_PASS configured, proceed immediately
- Only fall back to DB proxy polling if Evomi not configured
- Added clear comments explaining proxy initialization order
This fixes workers getting stuck waiting for DB proxies when
Evomi API is available for on-demand geo-targeted proxies.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add WorkerFingerprint interface with timezone, city, state, ip, locale
- Store fingerprint in TaskWorker after preflight passes
- Pass fingerprint through TaskContext to handlers
- Apply timezone via CDP and locale via Accept-Language header
- Ensures browser fingerprint matches proxy IP location
This fixes anti-detect detection where timezone/locale mismatch
with proxy IP was getting blocked by Cloudflare.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add fetchProductsByStoreIdDirect() for reliable Algolia product fetching
- Update product-discovery-jane to use direct Algolia instead of network interception
- Fix product-refresh handler to support both Dutchie and Jane payloads
- Handle both `products` (Dutchie) and `hits` (Jane) formats
- Use platform-appropriate raw_json structure for normalizers
- Fix consecutive_misses tracking to use correct provider
- Extract product IDs correctly (Dutchie _id vs Jane product_id)
- Add store discovery deduplication (prefer REC over MED at same location)
- Add storeTypes field to DiscoveredStore interface
- Add scripts: run-jane-store-discovery.ts, run-jane-product-discovery.ts
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Workers are now geo-locked to a specific state for their session:
- Session = 60 minutes OR 7 store visits (whichever comes first)
- Workers ONLY claim tasks matching their assigned state
- State assignment prioritizes: most pending tasks, fewest workers
Changes:
- Migration 108: geo session columns, claim_task with geo filter,
assign_worker_geo(), check_worker_geo_session(), worker_state_capacity view
- task-worker.ts: ensureGeoSession() method before task claiming
- worker-registry.ts: /state-capacity and /geo-sessions API endpoints
- WorkersDashboard: Show qualified icon + geo state in Preflight column
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Platform isolation:
- Rename handlers to {task}-{platform}.ts convention
- Deprecate -curl variants (now _deprecated-*)
- Platform-based routing in task-worker.ts
- Add Jane platform handlers and client
Evomi geo-targeting:
- Add dynamic proxy URL builder with state/city targeting
- Session stickiness per worker per state (30 min)
- Fallback to static proxy table when API unavailable
- Add proxy tracking columns to worker_tasks
Proxy management:
- New /proxies admin page for visibility
- Track proxy_ip, proxy_geo, proxy_source per task
- Show active sessions and task history
Validation filtering:
- Filter by validated stores (platform_dispensary_id + menu_url)
- Mark incomplete stores as deprecated
- Update all dashboard/stats queries
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Rename 'store_discovery_dutchie' to 'Store Discovery' (platform badge via platform field)
- Add self-healing: scan for stores missing payloads and queue product_discovery
- Catches stores added before chaining was implemented
- Limits to 50 stores per run to avoid overwhelming the system
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add dispensary_id column to task_schedules table
- Update scheduler to handle single-dispensary schedules
- Update run-now endpoint to handle single-dispensary schedules
- Update frontend modal to pass dispensary_id when 1 store selected
- Fix existing "Deeply Rooted Hourly" schedule with dispensary_id=112
Now when you select ONE store and check "Make recurring", it creates
a schedule that runs for that specific store every interval.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The run-now endpoint only fanned out to stores for product_discovery
schedules, not product_refresh or payload_fetch. This caused single
tasks to be created without dispensary_id, which then failed.
Now all crawl roles (product_discovery, product_refresh, payload_fetch)
with state_code properly fan out to individual store tasks.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add request interception to all Puppeteer handlers to block unnecessary
resources (images, fonts, media, stylesheets). We only need HTML/JS for
the session cookie, then the GraphQL JSON response.
This was causing 2.4GB of bandwidth from assets2.dutchie.com - every
page visit downloaded all product thumbnails, logos, etc.
Files updated:
- product-discovery-http.ts
- entry-point-discovery.ts
- store-discovery-http.ts
- store-discovery-state.ts
- puppeteer-preflight.ts
Note: Product images from payload are still downloaded once to MinIO
via image-storage.ts - this only blocks browser-rendered page images.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Workers check their timezone (from preflight IP geolocation) and current
hour's weight probability to determine availability. This creates natural
traffic patterns - more workers active during peak hours, fewer during
off-peak. Tasks queue up at night and drain during the day.
Migrations:
- 099: working_hours table with hourly weights by profile
- 100: Add timezone column to worker_registry
- 101: Store timezone from preflight IP geolocation
- 102: check_working_hours() function with probability roll
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Rewrite image-storage.ts to use MinIO instead of ephemeral local filesystem
- Images downloaded ONCE from Dutchie CDN, stored permanently in MinIO
- Check MinIO before downloading (skipIfExists) to avoid re-downloads
- Convert images to webp before storage
- Storage path: images/products/<state>/<store>/<brand>/<product>/image-<hash>.webp
- Public URL: https://cdn.cannabrands.app/cannaiq/images/...
This fixes the 2.4GB bandwidth issue from repeatedly downloading images
that were lost when K8s pods restarted.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
WordPress Plugin:
- Add dynamic tags for all product payload fields (name, brand, price, THC, effects, etc.)
- Add Product Loop widget with filtering, sorting, and layout options
- Register CannaIQ widget category in Elementor
- Update build script to auto-upload to MinIO CDN
- Remove legacy dutchie references
- Bump version to 1.7.0
Backend:
- Redirect /downloads/* to CDN instead of serving from local filesystem
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Now stores the complete raw product JSON from Dutchie on every
product refresh. This enables querying any Dutchie field
(terpenes, effects, description, etc.) without schema changes.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Dutchie already provides unique brand UUIDs (provider_brand_id).
No need for separate brand normalization/aliasing logic.
Use provider_brand_id for brand grouping instead.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- product_discovery → product_refresh now sets method: 'http'
- product_refresh → entry_point_discovery now sets method: 'http'
- All crawl tasks now require HTTP preflight to claim
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- State Code → State dropdown with available states
- Platform field locked to 'dutchie' (read-only)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Hard failures now auto-retry up to 3 times
- Exponential backoff: 5, 10, 20 minutes
- Only permanently fails after max retries exceeded
- Soft failures still requeue immediately
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Resets failed tasks back to pending for retry.
Options: role (filter), max_age_hours (default 24), limit (default 100)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Backend: Add 'search' param to /api/admin/intelligence/brands
- Frontend: Debounced search triggers server-side query
- Now searches ALL brands, not just top 500
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use pg_stat for approximate product count (instant vs full scan)
- LIMIT on DISTINCT queries for brand/category counts
- Single combined query (reduces round trips)
- Add index on store_product_snapshots.captured_at
- Add index on worker_tasks.worker_id and created_at
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When resolving platform_dispensary_id, check if it already exists on
another dispensary. If so, mark current dispensary as duplicate instead
of failing with unique constraint violation.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Stage checkpoints (observational, non-blocking):
- product_refresh: success → 'production', failure tracking → 'failing' after 3
- product_discovery: success → 'hydrating', failure tracking
- entry_point_discovery: success → 'promoted', failure tracking
Worker name fix:
- Join worker_registry in tasks query to get friendly_name directly
- Update TasksDashboard to use worker_name from joined query
- Fallback to registry lookup then pod ID suffix
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- market-summary now counts from store_products table (not product_variants)
- Added trigram indexes for fast ILIKE product searches
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Filter by menu_type = 'dutchie'
- Use INNER JOIN + HAVING to only return stores with products
- Stores without product discovery are excluded
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add last_failed_at column to track failure time
- Failed proxies auto-retry after 4 hours (configurable)
- Proxies permanently failed after 10 failures
- Add /retry-stats and /reenable-failed API endpoints
- markProxySuccess() re-enables recovered proxies
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update unique constraint to include username/password for session-based proxies
- All proxies imported as inactive (run Test All to verify and activate)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Changed unique_brands from COUNT(brand_id) to COUNT(brand_name_raw)
- brand_id is often NULL, brand_name_raw has actual data
- AZ now correctly shows 462 brands (was 144)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix stores endpoint to only show stores with actual products (INNER JOIN + HAVING)
- Update badge colors to match Workers/Tasks dashboard style
- Use emerald/amber/red/gray color scheme consistently
- Chain badge now uses purple (bg-purple-100)
- Add migration 092 to fix Trulieve store URLs
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Rewrote entry_point_discovery with auto-healing scheme:
1. Check dutchie_discovery_locations for existing platform_location_id
2. Browser-based GraphQL with 5x network retries
3. Mark as needs_investigation on hard failure
- Browser (Puppeteer) is now DEFAULT transport - curl only when explicit
- Added migration 091 for tracking columns:
- last_store_discovery_at: When store_discovery updated record
- last_payload_at: When last product payload was saved
- Updated CODEBASE_MAP.md with transport rules documentation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Store Discovery Parallelization:
- Add store_discovery_state handler for per-state parallel discovery
- Add POST /api/tasks/batch/store-discovery endpoint
- 8 workers can now process states in parallel (~30-45 min vs 3+ hours)
Modification Tracking (Migration 090):
- Add last_modified_at, last_modified_by_task, last_modified_task_id to dispensaries
- Add same columns to store_products
- Update all handlers to set tracking info on modifications
Stale Task Recovery:
- Add periodic stale cleanup every 10 minutes (worker-0 only)
- Prevents orphaned tasks from blocking queue after worker crashes
Task Deduplication:
- createStaggeredTasks now skips if pending/active task exists for same role
- Skips if same role completed within last 4 hours
- API responses include skipped count
🤖 Generated with [Claude Code](https://claude.com/claude-code)
- Track id_resolution_status, attempts, and errors in handler
- Add POST /api/tasks/batch/entry-point-discovery endpoint
- Skip already-resolved stores, retry failed with force flag
- Fix Run Now to prevent duplicate task creation
- Add loading state to Run Now button in UI
- Return early when no stores need refresh
- Worker dashboard improvements
- Browser pooling architecture updates
- K8s worker config updates (8 replicas, 3 concurrent tasks)