Add request interception to all Puppeteer handlers to block unnecessary
resources (images, fonts, media, stylesheets). We only need HTML/JS for
the session cookie, then the GraphQL JSON response.
This was causing 2.4GB of bandwidth from assets2.dutchie.com - every
page visit downloaded all product thumbnails, logos, etc.
Files updated:
- product-discovery-http.ts
- entry-point-discovery.ts
- store-discovery-http.ts
- store-discovery-state.ts
- puppeteer-preflight.ts
Note: Product images from payload are still downloaded once to MinIO
via image-storage.ts - this only blocks browser-rendered page images.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- CrawlRotator.getProxyUrl() now converts non-standard format (http://host:port:user:pass) to standard format (http://user:pass@host:port)
- Simplify puppeteer preflight to only use ipify.org for IP verification (much lighter than fingerprint.com)
- Remove heavy anti-detect site tests from preflight - not needed, trust stealth plugin
- Fixes 503 errors when using session-based residential proxies
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add IP geolocation lookup via ip-api.com to get timezone from proxy IP
- Use ipify.org API for reliable proxy IP detection (replaces unreliable fingerprint.com scraping)
- Set browser timezone via CDP Emulation.setTimezoneOverride to match proxy location
- Add detectedTimezone and detectedLocation to preflight result
- Add /api/worker-registry/preflight-test endpoint for smoke testing
Fixes timezone mismatch where browser showed America/Phoenix while proxy was in America/New_York
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Workers now run both curl and http (Puppeteer) preflights on startup:
- curl-preflight.ts: Tests axios + proxy via httpbin.org
- puppeteer-preflight.ts: Tests browser + StealthPlugin via fingerprint.com
(with amiunique.org fallback)
- Migration 084: Adds preflight columns to worker_registry and method
column to worker_tasks
- Workers report preflight status, IP, fingerprint, and response time
- Tasks can require specific transport method (curl/http)
- Dashboard shows Transport column with preflight status badges
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>