Commit Graph

4 Commits

Author SHA1 Message Date
Kelly
5084cb1a85 fix: Block images/fonts/media in Puppeteer to save bandwidth
Add request interception to all Puppeteer handlers to block unnecessary
resources (images, fonts, media, stylesheets). We only need HTML/JS for
the session cookie, then the GraphQL JSON response.

This was causing 2.4GB of bandwidth from assets2.dutchie.com - every
page visit downloaded all product thumbnails, logos, etc.

Files updated:
- product-discovery-http.ts
- entry-point-discovery.ts
- store-discovery-http.ts
- store-discovery-state.ts
- puppeteer-preflight.ts

Note: Product images from payload are still downloaded once to MinIO
via image-storage.ts - this only blocks browser-rendered page images.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 03:28:12 -07:00
Kelly
dd8fce6e35 fix(proxy): Convert non-standard proxy URL format and simplify preflight
- CrawlRotator.getProxyUrl() now converts non-standard format (http://host:port:user:pass) to standard format (http://user:pass@host:port)
- Simplify puppeteer preflight to only use ipify.org for IP verification (much lighter than fingerprint.com)
- Remove heavy anti-detect site tests from preflight - not needed, trust stealth plugin
- Fixes 503 errors when using session-based residential proxies

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 02:13:51 -07:00
Kelly
33feca3138 fix(antidetect): Match browser timezone to proxy IP location
- Add IP geolocation lookup via ip-api.com to get timezone from proxy IP
- Use ipify.org API for reliable proxy IP detection (replaces unreliable fingerprint.com scraping)
- Set browser timezone via CDP Emulation.setTimezoneOverride to match proxy location
- Add detectedTimezone and detectedLocation to preflight result
- Add /api/worker-registry/preflight-test endpoint for smoke testing

Fixes timezone mismatch where browser showed America/Phoenix while proxy was in America/New_York

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 00:25:39 -07:00
Kelly
cdab71a1ee feat(workers): Add dual-transport preflight system
Workers now run both curl and http (Puppeteer) preflights on startup:
- curl-preflight.ts: Tests axios + proxy via httpbin.org
- puppeteer-preflight.ts: Tests browser + StealthPlugin via fingerprint.com
  (with amiunique.org fallback)
- Migration 084: Adds preflight columns to worker_registry and method
  column to worker_tasks
- Workers report preflight status, IP, fingerprint, and response time
- Tasks can require specific transport method (curl/http)
- Dashboard shows Transport column with preflight status badges

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 22:47:52 -07:00