Commit Graph

7 Commits

Author SHA1 Message Date
Kelly
a020e31a46 feat(treez): CDP interception client for Elasticsearch API capture
Rewrites Treez platform client to use CDP (Chrome DevTools Protocol)
interception instead of DOM scraping. Key changes:

- Uses Puppeteer Stealth plugin to bypass headless detection
- Intercepts Elasticsearch API responses via CDP Network.responseReceived
- Captures full product data including inventory levels (availableUnits)
- Adds comprehensive TypeScript types for all Treez data structures
- Updates queries.ts with automatic session management
- Fixes product-discovery-treez handler for new API shape

Tested with Best Dispensary: 142 products across 10 categories captured
with inventory data, pricing, and lab results.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-13 19:25:49 -07:00
Kelly
ba15802a77 perf(puppeteer): Block analytics/tracking domains to save proxy bandwidth
Block requests to non-essential domains:
- googletagmanager.com, google-analytics.com (analytics)
- launchdarkly.com (feature flags)
- assets2.dutchie.com (CDN assets - we only need GraphQL)
- sentry.io (error tracking)
- segment.io/segment.com, amplitude.com, mixpanel.com (analytics)
- hotjar.com, fullstory.com (session recording)

Applied to both product-discovery-dutchie.ts and puppeteer-preflight.ts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 16:50:28 -07:00
Kelly
d8a22fba53 docs: Add Evomi residential proxy API documentation
- Document priority order (Evomi API first, DB fallback)
- List environment variables and defaults
- Show K8s secret location
- Explain proxy URL format with geo targeting

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 16:47:58 -07:00
Kelly
cf99ef9e09 fix(worker): Use Evomi API first, DB proxies as fallback
- Check Evomi API availability before waiting for DB proxies
- If EVOMI_USER/EVOMI_PASS configured, proceed immediately
- Only fall back to DB proxy polling if Evomi not configured
- Added clear comments explaining proxy initialization order

This fixes workers getting stuck waiting for DB proxies when
Evomi API is available for on-demand geo-targeted proxies.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 16:45:52 -07:00
Kelly
023cfc127f fix(preflight): Apply stored fingerprint to task browser
- Add WorkerFingerprint interface with timezone, city, state, ip, locale
- Store fingerprint in TaskWorker after preflight passes
- Pass fingerprint through TaskContext to handlers
- Apply timezone via CDP and locale via Accept-Language header
- Ensures browser fingerprint matches proxy IP location

This fixes anti-detect detection where timezone/locale mismatch
with proxy IP was getting blocked by Cloudflare.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 16:40:52 -07:00
Kelly
d7da0b938d feat(jane): Direct Algolia product fetch and multi-platform product-refresh
- Add fetchProductsByStoreIdDirect() for reliable Algolia product fetching
- Update product-discovery-jane to use direct Algolia instead of network interception
- Fix product-refresh handler to support both Dutchie and Jane payloads
  - Handle both `products` (Dutchie) and `hits` (Jane) formats
  - Use platform-appropriate raw_json structure for normalizers
  - Fix consecutive_misses tracking to use correct provider
  - Extract product IDs correctly (Dutchie _id vs Jane product_id)
- Add store discovery deduplication (prefer REC over MED at same location)
- Add storeTypes field to DiscoveredStore interface
- Add scripts: run-jane-store-discovery.ts, run-jane-product-discovery.ts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 16:05:50 -07:00
Kelly
56cc171287 feat: Stealth worker system with mandatory proxy rotation
## Worker System
- Role-agnostic workers that can handle any task type
- Pod-based architecture with StatefulSet (5-15 pods, 5 workers each)
- Custom pod names (Aethelgard, Xylos, Kryll, etc.)
- Worker registry with friendly names and resource monitoring
- Hub-and-spoke visualization on JobQueue page

## Stealth & Anti-Detection (REQUIRED)
- Proxies are MANDATORY - workers fail to start without active proxies
- CrawlRotator initializes on worker startup
- Loads proxies from `proxies` table
- Auto-rotates proxy + fingerprint on 403 errors
- 12 browser fingerprints (Chrome, Firefox, Safari, Edge)
- Locale/timezone matching for geographic consistency

## Task System
- Renamed product_resync → product_refresh
- Task chaining: store_discovery → entry_point → product_discovery
- Priority-based claiming with FOR UPDATE SKIP LOCKED
- Heartbeat and stale task recovery

## UI Updates
- JobQueue: Pod visualization, resource monitoring on hover
- WorkersDashboard: Simplified worker list
- Removed unused filters from task list

## Other
- IP2Location service for visitor analytics
- Findagram consumer features scaffolding
- Documentation updates

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 00:44:59 -07:00