From cdab71a1ee4807742e11361d4b64fab4d3af7813 Mon Sep 17 00:00:00 2001 From: Kelly Date: Thu, 11 Dec 2025 22:47:52 -0700 Subject: [PATCH] feat(workers): Add dual-transport preflight system MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Workers now run both curl and http (Puppeteer) preflights on startup: - curl-preflight.ts: Tests axios + proxy via httpbin.org - puppeteer-preflight.ts: Tests browser + StealthPlugin via fingerprint.com (with amiunique.org fallback) - Migration 084: Adds preflight columns to worker_registry and method column to worker_tasks - Workers report preflight status, IP, fingerprint, and response time - Tasks can require specific transport method (curl/http) - Dashboard shows Transport column with preflight status badges 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- CLAUDE.md | 106 +++++ .../084_dual_transport_preflight.sql | 253 +++++++++++ backend/src/services/curl-preflight.ts | 100 +++++ backend/src/services/puppeteer-preflight.ts | 399 ++++++++++++++++++ backend/src/tasks/task-service.ts | 29 +- backend/src/tasks/task-worker.ts | 130 +++++- cannaiq/src/pages/WorkersDashboard.tsx | 76 ++++ 7 files changed, 1085 insertions(+), 8 deletions(-) create mode 100644 backend/migrations/084_dual_transport_preflight.sql create mode 100644 backend/src/services/curl-preflight.ts create mode 100644 backend/src/services/puppeteer-preflight.ts diff --git a/CLAUDE.md b/CLAUDE.md index 42e79383..8c98d998 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -99,6 +99,112 @@ Bump `wordpress-plugin/VERSION` on changes: --- +## Puppeteer Scraping (Browser-Based) + +### Age Gate Bypass + +Most dispensary sites require age verification. The browser scraper handles this automatically: + +**Utility File**: `src/utils/age-gate.ts` + +**Key Functions**: +- `setAgeGateCookies(page, url, state)` - Set cookies BEFORE navigation to prevent gate +- `hasAgeGate(page)` - Detect if page shows age verification +- `bypassAgeGate(page, state)` - Click through age gate if displayed +- `detectStateFromUrl(url)` - Extract state from URL (e.g., `-az-` → Arizona) + +**Cookie Names Set**: +- `age_gate_passed: 'true'` +- `selected_state: ''` +- `age_verified: 'true'` + +**Bypass Methods** (tried in order): +1. Custom dropdown (shadcn/radix style) - Curaleaf pattern +2. Standard `