fix(preflight): Correct parameter order and add IP/fingerprint reporting

- Fix update_worker_preflight call to use correct parameter order: (worker_id, transport, status, ip, response_ms, error, fingerprint) - Add proxyIp to both curl and http preflight reports - Add fingerprint JSONB with timezone, location, and bot detection data - Log HTTP IP and timezone after preflight completes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Merge pull request 'feat(workers): Preflight phase 1 - Schema, StatefulSet, and timezone matching' (#55 ) from feat/preflight-phase1-schema into master
2025-12-12 00:32:45 -07:00 · 2025-12-12 07:30:53 +00:00 · 2025-12-12 00:25:39 -07:00 · 2025-12-12 07:14:40 +00:00 · 2025-12-11 23:45:04 -07:00 · 2025-12-12 06:22:06 +00:00
73 changed files with 3180 additions and 1905 deletions
--- a/.woodpecker.yml
+++ b/.woodpecker.yml
@@ -69,6 +69,7 @@ steps:
  # ===========================================
  # MASTER DEPLOY: Parallel Docker builds
  # NOTE: cache_from/cache_to removed due to plugin bug splitting on commas
  # ===========================================
  docker-backend:
    image: woodpeckerci/plugin-docker-buildx
@@ -86,10 +87,6 @@ steps:
        from_secret: registry_password
      platforms: linux/amd64
      provenance: false
      cache_from:
        - "type=registry,ref=code.cannabrands.app/creationshop/dispensary-scraper:cache"
      cache_to:
        - "type=registry,ref=code.cannabrands.app/creationshop/dispensary-scraper:cache,mode=max"
      build_args:
        APP_BUILD_VERSION: ${CI_COMMIT_SHA:0:8}
        APP_GIT_SHA: ${CI_COMMIT_SHA}
@@ -116,10 +113,6 @@ steps:
        from_secret: registry_password
      platforms: linux/amd64
      provenance: false
      cache_from:
        - "type=registry,ref=code.cannabrands.app/creationshop/cannaiq-frontend:cache"
      cache_to:
        - "type=registry,ref=code.cannabrands.app/creationshop/cannaiq-frontend:cache,mode=max"
    depends_on: []
    when:
      branch: master
@@ -141,10 +134,6 @@ steps:
        from_secret: registry_password
      platforms: linux/amd64
      provenance: false
      cache_from:
        - "type=registry,ref=code.cannabrands.app/creationshop/findadispo-frontend:cache"
      cache_to:
        - "type=registry,ref=code.cannabrands.app/creationshop/findadispo-frontend:cache,mode=max"
    depends_on: []
    when:
      branch: master
@@ -166,10 +155,6 @@ steps:
        from_secret: registry_password
      platforms: linux/amd64
      provenance: false
      cache_from:
        - "type=registry,ref=code.cannabrands.app/creationshop/findagram-frontend:cache"
      cache_to:
        - "type=registry,ref=code.cannabrands.app/creationshop/findagram-frontend:cache,mode=max"
    depends_on: []
    when:
      branch: master
--- a/CLAUDE.md
+++ b/CLAUDE.md
--- a/backend/docs/CODEBASE_MAP.md
+++ b/backend/docs/CODEBASE_MAP.md
@@ -0,0 +1,218 @@
 # CannaiQ Backend Codebase Map
 **Last Updated:** 2025-12-12
 **Purpose:** Help Claude and developers understand which code is current vs deprecated
 ---
 ## Quick Reference: What to Use
 ### For Crawling/Scraping
 | Task | Use This | NOT This |
 |------|----------|----------|
 | Fetch products | `src/tasks/handlers/payload-fetch.ts` | `src/hydration/*` |
 | Process products | `src/tasks/handlers/product-refresh.ts` | `src/scraper-v2/*` |
 | GraphQL client | `src/platforms/dutchie/client.ts` | `src/dutchie-az/services/graphql-client.ts` |
 | Worker system | `src/tasks/task-worker.ts` | `src/dutchie-az/services/worker.ts` |
 ### For Database
 | Task | Use This | NOT This |
 |------|----------|----------|
 | Get DB pool | `src/db/pool.ts` | `src/dutchie-az/db/connection.ts` |
 | Run migrations | `src/db/migrate.ts` (CLI only) | Never import at runtime |
 | Query products | `store_products` table | `products`, `dutchie_products` |
 | Query stores | `dispensaries` table | `stores` table |
 ### For Discovery
 | Task | Use This |
 |------|----------|
 | Discover stores | `src/discovery/*.ts` |
 | Run discovery | `npx tsx src/scripts/run-discovery.ts` |
 ---
 ## Directory Status
 ### ACTIVE DIRECTORIES (Use These)
 ```
 src/
 ├── auth/               # JWT/session auth, middleware
 ├── db/                 # Database pool, migrations
 ├── discovery/          # Dutchie store discovery pipeline
 ├── middleware/         # Express middleware
 ├── multi-state/        # Multi-state query support
 ├── platforms/          # Platform-specific clients (Dutchie, Jane, etc)
 │   └── dutchie/        # THE Dutchie client - use this one
 ├── routes/             # Express API routes
 ├── services/           # Core services (logger, scheduler, etc)
 ├── tasks/              # Task system (workers, handlers, scheduler)
 │   └── handlers/       # Task handlers (payload_fetch, product_refresh, etc)
 ├── types/              # TypeScript types
 └── utils/              # Utilities (storage, image processing)
 ```
 ### DEPRECATED DIRECTORIES (DO NOT USE)
 ```
 src/
 ├── hydration/          # DEPRECATED - Old pipeline approach
 ├── scraper-v2/         # DEPRECATED - Old scraper engine
 ├── canonical-hydration/# DEPRECATED - Merged into tasks/handlers
 ├── dutchie-az/         # PARTIAL - Some parts deprecated, some active
 │   ├── db/             # DEPRECATED - Use src/db/pool.ts
 │   └── services/       # PARTIAL - worker.ts still runs, graphql-client.ts deprecated
 ├── portals/            # FUTURE - Not yet implemented
 ├── seo/                # PARTIAL - Settings work, templates WIP
 └── system/             # DEPRECATED - Old orchestration system
 ```
 ### DEPRECATED FILES (DO NOT USE)
 ```
 src/dutchie-az/db/connection.ts      # Use src/db/pool.ts instead
 src/dutchie-az/services/graphql-client.ts  # Use src/platforms/dutchie/client.ts
 src/hydration/*.ts                   # Entire directory deprecated
 src/scraper-v2/*.ts                  # Entire directory deprecated
 ```
 ---
 ## Key Files Reference
 ### Entry Points
 | File | Purpose | Status |
 |------|---------|--------|
 | `src/index.ts` | Main Express server | ACTIVE |
 | `src/dutchie-az/services/worker.ts` | Worker process entry | ACTIVE |
 | `src/tasks/task-worker.ts` | Task worker (new system) | ACTIVE |
 ### Dutchie Integration
 | File | Purpose | Status |
 |------|---------|--------|
 | `src/platforms/dutchie/client.ts` | GraphQL client, hashes, curl | **PRIMARY** |
 | `src/platforms/dutchie/queries.ts` | High-level query functions | ACTIVE |
 | `src/platforms/dutchie/index.ts` | Re-exports | ACTIVE |
 ### Task Handlers
 | File | Purpose | Status |
 |------|---------|--------|
 | `src/tasks/handlers/payload-fetch.ts` | Fetch products from Dutchie | **PRIMARY** |
 | `src/tasks/handlers/product-refresh.ts` | Process payload into DB | **PRIMARY** |
 | `src/tasks/handlers/menu-detection.ts` | Detect menu type | ACTIVE |
 | `src/tasks/handlers/id-resolution.ts` | Resolve platform IDs | ACTIVE |
 | `src/tasks/handlers/image-download.ts` | Download product images | ACTIVE |
 ### Database
 | File | Purpose | Status |
 |------|---------|--------|
 | `src/db/pool.ts` | Canonical DB pool | **PRIMARY** |
 | `src/db/migrate.ts` | Migration runner (CLI only) | CLI ONLY |
 | `src/db/auto-migrate.ts` | Auto-run migrations on startup | ACTIVE |
 ### Configuration
 | File | Purpose | Status |
 |------|---------|--------|
 | `.env` | Environment variables | ACTIVE |
 | `package.json` | Dependencies | ACTIVE |
 | `tsconfig.json` | TypeScript config | ACTIVE |
 ---
 ## GraphQL Hashes (CRITICAL)
 The correct hashes are in `src/platforms/dutchie/client.ts`:
 ```typescript
 export const GRAPHQL_HASHES = {
  FilteredProducts: 'ee29c060826dc41c527e470e9ae502c9b2c169720faa0a9f5d25e1b9a530a4a0',
  GetAddressBasedDispensaryData: '13461f73abf7268770dfd05fe7e10c523084b2bb916a929c08efe3d87531977b',
  ConsumerDispensaries: '0a5bfa6ca1d64ae47bcccb7c8077c87147cbc4e6982c17ceec97a2a4948b311b',
  GetAllCitiesByState: 'ae547a0466ace5a48f91e55bf6699eacd87e3a42841560f0c0eabed5a0a920e6',
 };
 ```
 **ALWAYS** use `Status: 'Active'` for FilteredProducts (not `null` or `'All'`).
 ---
 ## Scripts Reference
 ### Useful Scripts (in `src/scripts/`)
 | Script | Purpose |
 |--------|---------|
 | `run-discovery.ts` | Run Dutchie discovery |
 | `crawl-single-store.ts` | Test crawl a single store |
 | `test-dutchie-graphql.ts` | Test GraphQL queries |
 ### One-Off Scripts (probably don't need)
 | Script | Purpose |
 |--------|---------|
 | `harmonize-az-dispensaries.ts` | One-time data cleanup |
 | `bootstrap-stores-for-dispensaries.ts` | One-time migration |
 | `backfill-*.ts` | Historical backfill scripts |
 ---
 ## API Routes
 ### Active Routes (in `src/routes/`)
 | Route File | Mount Point | Purpose |
 |------------|-------------|---------|
 | `auth.ts` | `/api/auth` | Login/logout/session |
 | `stores.ts` | `/api/stores` | Store CRUD |
 | `dashboard.ts` | `/api/dashboard` | Dashboard stats |
 | `workers.ts` | `/api/workers` | Worker monitoring |
 | `pipeline.ts` | `/api/pipeline` | Crawl triggers |
 | `discovery.ts` | `/api/discovery` | Discovery management |
 | `analytics.ts` | `/api/analytics` | Analytics queries |
 | `wordpress.ts` | `/api/v1/wordpress` | WordPress plugin API |
 ---
 ## Documentation Files
 ### Current Docs (in `backend/docs/`)
 | Doc | Purpose | Currency |
 |-----|---------|----------|
 | `TASK_WORKFLOW_2024-12-10.md` | Task system architecture | CURRENT |
 | `WORKER_TASK_ARCHITECTURE.md` | Worker/task design | CURRENT |
 | `CRAWL_PIPELINE.md` | Crawl pipeline overview | CURRENT |
 | `ORGANIC_SCRAPING_GUIDE.md` | Browser-based scraping | CURRENT |
 | `CODEBASE_MAP.md` | This file | CURRENT |
 | `ANALYTICS_V2_EXAMPLES.md` | Analytics API examples | CURRENT |
 | `BRAND_INTELLIGENCE_API.md` | Brand API docs | CURRENT |
 ### Root Docs
 | Doc | Purpose | Currency |
 |-----|---------|----------|
 | `CLAUDE.md` | Claude instructions | **PRIMARY** |
 | `README.md` | Project overview | NEEDS UPDATE |
 ---
 ## Common Mistakes to Avoid
 1. **Don't use `src/hydration/`** - It's an old approach that was superseded by the task system
 2. **Don't use `src/dutchie-az/db/connection.ts`** - Use `src/db/pool.ts` instead
 3. **Don't import `src/db/migrate.ts` at runtime** - It will crash. Only use for CLI migrations.
 4. **Don't query `stores` table** - It's empty. Use `dispensaries`.
 5. **Don't query `products` table** - It's empty. Use `store_products`.
 6. **Don't use wrong GraphQL hash** - Always get hash from `GRAPHQL_HASHES` in client.ts
 7. **Don't use `Status: null`** - It returns 0 products. Use `Status: 'Active'`.
 ---
 ## When in Doubt
 1. Check if the file is imported in `src/index.ts` - if not, it may be deprecated
 2. Check the last modified date - older files may be stale
 3. Look for `DEPRECATED` comments in the code
 4. Ask: "Is there a newer version of this in `src/tasks/` or `src/platforms/`?"
 5. Read the relevant doc in `docs/` before modifying code
--- a/backend/docs/_archive/ANALYTICS_RUNBOOK.md
+++ b/backend/docs/_archive/ANALYTICS_RUNBOOK.md
--- a/backend/docs/_archive/ANALYTICS_V2_EXAMPLES.md
+++ b/backend/docs/_archive/ANALYTICS_V2_EXAMPLES.md
--- a/backend/docs/_archive/BRAND_INTELLIGENCE_API.md
+++ b/backend/docs/_archive/BRAND_INTELLIGENCE_API.md
--- a/backend/docs/_archive/CRAWL_PIPELINE.md
+++ b/backend/docs/_archive/CRAWL_PIPELINE.md
--- a/backend/docs/_archive/ORGANIC_SCRAPING_GUIDE.md
+++ b/backend/docs/_archive/ORGANIC_SCRAPING_GUIDE.md
@@ -0,0 +1,297 @@
 # Organic Browser-Based Scraping Guide
 **Last Updated:** 2025-12-12
 **Status:** Production-ready proof of concept
 ---
 ## Overview
 This document describes the "organic" browser-based approach to scraping Dutchie dispensary menus. Unlike direct curl/axios requests, this method uses a real browser session to make API calls, making requests appear natural and reducing detection risk.
 ---
 ## Why Organic Scraping?
 | Approach | Detection Risk | Speed | Complexity |
 |----------|---------------|-------|------------|
 | Direct curl | Higher | Fast | Low |
 | curl-impersonate | Medium | Fast | Medium |
 | **Browser-based (organic)** | **Lowest** | Slower | Higher |
 Direct curl requests can be fingerprinted via:
 - TLS fingerprint (cipher suites, extensions)
 - Header order and values
 - Missing cookies/session data
 - Request patterns
 Browser-based requests inherit:
 - Real Chrome TLS fingerprint
 - Session cookies from page visit
 - Natural header order
 - JavaScript execution environment
 ---
 ## Implementation
 ### Dependencies
 ```bash
 npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth
 ```
 ### Core Script: `test-intercept.js`
 Located at: `backend/test-intercept.js`
 ```javascript
 const puppeteer = require('puppeteer-extra');
 const StealthPlugin = require('puppeteer-extra-plugin-stealth');
 const fs = require('fs');
 puppeteer.use(StealthPlugin());
 async function capturePayload(config) {
  const { dispensaryId, platformId, cName, outputPath } = config;
  const browser = await puppeteer.launch({
    headless: 'new',
    args: ['--no-sandbox', '--disable-setuid-sandbox']
  });
  const page = await browser.newPage();
  // STEP 1: Establish session by visiting the menu
  const embedUrl = `https://dutchie.com/embedded-menu/${cName}?menuType=rec`;
  await page.goto(embedUrl, { waitUntil: 'networkidle2', timeout: 60000 });
  // STEP 2: Fetch ALL products using GraphQL from browser context
  const result = await page.evaluate(async (platformId) => {
    const allProducts = [];
    let pageNum = 0;
    const perPage = 100;
    let totalCount = 0;
    const sessionId = 'browser-session-' + Date.now();
    while (pageNum < 30) {
      const variables = {
        includeEnterpriseSpecials: false,
        productsFilter: {
          dispensaryId: platformId,
          pricingType: 'rec',
          Status: 'Active',  // CRITICAL: Must be 'Active', not null
          types: [],
          useCache: true,
          isDefaultSort: true,
          sortBy: 'popularSortIdx',
          sortDirection: 1,
          bypassOnlineThresholds: true,
          isKioskMenu: false,
          removeProductsBelowOptionThresholds: false,
        },
        page: pageNum,
        perPage: perPage,
      };
      const extensions = {
        persistedQuery: {
          version: 1,
          sha256Hash: 'ee29c060826dc41c527e470e9ae502c9b2c169720faa0a9f5d25e1b9a530a4a0'
        }
      };
      const qs = new URLSearchParams({
        operationName: 'FilteredProducts',
        variables: JSON.stringify(variables),
        extensions: JSON.stringify(extensions)
      });
      const response = await fetch(`https://dutchie.com/api-3/graphql?${qs}`, {
        method: 'GET',
        headers: {
          'Accept': 'application/json',
          'content-type': 'application/json',
          'x-dutchie-session': sessionId,
          'apollographql-client-name': 'Marketplace (production)',
        },
        credentials: 'include'
      });
      const json = await response.json();
      const data = json?.data?.filteredProducts;
      if (!data?.products) break;
      allProducts.push(...data.products);
      if (pageNum === 0) totalCount = data.queryInfo?.totalCount || 0;
      if (allProducts.length >= totalCount) break;
      pageNum++;
      await new Promise(r => setTimeout(r, 200)); // Polite delay
    }
    return { products: allProducts, totalCount };
  }, platformId);
  await browser.close();
  // STEP 3: Save payload
  const payload = {
    dispensaryId,
    platformId,
    cName,
    fetchedAt: new Date().toISOString(),
    productCount: result.products.length,
    products: result.products,
  };
  fs.writeFileSync(outputPath, JSON.stringify(payload, null, 2));
  return payload;
 }
 ```
 ---
 ## Critical Parameters
 ### GraphQL Hash (FilteredProducts)
 ```
 ee29c060826dc41c527e470e9ae502c9b2c169720faa0a9f5d25e1b9a530a4a0
 ```
 **WARNING:** Using the wrong hash returns HTTP 400.
 ### Status Parameter
 | Value | Result |
 |-------|--------|
 | `'Active'` | Returns in-stock products (1019 in test) |
 | `null` | Returns 0 products |
 | `'All'` | Returns HTTP 400 |
 **ALWAYS use `Status: 'Active'`**
 ### Required Headers
 ```javascript
 {
  'Accept': 'application/json',
  'content-type': 'application/json',
  'x-dutchie-session': 'unique-session-id',
  'apollographql-client-name': 'Marketplace (production)',
 }
 ```
 ### Endpoint
 ```
 https://dutchie.com/api-3/graphql
 ```
 ---
 ## Performance Benchmarks
 Test store: AZ-Deeply-Rooted (1019 products)
 | Metric | Value |
 |--------|-------|
 | Total products | 1019 |
 | Time | 18.5 seconds |
 | Payload size | 11.8 MB |
 | Pages fetched | 11 (100 per page) |
 | Success rate | 100% |
 ---
 ## Payload Format
 The output matches the existing `payload-fetch.ts` handler format:
 ```json
 {
  "dispensaryId": 123,
  "platformId": "6405ef617056e8014d79101b",
  "cName": "AZ-Deeply-Rooted",
  "fetchedAt": "2025-12-12T05:05:19.837Z",
  "productCount": 1019,
  "products": [
    {
      "id": "6927508db4851262f629a869",
      "Name": "Product Name",
      "brand": { "name": "Brand Name", ... },
      "type": "Flower",
      "THC": "25%",
      "Prices": [...],
      "Options": [...],
      ...
    }
  ]
 }
 ```
 ---
 ## Integration Points
 ### As a Task Handler
 The organic approach can be integrated as an alternative to curl-based fetching:
 ```typescript
 // In src/tasks/handlers/organic-payload-fetch.ts
 export async function handleOrganicPayloadFetch(ctx: TaskContext): Promise<TaskResult> {
  // Use puppeteer-based capture
  // Save to same payload storage
  // Queue product_refresh task
 }
 ```
 ### Worker Configuration
 Add to job_schedules:
 ```sql
 INSERT INTO job_schedules (name, role, cron_expression)
 VALUES ('organic_product_crawl', 'organic_payload_fetch', '0 */6 * * *');
 ```
 ---
 ## Troubleshooting
 ### HTTP 400 Bad Request
 - Check hash is correct: `ee29c060...`
 - Verify Status is `'Active'` (string, not null)
 ### 0 Products Returned
 - Status was likely `null` or `'All'` - use `'Active'`
 - Check platformId is valid MongoDB ObjectId
 ### Session Not Established
 - Increase timeout on initial page.goto()
 - Check cName is valid (matches embedded-menu URL)
 ### Detection/Blocking
 - StealthPlugin should handle most cases
 - Add random delays between pages
 - Use headless: 'new' (not true/false)
 ---
 ## Files Reference
 | File | Purpose |
 |------|---------|
 | `backend/test-intercept.js` | Proof of concept script |
 | `backend/src/platforms/dutchie/client.ts` | GraphQL hashes, curl implementation |
 | `backend/src/tasks/handlers/payload-fetch.ts` | Current curl-based handler |
 | `backend/src/utils/payload-storage.ts` | Payload save/load utilities |
 ---
 ## See Also
 - `DUTCHIE_CRAWL_WORKFLOW.md` - Full crawl pipeline documentation
 - `TASK_WORKFLOW_2024-12-10.md` - Task system architecture
 - `CLAUDE.md` - Project rules and constraints
--- a/backend/docs/_archive/README.md
+++ b/backend/docs/_archive/README.md
@@ -0,0 +1,25 @@
 # ARCHIVED DOCUMENTATION
 **WARNING: These docs may be outdated or inaccurate.**
 The code has evolved significantly. These docs are kept for historical reference only.
 ## What to Use Instead
 **The single source of truth is:**
 - `CLAUDE.md` (root) - Essential rules and quick reference
 - `docs/CODEBASE_MAP.md` - Current file/directory reference
 ## Why Archive?
 These docs were written during development iterations and may reference:
 - Old file paths that no longer exist
 - Deprecated approaches (hydration, scraper-v2)
 - APIs that have changed
 - Database schemas that evolved
 ## If You Need Details
 1. First check CODEBASE_MAP.md for current file locations
 2. Then read the actual source code
 3. Only use archive docs as a last resort for historical context
--- a/backend/docs/_archive/TASK_WORKFLOW_2024-12-10.md
+++ b/backend/docs/_archive/TASK_WORKFLOW_2024-12-10.md
--- a/backend/docs/_archive/WORKER_TASK_ARCHITECTURE.md
+++ b/backend/docs/_archive/WORKER_TASK_ARCHITECTURE.md
--- a/backend/k8s/scraper-worker-statefulset.yaml
+++ b/backend/k8s/scraper-worker-statefulset.yaml
@@ -0,0 +1,77 @@
 apiVersion: v1
 kind: Service
 metadata:
  name: scraper-worker
  namespace: dispensary-scraper
  labels:
    app: scraper-worker
 spec:
  clusterIP: None  # Headless service required for StatefulSet
  selector:
    app: scraper-worker
  ports:
  - port: 3010
    name: http
 ---
 apiVersion: apps/v1
 kind: StatefulSet
 metadata:
  name: scraper-worker
  namespace: dispensary-scraper
 spec:
  serviceName: scraper-worker
  replicas: 8
  podManagementPolicy: Parallel  # Start all pods at once
  updateStrategy:
    type: OnDelete  # Pods only update when manually deleted - no automatic restarts
  selector:
    matchLabels:
      app: scraper-worker
  template:
    metadata:
      labels:
        app: scraper-worker
    spec:
      terminationGracePeriodSeconds: 60
      imagePullSecrets:
      - name: regcred
      containers:
      - name: worker
        image: code.cannabrands.app/creationshop/dispensary-scraper:latest
        imagePullPolicy: Always
        command: ["node"]
        args: ["dist/tasks/task-worker.js"]
        env:
        - name: WORKER_MODE
          value: "true"
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: MAX_CONCURRENT_TASKS
          value: "50"
        - name: API_BASE_URL
          value: http://scraper
        - name: NODE_OPTIONS
          value: --max-old-space-size=1500
        envFrom:
        - configMapRef:
            name: scraper-config
        - secretRef:
            name: scraper-secrets
        resources:
          requests:
            cpu: 100m
            memory: 1Gi
          limits:
            cpu: 500m
            memory: 2Gi
        livenessProbe:
          exec:
            command:
            - /bin/sh
            - -c
            - pgrep -f 'task-worker' > /dev/null
          initialDelaySeconds: 10
          periodSeconds: 30
          failureThreshold: 3
--- a/backend/migrations/084_dual_transport_preflight.sql
+++ b/backend/migrations/084_dual_transport_preflight.sql
@@ -0,0 +1,253 @@
 -- Migration 084: Dual Transport Preflight System
 -- Workers run both curl and http (Puppeteer) preflights on startup
 -- Tasks can require a specific transport method
 -- ===================================================================
 -- PART 1: Add preflight columns to worker_registry
 -- ===================================================================
 -- Preflight status for curl/axios transport (proxy-based)
 ALTER TABLE worker_registry
 ADD COLUMN IF NOT EXISTS preflight_curl_status VARCHAR(20) DEFAULT 'pending';
 -- Preflight status for http/Puppeteer transport (browser-based)
 ALTER TABLE worker_registry
 ADD COLUMN IF NOT EXISTS preflight_http_status VARCHAR(20) DEFAULT 'pending';
 -- Timestamps for when each preflight completed
 ALTER TABLE worker_registry
 ADD COLUMN IF NOT EXISTS preflight_curl_at TIMESTAMPTZ;
 ALTER TABLE worker_registry
 ADD COLUMN IF NOT EXISTS preflight_http_at TIMESTAMPTZ;
 -- Error messages for failed preflights
 ALTER TABLE worker_registry
 ADD COLUMN IF NOT EXISTS preflight_curl_error TEXT;
 ALTER TABLE worker_registry
 ADD COLUMN IF NOT EXISTS preflight_http_error TEXT;
 -- Response time for successful preflights (ms)
 ALTER TABLE worker_registry
 ADD COLUMN IF NOT EXISTS preflight_curl_ms INTEGER;
 ALTER TABLE worker_registry
 ADD COLUMN IF NOT EXISTS preflight_http_ms INTEGER;
 -- Constraints for preflight status values
 ALTER TABLE worker_registry
 DROP CONSTRAINT IF EXISTS valid_preflight_curl_status;
 ALTER TABLE worker_registry
 ADD CONSTRAINT valid_preflight_curl_status
 CHECK (preflight_curl_status IN ('pending', 'passed', 'failed', 'skipped'));
 ALTER TABLE worker_registry
 DROP CONSTRAINT IF EXISTS valid_preflight_http_status;
 ALTER TABLE worker_registry
 ADD CONSTRAINT valid_preflight_http_status
 CHECK (preflight_http_status IN ('pending', 'passed', 'failed', 'skipped'));
 -- ===================================================================
 -- PART 2: Add method column to worker_tasks
 -- ===================================================================
 -- Transport method requirement for the task
 -- NULL = no preference (any worker can claim)
 -- 'curl' = requires curl/axios transport (proxy-based, fast)
 -- 'http' = requires http/Puppeteer transport (browser-based, anti-detect)
 ALTER TABLE worker_tasks
 ADD COLUMN IF NOT EXISTS method VARCHAR(10);
 -- Constraint for valid method values
 ALTER TABLE worker_tasks
 DROP CONSTRAINT IF EXISTS valid_task_method;
 ALTER TABLE worker_tasks
 ADD CONSTRAINT valid_task_method
 CHECK (method IS NULL OR method IN ('curl', 'http'));
 -- Index for method-based task claiming
 CREATE INDEX IF NOT EXISTS idx_worker_tasks_method
  ON worker_tasks(method)
  WHERE status = 'pending';
 -- Set default method for all existing pending tasks to 'http'
 -- ALL current tasks require Puppeteer/browser-based transport
 UPDATE worker_tasks
 SET method = 'http'
 WHERE method IS NULL;
 -- ===================================================================
 -- PART 3: Update claim_task function for method compatibility
 -- ===================================================================
 CREATE OR REPLACE FUNCTION claim_task(
  p_role VARCHAR(50),
  p_worker_id VARCHAR(100),
  p_curl_passed BOOLEAN DEFAULT TRUE,
  p_http_passed BOOLEAN DEFAULT FALSE
 ) RETURNS worker_tasks AS $$
 DECLARE
  claimed_task worker_tasks;
 BEGIN
  UPDATE worker_tasks
  SET
    status = 'claimed',
    worker_id = p_worker_id,
    claimed_at = NOW(),
    updated_at = NOW()
  WHERE id = (
    SELECT id FROM worker_tasks
    WHERE role = p_role
      AND status = 'pending'
      AND (scheduled_for IS NULL OR scheduled_for <= NOW())
      -- Method compatibility: worker must have passed the required preflight
      AND (
        method IS NULL  -- No preference, any worker can claim
        OR (method = 'curl' AND p_curl_passed = TRUE)
        OR (method = 'http' AND p_http_passed = TRUE)
      )
      -- Exclude stores that already have an active task
      AND (dispensary_id IS NULL OR dispensary_id NOT IN (
        SELECT dispensary_id FROM worker_tasks
        WHERE status IN ('claimed', 'running')
        AND dispensary_id IS NOT NULL
      ))
    ORDER BY priority DESC, created_at ASC
    LIMIT 1
    FOR UPDATE SKIP LOCKED
  )
  RETURNING * INTO claimed_task;
  RETURN claimed_task;
 END;
 $$ LANGUAGE plpgsql;
 -- ===================================================================
 -- PART 4: Update v_active_workers view
 -- ===================================================================
 DROP VIEW IF EXISTS v_active_workers;
 CREATE VIEW v_active_workers AS
 SELECT
  wr.id,
  wr.worker_id,
  wr.friendly_name,
  wr.role,
  wr.status,
  wr.pod_name,
  wr.hostname,
  wr.started_at,
  wr.last_heartbeat_at,
  wr.last_task_at,
  wr.tasks_completed,
  wr.tasks_failed,
  wr.current_task_id,
  -- Preflight status
  wr.preflight_curl_status,
  wr.preflight_http_status,
  wr.preflight_curl_at,
  wr.preflight_http_at,
  wr.preflight_curl_error,
  wr.preflight_http_error,
  wr.preflight_curl_ms,
  wr.preflight_http_ms,
  -- Computed fields
  EXTRACT(EPOCH FROM (NOW() - wr.last_heartbeat_at)) as seconds_since_heartbeat,
  CASE
    WHEN wr.status = 'offline' THEN 'offline'
    WHEN wr.last_heartbeat_at < NOW() - INTERVAL '2 minutes' THEN 'stale'
    WHEN wr.current_task_id IS NOT NULL THEN 'busy'
    ELSE 'ready'
  END as health_status,
  -- Capability flags (can this worker handle curl/http tasks?)
  (wr.preflight_curl_status = 'passed') as can_curl,
  (wr.preflight_http_status = 'passed') as can_http
 FROM worker_registry wr
 WHERE wr.status != 'terminated'
 ORDER BY wr.status = 'active' DESC, wr.last_heartbeat_at DESC;
 -- ===================================================================
 -- PART 5: View for task queue with method info
 -- ===================================================================
 DROP VIEW IF EXISTS v_task_history;
 CREATE VIEW v_task_history AS
 SELECT
  t.id,
  t.role,
  t.dispensary_id,
  d.name as dispensary_name,
  t.platform,
  t.status,
  t.priority,
  t.method,
  t.worker_id,
  t.scheduled_for,
  t.claimed_at,
  t.started_at,
  t.completed_at,
  t.error_message,
  t.retry_count,
  t.created_at,
  EXTRACT(EPOCH FROM (t.completed_at - t.started_at)) as duration_sec
 FROM worker_tasks t
 LEFT JOIN dispensaries d ON d.id = t.dispensary_id
 ORDER BY t.created_at DESC;
 -- ===================================================================
 -- PART 6: Helper function to update worker preflight status
 -- ===================================================================
 CREATE OR REPLACE FUNCTION update_worker_preflight(
  p_worker_id VARCHAR(100),
  p_transport VARCHAR(10),  -- 'curl' or 'http'
  p_status VARCHAR(20),     -- 'passed', 'failed', 'skipped'
  p_response_ms INTEGER DEFAULT NULL,
  p_error TEXT DEFAULT NULL
 ) RETURNS VOID AS $$
 BEGIN
  IF p_transport = 'curl' THEN
    UPDATE worker_registry
    SET
      preflight_curl_status = p_status,
      preflight_curl_at = NOW(),
      preflight_curl_ms = p_response_ms,
      preflight_curl_error = p_error,
      updated_at = NOW()
    WHERE worker_id = p_worker_id;
  ELSIF p_transport = 'http' THEN
    UPDATE worker_registry
    SET
      preflight_http_status = p_status,
      preflight_http_at = NOW(),
      preflight_http_ms = p_response_ms,
      preflight_http_error = p_error,
      updated_at = NOW()
    WHERE worker_id = p_worker_id;
  END IF;
 END;
 $$ LANGUAGE plpgsql;
 -- ===================================================================
 -- Comments
 -- ===================================================================
 COMMENT ON COLUMN worker_registry.preflight_curl_status IS 'Status of curl/axios preflight: pending, passed, failed, skipped';
 COMMENT ON COLUMN worker_registry.preflight_http_status IS 'Status of http/Puppeteer preflight: pending, passed, failed, skipped';
 COMMENT ON COLUMN worker_registry.preflight_curl_at IS 'When curl preflight completed';
 COMMENT ON COLUMN worker_registry.preflight_http_at IS 'When http preflight completed';
 COMMENT ON COLUMN worker_registry.preflight_curl_error IS 'Error message if curl preflight failed';
 COMMENT ON COLUMN worker_registry.preflight_http_error IS 'Error message if http preflight failed';
 COMMENT ON COLUMN worker_registry.preflight_curl_ms IS 'Response time of successful curl preflight (ms)';
 COMMENT ON COLUMN worker_registry.preflight_http_ms IS 'Response time of successful http preflight (ms)';
 COMMENT ON COLUMN worker_tasks.method IS 'Transport method required: NULL=any, curl=proxy-based, http=browser-based';
 COMMENT ON FUNCTION claim_task IS 'Atomically claim a task, respecting method requirements and per-store locking';
 COMMENT ON FUNCTION update_worker_preflight IS 'Update a workers preflight status for a given transport';
--- a/backend/migrations/085_preflight_ip_fingerprint.sql
+++ b/backend/migrations/085_preflight_ip_fingerprint.sql
@@ -0,0 +1,168 @@
 -- Migration 085: Add IP and fingerprint columns for preflight reporting
 -- These columns were missing from migration 084
 -- ===================================================================
 -- PART 1: Add IP address columns to worker_registry
 -- ===================================================================
 -- IP address detected during curl/axios preflight
 ALTER TABLE worker_registry
 ADD COLUMN IF NOT EXISTS curl_ip VARCHAR(45);
 -- IP address detected during http/Puppeteer preflight
 ALTER TABLE worker_registry
 ADD COLUMN IF NOT EXISTS http_ip VARCHAR(45);
 -- ===================================================================
 -- PART 2: Add fingerprint data column
 -- ===================================================================
 -- Browser fingerprint data captured during Puppeteer preflight
 ALTER TABLE worker_registry
 ADD COLUMN IF NOT EXISTS fingerprint_data JSONB;
 -- ===================================================================
 -- PART 3: Add combined preflight status/timestamp for convenience
 -- ===================================================================
 -- Overall preflight status (computed from both transports)
 -- Values: 'pending', 'passed', 'partial', 'failed'
 --   - 'pending': neither transport tested
 --   - 'passed': both transports passed (or http passed for browser-only)
 --   - 'partial': at least one passed
 --   - 'failed': no transport passed
 ALTER TABLE worker_registry
 ADD COLUMN IF NOT EXISTS preflight_status VARCHAR(20) DEFAULT 'pending';
 -- Most recent preflight completion timestamp
 ALTER TABLE worker_registry
 ADD COLUMN IF NOT EXISTS preflight_at TIMESTAMPTZ;
 -- ===================================================================
 -- PART 4: Update function to set preflight status
 -- ===================================================================
 CREATE OR REPLACE FUNCTION update_worker_preflight(
  p_worker_id VARCHAR(100),
  p_transport VARCHAR(10),  -- 'curl' or 'http'
  p_status VARCHAR(20),     -- 'passed', 'failed', 'skipped'
  p_ip VARCHAR(45) DEFAULT NULL,
  p_response_ms INTEGER DEFAULT NULL,
  p_error TEXT DEFAULT NULL,
  p_fingerprint JSONB DEFAULT NULL
 ) RETURNS VOID AS $$
 DECLARE
  v_curl_status VARCHAR(20);
  v_http_status VARCHAR(20);
  v_overall_status VARCHAR(20);
 BEGIN
  IF p_transport = 'curl' THEN
    UPDATE worker_registry
    SET
      preflight_curl_status = p_status,
      preflight_curl_at = NOW(),
      preflight_curl_ms = p_response_ms,
      preflight_curl_error = p_error,
      curl_ip = p_ip,
      updated_at = NOW()
    WHERE worker_id = p_worker_id;
  ELSIF p_transport = 'http' THEN
    UPDATE worker_registry
    SET
      preflight_http_status = p_status,
      preflight_http_at = NOW(),
      preflight_http_ms = p_response_ms,
      preflight_http_error = p_error,
      http_ip = p_ip,
      fingerprint_data = COALESCE(p_fingerprint, fingerprint_data),
      updated_at = NOW()
    WHERE worker_id = p_worker_id;
  END IF;
  -- Update overall preflight status
  SELECT preflight_curl_status, preflight_http_status
  INTO v_curl_status, v_http_status
  FROM worker_registry
  WHERE worker_id = p_worker_id;
  -- Compute overall status
  IF v_curl_status = 'passed' AND v_http_status = 'passed' THEN
    v_overall_status := 'passed';
  ELSIF v_curl_status = 'passed' OR v_http_status = 'passed' THEN
    v_overall_status := 'partial';
  ELSIF v_curl_status = 'failed' OR v_http_status = 'failed' THEN
    v_overall_status := 'failed';
  ELSE
    v_overall_status := 'pending';
  END IF;
  UPDATE worker_registry
  SET
    preflight_status = v_overall_status,
    preflight_at = NOW()
  WHERE worker_id = p_worker_id;
 END;
 $$ LANGUAGE plpgsql;
 -- ===================================================================
 -- PART 5: Update v_active_workers view
 -- ===================================================================
 DROP VIEW IF EXISTS v_active_workers;
 CREATE VIEW v_active_workers AS
 SELECT
  wr.id,
  wr.worker_id,
  wr.friendly_name,
  wr.role,
  wr.status,
  wr.pod_name,
  wr.hostname,
  wr.started_at,
  wr.last_heartbeat_at,
  wr.last_task_at,
  wr.tasks_completed,
  wr.tasks_failed,
  wr.current_task_id,
  -- IP addresses from preflights
  wr.curl_ip,
  wr.http_ip,
  -- Combined preflight status
  wr.preflight_status,
  wr.preflight_at,
  -- Detailed preflight status per transport
  wr.preflight_curl_status,
  wr.preflight_http_status,
  wr.preflight_curl_at,
  wr.preflight_http_at,
  wr.preflight_curl_error,
  wr.preflight_http_error,
  wr.preflight_curl_ms,
  wr.preflight_http_ms,
  -- Fingerprint data
  wr.fingerprint_data,
  -- Computed fields
  EXTRACT(EPOCH FROM (NOW() - wr.last_heartbeat_at)) as seconds_since_heartbeat,
  CASE
    WHEN wr.status = 'offline' THEN 'offline'
    WHEN wr.last_heartbeat_at < NOW() - INTERVAL '2 minutes' THEN 'stale'
    WHEN wr.current_task_id IS NOT NULL THEN 'busy'
    ELSE 'ready'
  END as health_status,
  -- Capability flags (can this worker handle curl/http tasks?)
  (wr.preflight_curl_status = 'passed') as can_curl,
  (wr.preflight_http_status = 'passed') as can_http
 FROM worker_registry wr
 WHERE wr.status != 'terminated'
 ORDER BY wr.status = 'active' DESC, wr.last_heartbeat_at DESC;
 -- ===================================================================
 -- Comments
 -- ===================================================================
 COMMENT ON COLUMN worker_registry.curl_ip IS 'IP address detected during curl/axios preflight';
 COMMENT ON COLUMN worker_registry.http_ip IS 'IP address detected during Puppeteer preflight';
 COMMENT ON COLUMN worker_registry.fingerprint_data IS 'Browser fingerprint captured during Puppeteer preflight';
 COMMENT ON COLUMN worker_registry.preflight_status IS 'Overall preflight status: pending, passed, partial, failed';
 COMMENT ON COLUMN worker_registry.preflight_at IS 'Most recent preflight completion timestamp';
--- a/backend/src/_deprecated/DONT_USE.md
+++ b/backend/src/_deprecated/DONT_USE.md
@@ -0,0 +1,46 @@
 # DEPRECATED CODE - DO NOT USE
 **These directories contain OLD, ABANDONED code.**
 ## What's Here
 | Directory | What It Was | Why Deprecated |
 |-----------|-------------|----------------|
 | `hydration/` | Old pipeline for processing crawl data | Replaced by `src/tasks/handlers/` |
 | `scraper-v2/` | Old Puppeteer-based scraper engine | Replaced by curl-based `src/platforms/dutchie/client.ts` |
 | `canonical-hydration/` | Intermediate step toward canonical schema | Merged into task handlers |
 ## What to Use Instead
 | Old (DONT USE) | New (USE THIS) |
 |----------------|----------------|
 | `hydration/normalizers/dutchie.ts` | `src/tasks/handlers/product-refresh.ts` |
 | `hydration/producer.ts` | `src/tasks/handlers/payload-fetch.ts` |
 | `scraper-v2/engine.ts` | `src/platforms/dutchie/client.ts` |
 | `scraper-v2/scheduler.ts` | `src/services/task-scheduler.ts` |
 ## Why Keep This Code?
 - Historical reference only
 - Some patterns may be useful for debugging
 - Will be deleted once confirmed not needed
 ## Claude Instructions
 **IF YOU ARE CLAUDE:**
 1. NEVER import from `src/_deprecated/`
 2. NEVER reference these files as examples
 3. NEVER try to "fix" or "update" code in here
 4. If you see imports from these directories, suggest replacing them
 **Correct imports:**
 ```typescript
 // GOOD
 import { executeGraphQL } from '../platforms/dutchie/client';
 import { pool } from '../db/pool';
 // BAD - DO NOT USE
 import { something } from '../_deprecated/hydration/...';
 import { something } from '../_deprecated/scraper-v2/...';
 ```
--- a/backend/src/_deprecated/canonical-hydration/RUNBOOK.md
+++ b/backend/src/_deprecated/canonical-hydration/RUNBOOK.md
--- a/backend/src/_deprecated/canonical-hydration/cli/backfill.ts
+++ b/backend/src/_deprecated/canonical-hydration/cli/backfill.ts
--- a/backend/src/_deprecated/canonical-hydration/cli/incremental.ts
+++ b/backend/src/_deprecated/canonical-hydration/cli/incremental.ts
--- a/backend/src/_deprecated/canonical-hydration/cli/products-only.ts
+++ b/backend/src/_deprecated/canonical-hydration/cli/products-only.ts
--- a/backend/src/_deprecated/canonical-hydration/crawl-run-recorder.ts
+++ b/backend/src/_deprecated/canonical-hydration/crawl-run-recorder.ts
--- a/backend/src/_deprecated/canonical-hydration/hydration-service.ts
+++ b/backend/src/_deprecated/canonical-hydration/hydration-service.ts
--- a/backend/src/_deprecated/canonical-hydration/index.ts
+++ b/backend/src/_deprecated/canonical-hydration/index.ts
--- a/backend/src/_deprecated/canonical-hydration/snapshot-writer.ts
+++ b/backend/src/_deprecated/canonical-hydration/snapshot-writer.ts
--- a/backend/src/_deprecated/canonical-hydration/store-product-normalizer.ts
+++ b/backend/src/_deprecated/canonical-hydration/store-product-normalizer.ts
--- a/backend/src/_deprecated/canonical-hydration/types.ts
+++ b/backend/src/_deprecated/canonical-hydration/types.ts
--- a/backend/src/_deprecated/routes/crawler-sandbox.ts
+++ b/backend/src/_deprecated/routes/crawler-sandbox.ts
--- a/backend/src/_deprecated/scraper-v2/README.md
+++ b/backend/src/_deprecated/scraper-v2/README.md
--- a/backend/src/_deprecated/scraper-v2/canonical-pipeline.ts
+++ b/backend/src/_deprecated/scraper-v2/canonical-pipeline.ts
--- a/backend/src/_deprecated/scraper-v2/downloader.ts
+++ b/backend/src/_deprecated/scraper-v2/downloader.ts
--- a/backend/src/_deprecated/scraper-v2/engine.ts
+++ b/backend/src/_deprecated/scraper-v2/engine.ts
--- a/backend/src/_deprecated/scraper-v2/index.ts
+++ b/backend/src/_deprecated/scraper-v2/index.ts
--- a/backend/src/_deprecated/scraper-v2/middlewares.ts
+++ b/backend/src/_deprecated/scraper-v2/middlewares.ts
--- a/backend/src/_deprecated/scraper-v2/navigation.ts
+++ b/backend/src/_deprecated/scraper-v2/navigation.ts
--- a/backend/src/_deprecated/scraper-v2/pipelines.ts
+++ b/backend/src/_deprecated/scraper-v2/pipelines.ts
--- a/backend/src/_deprecated/scraper-v2/scheduler.ts
+++ b/backend/src/_deprecated/scraper-v2/scheduler.ts
--- a/backend/src/_deprecated/scraper-v2/types.ts
+++ b/backend/src/_deprecated/scraper-v2/types.ts
--- a/backend/src/_deprecated/scripts/queue-dispensaries.ts
+++ b/backend/src/_deprecated/scripts/queue-dispensaries.ts
--- a/backend/src/_deprecated/scripts/run-backfill.ts
+++ b/backend/src/_deprecated/scripts/run-backfill.ts
--- a/backend/src/_deprecated/scripts/run-hydration.ts
+++ b/backend/src/_deprecated/scripts/run-hydration.ts
--- a/backend/src/_deprecated/scripts/test-crawl-to-canonical.ts
+++ b/backend/src/_deprecated/scripts/test-crawl-to-canonical.ts
--- a/backend/src/_deprecated/services/DiscoveryGeoService.ts
+++ b/backend/src/_deprecated/services/DiscoveryGeoService.ts
--- a/backend/src/_deprecated/services/GeoValidationService.ts
+++ b/backend/src/_deprecated/services/GeoValidationService.ts
--- a/backend/src/_deprecated/services/availability.ts
+++ b/backend/src/_deprecated/services/availability.ts
--- a/backend/src/_deprecated/services/crawler-jobs.ts
+++ b/backend/src/_deprecated/services/crawler-jobs.ts
--- a/backend/src/_deprecated/services/crawler-logger.ts
+++ b/backend/src/_deprecated/services/crawler-logger.ts
--- a/backend/src/_deprecated/services/crawler-profiles.ts
+++ b/backend/src/_deprecated/services/crawler-profiles.ts
--- a/backend/src/_deprecated/services/intelligence-detector.ts
+++ b/backend/src/_deprecated/services/intelligence-detector.ts
--- a/backend/src/_deprecated/services/menu-provider-detector.ts
+++ b/backend/src/_deprecated/services/menu-provider-detector.ts
--- a/backend/src/_deprecated/services/scraper-debug.ts
+++ b/backend/src/_deprecated/services/scraper-debug.ts
--- a/backend/src/_deprecated/services/scraper.ts
+++ b/backend/src/_deprecated/services/scraper.ts
--- a/backend/src/_deprecated/system/routes/index.ts
+++ b/backend/src/_deprecated/system/routes/index.ts
@@ -0,0 +1,584 @@
 /**
 * System API Routes
 *
 * Provides REST API endpoints for system monitoring and control:
 * - /api/system/sync/* - Sync orchestrator
 * - /api/system/dlq/* - Dead-letter queue
 * - /api/system/integrity/* - Integrity checks
 * - /api/system/fix/* - Auto-fix routines
 * - /api/system/alerts/* - System alerts
 * - /metrics - Prometheus metrics
 *
 * Phase 5: Full Production Sync + Monitoring
 */
 import { Router, Request, Response } from 'express';
 import { Pool } from 'pg';
 import {
  SyncOrchestrator,
  MetricsService,
  DLQService,
  AlertService,
  IntegrityService,
  AutoFixService,
 } from '../services';
 export function createSystemRouter(pool: Pool): Router {
  const router = Router();
  // Initialize services
  const metrics = new MetricsService(pool);
  const dlq = new DLQService(pool);
  const alerts = new AlertService(pool);
  const integrity = new IntegrityService(pool, alerts);
  const autoFix = new AutoFixService(pool, alerts);
  const orchestrator = new SyncOrchestrator(pool, metrics, dlq, alerts);
  // ============================================================
  // SYNC ORCHESTRATOR ENDPOINTS
  // ============================================================
  /**
   * GET /api/system/sync/status
   * Get current sync status
   */
  router.get('/sync/status', async (_req: Request, res: Response) => {
    try {
      const status = await orchestrator.getStatus();
      res.json(status);
    } catch (error) {
      console.error('[System] Sync status error:', error);
      res.status(500).json({ error: 'Failed to get sync status' });
    }
  });
  /**
   * POST /api/system/sync/run
   * Trigger a sync run
   */
  router.post('/sync/run', async (req: Request, res: Response) => {
    try {
      const triggeredBy = req.body.triggeredBy || 'api';
      const result = await orchestrator.runSync();
      res.json({
        success: true,
        triggeredBy,
        metrics: result,
      });
    } catch (error) {
      console.error('[System] Sync run error:', error);
      res.status(500).json({
        success: false,
        error: error instanceof Error ? error.message : 'Sync run failed',
      });
    }
  });
  /**
   * GET /api/system/sync/queue-depth
   * Get queue depth information
   */
  router.get('/sync/queue-depth', async (_req: Request, res: Response) => {
    try {
      const depth = await orchestrator.getQueueDepth();
      res.json(depth);
    } catch (error) {
      console.error('[System] Queue depth error:', error);
      res.status(500).json({ error: 'Failed to get queue depth' });
    }
  });
  /**
   * GET /api/system/sync/health
   * Get sync health status
   */
  router.get('/sync/health', async (_req: Request, res: Response) => {
    try {
      const health = await orchestrator.getHealth();
      res.status(health.healthy ? 200 : 503).json(health);
    } catch (error) {
      console.error('[System] Health check error:', error);
      res.status(500).json({ healthy: false, error: 'Health check failed' });
    }
  });
  /**
   * POST /api/system/sync/pause
   * Pause the orchestrator
   */
  router.post('/sync/pause', async (req: Request, res: Response) => {
    try {
      const reason = req.body.reason || 'Manual pause';
      await orchestrator.pause(reason);
      res.json({ success: true, message: 'Orchestrator paused' });
    } catch (error) {
      console.error('[System] Pause error:', error);
      res.status(500).json({ error: 'Failed to pause orchestrator' });
    }
  });
  /**
   * POST /api/system/sync/resume
   * Resume the orchestrator
   */
  router.post('/sync/resume', async (_req: Request, res: Response) => {
    try {
      await orchestrator.resume();
      res.json({ success: true, message: 'Orchestrator resumed' });
    } catch (error) {
      console.error('[System] Resume error:', error);
      res.status(500).json({ error: 'Failed to resume orchestrator' });
    }
  });
  // ============================================================
  // DLQ ENDPOINTS
  // ============================================================
  /**
   * GET /api/system/dlq
   * List DLQ payloads
   */
  router.get('/dlq', async (req: Request, res: Response) => {
    try {
      const options = {
        status: req.query.status as string,
        errorType: req.query.errorType as string,
        dispensaryId: req.query.dispensaryId ? parseInt(req.query.dispensaryId as string) : undefined,
        limit: req.query.limit ? parseInt(req.query.limit as string) : 50,
        offset: req.query.offset ? parseInt(req.query.offset as string) : 0,
      };
      const result = await dlq.listPayloads(options);
      res.json(result);
    } catch (error) {
      console.error('[System] DLQ list error:', error);
      res.status(500).json({ error: 'Failed to list DLQ payloads' });
    }
  });
  /**
   * GET /api/system/dlq/stats
   * Get DLQ statistics
   */
  router.get('/dlq/stats', async (_req: Request, res: Response) => {
    try {
      const stats = await dlq.getStats();
      res.json(stats);
    } catch (error) {
      console.error('[System] DLQ stats error:', error);
      res.status(500).json({ error: 'Failed to get DLQ stats' });
    }
  });
  /**
   * GET /api/system/dlq/summary
   * Get DLQ summary by error type
   */
  router.get('/dlq/summary', async (_req: Request, res: Response) => {
    try {
      const summary = await dlq.getSummary();
      res.json(summary);
    } catch (error) {
      console.error('[System] DLQ summary error:', error);
      res.status(500).json({ error: 'Failed to get DLQ summary' });
    }
  });
  /**
   * GET /api/system/dlq/:id
   * Get a specific DLQ payload
   */
  router.get('/dlq/:id', async (req: Request, res: Response) => {
    try {
      const payload = await dlq.getPayload(req.params.id);
      if (!payload) {
        return res.status(404).json({ error: 'Payload not found' });
      }
      res.json(payload);
    } catch (error) {
      console.error('[System] DLQ get error:', error);
      res.status(500).json({ error: 'Failed to get DLQ payload' });
    }
  });
  /**
   * POST /api/system/dlq/:id/retry
   * Retry a DLQ payload
   */
  router.post('/dlq/:id/retry', async (req: Request, res: Response) => {
    try {
      const result = await dlq.retryPayload(req.params.id);
      if (result.success) {
        res.json(result);
      } else {
        res.status(400).json(result);
      }
    } catch (error) {
      console.error('[System] DLQ retry error:', error);
      res.status(500).json({ error: 'Failed to retry payload' });
    }
  });
  /**
   * POST /api/system/dlq/:id/abandon
   * Abandon a DLQ payload
   */
  router.post('/dlq/:id/abandon', async (req: Request, res: Response) => {
    try {
      const reason = req.body.reason || 'Manually abandoned';
      const abandonedBy = req.body.abandonedBy || 'api';
      const success = await dlq.abandonPayload(req.params.id, reason, abandonedBy);
      res.json({ success });
    } catch (error) {
      console.error('[System] DLQ abandon error:', error);
      res.status(500).json({ error: 'Failed to abandon payload' });
    }
  });
  /**
   * POST /api/system/dlq/bulk-retry
   * Bulk retry payloads by error type
   */
  router.post('/dlq/bulk-retry', async (req: Request, res: Response) => {
    try {
      const { errorType } = req.body;
      if (!errorType) {
        return res.status(400).json({ error: 'errorType is required' });
      }
      const result = await dlq.bulkRetryByErrorType(errorType);
      res.json(result);
    } catch (error) {
      console.error('[System] DLQ bulk retry error:', error);
      res.status(500).json({ error: 'Failed to bulk retry' });
    }
  });
  // ============================================================
  // INTEGRITY CHECK ENDPOINTS
  // ============================================================
  /**
   * POST /api/system/integrity/run
   * Run all integrity checks
   */
  router.post('/integrity/run', async (req: Request, res: Response) => {
    try {
      const triggeredBy = req.body.triggeredBy || 'api';
      const result = await integrity.runAllChecks(triggeredBy);
      res.json(result);
    } catch (error) {
      console.error('[System] Integrity run error:', error);
      res.status(500).json({ error: 'Failed to run integrity checks' });
    }
  });
  /**
   * GET /api/system/integrity/runs
   * Get recent integrity check runs
   */
  router.get('/integrity/runs', async (req: Request, res: Response) => {
    try {
      const limit = req.query.limit ? parseInt(req.query.limit as string) : 10;
      const runs = await integrity.getRecentRuns(limit);
      res.json(runs);
    } catch (error) {
      console.error('[System] Integrity runs error:', error);
      res.status(500).json({ error: 'Failed to get integrity runs' });
    }
  });
  /**
   * GET /api/system/integrity/runs/:runId
   * Get results for a specific integrity run
   */
  router.get('/integrity/runs/:runId', async (req: Request, res: Response) => {
    try {
      const results = await integrity.getRunResults(req.params.runId);
      res.json(results);
    } catch (error) {
      console.error('[System] Integrity run results error:', error);
      res.status(500).json({ error: 'Failed to get run results' });
    }
  });
  // ============================================================
  // AUTO-FIX ENDPOINTS
  // ============================================================
  /**
   * GET /api/system/fix/routines
   * Get available fix routines
   */
  router.get('/fix/routines', (_req: Request, res: Response) => {
    try {
      const routines = autoFix.getAvailableRoutines();
      res.json(routines);
    } catch (error) {
      console.error('[System] Get routines error:', error);
      res.status(500).json({ error: 'Failed to get routines' });
    }
  });
  /**
   * POST /api/system/fix/:routine
   * Run a fix routine
   */
  router.post('/fix/:routine', async (req: Request, res: Response) => {
    try {
      const routineName = req.params.routine;
      const dryRun = req.body.dryRun === true;
      const triggeredBy = req.body.triggeredBy || 'api';
      const result = await autoFix.runRoutine(routineName as any, triggeredBy, { dryRun });
      res.json(result);
    } catch (error) {
      console.error('[System] Fix routine error:', error);
      res.status(500).json({ error: 'Failed to run fix routine' });
    }
  });
  /**
   * GET /api/system/fix/runs
   * Get recent fix runs
   */
  router.get('/fix/runs', async (req: Request, res: Response) => {
    try {
      const limit = req.query.limit ? parseInt(req.query.limit as string) : 20;
      const runs = await autoFix.getRecentRuns(limit);
      res.json(runs);
    } catch (error) {
      console.error('[System] Fix runs error:', error);
      res.status(500).json({ error: 'Failed to get fix runs' });
    }
  });
  // ============================================================
  // ALERTS ENDPOINTS
  // ============================================================
  /**
   * GET /api/system/alerts
   * List alerts
   */
  router.get('/alerts', async (req: Request, res: Response) => {
    try {
      const options = {
        status: req.query.status as any,
        severity: req.query.severity as any,
        type: req.query.type as string,
        limit: req.query.limit ? parseInt(req.query.limit as string) : 50,
        offset: req.query.offset ? parseInt(req.query.offset as string) : 0,
      };
      const result = await alerts.listAlerts(options);
      res.json(result);
    } catch (error) {
      console.error('[System] Alerts list error:', error);
      res.status(500).json({ error: 'Failed to list alerts' });
    }
  });
  /**
   * GET /api/system/alerts/active
   * Get active alerts
   */
  router.get('/alerts/active', async (_req: Request, res: Response) => {
    try {
      const activeAlerts = await alerts.getActiveAlerts();
      res.json(activeAlerts);
    } catch (error) {
      console.error('[System] Active alerts error:', error);
      res.status(500).json({ error: 'Failed to get active alerts' });
    }
  });
  /**
   * GET /api/system/alerts/summary
   * Get alert summary
   */
  router.get('/alerts/summary', async (_req: Request, res: Response) => {
    try {
      const summary = await alerts.getSummary();
      res.json(summary);
    } catch (error) {
      console.error('[System] Alerts summary error:', error);
      res.status(500).json({ error: 'Failed to get alerts summary' });
    }
  });
  /**
   * POST /api/system/alerts/:id/acknowledge
   * Acknowledge an alert
   */
  router.post('/alerts/:id/acknowledge', async (req: Request, res: Response) => {
    try {
      const alertId = parseInt(req.params.id);
      const acknowledgedBy = req.body.acknowledgedBy || 'api';
      const success = await alerts.acknowledgeAlert(alertId, acknowledgedBy);
      res.json({ success });
    } catch (error) {
      console.error('[System] Acknowledge alert error:', error);
      res.status(500).json({ error: 'Failed to acknowledge alert' });
    }
  });
  /**
   * POST /api/system/alerts/:id/resolve
   * Resolve an alert
   */
  router.post('/alerts/:id/resolve', async (req: Request, res: Response) => {
    try {
      const alertId = parseInt(req.params.id);
      const resolvedBy = req.body.resolvedBy || 'api';
      const success = await alerts.resolveAlert(alertId, resolvedBy);
      res.json({ success });
    } catch (error) {
      console.error('[System] Resolve alert error:', error);
      res.status(500).json({ error: 'Failed to resolve alert' });
    }
  });
  /**
   * POST /api/system/alerts/bulk-acknowledge
   * Bulk acknowledge alerts
   */
  router.post('/alerts/bulk-acknowledge', async (req: Request, res: Response) => {
    try {
      const { ids, acknowledgedBy } = req.body;
      if (!ids || !Array.isArray(ids)) {
        return res.status(400).json({ error: 'ids array is required' });
      }
      const count = await alerts.bulkAcknowledge(ids, acknowledgedBy || 'api');
      res.json({ acknowledged: count });
    } catch (error) {
      console.error('[System] Bulk acknowledge error:', error);
      res.status(500).json({ error: 'Failed to bulk acknowledge' });
    }
  });
  // ============================================================
  // METRICS ENDPOINTS
  // ============================================================
  /**
   * GET /api/system/metrics
   * Get all current metrics
   */
  router.get('/metrics', async (_req: Request, res: Response) => {
    try {
      const allMetrics = await metrics.getAllMetrics();
      res.json(allMetrics);
    } catch (error) {
      console.error('[System] Metrics error:', error);
      res.status(500).json({ error: 'Failed to get metrics' });
    }
  });
  /**
   * GET /api/system/metrics/:name
   * Get a specific metric
   */
  router.get('/metrics/:name', async (req: Request, res: Response) => {
    try {
      const metric = await metrics.getMetric(req.params.name);
      if (!metric) {
        return res.status(404).json({ error: 'Metric not found' });
      }
      res.json(metric);
    } catch (error) {
      console.error('[System] Metric error:', error);
      res.status(500).json({ error: 'Failed to get metric' });
    }
  });
  /**
   * GET /api/system/metrics/:name/history
   * Get metric time series
   */
  router.get('/metrics/:name/history', async (req: Request, res: Response) => {
    try {
      const hours = req.query.hours ? parseInt(req.query.hours as string) : 24;
      const history = await metrics.getMetricHistory(req.params.name, hours);
      res.json(history);
    } catch (error) {
      console.error('[System] Metric history error:', error);
      res.status(500).json({ error: 'Failed to get metric history' });
    }
  });
  /**
   * GET /api/system/errors
   * Get error summary
   */
  router.get('/errors', async (_req: Request, res: Response) => {
    try {
      const summary = await metrics.getErrorSummary();
      res.json(summary);
    } catch (error) {
      console.error('[System] Error summary error:', error);
      res.status(500).json({ error: 'Failed to get error summary' });
    }
  });
  /**
   * GET /api/system/errors/recent
   * Get recent errors
   */
  router.get('/errors/recent', async (req: Request, res: Response) => {
    try {
      const limit = req.query.limit ? parseInt(req.query.limit as string) : 50;
      const errorType = req.query.type as string;
      const errors = await metrics.getRecentErrors(limit, errorType);
      res.json(errors);
    } catch (error) {
      console.error('[System] Recent errors error:', error);
      res.status(500).json({ error: 'Failed to get recent errors' });
    }
  });
  /**
   * POST /api/system/errors/acknowledge
   * Acknowledge errors
   */
  router.post('/errors/acknowledge', async (req: Request, res: Response) => {
    try {
      const { ids, acknowledgedBy } = req.body;
      if (!ids || !Array.isArray(ids)) {
        return res.status(400).json({ error: 'ids array is required' });
      }
      const count = await metrics.acknowledgeErrors(ids, acknowledgedBy || 'api');
      res.json({ acknowledged: count });
    } catch (error) {
      console.error('[System] Acknowledge errors error:', error);
      res.status(500).json({ error: 'Failed to acknowledge errors' });
    }
  });
  return router;
 }
 /**
 * Create Prometheus metrics endpoint (standalone)
 */
 export function createPrometheusRouter(pool: Pool): Router {
  const router = Router();
  const metrics = new MetricsService(pool);
  /**
   * GET /metrics
   * Prometheus-compatible metrics endpoint
   */
  router.get('/', async (_req: Request, res: Response) => {
    try {
      const prometheusOutput = await metrics.getPrometheusMetrics();
      res.set('Content-Type', 'text/plain; version=0.0.4');
      res.send(prometheusOutput);
    } catch (error) {
      console.error('[Prometheus] Metrics error:', error);
      res.status(500).send('# Error generating metrics');
    }
  });
  return router;
 }
--- a/backend/src/_deprecated/system/services/sync-orchestrator.ts
+++ b/backend/src/_deprecated/system/services/sync-orchestrator.ts
--- a/backend/src/_deprecated/utils/HomepageValidator.ts
+++ b/backend/src/_deprecated/utils/HomepageValidator.ts
--- a/backend/src/_deprecated/utils/age-gate-playwright.ts
+++ b/backend/src/_deprecated/utils/age-gate-playwright.ts
--- a/backend/src/_deprecated/utils/stealthBrowser.ts
+++ b/backend/src/_deprecated/utils/stealthBrowser.ts
--- a/backend/src/index.ts
+++ b/backend/src/index.ts
@@ -109,7 +109,7 @@ import scraperMonitorRoutes from './routes/scraper-monitor';
 import apiTokensRoutes from './routes/api-tokens';
 import apiPermissionsRoutes from './routes/api-permissions';
 import parallelScrapeRoutes from './routes/parallel-scrape';
-import crawlerSandboxRoutes from './routes/crawler-sandbox';
+// crawler-sandbox moved to _deprecated
 import versionRoutes from './routes/version';
 import deployStatusRoutes from './routes/deploy-status';
 import publicApiRoutes from './routes/public-api';
@@ -187,7 +187,7 @@ app.use('/api/scraper-monitor', scraperMonitorRoutes);
 app.use('/api/api-tokens', apiTokensRoutes);
 app.use('/api/api-permissions', apiPermissionsRoutes);
 app.use('/api/parallel-scrape', parallelScrapeRoutes);
-app.use('/api/crawler-sandbox', crawlerSandboxRoutes);
+// crawler-sandbox moved to _deprecated
 app.use('/api/version', versionRoutes);
 app.use('/api/admin/deploy-status', deployStatusRoutes);
 console.log('[DeployStatus] Routes registered at /api/admin/deploy-status');
--- a/backend/src/routes/proxies.ts
+++ b/backend/src/routes/proxies.ts
@@ -278,7 +278,7 @@ router.post('/update-locations', requireRole('superadmin', 'admin'), async (req,
    // Run in background
    updateAllProxyLocations().catch(err => {
-      console.error('❌ Location update failed:', err);
+      console.error('Location update failed:', err);
    });
    res.json({ message: 'Location update job started' });
--- a/backend/src/routes/worker-registry.ts
+++ b/backend/src/routes/worker-registry.ts
@@ -23,6 +23,8 @@
 import { Router, Request, Response } from 'express';
 import { pool } from '../db/pool';
 import os from 'os';
 import { runPuppeteerPreflightWithRetry } from '../services/puppeteer-preflight';
 import { CrawlRotator } from '../services/crawl-rotator';
 const router = Router();
@@ -355,6 +357,12 @@ router.get('/workers', async (req: Request, res: Response) => {
        -- Decommission fields
        COALESCE(decommission_requested, false) as decommission_requested,
        decommission_reason,
        -- Preflight fields (dual-transport verification)
        curl_ip,
        http_ip,
        preflight_status,
        preflight_at,
        fingerprint_data,
        -- Full metadata for resources
        metadata,
        EXTRACT(EPOCH FROM (NOW() - last_heartbeat_at)) as seconds_since_heartbeat,
@@ -858,4 +866,58 @@ router.get('/pods', async (_req: Request, res: Response) => {
  }
 });
 // ============================================================
 // PREFLIGHT SMOKE TEST
 // ============================================================
 /**
 * POST /api/worker-registry/preflight-test
 * Run an HTTP (Puppeteer) preflight test and return results
 *
 * This is a smoke test endpoint to verify the preflight system works.
 * Returns IP, fingerprint data, bot detection results, and products fetched.
 */
 router.post('/preflight-test', async (_req: Request, res: Response) => {
  try {
    console.log('[PreflightTest] Starting HTTP preflight smoke test...');
    // Create a temporary CrawlRotator for the test
    const crawlRotator = new CrawlRotator();
    // Run the Puppeteer preflight (with 1 retry)
    const startTime = Date.now();
    const result = await runPuppeteerPreflightWithRetry(crawlRotator, 1);
    const duration = Date.now() - startTime;
    console.log(`[PreflightTest] Completed in ${duration}ms - passed: ${result.passed}`);
    res.json({
      success: true,
      test: 'http_preflight',
      duration_ms: duration,
      result: {
        passed: result.passed,
        proxy_ip: result.proxyIp,
        fingerprint: result.fingerprint,
        bot_detection: result.botDetection,
        products_returned: result.productsReturned,
        browser_user_agent: result.browserUserAgent,
        ip_verified: result.ipVerified,
        proxy_available: result.proxyAvailable,
        proxy_connected: result.proxyConnected,
        antidetect_ready: result.antidetectReady,
        response_time_ms: result.responseTimeMs,
        error: result.error
      }
    });
  } catch (error: any) {
    console.error('[PreflightTest] Error:', error.message);
    res.status(500).json({
      success: false,
      test: 'http_preflight',
      error: error.message
    });
  }
 });
 export default router;
--- a/backend/src/services/crawl-rotator.ts
+++ b/backend/src/services/crawl-rotator.ts
@@ -683,6 +683,118 @@ export class CrawlRotator {
    const current = this.proxy.getCurrent();
    return current?.timezone;
  }
  /**
   * Preflight check - verifies proxy and anti-detect are working
   * MUST be called before any task execution to ensure anonymity.
   *
   * Tests:
   * 1. Proxy available - a proxy must be loaded and active
   * 2. Proxy connectivity - makes HTTP request through proxy to verify connection
   * 3. Anti-detect headers - verifies fingerprint is set with required headers
   *
   * @returns Promise<PreflightResult> with pass/fail status and details
   */
  async preflight(): Promise<PreflightResult> {
    const result: PreflightResult = {
      passed: false,
      proxyAvailable: false,
      proxyConnected: false,
      antidetectReady: false,
      proxyIp: null,
      fingerprint: null,
      error: null,
      responseTimeMs: null,
    };
    // Step 1: Check proxy is available
    const currentProxy = this.proxy.getCurrent();
    if (!currentProxy) {
      result.error = 'No proxy available';
      console.log('[Preflight] FAILED - No proxy available');
      return result;
    }
    result.proxyAvailable = true;
    result.proxyIp = currentProxy.host;
    // Step 2: Check fingerprint/anti-detect is ready
    const fingerprint = this.userAgent.getCurrent();
    if (!fingerprint || !fingerprint.userAgent) {
      result.error = 'Anti-detect fingerprint not initialized';
      console.log('[Preflight] FAILED - No fingerprint');
      return result;
    }
    result.antidetectReady = true;
    result.fingerprint = {
      userAgent: fingerprint.userAgent,
      browserName: fingerprint.browserName,
      deviceCategory: fingerprint.deviceCategory,
    };
    // Step 3: Test proxy connectivity with an actual HTTP request
    // Use httpbin.org/ip to verify request goes through proxy
    const proxyUrl = this.proxy.getProxyUrl(currentProxy);
    const testUrl = 'https://httpbin.org/ip';
    try {
      const { default: axios } = await import('axios');
      const { HttpsProxyAgent } = await import('https-proxy-agent');
      const agent = new HttpsProxyAgent(proxyUrl);
      const startTime = Date.now();
      const response = await axios.get(testUrl, {
        httpsAgent: agent,
        timeout: 15000, // 15 second timeout
        headers: {
          'User-Agent': fingerprint.userAgent,
          'Accept-Language': fingerprint.acceptLanguage,
          ...(fingerprint.secChUa && { 'sec-ch-ua': fingerprint.secChUa }),
          ...(fingerprint.secChUaPlatform && { 'sec-ch-ua-platform': fingerprint.secChUaPlatform }),
          ...(fingerprint.secChUaMobile && { 'sec-ch-ua-mobile': fingerprint.secChUaMobile }),
        },
      });
      result.responseTimeMs = Date.now() - startTime;
      result.proxyConnected = true;
      result.passed = true;
      // Mark success on proxy stats
      await this.proxy.markSuccess(currentProxy.id, result.responseTimeMs);
      console.log(`[Preflight] PASSED - Proxy ${currentProxy.host} connected (${result.responseTimeMs}ms), UA: ${fingerprint.browserName}/${fingerprint.deviceCategory}`);
    } catch (err: any) {
      result.error = `Proxy connection failed: ${err.message || 'Unknown error'}`;
      console.log(`[Preflight] FAILED - Proxy connection error: ${err.message}`);
      // Mark failure on proxy stats
      await this.proxy.markFailed(currentProxy.id, err.message);
    }
    return result;
  }
 }
 /**
 * Result from preflight check
 */
 export interface PreflightResult {
  /** Overall pass/fail */
  passed: boolean;
  /** Step 1: Is a proxy loaded? */
  proxyAvailable: boolean;
  /** Step 2: Did HTTP request through proxy succeed? */
  proxyConnected: boolean;
  /** Step 3: Is fingerprint/anti-detect ready? */
  antidetectReady: boolean;
  /** Current proxy IP */
  proxyIp: string | null;
  /** Fingerprint summary */
  fingerprint: { userAgent: string; browserName: string; deviceCategory: string } | null;
  /** Error message if failed */
  error: string | null;
  /** Proxy response time in ms */
  responseTimeMs: number | null;
 }
 // ============================================================
--- a/backend/src/services/curl-preflight.ts
+++ b/backend/src/services/curl-preflight.ts
@@ -0,0 +1,100 @@
 /**
 * Curl Preflight - Verify curl/axios transport works through proxy
 *
 * Tests:
 * 1. Proxy is available and active
 * 2. HTTP request through proxy succeeds
 * 3. Anti-detect headers are properly set
 *
 * Use case: Fast, simple API requests that don't need browser fingerprint
 */
 import axios from 'axios';
 import { HttpsProxyAgent } from 'https-proxy-agent';
 import { CrawlRotator, PreflightResult } from './crawl-rotator';
 export interface CurlPreflightResult extends PreflightResult {
  method: 'curl';
 }
 /**
 * Run curl preflight check
 * Tests proxy connectivity using axios/curl through the proxy
 */
 export async function runCurlPreflight(
  crawlRotator: CrawlRotator
 ): Promise<CurlPreflightResult> {
  const result: CurlPreflightResult = {
    method: 'curl',
    passed: false,
    proxyAvailable: false,
    proxyConnected: false,
    antidetectReady: false,
    proxyIp: null,
    fingerprint: null,
    error: null,
    responseTimeMs: null,
  };
  // Step 1: Check proxy is available
  const currentProxy = crawlRotator.proxy.getCurrent();
  if (!currentProxy) {
    result.error = 'No proxy available';
    console.log('[CurlPreflight] FAILED - No proxy available');
    return result;
  }
  result.proxyAvailable = true;
  result.proxyIp = currentProxy.host;
  // Step 2: Check fingerprint/anti-detect is ready
  const fingerprint = crawlRotator.userAgent.getCurrent();
  if (!fingerprint || !fingerprint.userAgent) {
    result.error = 'Anti-detect fingerprint not initialized';
    console.log('[CurlPreflight] FAILED - No fingerprint');
    return result;
  }
  result.antidetectReady = true;
  result.fingerprint = {
    userAgent: fingerprint.userAgent,
    browserName: fingerprint.browserName,
    deviceCategory: fingerprint.deviceCategory,
  };
  // Step 3: Test proxy connectivity with an actual HTTP request
  const proxyUrl = crawlRotator.proxy.getProxyUrl(currentProxy);
  const testUrl = 'https://httpbin.org/ip';
  try {
    const agent = new HttpsProxyAgent(proxyUrl);
    const startTime = Date.now();
    const response = await axios.get(testUrl, {
      httpsAgent: agent,
      timeout: 15000, // 15 second timeout
      headers: {
        'User-Agent': fingerprint.userAgent,
        'Accept-Language': fingerprint.acceptLanguage,
        ...(fingerprint.secChUa && { 'sec-ch-ua': fingerprint.secChUa }),
        ...(fingerprint.secChUaPlatform && { 'sec-ch-ua-platform': fingerprint.secChUaPlatform }),
        ...(fingerprint.secChUaMobile && { 'sec-ch-ua-mobile': fingerprint.secChUaMobile }),
      },
    });
    result.responseTimeMs = Date.now() - startTime;
    result.proxyConnected = true;
    result.passed = true;
    // Mark success on proxy stats
    await crawlRotator.proxy.markSuccess(currentProxy.id, result.responseTimeMs);
    console.log(`[CurlPreflight] PASSED - Proxy ${currentProxy.host} connected (${result.responseTimeMs}ms), UA: ${fingerprint.browserName}/${fingerprint.deviceCategory}`);
  } catch (err: any) {
    result.error = `Proxy connection failed: ${err.message || 'Unknown error'}`;
    console.log(`[CurlPreflight] FAILED - Proxy connection error: ${err.message}`);
    // Mark failure on proxy stats
    await crawlRotator.proxy.markFailed(currentProxy.id, err.message);
  }
  return result;
 }
--- a/backend/src/services/puppeteer-preflight.ts
+++ b/backend/src/services/puppeteer-preflight.ts
@@ -0,0 +1,481 @@
 /**
 * Puppeteer Preflight - Verify browser-based transport works with anti-detect
 *
 * Uses Puppeteer + StealthPlugin to:
 * 1. Launch headless browser with stealth mode + PROXY
 * 2. Visit fingerprint.com demo to verify anti-detect and confirm proxy IP
 * 3. Establish session by visiting Dutchie embedded menu
 * 4. Make GraphQL request from browser context
 * 5. Verify we get a valid response (not blocked)
 *
 * Use case: Anti-detect scraping that needs real browser fingerprint through proxy
 *
 * Based on test-intercept.js which successfully captures 1000+ products
 */
 import { PreflightResult, CrawlRotator } from './crawl-rotator';
 // GraphQL hash for FilteredProducts query - MUST match CLAUDE.md
 const FILTERED_PRODUCTS_HASH = 'ee29c060826dc41c527e470e9ae502c9b2c169720faa0a9f5d25e1b9a530a4a0';
 // Test dispensary - AZ-Deeply-Rooted (known working)
 const TEST_CNAME = 'AZ-Deeply-Rooted';
 const TEST_PLATFORM_ID = '6405ef617056e8014d79101b';
 // Anti-detect verification sites (primary + fallback)
 const FINGERPRINT_DEMO_URL = 'https://demo.fingerprint.com/';
 const AMIUNIQUE_URL = 'https://amiunique.org/fingerprint';
 // IP geolocation API for timezone lookup (free, no key required)
 const IP_API_URL = 'http://ip-api.com/json';
 /**
 * Look up timezone from IP address using ip-api.com
 * Returns IANA timezone (e.g., 'America/New_York') or null on failure
 */
 async function getTimezoneFromIp(ip: string): Promise<{ timezone: string; city?: string; region?: string } | null> {
  try {
    const axios = require('axios');
    const response = await axios.get(`${IP_API_URL}/${ip}?fields=status,timezone,city,regionName`, {
      timeout: 5000,
    });
    if (response.data?.status === 'success' && response.data?.timezone) {
      return {
        timezone: response.data.timezone,
        city: response.data.city,
        region: response.data.regionName,
      };
    }
    return null;
  } catch (err: any) {
    console.log(`[PuppeteerPreflight] IP geolocation lookup failed: ${err.message}`);
    return null;
  }
 }
 export interface PuppeteerPreflightResult extends PreflightResult {
  method: 'http';
  /** Number of products returned (proves API access) */
  productsReturned?: number;
  /** Browser user agent used */
  browserUserAgent?: string;
  /** Bot detection result from fingerprint.com */
  botDetection?: {
    detected: boolean;
    probability?: number;
    type?: string;
  };
  /** Expected proxy IP (from pool) */
  expectedProxyIp?: string;
  /** Whether IP verification passed (detected IP matches proxy) */
  ipVerified?: boolean;
  /** Detected timezone from IP geolocation */
  detectedTimezone?: string;
  /** Detected location from IP geolocation */
  detectedLocation?: {
    city?: string;
    region?: string;
  };
 }
 /**
 * Run Puppeteer preflight check with proxy
 * Tests browser-based access with anti-detect verification via fingerprint.com
 *
 * @param crawlRotator - CrawlRotator instance to get proxy from pool
 */
 export async function runPuppeteerPreflight(
  crawlRotator?: CrawlRotator
 ): Promise<PuppeteerPreflightResult> {
  const result: PuppeteerPreflightResult = {
    method: 'http',
    passed: false,
    proxyAvailable: false,
    proxyConnected: false,
    antidetectReady: false,
    proxyIp: null,
    fingerprint: null,
    error: null,
    responseTimeMs: null,
    productsReturned: 0,
    ipVerified: false,
  };
  let browser: any = null;
  try {
    // Step 0: Get a proxy from the pool
    let proxyUrl: string | null = null;
    let expectedProxyHost: string | null = null;
    if (crawlRotator) {
      const currentProxy = crawlRotator.proxy.getCurrent();
      if (currentProxy) {
        result.proxyAvailable = true;
        proxyUrl = crawlRotator.proxy.getProxyUrl(currentProxy);
        expectedProxyHost = currentProxy.host;
        result.expectedProxyIp = expectedProxyHost;
        console.log(`[PuppeteerPreflight] Using proxy: ${currentProxy.host}:${currentProxy.port}`);
      } else {
        result.error = 'No proxy available from pool';
        console.log(`[PuppeteerPreflight] FAILED - No proxy available`);
        return result;
      }
    } else {
      console.log(`[PuppeteerPreflight] WARNING: No CrawlRotator provided - using direct connection`);
      result.proxyAvailable = true; // No proxy needed for direct
    }
    // Dynamic imports to avoid loading Puppeteer unless needed
    const puppeteer = require('puppeteer-extra');
    const StealthPlugin = require('puppeteer-extra-plugin-stealth');
    puppeteer.use(StealthPlugin());
    const startTime = Date.now();
    // Build browser args
    const browserArgs = ['--no-sandbox', '--disable-setuid-sandbox'];
    if (proxyUrl) {
      // Extract host:port for Puppeteer (it handles auth separately)
      const proxyUrlParsed = new URL(proxyUrl);
      browserArgs.push(`--proxy-server=${proxyUrlParsed.host}`);
    }
    // Launch browser with stealth + proxy
    browser = await puppeteer.launch({
      headless: 'new',
      args: browserArgs,
    });
    const page = await browser.newPage();
    // If proxy has auth, set it up
    if (proxyUrl) {
      const proxyUrlParsed = new URL(proxyUrl);
      if (proxyUrlParsed.username && proxyUrlParsed.password) {
        await page.authenticate({
          username: decodeURIComponent(proxyUrlParsed.username),
          password: decodeURIComponent(proxyUrlParsed.password),
        });
      }
    }
    // Get browser user agent
    const userAgent = await page.evaluate(() => navigator.userAgent);
    result.browserUserAgent = userAgent;
    result.fingerprint = {
      userAgent,
      browserName: 'Chrome (Puppeteer)',
      deviceCategory: 'desktop',
    };
    // =========================================================================
    // STEP 1a: Get IP address directly via simple API (more reliable than scraping)
    // =========================================================================
    console.log(`[PuppeteerPreflight] Getting proxy IP address...`);
    try {
      const ipApiResponse = await page.evaluate(async () => {
        try {
          const response = await fetch('https://api.ipify.org?format=json');
          const data = await response.json();
          return { ip: data.ip, error: null };
        } catch (err: any) {
          return { ip: null, error: err.message };
        }
      });
      if (ipApiResponse.ip) {
        result.proxyIp = ipApiResponse.ip;
        result.proxyConnected = true;
        console.log(`[PuppeteerPreflight] Detected proxy IP: ${ipApiResponse.ip}`);
        // Look up timezone from IP
        const geoData = await getTimezoneFromIp(ipApiResponse.ip);
        if (geoData) {
          result.detectedTimezone = geoData.timezone;
          result.detectedLocation = { city: geoData.city, region: geoData.region };
          console.log(`[PuppeteerPreflight] IP Geolocation: ${geoData.city}, ${geoData.region} (${geoData.timezone})`);
          // Set browser timezone to match proxy location via CDP
          try {
            const client = await page.target().createCDPSession();
            await client.send('Emulation.setTimezoneOverride', { timezoneId: geoData.timezone });
            console.log(`[PuppeteerPreflight] Browser timezone set to: ${geoData.timezone}`);
          } catch (tzErr: any) {
            console.log(`[PuppeteerPreflight] Failed to set browser timezone: ${tzErr.message}`);
          }
        } else {
          console.log(`[PuppeteerPreflight] WARNING: Could not determine timezone from IP - timezone mismatch possible`);
        }
      } else {
        console.log(`[PuppeteerPreflight] IP lookup failed: ${ipApiResponse.error || 'unknown error'}`);
      }
    } catch (ipErr: any) {
      console.log(`[PuppeteerPreflight] IP API error: ${ipErr.message}`);
    }
    // =========================================================================
    // STEP 1b: Visit fingerprint.com demo to verify anti-detect
    // =========================================================================
    console.log(`[PuppeteerPreflight] Testing anti-detect at ${FINGERPRINT_DEMO_URL}...`);
    try {
      await page.goto(FINGERPRINT_DEMO_URL, {
        waitUntil: 'networkidle2',
        timeout: 30000,
      });
      result.proxyConnected = true; // If we got here, proxy is working
      // Wait for fingerprint results to load
      await page.waitForSelector('[data-test="visitor-id"]', { timeout: 10000 }).catch(() => {});
      // Extract fingerprint data from the page
      const fingerprintData = await page.evaluate(() => {
        // Try to find the IP address displayed on the page
        const ipElement = document.querySelector('[data-test="ip-address"]');
        const ip = ipElement?.textContent?.trim() || null;
        // Try to find bot detection info
        const botElement = document.querySelector('[data-test="bot-detected"]');
        const botDetected = botElement?.textContent?.toLowerCase().includes('true') || false;
        // Try to find visitor ID (proves fingerprinting worked)
        const visitorIdElement = document.querySelector('[data-test="visitor-id"]');
        const visitorId = visitorIdElement?.textContent?.trim() || null;
        // Alternative: look for common UI patterns if data-test attrs not present
        let detectedIp = ip;
        if (!detectedIp) {
          // Look for IP in any element containing IP-like pattern
          const allText = document.body.innerText;
          const ipMatch = allText.match(/\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\b/);
          detectedIp = ipMatch ? ipMatch[1] : null;
        }
        return {
          ip: detectedIp,
          botDetected,
          visitorId,
          pageLoaded: !!document.body,
        };
      });
      if (fingerprintData.ip) {
        result.proxyIp = fingerprintData.ip;
        console.log(`[PuppeteerPreflight] Detected IP: ${fingerprintData.ip}`);
        // Verify IP matches expected proxy
        if (expectedProxyHost) {
          // Check if detected IP contains the proxy host (or is close match)
          if (fingerprintData.ip === expectedProxyHost ||
              expectedProxyHost.includes(fingerprintData.ip) ||
              fingerprintData.ip.includes(expectedProxyHost.split('.').slice(0, 3).join('.'))) {
            result.ipVerified = true;
            console.log(`[PuppeteerPreflight] IP VERIFIED - matches proxy`);
          } else {
            console.log(`[PuppeteerPreflight] IP mismatch: expected ${expectedProxyHost}, got ${fingerprintData.ip}`);
            // Don't fail - residential proxies often show different egress IPs
          }
        }
        // Note: Timezone already set earlier via ipify.org IP lookup
      }
      if (fingerprintData.visitorId) {
        console.log(`[PuppeteerPreflight] Fingerprint visitor ID: ${fingerprintData.visitorId}`);
      }
      result.botDetection = {
        detected: fingerprintData.botDetected,
      };
      if (fingerprintData.botDetected) {
        console.log(`[PuppeteerPreflight] WARNING: Bot detection triggered!`);
      } else {
        console.log(`[PuppeteerPreflight] Anti-detect check: NOT detected as bot`);
        result.antidetectReady = true;
      }
    } catch (fpErr: any) {
      // Could mean proxy connection failed
      console.log(`[PuppeteerPreflight] Fingerprint.com check failed: ${fpErr.message}`);
      if (fpErr.message.includes('net::ERR_PROXY') || fpErr.message.includes('ECONNREFUSED')) {
        result.error = `Proxy connection failed: ${fpErr.message}`;
        return result;
      }
      // Try fallback: amiunique.org
      console.log(`[PuppeteerPreflight] Trying fallback: ${AMIUNIQUE_URL}...`);
      try {
        await page.goto(AMIUNIQUE_URL, {
          waitUntil: 'networkidle2',
          timeout: 30000,
        });
        result.proxyConnected = true;
        // Extract IP from amiunique.org page
        const amiData = await page.evaluate(() => {
          const allText = document.body.innerText;
          const ipMatch = allText.match(/\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\b/);
          return {
            ip: ipMatch ? ipMatch[1] : null,
            pageLoaded: !!document.body,
          };
        });
        if (amiData.ip) {
          result.proxyIp = amiData.ip;
          console.log(`[PuppeteerPreflight] Detected IP via amiunique.org: ${amiData.ip}`);
        }
        result.antidetectReady = true;
        console.log(`[PuppeteerPreflight] amiunique.org fallback succeeded`);
      } catch (amiErr: any) {
        console.log(`[PuppeteerPreflight] amiunique.org fallback also failed: ${amiErr.message}`);
        // Continue with Dutchie test anyway
        result.proxyConnected = true;
        result.antidetectReady = true;
      }
    }
    // =========================================================================
    // STEP 2: Test Dutchie API access (the real test)
    // =========================================================================
    const embedUrl = `https://dutchie.com/embedded-menu/${TEST_CNAME}?menuType=rec`;
    console.log(`[PuppeteerPreflight] Establishing session at ${embedUrl}...`);
    await page.goto(embedUrl, {
      waitUntil: 'networkidle2',
      timeout: 30000,
    });
    // Make GraphQL request from browser context
    const graphqlResult = await page.evaluate(
      async (platformId: string, hash: string) => {
        try {
          const variables = {
            includeEnterpriseSpecials: false,
            productsFilter: {
              dispensaryId: platformId,
              pricingType: 'rec',
              Status: 'Active', // CRITICAL: Must be 'Active' per CLAUDE.md
              types: [],
              useCache: true,
              isDefaultSort: true,
              sortBy: 'popularSortIdx',
              sortDirection: 1,
              bypassOnlineThresholds: true,
              isKioskMenu: false,
              removeProductsBelowOptionThresholds: false,
            },
            page: 0,
            perPage: 10, // Just need a few to prove it works
          };
          const extensions = {
            persistedQuery: {
              version: 1,
              sha256Hash: hash,
            },
          };
          const qs = new URLSearchParams({
            operationName: 'FilteredProducts',
            variables: JSON.stringify(variables),
            extensions: JSON.stringify(extensions),
          });
          const url = `https://dutchie.com/api-3/graphql?${qs.toString()}`;
          const sessionId = 'preflight-' + Date.now();
          const response = await fetch(url, {
            method: 'GET',
            headers: {
              Accept: 'application/json',
              'content-type': 'application/json',
              'x-dutchie-session': sessionId,
              'apollographql-client-name': 'Marketplace (production)',
            },
            credentials: 'include',
          });
          if (!response.ok) {
            return { error: `HTTP ${response.status}`, products: 0 };
          }
          const json = await response.json();
          if (json.errors) {
            return { error: JSON.stringify(json.errors).slice(0, 200), products: 0 };
          }
          const products = json?.data?.filteredProducts?.products || [];
          return { error: null, products: products.length };
        } catch (err: any) {
          return { error: err.message || 'Unknown error', products: 0 };
        }
      },
      TEST_PLATFORM_ID,
      FILTERED_PRODUCTS_HASH
    );
    result.responseTimeMs = Date.now() - startTime;
    if (graphqlResult.error) {
      result.error = `GraphQL error: ${graphqlResult.error}`;
      console.log(`[PuppeteerPreflight] FAILED - ${result.error}`);
    } else if (graphqlResult.products === 0) {
      result.error = 'GraphQL returned 0 products';
      console.log(`[PuppeteerPreflight] FAILED - No products returned`);
    } else {
      result.passed = true;
      result.productsReturned = graphqlResult.products;
      console.log(
        `[PuppeteerPreflight] PASSED - Got ${graphqlResult.products} products in ${result.responseTimeMs}ms`
      );
      if (result.proxyIp) {
        console.log(`[PuppeteerPreflight] Browser IP via proxy: ${result.proxyIp}`);
      }
    }
  } catch (err: any) {
    result.error = `Browser error: ${err.message || 'Unknown error'}`;
    console.log(`[PuppeteerPreflight] FAILED - ${result.error}`);
  } finally {
    if (browser) {
      await browser.close().catch(() => {});
    }
  }
  return result;
 }
 /**
 * Run Puppeteer preflight with retry
 * Retries once on failure to handle transient issues
 *
 * @param crawlRotator - CrawlRotator instance to get proxy from pool
 * @param maxRetries - Number of retry attempts (default 1)
 */
 export async function runPuppeteerPreflightWithRetry(
  crawlRotator?: CrawlRotator,
  maxRetries: number = 1
 ): Promise<PuppeteerPreflightResult> {
  let lastResult: PuppeteerPreflightResult | null = null;
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    if (attempt > 0) {
      console.log(`[PuppeteerPreflight] Retry attempt ${attempt}/${maxRetries}...`);
      await new Promise((r) => setTimeout(r, 5000)); // Wait 5s between retries
    }
    lastResult = await runPuppeteerPreflight(crawlRotator);
    if (lastResult.passed) {
      return lastResult;
    }
  }
  return lastResult!;
 }
--- a/backend/src/system/routes/index.ts
+++ b/backend/src/system/routes/index.ts
@@ -1,566 +1,30 @@
 /**
- * System API Routes
+ * System API Routes (Stub)
 *
- * Provides REST API endpoints for system monitoring and control:
+ * The full system routes depend on SyncOrchestrator which was moved to _deprecated.
- * - /api/system/sync/* - Sync orchestrator
+ * This stub provides empty routers to maintain backward compatibility.
 * - /api/system/dlq/* - Dead-letter queue
 * - /api/system/integrity/* - Integrity checks
 * - /api/system/fix/* - Auto-fix routines
 * - /api/system/alerts/* - System alerts
 * - /metrics - Prometheus metrics
 *
- * Phase 5: Full Production Sync + Monitoring
+ * Full implementation available at: src/_deprecated/system/routes/index.ts
 */
 import { Router, Request, Response } from 'express';
 import { Pool } from 'pg';
-import {
+import { MetricsService } from '../services';
  SyncOrchestrator,
  MetricsService,
  DLQService,
  AlertService,
  IntegrityService,
  AutoFixService,
 } from '../services';
-export function createSystemRouter(pool: Pool): Router {
+export function createSystemRouter(_pool: Pool): Router {
  const router = Router();
-  // Initialize services
+  // Stub - full sync/dlq/integrity/fix/alerts routes moved to _deprecated
-  const metrics = new MetricsService(pool);
+  router.get('/status', (_req: Request, res: Response) => {
  const dlq = new DLQService(pool);
  const alerts = new AlertService(pool);
  const integrity = new IntegrityService(pool, alerts);
  const autoFix = new AutoFixService(pool, alerts);
  const orchestrator = new SyncOrchestrator(pool, metrics, dlq, alerts);
  // ============================================================
  // SYNC ORCHESTRATOR ENDPOINTS
  // ============================================================
  /**
   * GET /api/system/sync/status
   * Get current sync status
   */
  router.get('/sync/status', async (_req: Request, res: Response) => {
    try {
      const status = await orchestrator.getStatus();
      res.json(status);
    } catch (error) {
      console.error('[System] Sync status error:', error);
      res.status(500).json({ error: 'Failed to get sync status' });
    }
  });
  /**
   * POST /api/system/sync/run
   * Trigger a sync run
   */
  router.post('/sync/run', async (req: Request, res: Response) => {
    try {
      const triggeredBy = req.body.triggeredBy || 'api';
      const result = await orchestrator.runSync();
    res.json({
-        success: true,
+      message: 'System routes temporarily disabled - see _deprecated/system/routes',
-        triggeredBy,
+      status: 'stub',
        metrics: result,
    });
    } catch (error) {
      console.error('[System] Sync run error:', error);
      res.status(500).json({
        success: false,
        error: error instanceof Error ? error.message : 'Sync run failed',
      });
    }
  });
  /**
   * GET /api/system/sync/queue-depth
   * Get queue depth information
   */
  router.get('/sync/queue-depth', async (_req: Request, res: Response) => {
    try {
      const depth = await orchestrator.getQueueDepth();
      res.json(depth);
    } catch (error) {
      console.error('[System] Queue depth error:', error);
      res.status(500).json({ error: 'Failed to get queue depth' });
    }
  });
  /**
   * GET /api/system/sync/health
   * Get sync health status
   */
  router.get('/sync/health', async (_req: Request, res: Response) => {
    try {
      const health = await orchestrator.getHealth();
      res.status(health.healthy ? 200 : 503).json(health);
    } catch (error) {
      console.error('[System] Health check error:', error);
      res.status(500).json({ healthy: false, error: 'Health check failed' });
    }
  });
  /**
   * POST /api/system/sync/pause
   * Pause the orchestrator
   */
  router.post('/sync/pause', async (req: Request, res: Response) => {
    try {
      const reason = req.body.reason || 'Manual pause';
      await orchestrator.pause(reason);
      res.json({ success: true, message: 'Orchestrator paused' });
    } catch (error) {
      console.error('[System] Pause error:', error);
      res.status(500).json({ error: 'Failed to pause orchestrator' });
    }
  });
  /**
   * POST /api/system/sync/resume
   * Resume the orchestrator
   */
  router.post('/sync/resume', async (_req: Request, res: Response) => {
    try {
      await orchestrator.resume();
      res.json({ success: true, message: 'Orchestrator resumed' });
    } catch (error) {
      console.error('[System] Resume error:', error);
      res.status(500).json({ error: 'Failed to resume orchestrator' });
    }
  });
  // ============================================================
  // DLQ ENDPOINTS
  // ============================================================
  /**
   * GET /api/system/dlq
   * List DLQ payloads
   */
  router.get('/dlq', async (req: Request, res: Response) => {
    try {
      const options = {
        status: req.query.status as string,
        errorType: req.query.errorType as string,
        dispensaryId: req.query.dispensaryId ? parseInt(req.query.dispensaryId as string) : undefined,
        limit: req.query.limit ? parseInt(req.query.limit as string) : 50,
        offset: req.query.offset ? parseInt(req.query.offset as string) : 0,
      };
      const result = await dlq.listPayloads(options);
      res.json(result);
    } catch (error) {
      console.error('[System] DLQ list error:', error);
      res.status(500).json({ error: 'Failed to list DLQ payloads' });
    }
  });
  /**
   * GET /api/system/dlq/stats
   * Get DLQ statistics
   */
  router.get('/dlq/stats', async (_req: Request, res: Response) => {
    try {
      const stats = await dlq.getStats();
      res.json(stats);
    } catch (error) {
      console.error('[System] DLQ stats error:', error);
      res.status(500).json({ error: 'Failed to get DLQ stats' });
    }
  });
  /**
   * GET /api/system/dlq/summary
   * Get DLQ summary by error type
   */
  router.get('/dlq/summary', async (_req: Request, res: Response) => {
    try {
      const summary = await dlq.getSummary();
      res.json(summary);
    } catch (error) {
      console.error('[System] DLQ summary error:', error);
      res.status(500).json({ error: 'Failed to get DLQ summary' });
    }
  });
  /**
   * GET /api/system/dlq/:id
   * Get a specific DLQ payload
   */
  router.get('/dlq/:id', async (req: Request, res: Response) => {
    try {
      const payload = await dlq.getPayload(req.params.id);
      if (!payload) {
        return res.status(404).json({ error: 'Payload not found' });
      }
      res.json(payload);
    } catch (error) {
      console.error('[System] DLQ get error:', error);
      res.status(500).json({ error: 'Failed to get DLQ payload' });
    }
  });
  /**
   * POST /api/system/dlq/:id/retry
   * Retry a DLQ payload
   */
  router.post('/dlq/:id/retry', async (req: Request, res: Response) => {
    try {
      const result = await dlq.retryPayload(req.params.id);
      if (result.success) {
        res.json(result);
      } else {
        res.status(400).json(result);
      }
    } catch (error) {
      console.error('[System] DLQ retry error:', error);
      res.status(500).json({ error: 'Failed to retry payload' });
    }
  });
  /**
   * POST /api/system/dlq/:id/abandon
   * Abandon a DLQ payload
   */
  router.post('/dlq/:id/abandon', async (req: Request, res: Response) => {
    try {
      const reason = req.body.reason || 'Manually abandoned';
      const abandonedBy = req.body.abandonedBy || 'api';
      const success = await dlq.abandonPayload(req.params.id, reason, abandonedBy);
      res.json({ success });
    } catch (error) {
      console.error('[System] DLQ abandon error:', error);
      res.status(500).json({ error: 'Failed to abandon payload' });
    }
  });
  /**
   * POST /api/system/dlq/bulk-retry
   * Bulk retry payloads by error type
   */
  router.post('/dlq/bulk-retry', async (req: Request, res: Response) => {
    try {
      const { errorType } = req.body;
      if (!errorType) {
        return res.status(400).json({ error: 'errorType is required' });
      }
      const result = await dlq.bulkRetryByErrorType(errorType);
      res.json(result);
    } catch (error) {
      console.error('[System] DLQ bulk retry error:', error);
      res.status(500).json({ error: 'Failed to bulk retry' });
    }
  });
  // ============================================================
  // INTEGRITY CHECK ENDPOINTS
  // ============================================================
  /**
   * POST /api/system/integrity/run
   * Run all integrity checks
   */
  router.post('/integrity/run', async (req: Request, res: Response) => {
    try {
      const triggeredBy = req.body.triggeredBy || 'api';
      const result = await integrity.runAllChecks(triggeredBy);
      res.json(result);
    } catch (error) {
      console.error('[System] Integrity run error:', error);
      res.status(500).json({ error: 'Failed to run integrity checks' });
    }
  });
  /**
   * GET /api/system/integrity/runs
   * Get recent integrity check runs
   */
  router.get('/integrity/runs', async (req: Request, res: Response) => {
    try {
      const limit = req.query.limit ? parseInt(req.query.limit as string) : 10;
      const runs = await integrity.getRecentRuns(limit);
      res.json(runs);
    } catch (error) {
      console.error('[System] Integrity runs error:', error);
      res.status(500).json({ error: 'Failed to get integrity runs' });
    }
  });
  /**
   * GET /api/system/integrity/runs/:runId
   * Get results for a specific integrity run
   */
  router.get('/integrity/runs/:runId', async (req: Request, res: Response) => {
    try {
      const results = await integrity.getRunResults(req.params.runId);
      res.json(results);
    } catch (error) {
      console.error('[System] Integrity run results error:', error);
      res.status(500).json({ error: 'Failed to get run results' });
    }
  });
  // ============================================================
  // AUTO-FIX ENDPOINTS
  // ============================================================
  /**
   * GET /api/system/fix/routines
   * Get available fix routines
   */
  router.get('/fix/routines', (_req: Request, res: Response) => {
    try {
      const routines = autoFix.getAvailableRoutines();
      res.json(routines);
    } catch (error) {
      console.error('[System] Get routines error:', error);
      res.status(500).json({ error: 'Failed to get routines' });
    }
  });
  /**
   * POST /api/system/fix/:routine
   * Run a fix routine
   */
  router.post('/fix/:routine', async (req: Request, res: Response) => {
    try {
      const routineName = req.params.routine;
      const dryRun = req.body.dryRun === true;
      const triggeredBy = req.body.triggeredBy || 'api';
      const result = await autoFix.runRoutine(routineName as any, triggeredBy, { dryRun });
      res.json(result);
    } catch (error) {
      console.error('[System] Fix routine error:', error);
      res.status(500).json({ error: 'Failed to run fix routine' });
    }
  });
  /**
   * GET /api/system/fix/runs
   * Get recent fix runs
   */
  router.get('/fix/runs', async (req: Request, res: Response) => {
    try {
      const limit = req.query.limit ? parseInt(req.query.limit as string) : 20;
      const runs = await autoFix.getRecentRuns(limit);
      res.json(runs);
    } catch (error) {
      console.error('[System] Fix runs error:', error);
      res.status(500).json({ error: 'Failed to get fix runs' });
    }
  });
  // ============================================================
  // ALERTS ENDPOINTS
  // ============================================================
  /**
   * GET /api/system/alerts
   * List alerts
   */
  router.get('/alerts', async (req: Request, res: Response) => {
    try {
      const options = {
        status: req.query.status as any,
        severity: req.query.severity as any,
        type: req.query.type as string,
        limit: req.query.limit ? parseInt(req.query.limit as string) : 50,
        offset: req.query.offset ? parseInt(req.query.offset as string) : 0,
      };
      const result = await alerts.listAlerts(options);
      res.json(result);
    } catch (error) {
      console.error('[System] Alerts list error:', error);
      res.status(500).json({ error: 'Failed to list alerts' });
    }
  });
  /**
   * GET /api/system/alerts/active
   * Get active alerts
   */
  router.get('/alerts/active', async (_req: Request, res: Response) => {
    try {
      const activeAlerts = await alerts.getActiveAlerts();
      res.json(activeAlerts);
    } catch (error) {
      console.error('[System] Active alerts error:', error);
      res.status(500).json({ error: 'Failed to get active alerts' });
    }
  });
  /**
   * GET /api/system/alerts/summary
   * Get alert summary
   */
  router.get('/alerts/summary', async (_req: Request, res: Response) => {
    try {
      const summary = await alerts.getSummary();
      res.json(summary);
    } catch (error) {
      console.error('[System] Alerts summary error:', error);
      res.status(500).json({ error: 'Failed to get alerts summary' });
    }
  });
  /**
   * POST /api/system/alerts/:id/acknowledge
   * Acknowledge an alert
   */
  router.post('/alerts/:id/acknowledge', async (req: Request, res: Response) => {
    try {
      const alertId = parseInt(req.params.id);
      const acknowledgedBy = req.body.acknowledgedBy || 'api';
      const success = await alerts.acknowledgeAlert(alertId, acknowledgedBy);
      res.json({ success });
    } catch (error) {
      console.error('[System] Acknowledge alert error:', error);
      res.status(500).json({ error: 'Failed to acknowledge alert' });
    }
  });
  /**
   * POST /api/system/alerts/:id/resolve
   * Resolve an alert
   */
  router.post('/alerts/:id/resolve', async (req: Request, res: Response) => {
    try {
      const alertId = parseInt(req.params.id);
      const resolvedBy = req.body.resolvedBy || 'api';
      const success = await alerts.resolveAlert(alertId, resolvedBy);
      res.json({ success });
    } catch (error) {
      console.error('[System] Resolve alert error:', error);
      res.status(500).json({ error: 'Failed to resolve alert' });
    }
  });
  /**
   * POST /api/system/alerts/bulk-acknowledge
   * Bulk acknowledge alerts
   */
  router.post('/alerts/bulk-acknowledge', async (req: Request, res: Response) => {
    try {
      const { ids, acknowledgedBy } = req.body;
      if (!ids || !Array.isArray(ids)) {
        return res.status(400).json({ error: 'ids array is required' });
      }
      const count = await alerts.bulkAcknowledge(ids, acknowledgedBy || 'api');
      res.json({ acknowledged: count });
    } catch (error) {
      console.error('[System] Bulk acknowledge error:', error);
      res.status(500).json({ error: 'Failed to bulk acknowledge' });
    }
  });
  // ============================================================
  // METRICS ENDPOINTS
  // ============================================================
  /**
   * GET /api/system/metrics
   * Get all current metrics
   */
  router.get('/metrics', async (_req: Request, res: Response) => {
    try {
      const allMetrics = await metrics.getAllMetrics();
      res.json(allMetrics);
    } catch (error) {
      console.error('[System] Metrics error:', error);
      res.status(500).json({ error: 'Failed to get metrics' });
    }
  });
  /**
   * GET /api/system/metrics/:name
   * Get a specific metric
   */
  router.get('/metrics/:name', async (req: Request, res: Response) => {
    try {
      const metric = await metrics.getMetric(req.params.name);
      if (!metric) {
        return res.status(404).json({ error: 'Metric not found' });
      }
      res.json(metric);
    } catch (error) {
      console.error('[System] Metric error:', error);
      res.status(500).json({ error: 'Failed to get metric' });
    }
  });
  /**
   * GET /api/system/metrics/:name/history
   * Get metric time series
   */
  router.get('/metrics/:name/history', async (req: Request, res: Response) => {
    try {
      const hours = req.query.hours ? parseInt(req.query.hours as string) : 24;
      const history = await metrics.getMetricHistory(req.params.name, hours);
      res.json(history);
    } catch (error) {
      console.error('[System] Metric history error:', error);
      res.status(500).json({ error: 'Failed to get metric history' });
    }
  });
  /**
   * GET /api/system/errors
   * Get error summary
   */
  router.get('/errors', async (_req: Request, res: Response) => {
    try {
      const summary = await metrics.getErrorSummary();
      res.json(summary);
    } catch (error) {
      console.error('[System] Error summary error:', error);
      res.status(500).json({ error: 'Failed to get error summary' });
    }
  });
  /**
   * GET /api/system/errors/recent
   * Get recent errors
   */
  router.get('/errors/recent', async (req: Request, res: Response) => {
    try {
      const limit = req.query.limit ? parseInt(req.query.limit as string) : 50;
      const errorType = req.query.type as string;
      const errors = await metrics.getRecentErrors(limit, errorType);
      res.json(errors);
    } catch (error) {
      console.error('[System] Recent errors error:', error);
      res.status(500).json({ error: 'Failed to get recent errors' });
    }
  });
  /**
   * POST /api/system/errors/acknowledge
   * Acknowledge errors
   */
  router.post('/errors/acknowledge', async (req: Request, res: Response) => {
    try {
      const { ids, acknowledgedBy } = req.body;
      if (!ids || !Array.isArray(ids)) {
        return res.status(400).json({ error: 'ids array is required' });
      }
      const count = await metrics.acknowledgeErrors(ids, acknowledgedBy || 'api');
      res.json({ acknowledged: count });
    } catch (error) {
      console.error('[System] Acknowledge errors error:', error);
      res.status(500).json({ error: 'Failed to acknowledge errors' });
    }
  });
  return router;
 }
 /**
 * Create Prometheus metrics endpoint (standalone)
 */
 export function createPrometheusRouter(pool: Pool): Router {
  const router = Router();
  const metrics = new MetricsService(pool);
--- a/backend/src/system/services/index.ts
+++ b/backend/src/system/services/index.ts
@@ -4,7 +4,7 @@
 * Phase 5: Full Production Sync + Monitoring
 */
-export { SyncOrchestrator, type SyncStatus, type QueueDepth, type SyncRunMetrics, type OrchestratorStatus } from './sync-orchestrator';
+// SyncOrchestrator moved to _deprecated (depends on hydration module)
 export { MetricsService, ERROR_TYPES, type Metric, type MetricTimeSeries, type ErrorBucket, type ErrorType } from './metrics';
 export { DLQService, type DLQPayload, type DLQStats } from './dlq';
 export { AlertService, type SystemAlert, type AlertSummary, type AlertSeverity, type AlertStatus } from './alerts';
--- a/backend/src/tasks/handlers/index.ts
+++ b/backend/src/tasks/handlers/index.ts
@@ -4,9 +4,9 @@
 * Exports all task handlers for the task worker.
 */
 export { handleProductRefresh } from './product-refresh';
 export { handleProductDiscovery } from './product-discovery';
 export { handleProductRefresh } from './product-refresh';
 export { handleStoreDiscovery } from './store-discovery';
 export { handleEntryPointDiscovery } from './entry-point-discovery';
 export { handleAnalyticsRefresh } from './analytics-refresh';
-export { handleProxyTest } from './proxy-test';
+export { handleWhoami } from './whoami';
--- a/backend/src/tasks/handlers/proxy-test.ts
+++ b/backend/src/tasks/handlers/proxy-test.ts
@@ -1,51 +0,0 @@
 /**
 * Proxy Test Handler
 * Tests proxy connectivity by fetching public IP via ipify
 */
 import { TaskContext, TaskResult } from '../task-worker';
 import { execSync } from 'child_process';
 export async function handleProxyTest(ctx: TaskContext): Promise<TaskResult> {
  const { pool } = ctx;
  console.log('[ProxyTest] Testing proxy connection...');
  try {
    // Get active proxy from DB
    const proxyResult = await pool.query(`
      SELECT host, port, username, password
      FROM proxies
      WHERE is_active = true
      LIMIT 1
    `);
    if (proxyResult.rows.length === 0) {
      return { success: false, error: 'No active proxy configured' };
    }
    const p = proxyResult.rows[0];
    const proxyUrl = p.username
      ? `http://${p.username}:${p.password}@${p.host}:${p.port}`
      : `http://${p.host}:${p.port}`;
    console.log(`[ProxyTest] Using proxy: ${p.host}:${p.port}`);
    // Fetch IP via proxy
    const cmd = `curl -s --proxy '${proxyUrl}' 'https://api.ipify.org?format=json'`;
    const output = execSync(cmd, { timeout: 30000 }).toString().trim();
    const data = JSON.parse(output);
    console.log(`[ProxyTest] Proxy IP: ${data.ip}`);
    return {
      success: true,
      proxyIp: data.ip,
      proxyHost: p.host,
      proxyPort: p.port,
    };
  } catch (error: any) {
    console.error('[ProxyTest] Error:', error.message);
    return { success: false, error: error.message };
  }
 }
--- a/backend/src/tasks/handlers/whoami.ts
+++ b/backend/src/tasks/handlers/whoami.ts
@@ -0,0 +1,80 @@
 /**
 * WhoAmI Handler
 * Tests proxy connectivity and anti-detect by fetching public IP
 * Reports: proxy IP, fingerprint info, and connection status
 */
 import { TaskContext, TaskResult } from '../task-worker';
 import { execSync } from 'child_process';
 export async function handleWhoami(ctx: TaskContext): Promise<TaskResult> {
  const { pool, crawlRotator } = ctx;
  console.log('[WhoAmI] Testing proxy and anti-detect...');
  try {
    // Use the preflight check which tests proxy + anti-detect
    if (crawlRotator) {
      const preflight = await crawlRotator.preflight();
      if (!preflight.passed) {
        return {
          success: false,
          error: preflight.error || 'Preflight check failed',
          proxyAvailable: preflight.proxyAvailable,
          proxyConnected: preflight.proxyConnected,
          antidetectReady: preflight.antidetectReady,
        };
      }
      console.log(`[WhoAmI] Proxy IP: ${preflight.proxyIp}, Response: ${preflight.responseTimeMs}ms`);
      console.log(`[WhoAmI] Fingerprint: ${preflight.fingerprint?.browserName}/${preflight.fingerprint?.deviceCategory}`);
      return {
        success: true,
        proxyIp: preflight.proxyIp,
        responseTimeMs: preflight.responseTimeMs,
        fingerprint: preflight.fingerprint,
        proxyAvailable: preflight.proxyAvailable,
        proxyConnected: preflight.proxyConnected,
        antidetectReady: preflight.antidetectReady,
      };
    }
    // Fallback: Direct proxy test without CrawlRotator
    const proxyResult = await pool.query(`
      SELECT host, port, username, password
      FROM proxies
      WHERE is_active = true
      LIMIT 1
    `);
    if (proxyResult.rows.length === 0) {
      return { success: false, error: 'No active proxy configured' };
    }
    const p = proxyResult.rows[0];
    const proxyUrl = p.username
      ? `http://${p.username}:${p.password}@${p.host}:${p.port}`
      : `http://${p.host}:${p.port}`;
    console.log(`[WhoAmI] Using proxy: ${p.host}:${p.port}`);
    // Fetch IP via proxy
    const cmd = `curl -s --proxy '${proxyUrl}' 'https://api.ipify.org?format=json'`;
    const output = execSync(cmd, { timeout: 30000 }).toString().trim();
    const data = JSON.parse(output);
    console.log(`[WhoAmI] Proxy IP: ${data.ip}`);
    return {
      success: true,
      proxyIp: data.ip,
      proxyHost: p.host,
      proxyPort: p.port,
    };
  } catch (error: any) {
    console.error('[WhoAmI] Error:', error.message);
    return { success: false, error: error.message };
  }
 }
--- a/backend/src/tasks/index.ts
+++ b/backend/src/tasks/index.ts
@@ -17,8 +17,8 @@ export {
 export { TaskWorker, TaskContext, TaskResult } from './task-worker';
 export {
  handleProductRefresh,
  handleProductDiscovery,
  handleProductRefresh,
  handleStoreDiscovery,
  handleEntryPointDiscovery,
  handleAnalyticsRefresh,
--- a/backend/src/tasks/task-service.ts
+++ b/backend/src/tasks/task-service.ts
@@ -24,15 +24,16 @@ async function tableExists(tableName: string): Promise<boolean> {
 // Per TASK_WORKFLOW_2024-12-10.md: Task roles
 // payload_fetch: Hits Dutchie API, saves raw payload to filesystem
-// product_refresh: Reads local payload, normalizes, upserts to DB
+// product_discovery: Main product crawl handler
 // product_refresh: Legacy role (deprecated but kept for compatibility)
 export type TaskRole =
  | 'store_discovery'
  | 'entry_point_discovery'
  | 'product_discovery'
-  | 'payload_fetch'      // NEW: Fetches from API, saves to disk
+  | 'payload_fetch'      // Fetches from API, saves to disk
-  | 'product_refresh'    // CHANGED: Now reads from local payload
+  | 'product_refresh'    // DEPRECATED: Use product_discovery instead
  | 'analytics_refresh'
-  | 'proxy_test';        // Tests proxy connectivity via ipify
+  | 'whoami';            // Tests proxy + anti-detect connectivity
 export type TaskStatus =
  | 'pending'
@@ -51,6 +52,7 @@ export interface WorkerTask {
  platform: string | null;
  status: TaskStatus;
  priority: number;
  method: 'curl' | 'http' | null;  // Transport method: curl=axios/proxy, http=Puppeteer/browser
  scheduled_for: Date | null;
  worker_id: string | null;
  claimed_at: Date | null;
@@ -152,23 +154,33 @@ class TaskService {
   * Claim a task atomically for a worker
   * If role is null, claims ANY available task (role-agnostic worker)
   * Returns null if task pool is paused.
   *
   * @param role - Task role to claim, or null for any task
   * @param workerId - Worker ID claiming the task
   * @param curlPassed - Whether worker passed curl preflight (default true for backward compat)
   * @param httpPassed - Whether worker passed http/Puppeteer preflight (default false)
   */
-  async claimTask(role: TaskRole | null, workerId: string): Promise<WorkerTask | null> {
+  async claimTask(
    role: TaskRole | null,
    workerId: string,
    curlPassed: boolean = true,
    httpPassed: boolean = false
  ): Promise<WorkerTask | null> {
    // Check if task pool is paused - don't claim any tasks
    if (isTaskPoolPaused()) {
      return null;
    }
    if (role) {
-      // Role-specific claiming - use the SQL function
+      // Role-specific claiming - use the SQL function with preflight capabilities
      const result = await pool.query(
-        `SELECT * FROM claim_task($1, $2)`,
+        `SELECT * FROM claim_task($1, $2, $3, $4)`,
-        [role, workerId]
+        [role, workerId, curlPassed, httpPassed]
      );
      return (result.rows[0] as WorkerTask) || null;
    }
-    // Role-agnostic claiming - claim ANY pending task
+    // Role-agnostic claiming - claim ANY pending task matching worker capabilities
    const result = await pool.query(`
      UPDATE worker_tasks
      SET
@@ -179,6 +191,12 @@ class TaskService {
        SELECT id FROM worker_tasks
        WHERE status = 'pending'
          AND (scheduled_for IS NULL OR scheduled_for <= NOW())
          -- Method compatibility: worker must have passed the required preflight
          AND (
            method IS NULL  -- No preference, any worker can claim
            OR (method = 'curl' AND $2 = TRUE)
            OR (method = 'http' AND $3 = TRUE)
          )
          -- Exclude stores that already have an active task
          AND (dispensary_id IS NULL OR dispensary_id NOT IN (
            SELECT dispensary_id FROM worker_tasks
@@ -190,7 +208,7 @@ class TaskService {
        FOR UPDATE SKIP LOCKED
      )
      RETURNING *
-    `, [workerId]);
+    `, [workerId, curlPassed, httpPassed]);
    return (result.rows[0] as WorkerTask) || null;
  }
@@ -231,6 +249,24 @@ class TaskService {
    );
  }
  /**
   * Release a claimed task back to pending (e.g., when preflight fails)
   * This allows another worker to pick it up.
   */
  async releaseTask(taskId: number): Promise<void> {
    await pool.query(
      `UPDATE worker_tasks
       SET status = 'pending',
           worker_id = NULL,
           claimed_at = NULL,
           started_at = NULL,
           updated_at = NOW()
       WHERE id = $1 AND status IN ('claimed', 'running')`,
      [taskId]
    );
    console.log(`[TaskService] Task ${taskId} released back to pending`);
  }
  /**
   * Mark a task as failed, with auto-retry if under max_retries
   * Returns true if task was re-queued for retry, false if permanently failed
--- a/backend/src/tasks/task-worker.ts
+++ b/backend/src/tasks/task-worker.ts
@@ -51,6 +51,10 @@ import os from 'os';
 import { CrawlRotator } from '../services/crawl-rotator';
 import { setCrawlRotator } from '../platforms/dutchie';
 // Dual-transport preflight system
 import { runCurlPreflight, CurlPreflightResult } from '../services/curl-preflight';
 import { runPuppeteerPreflightWithRetry, PuppeteerPreflightResult } from '../services/puppeteer-preflight';
 // Task handlers by role
 // Per TASK_WORKFLOW_2024-12-10.md: payload_fetch and product_refresh are now separate
 import { handlePayloadFetch } from './handlers/payload-fetch';
@@ -59,7 +63,7 @@ import { handleProductDiscovery } from './handlers/product-discovery';
 import { handleStoreDiscovery } from './handlers/store-discovery';
 import { handleEntryPointDiscovery } from './handlers/entry-point-discovery';
 import { handleAnalyticsRefresh } from './handlers/analytics-refresh';
-import { handleProxyTest } from './handlers/proxy-test';
+import { handleWhoami } from './handlers/whoami';
 const POLL_INTERVAL_MS = parseInt(process.env.POLL_INTERVAL_MS || '5000');
 const HEARTBEAT_INTERVAL_MS = parseInt(process.env.HEARTBEAT_INTERVAL_MS || '30000');
@@ -111,6 +115,7 @@ export interface TaskContext {
  workerId: string;
  task: WorkerTask;
  heartbeat: () => Promise<void>;
  crawlRotator?: CrawlRotator;
 }
 export interface TaskResult {
@@ -125,16 +130,17 @@ export interface TaskResult {
 type TaskHandler = (ctx: TaskContext) => Promise<TaskResult>;
 // Per TASK_WORKFLOW_2024-12-10.md: Handler registry
-// payload_fetch: Fetches from Dutchie API, saves to disk, chains to product_refresh
+// payload_fetch: Fetches from Dutchie API, saves to disk
 // product_refresh: Reads local payload, normalizes, upserts to DB
 // product_discovery: Main handler for product crawling
 const TASK_HANDLERS: Record<TaskRole, TaskHandler> = {
-  payload_fetch: handlePayloadFetch,       // NEW: API fetch -> disk
+  payload_fetch: handlePayloadFetch,       // API fetch -> disk
-  product_refresh: handleProductRefresh,   // CHANGED: disk -> DB
+  product_refresh: handleProductRefresh,   // disk -> DB
  product_discovery: handleProductDiscovery,
  store_discovery: handleStoreDiscovery,
  entry_point_discovery: handleEntryPointDiscovery,
  analytics_refresh: handleAnalyticsRefresh,
-  proxy_test: handleProxyTest,             // Tests proxy via ipify
+  whoami: handleWhoami,                     // Tests proxy + anti-detect
 };
 /**
@@ -188,6 +194,21 @@ export class TaskWorker {
  private isBackingOff: boolean = false;
  private backoffReason: string | null = null;
  // ==========================================================================
  // DUAL-TRANSPORT PREFLIGHT STATUS
  // ==========================================================================
  // Workers run BOTH preflights on startup:
  // - curl: axios/proxy transport - fast, for simple API calls
  // - http: Puppeteer/browser transport - anti-detect, for Dutchie GraphQL
  //
  // Task claiming checks method compatibility - worker must have passed
  // the preflight for the task's required method.
  // ==========================================================================
  private preflightCurlPassed: boolean = false;
  private preflightHttpPassed: boolean = false;
  private preflightCurlResult: CurlPreflightResult | null = null;
  private preflightHttpResult: PuppeteerPreflightResult | null = null;
  constructor(role: TaskRole | null = null, workerId?: string) {
    this.pool = getPool();
    this.role = role;
@@ -350,6 +371,117 @@ export class TaskWorker {
    }
  }
  /**
   * Run dual-transport preflights on startup
   * Tests both curl (axios/proxy) and http (Puppeteer/browser) transport methods.
   * Results are reported to worker_registry and used for task claiming.
   *
   * NOTE: All current tasks require 'http' method, so http preflight must pass
   * for the worker to claim any tasks. Curl preflight is for future use.
   */
  private async runDualPreflights(): Promise<void> {
    console.log(`[TaskWorker] Running dual-transport preflights...`);
    // Run both preflights in parallel for efficiency
    const [curlResult, httpResult] = await Promise.all([
      runCurlPreflight(this.crawlRotator).catch((err): CurlPreflightResult => ({
        method: 'curl',
        passed: false,
        proxyAvailable: false,
        proxyConnected: false,
        antidetectReady: false,
        proxyIp: null,
        fingerprint: null,
        error: `Preflight error: ${err.message}`,
        responseTimeMs: null,
      })),
      runPuppeteerPreflightWithRetry(this.crawlRotator, 1).catch((err): PuppeteerPreflightResult => ({
        method: 'http',
        passed: false,
        proxyAvailable: false,
        proxyConnected: false,
        antidetectReady: false,
        proxyIp: null,
        fingerprint: null,
        error: `Preflight error: ${err.message}`,
        responseTimeMs: null,
        productsReturned: 0,
      })),
    ]);
    // Store results
    this.preflightCurlResult = curlResult;
    this.preflightHttpResult = httpResult;
    this.preflightCurlPassed = curlResult.passed;
    this.preflightHttpPassed = httpResult.passed;
    // Log results
    console.log(`[TaskWorker] CURL preflight: ${curlResult.passed ? 'PASSED' : 'FAILED'}${curlResult.error ? ` - ${curlResult.error}` : ''}`);
    console.log(`[TaskWorker] HTTP preflight: ${httpResult.passed ? 'PASSED' : 'FAILED'}${httpResult.error ? ` - ${httpResult.error}` : ''}`);
    if (httpResult.passed && httpResult.productsReturned) {
      console.log(`[TaskWorker] HTTP preflight returned ${httpResult.productsReturned} products from test store`);
    }
    // Report to worker_registry via API
    await this.reportPreflightStatus();
    // Since all tasks require 'http', warn if http preflight failed
    if (!this.preflightHttpPassed) {
      console.warn(`[TaskWorker] WARNING: HTTP preflight failed - this worker cannot claim any tasks!`);
      console.warn(`[TaskWorker] Error: ${httpResult.error}`);
    }
  }
  /**
   * Report preflight status to worker_registry
   * Function signature: update_worker_preflight(worker_id, transport, status, ip, response_ms, error, fingerprint)
   */
  private async reportPreflightStatus(): Promise<void> {
    try {
      // Update worker_registry directly via SQL (more reliable than API)
      // CURL preflight - includes IP address
      await this.pool.query(`
        SELECT update_worker_preflight($1, 'curl', $2, $3, $4, $5, $6)
      `, [
        this.workerId,
        this.preflightCurlPassed ? 'passed' : 'failed',
        this.preflightCurlResult?.proxyIp || null,
        this.preflightCurlResult?.responseTimeMs || null,
        this.preflightCurlResult?.error || null,
        null, // No fingerprint for curl
      ]);
      // HTTP preflight - includes IP, fingerprint, and timezone data
      const httpFingerprint = this.preflightHttpResult ? {
        ...this.preflightHttpResult.fingerprint,
        detectedTimezone: (this.preflightHttpResult as any).detectedTimezone,
        detectedLocation: (this.preflightHttpResult as any).detectedLocation,
        productsReturned: this.preflightHttpResult.productsReturned,
        botDetection: (this.preflightHttpResult as any).botDetection,
      } : null;
      await this.pool.query(`
        SELECT update_worker_preflight($1, 'http', $2, $3, $4, $5, $6)
      `, [
        this.workerId,
        this.preflightHttpPassed ? 'passed' : 'failed',
        this.preflightHttpResult?.proxyIp || null,
        this.preflightHttpResult?.responseTimeMs || null,
        this.preflightHttpResult?.error || null,
        httpFingerprint ? JSON.stringify(httpFingerprint) : null,
      ]);
      console.log(`[TaskWorker] Preflight status reported to worker_registry`);
      if (this.preflightHttpResult?.proxyIp) {
        console.log(`[TaskWorker] HTTP IP: ${this.preflightHttpResult.proxyIp}, Timezone: ${(this.preflightHttpResult as any).detectedTimezone || 'unknown'}`);
      }
    } catch (err: any) {
      // Non-fatal - worker can still function
      console.warn(`[TaskWorker] Could not report preflight status: ${err.message}`);
    }
  }
  /**
   * Register worker with the registry (get friendly name)
   */
@@ -493,11 +625,15 @@ export class TaskWorker {
    // Register with the API to get a friendly name
    await this.register();
    // Run dual-transport preflights
    await this.runDualPreflights();
    // Start registry heartbeat
    this.startRegistryHeartbeat();
    const roleMsg = this.role ? `for role: ${this.role}` : '(role-agnostic - any task)';
-    console.log(`[TaskWorker] ${this.friendlyName} starting ${roleMsg} (max ${this.maxConcurrentTasks} concurrent tasks)`);
+    const preflightMsg = `curl=${this.preflightCurlPassed ? '✓' : '✗'} http=${this.preflightHttpPassed ? '✓' : '✗'}`;
    console.log(`[TaskWorker] ${this.friendlyName} starting ${roleMsg} (${preflightMsg}, max ${this.maxConcurrentTasks} concurrent tasks)`);
    while (this.isRunning) {
      try {
@@ -551,10 +687,36 @@ export class TaskWorker {
    // Try to claim more tasks if we have capacity
    if (this.canAcceptMoreTasks()) {
-      const task = await taskService.claimTask(this.role, this.workerId);
+      // Pass preflight capabilities to only claim compatible tasks
      const task = await taskService.claimTask(
        this.role,
        this.workerId,
        this.preflightCurlPassed,
        this.preflightHttpPassed
      );
      if (task) {
        console.log(`[TaskWorker] ${this.friendlyName} claimed task ${task.id} (${task.role}) [${this.activeTasks.size + 1}/${this.maxConcurrentTasks}]`);
        // =================================================================
        // PREFLIGHT CHECK - CRITICAL: Worker MUST pass before task execution
        // Verifies: 1) Proxy available 2) Proxy connected 3) Anti-detect ready
        // =================================================================
        const preflight = await this.crawlRotator.preflight();
        if (!preflight.passed) {
          console.log(`[TaskWorker] ${this.friendlyName} PREFLIGHT FAILED for task ${task.id}: ${preflight.error}`);
          console.log(`[TaskWorker] Releasing task ${task.id} back to pending - worker cannot proceed without proxy/anti-detect`);
          // Release task back to pending so another worker can pick it up
          await taskService.releaseTask(task.id);
          // Wait before trying again - give proxies time to recover
          await this.sleep(30000); // 30 second wait on preflight failure
          return;
        }
        console.log(`[TaskWorker] ${this.friendlyName} preflight PASSED for task ${task.id} (proxy: ${preflight.proxyIp}, ${preflight.responseTimeMs}ms)`);
        this.activeTasks.set(task.id, task);
        // Start task in background (don't await)
@@ -611,6 +773,7 @@ export class TaskWorker {
        heartbeat: async () => {
          await taskService.heartbeat(task.id);
        },
        crawlRotator: this.crawlRotator,
      };
      // Execute the task
@@ -716,6 +879,8 @@ export class TaskWorker {
    maxConcurrentTasks: number;
    isBackingOff: boolean;
    backoffReason: string | null;
    preflightCurlPassed: boolean;
    preflightHttpPassed: boolean;
  } {
    return {
      workerId: this.workerId,
@@ -726,6 +891,8 @@ export class TaskWorker {
      maxConcurrentTasks: this.maxConcurrentTasks,
      isBackingOff: this.isBackingOff,
      backoffReason: this.backoffReason,
      preflightCurlPassed: this.preflightCurlPassed,
      preflightHttpPassed: this.preflightHttpPassed,
    };
  }
 }
@@ -742,8 +909,8 @@ async function main(): Promise<void> {
    'store_discovery',
    'entry_point_discovery',
    'product_discovery',
-    'payload_fetch',      // NEW: Fetches from API, saves to disk
+    'payload_fetch',      // Fetches from API, saves to disk
-    'product_refresh',    // CHANGED: Reads from disk, processes to DB
+    'product_refresh',    // Reads from disk, processes to DB
    'analytics_refresh',
  ];
--- a/backend/test-intercept.js
+++ b/backend/test-intercept.js
@@ -0,0 +1,180 @@
 /**
 * Stealth Browser Payload Capture - Direct GraphQL Injection
 *
 * Uses the browser session to make GraphQL requests that look organic.
 * Adds proper headers matching what Dutchie's frontend sends.
 */
 const puppeteer = require('puppeteer-extra');
 const StealthPlugin = require('puppeteer-extra-plugin-stealth');
 const fs = require('fs');
 puppeteer.use(StealthPlugin());
 async function capturePayload(config) {
  const {
    dispensaryId = null,
    platformId,
    cName,
    outputPath = `/tmp/payload_${cName}_${Date.now()}.json`,
  } = config;
  const browser = await puppeteer.launch({
    headless: 'new',
    args: ['--no-sandbox', '--disable-setuid-sandbox']
  });
  const page = await browser.newPage();
  // Establish session by visiting the embedded menu
  const embedUrl = `https://dutchie.com/embedded-menu/${cName}?menuType=rec`;
  console.log(`[Capture] Establishing session at ${embedUrl}...`);
  await page.goto(embedUrl, {
    waitUntil: 'networkidle2',
    timeout: 60000
  });
  console.log('[Capture] Session established, fetching ALL products...');
  // Fetch all products using GET requests with proper headers
  const result = await page.evaluate(async (platformId, cName) => {
    const allProducts = [];
    const logs = [];
    let pageNum = 0;
    const perPage = 100;
    let totalCount = 0;
    const sessionId = 'browser-session-' + Date.now();
    try {
      while (pageNum < 30) {  // Max 30 pages = 3000 products
        const variables = {
          includeEnterpriseSpecials: false,
          productsFilter: {
            dispensaryId: platformId,
            pricingType: 'rec',
            Status: 'Active',  // 'Active' for in-stock products per CLAUDE.md
            types: [],
            useCache: true,
            isDefaultSort: true,
            sortBy: 'popularSortIdx',
            sortDirection: 1,
            bypassOnlineThresholds: true,
            isKioskMenu: false,
            removeProductsBelowOptionThresholds: false,
          },
          page: pageNum,
          perPage: perPage,
        };
        const extensions = {
          persistedQuery: {
            version: 1,
            sha256Hash: 'ee29c060826dc41c527e470e9ae502c9b2c169720faa0a9f5d25e1b9a530a4a0'
          }
        };
        // Build GET URL like the browser does
        const qs = new URLSearchParams({
          operationName: 'FilteredProducts',
          variables: JSON.stringify(variables),
          extensions: JSON.stringify(extensions)
        });
        const url = `https://dutchie.com/api-3/graphql?${qs.toString()}`;
        const response = await fetch(url, {
          method: 'GET',
          headers: {
            'Accept': 'application/json',
            'content-type': 'application/json',
            'x-dutchie-session': sessionId,
            'apollographql-client-name': 'Marketplace (production)',
          },
          credentials: 'include'
        });
        logs.push(`Page ${pageNum}: HTTP ${response.status}`);
        if (!response.ok) {
          const text = await response.text();
          logs.push(`HTTP error: ${response.status} - ${text.slice(0, 200)}`);
          break;
        }
        const json = await response.json();
        if (json.errors) {
          logs.push(`GraphQL error: ${JSON.stringify(json.errors).slice(0, 200)}`);
          break;
        }
        const data = json?.data?.filteredProducts;
        if (!data || !data.products) {
          logs.push('No products in response');
          break;
        }
        const products = data.products;
        allProducts.push(...products);
        if (pageNum === 0) {
          totalCount = data.queryInfo?.totalCount || 0;
          logs.push(`Total reported: ${totalCount}`);
        }
        logs.push(`Got ${products.length} products (total: ${allProducts.length}/${totalCount})`);
        if (allProducts.length >= totalCount || products.length < perPage) {
          break;
        }
        pageNum++;
        // Small delay between pages to be polite
        await new Promise(r => setTimeout(r, 200));
      }
    } catch (err) {
      logs.push(`Error: ${err.message}`);
    }
    return { products: allProducts, totalCount, logs };
  }, platformId, cName);
  await browser.close();
  // Print logs from browser context
  result.logs.forEach(log => console.log(`[Browser] ${log}`));
  console.log(`[Capture] Got ${result.products.length} products (API reported ${result.totalCount})`);
  const payload = {
    dispensaryId: dispensaryId,
    platformId: platformId,
    cName,
    fetchedAt: new Date().toISOString(),
    productCount: result.products.length,
    products: result.products,
  };
  fs.writeFileSync(outputPath, JSON.stringify(payload, null, 2));
  console.log(`\n=== Capture Complete ===`);
  console.log(`Total products: ${result.products.length}`);
  console.log(`Saved to: ${outputPath}`);
  console.log(`File size: ${(fs.statSync(outputPath).size / 1024).toFixed(1)} KB`);
  return payload;
 }
 // Run
 (async () => {
  const payload = await capturePayload({
    cName: 'AZ-Deeply-Rooted',
    platformId: '6405ef617056e8014d79101b',
  });
  if (payload.products.length > 0) {
    const sample = payload.products[0];
    console.log(`\nSample: ${sample.Name || sample.name} - ${sample.brand?.name || sample.brandName}`);
  }
 })().catch(console.error);
--- a/backend/tsconfig.json
+++ b/backend/tsconfig.json
@@ -14,5 +14,5 @@
    "allowSyntheticDefaultImports": true
  },
  "include": ["src/**/*"],
-  "exclude": ["node_modules", "dist", "src/**/*.test.ts", "src/**/__tests__/**"]
+  "exclude": ["node_modules", "dist", "src/**/*.test.ts", "src/**/__tests__/**", "src/_deprecated/**"]
 }
--- a/cannaiq/src/pages/WorkersDashboard.tsx
+++ b/cannaiq/src/pages/WorkersDashboard.tsx
@@ -48,6 +48,17 @@ interface Worker {
  seconds_since_heartbeat: number;
  decommission_requested?: boolean;
  decommission_reason?: string;
  // Dual-transport preflight status
  preflight_curl_status?: 'pending' | 'passed' | 'failed' | 'skipped';
  preflight_http_status?: 'pending' | 'passed' | 'failed' | 'skipped';
  preflight_curl_at?: string;
  preflight_http_at?: string;
  preflight_curl_error?: string;
  preflight_http_error?: string;
  preflight_curl_ms?: number;
  preflight_http_ms?: number;
  can_curl?: boolean;
  can_http?: boolean;
  metadata: {
    cpu?: number;
    memory?: number;
@@ -277,6 +288,67 @@ function ResourceBadge({ worker }: { worker: Worker }) {
  );
 }
 // Transport capability badge showing curl/http preflight status
 function TransportBadge({ worker }: { worker: Worker }) {
  const curlStatus = worker.preflight_curl_status || 'pending';
  const httpStatus = worker.preflight_http_status || 'pending';
  const getStatusConfig = (status: string, label: string, ms?: number, error?: string) => {
    switch (status) {
      case 'passed':
        return {
          bg: 'bg-emerald-100',
          text: 'text-emerald-700',
          icon: <CheckCircle className="w-3 h-3" />,
          tooltip: ms ? `${label}: Passed (${ms}ms)` : `${label}: Passed`,
        };
      case 'failed':
        return {
          bg: 'bg-red-100',
          text: 'text-red-700',
          icon: <XCircle className="w-3 h-3" />,
          tooltip: error ? `${label}: Failed - ${error}` : `${label}: Failed`,
        };
      case 'skipped':
        return {
          bg: 'bg-gray-100',
          text: 'text-gray-500',
          icon: <Clock className="w-3 h-3" />,
          tooltip: `${label}: Skipped`,
        };
      default:
        return {
          bg: 'bg-yellow-100',
          text: 'text-yellow-700',
          icon: <Clock className="w-3 h-3 animate-pulse" />,
          tooltip: `${label}: Pending`,
        };
    }
  };
  const curlConfig = getStatusConfig(curlStatus, 'CURL', worker.preflight_curl_ms, worker.preflight_curl_error);
  const httpConfig = getStatusConfig(httpStatus, 'HTTP', worker.preflight_http_ms, worker.preflight_http_error);
  return (
    <div className="flex flex-col gap-1">
      <div
        className={`inline-flex items-center gap-1 px-1.5 py-0.5 rounded text-xs font-medium ${curlConfig.bg} ${curlConfig.text}`}
        title={curlConfig.tooltip}
      >
        {curlConfig.icon}
        <span>curl</span>
      </div>
      <div
        className={`inline-flex items-center gap-1 px-1.5 py-0.5 rounded text-xs font-medium ${httpConfig.bg} ${httpConfig.text}`}
        title={httpConfig.tooltip}
      >
        {httpConfig.icon}
        <span>http</span>
      </div>
    </div>
  );
 }
 // Task count badge showing active/max concurrent tasks
 function TaskCountBadge({ worker, tasks }: { worker: Worker; tasks: Task[] }) {
  const activeCount = worker.active_task_count ?? (worker.current_task_id ? 1 : 0);
@@ -883,6 +955,7 @@ export function WorkersDashboard() {
                  <th className="px-4 py-3 text-left text-xs font-medium text-gray-500 uppercase">Worker</th>
                  <th className="px-4 py-3 text-left text-xs font-medium text-gray-500 uppercase">Role</th>
                  <th className="px-4 py-3 text-left text-xs font-medium text-gray-500 uppercase">Status</th>
                  <th className="px-4 py-3 text-left text-xs font-medium text-gray-500 uppercase">Transport</th>
                  <th className="px-4 py-3 text-left text-xs font-medium text-gray-500 uppercase">Resources</th>
                  <th className="px-4 py-3 text-left text-xs font-medium text-gray-500 uppercase">Tasks</th>
                  <th className="px-4 py-3 text-left text-xs font-medium text-gray-500 uppercase">Duration</th>
@@ -934,6 +1007,9 @@ export function WorkersDashboard() {
                      <td className="px-4 py-3">
                        <HealthBadge status={worker.status} healthStatus={worker.health_status} />
                      </td>
                      <td className="px-4 py-3">
                        <TransportBadge worker={worker} />
                      </td>
                      <td className="px-4 py-3">
                        <ResourceBadge worker={worker} />
                      </td>
--- a/k8s/woodpecker-agent-compose.yml
+++ b/k8s/woodpecker-agent-compose.yml
@@ -0,0 +1,18 @@
 # Woodpecker Agent Docker Compose
 # Path: /opt/woodpecker/docker-compose.yml
 # Deploy: cd /opt/woodpecker && docker compose up -d
 version: '3.8'
 services:
  woodpecker-agent:
    image: woodpeckerci/woodpecker-agent:latest
    container_name: woodpecker-agent
    restart: always
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      - WOODPECKER_SERVER=localhost:9000
      - WOODPECKER_AGENT_SECRET=${WOODPECKER_AGENT_SECRET}
      - WOODPECKER_MAX_WORKFLOWS=5
      - WOODPECKER_HEALTHCHECK=true
      - WOODPECKER_LOG_LEVEL=info
Author	SHA1	Message	Date
Kelly	6bcadd9e71	fix(preflight): Correct parameter order and add IP/fingerprint reporting - Fix update_worker_preflight call to use correct parameter order: (worker_id, transport, status, ip, response_ms, error, fingerprint) - Add proxyIp to both curl and http preflight reports - Add fingerprint JSONB with timezone, location, and bot detection data - Log HTTP IP and timezone after preflight completes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 00:32:45 -07:00
kelly	a77bf8611a	Merge pull request 'feat(workers): Preflight phase 1 - Schema, StatefulSet, and timezone matching' (#55 ) from feat/preflight-phase1-schema into master	2025-12-12 07:30:53 +00:00
Kelly	33feca3138	fix(antidetect): Match browser timezone to proxy IP location - Add IP geolocation lookup via ip-api.com to get timezone from proxy IP - Use ipify.org API for reliable proxy IP detection (replaces unreliable fingerprint.com scraping) - Set browser timezone via CDP Emulation.setTimezoneOverride to match proxy location - Add detectedTimezone and detectedLocation to preflight result - Add /api/worker-registry/preflight-test endpoint for smoke testing Fixes timezone mismatch where browser showed America/Phoenix while proxy was in America/New_York 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 00:25:39 -07:00
kelly	7d85a97b63	Merge pull request 'feat: Preflight schema and StatefulSet' (#54 ) from feat/preflight-phase1-schema into master Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/54	2025-12-12 07:14:40 +00:00
Kelly	ce081effd4	feat(workers): Add preflight schema and StatefulSet - Migration 085: Add curl_ip, http_ip, fingerprint_data, preflight_status, preflight_at columns to worker_registry - StatefulSet manifest for 8 persistent workers with OnDelete update strategy 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-11 23:45:04 -07:00
kelly	2ed088b4d8	Merge pull request 'feat(api): Add preflight columns to worker registry API response' (#50 ) from feat/preflight-api-fields into master	2025-12-12 06:22:06 +00:00
Kelly	d3c49fa246	feat(api): Add preflight columns to worker registry API response Exposes curl_ip, http_ip, preflight_status, preflight_at, and fingerprint_data in the /api/worker-registry/workers response. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-11 23:12:55 -07:00
kelly	52cb5014fd	Merge pull request 'feat: Dual-transport preflight system for worker fingerprinting' (#49 ) from fix/ci-and-preflight-enforcement into master	2025-12-12 06:08:13 +00:00
Kelly	50654be910	fix: Restore hydration and product_refresh for store updates - Moved hydration module back from _deprecated (needed for product_refresh) - Restored product_refresh handler for processing stored payloads - Restored geolocation service for findadispo/findagram - Stubbed system routes that depend on deprecated SyncOrchestrator - Removed crawler-sandbox route (deprecated) - Fixed all TypeScript compilation errors 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-11 23:03:39 -07:00
Kelly	cdab71a1ee	feat(workers): Add dual-transport preflight system Workers now run both curl and http (Puppeteer) preflights on startup: - curl-preflight.ts: Tests axios + proxy via httpbin.org - puppeteer-preflight.ts: Tests browser + StealthPlugin via fingerprint.com (with amiunique.org fallback) - Migration 084: Adds preflight columns to worker_registry and method column to worker_tasks - Workers report preflight status, IP, fingerprint, and response time - Tasks can require specific transport method (curl/http) - Dashboard shows Transport column with preflight status badges 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-11 22:47:52 -07:00
Kelly	a35976b9e9	chore: Clean up deprecated code and docs - Move deprecated directories to src/_deprecated/: - hydration/ (old pipeline approach) - scraper-v2/ (old Puppeteer scraper) - canonical-hydration/ (merged into tasks) - Unused services: availability, crawler-logger, geolocation, etc - Unused utils: age-gate-playwright, HomepageValidator, stealthBrowser - Archive outdated docs to docs/_archive/: - ANALYTICS_RUNBOOK.md - ANALYTICS_V2_EXAMPLES.md - BRAND_INTELLIGENCE_API.md - CRAWL_PIPELINE.md - TASK_WORKFLOW_2024-12-10.md - WORKER_TASK_ARCHITECTURE.md - ORGANIC_SCRAPING_GUIDE.md - Add docs/CODEBASE_MAP.md as single source of truth - Add warning files to deprecated/archived directories - Slim down CLAUDE.md to essential rules only 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-11 22:17:40 -07:00
kelly	c68210c485	Merge pull request 'fix(ci): Remove buildx cache and add preflight enforcement' (#48 ) from fix/ci-and-preflight-enforcement into master Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/48	2025-12-12 04:53:20 +00:00
Kelly	f2864bd2ad	fix(ci): Remove buildx cache and add preflight enforcement - Remove cache_from/cache_to from CI (plugin bug splitting commas) - Add preflight() method to CrawlRotator - tests proxy + anti-detect - Add pre-task preflight check - workers MUST pass before executing - Add releaseTask() to release tasks back to pending on preflight fail - Rename proxy_test task to whoami for clarity 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-11 21:37:22 -07:00
kelly	eca9e85242	Merge pull request 'fix(ci): Fix buildx cache syntax and add proxy_test task' (#47 ) from feat/ui-polish-and-ci-caching into master Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/47	2025-12-12 04:30:21 +00:00
kelly	e1c67dcee5	Merge pull request 'feat: UI polish and CI caching improvements' (#46 ) from feat/ui-polish-and-ci-caching into master Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/46	2025-12-12 03:47:46 +00:00
kelly	34c8a8cc67	Merge pull request 'feat(cannaiq): Add clickable logo, favicon, and remove state selector' (#45 ) from feat/cannaiq-ui-polish into master Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/45	2025-12-12 03:36:04 +00:00