Compare commits
1 Commits
feat/task-
...
fix/analyt
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
2e22b439e0 |
@@ -163,32 +163,7 @@ steps:
|
|||||||
event: push
|
event: push
|
||||||
|
|
||||||
# ===========================================
|
# ===========================================
|
||||||
# STAGE 3: Run Database Migrations (before deploy)
|
# STAGE 3: Deploy (after Docker builds)
|
||||||
# ===========================================
|
|
||||||
migrate:
|
|
||||||
image: code.cannabrands.app/creationshop/dispensary-scraper:${CI_COMMIT_SHA:0:8}
|
|
||||||
environment:
|
|
||||||
CANNAIQ_DB_HOST:
|
|
||||||
from_secret: db_host
|
|
||||||
CANNAIQ_DB_PORT:
|
|
||||||
from_secret: db_port
|
|
||||||
CANNAIQ_DB_NAME:
|
|
||||||
from_secret: db_name
|
|
||||||
CANNAIQ_DB_USER:
|
|
||||||
from_secret: db_user
|
|
||||||
CANNAIQ_DB_PASS:
|
|
||||||
from_secret: db_pass
|
|
||||||
commands:
|
|
||||||
- cd /app
|
|
||||||
- node dist/db/migrate.js
|
|
||||||
depends_on:
|
|
||||||
- docker-backend
|
|
||||||
when:
|
|
||||||
branch: master
|
|
||||||
event: push
|
|
||||||
|
|
||||||
# ===========================================
|
|
||||||
# STAGE 4: Deploy (after migrations)
|
|
||||||
# ===========================================
|
# ===========================================
|
||||||
deploy:
|
deploy:
|
||||||
image: bitnami/kubectl:latest
|
image: bitnami/kubectl:latest
|
||||||
@@ -207,7 +182,7 @@ steps:
|
|||||||
- kubectl rollout status deployment/scraper -n dispensary-scraper --timeout=300s
|
- kubectl rollout status deployment/scraper -n dispensary-scraper --timeout=300s
|
||||||
- kubectl rollout status deployment/cannaiq-frontend -n dispensary-scraper --timeout=120s
|
- kubectl rollout status deployment/cannaiq-frontend -n dispensary-scraper --timeout=120s
|
||||||
depends_on:
|
depends_on:
|
||||||
- migrate
|
- docker-backend
|
||||||
- docker-cannaiq
|
- docker-cannaiq
|
||||||
- docker-findadispo
|
- docker-findadispo
|
||||||
- docker-findagram
|
- docker-findagram
|
||||||
|
|||||||
@@ -25,9 +25,8 @@ ENV APP_GIT_SHA=${APP_GIT_SHA}
|
|||||||
ENV APP_BUILD_TIME=${APP_BUILD_TIME}
|
ENV APP_BUILD_TIME=${APP_BUILD_TIME}
|
||||||
ENV CONTAINER_IMAGE_TAG=${CONTAINER_IMAGE_TAG}
|
ENV CONTAINER_IMAGE_TAG=${CONTAINER_IMAGE_TAG}
|
||||||
|
|
||||||
# Install Chromium dependencies and curl for HTTP requests
|
# Install Chromium dependencies
|
||||||
RUN apt-get update && apt-get install -y \
|
RUN apt-get update && apt-get install -y \
|
||||||
curl \
|
|
||||||
chromium \
|
chromium \
|
||||||
fonts-liberation \
|
fonts-liberation \
|
||||||
libnss3 \
|
libnss3 \
|
||||||
|
|||||||
@@ -500,18 +500,17 @@ CREATE TABLE proxies (
|
|||||||
|
|
||||||
Proxies are mandatory. There is no environment variable to disable them. Workers will refuse to start without active proxies in the database.
|
Proxies are mandatory. There is no environment variable to disable them. Workers will refuse to start without active proxies in the database.
|
||||||
|
|
||||||
### User-Agent Generation
|
### Fingerprints Available
|
||||||
|
|
||||||
See `workflow-12102025.md` for full specification.
|
The client includes 6 browser fingerprints:
|
||||||
|
- Chrome 131 on Windows
|
||||||
|
- Chrome 131 on macOS
|
||||||
|
- Chrome 120 on Windows
|
||||||
|
- Firefox 133 on Windows
|
||||||
|
- Safari 17.2 on macOS
|
||||||
|
- Edge 131 on Windows
|
||||||
|
|
||||||
**Summary:**
|
Each includes proper `sec-ch-ua`, `sec-ch-ua-platform`, and `sec-ch-ua-mobile` headers.
|
||||||
- Uses `intoli/user-agents` library (daily-updated market share data)
|
|
||||||
- Device distribution: Mobile 62%, Desktop 36%, Tablet 2%
|
|
||||||
- Browser whitelist: Chrome, Safari, Edge, Firefox only
|
|
||||||
- UA sticks until IP rotates (403 or manual rotation)
|
|
||||||
- Failure = alert admin + stop crawl (no fallback)
|
|
||||||
|
|
||||||
Each fingerprint includes proper `sec-ch-ua`, `sec-ch-ua-platform`, and `sec-ch-ua-mobile` headers.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
@@ -1,584 +0,0 @@
|
|||||||
# Task Workflow Documentation
|
|
||||||
**Date: 2024-12-10**
|
|
||||||
|
|
||||||
This document describes the complete task/job processing architecture after the 2024-12-10 rewrite.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Complete Architecture
|
|
||||||
|
|
||||||
```
|
|
||||||
┌─────────────────────────────────────────────────────────────────────────────────┐
|
|
||||||
│ KUBERNETES CLUSTER │
|
|
||||||
├─────────────────────────────────────────────────────────────────────────────────┤
|
|
||||||
│ │
|
|
||||||
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
|
|
||||||
│ │ API SERVER POD (scraper) │ │
|
|
||||||
│ │ │ │
|
|
||||||
│ │ ┌──────────────────┐ ┌────────────────────────────────────────┐ │ │
|
|
||||||
│ │ │ Express API │ │ TaskScheduler │ │ │
|
|
||||||
│ │ │ │ │ (src/services/task-scheduler.ts) │ │ │
|
|
||||||
│ │ │ /api/job-queue │ │ │ │ │
|
|
||||||
│ │ │ /api/tasks │ │ • Polls every 60s │ │ │
|
|
||||||
│ │ │ /api/schedules │ │ • Checks task_schedules table │ │ │
|
|
||||||
│ │ └────────┬─────────┘ │ • SELECT FOR UPDATE SKIP LOCKED │ │ │
|
|
||||||
│ │ │ │ • Generates tasks when due │ │ │
|
|
||||||
│ │ │ └──────────────────┬─────────────────────┘ │ │
|
|
||||||
│ │ │ │ │ │
|
|
||||||
│ └────────────┼──────────────────────────────────┼──────────────────────────┘ │
|
|
||||||
│ │ │ │
|
|
||||||
│ │ ┌────────────────────────┘ │
|
|
||||||
│ │ │ │
|
|
||||||
│ ▼ ▼ │
|
|
||||||
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
|
|
||||||
│ │ POSTGRESQL DATABASE │ │
|
|
||||||
│ │ │ │
|
|
||||||
│ │ ┌─────────────────────┐ ┌─────────────────────┐ │ │
|
|
||||||
│ │ │ task_schedules │ │ worker_tasks │ │ │
|
|
||||||
│ │ │ │ │ │ │ │
|
|
||||||
│ │ │ • product_refresh │───────►│ • pending tasks │ │ │
|
|
||||||
│ │ │ • store_discovery │ create │ • claimed tasks │ │ │
|
|
||||||
│ │ │ • analytics_refresh │ tasks │ • running tasks │ │ │
|
|
||||||
│ │ │ │ │ • completed tasks │ │ │
|
|
||||||
│ │ │ next_run_at │ │ │ │ │
|
|
||||||
│ │ │ last_run_at │ │ role, dispensary_id │ │ │
|
|
||||||
│ │ │ interval_hours │ │ priority, status │ │ │
|
|
||||||
│ │ └─────────────────────┘ └──────────┬──────────┘ │ │
|
|
||||||
│ │ │ │ │
|
|
||||||
│ └─────────────────────────────────────────────┼────────────────────────────┘ │
|
|
||||||
│ │ │
|
|
||||||
│ ┌──────────────────────┘ │
|
|
||||||
│ │ Workers poll for tasks │
|
|
||||||
│ │ (SELECT FOR UPDATE SKIP LOCKED) │
|
|
||||||
│ ▼ │
|
|
||||||
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
|
|
||||||
│ │ WORKER PODS (StatefulSet: scraper-worker) │ │
|
|
||||||
│ │ │ │
|
|
||||||
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
|
|
||||||
│ │ │ Worker 0 │ │ Worker 1 │ │ Worker 2 │ │ Worker N │ │ │
|
|
||||||
│ │ │ │ │ │ │ │ │ │ │ │
|
|
||||||
│ │ │ task-worker │ │ task-worker │ │ task-worker │ │ task-worker │ │ │
|
|
||||||
│ │ │ .ts │ │ .ts │ │ .ts │ │ .ts │ │ │
|
|
||||||
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │
|
|
||||||
│ │ │ │
|
|
||||||
│ └──────────────────────────────────────────────────────────────────────────┘ │
|
|
||||||
│ │
|
|
||||||
└──────────────────────────────────────────────────────────────────────────────────┘
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Startup Sequence
|
|
||||||
|
|
||||||
```
|
|
||||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
||||||
│ API SERVER STARTUP │
|
|
||||||
├─────────────────────────────────────────────────────────────────────────────┤
|
|
||||||
│ │
|
|
||||||
│ 1. Express app initializes │
|
|
||||||
│ │ │
|
|
||||||
│ ▼ │
|
|
||||||
│ 2. runAutoMigrations() │
|
|
||||||
│ • Runs pending migrations (including 079_task_schedules.sql) │
|
|
||||||
│ │ │
|
|
||||||
│ ▼ │
|
|
||||||
│ 3. initializeMinio() / initializeImageStorage() │
|
|
||||||
│ │ │
|
|
||||||
│ ▼ │
|
|
||||||
│ 4. cleanupOrphanedJobs() │
|
|
||||||
│ │ │
|
|
||||||
│ ▼ │
|
|
||||||
│ 5. taskScheduler.start() ◄─── NEW (per TASK_WORKFLOW_2024-12-10.md) │
|
|
||||||
│ │ │
|
|
||||||
│ ├── Recover stale tasks (workers that died) │
|
|
||||||
│ ├── Ensure default schedules exist in task_schedules │
|
|
||||||
│ ├── Check and run any due schedules immediately │
|
|
||||||
│ └── Start 60-second poll interval │
|
|
||||||
│ │ │
|
|
||||||
│ ▼ │
|
|
||||||
│ 6. app.listen(PORT) │
|
|
||||||
│ │
|
|
||||||
└─────────────────────────────────────────────────────────────────────────────┘
|
|
||||||
|
|
||||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
||||||
│ WORKER POD STARTUP │
|
|
||||||
├─────────────────────────────────────────────────────────────────────────────┤
|
|
||||||
│ │
|
|
||||||
│ 1. K8s starts pod from StatefulSet │
|
|
||||||
│ │ │
|
|
||||||
│ ▼ │
|
|
||||||
│ 2. TaskWorker.constructor() │
|
|
||||||
│ • Create DB pool │
|
|
||||||
│ • Create CrawlRotator │
|
|
||||||
│ │ │
|
|
||||||
│ ▼ │
|
|
||||||
│ 3. initializeStealth() │
|
|
||||||
│ • Load proxies from DB (REQUIRED - fails if none) │
|
|
||||||
│ • Wire rotator to Dutchie client │
|
|
||||||
│ │ │
|
|
||||||
│ ▼ │
|
|
||||||
│ 4. register() with API │
|
|
||||||
│ • Optional - continues if fails │
|
|
||||||
│ │ │
|
|
||||||
│ ▼ │
|
|
||||||
│ 5. startRegistryHeartbeat() every 30s │
|
|
||||||
│ │ │
|
|
||||||
│ ▼ │
|
|
||||||
│ 6. processNextTask() loop │
|
|
||||||
│ │ │
|
|
||||||
│ ├── Poll for pending task (FOR UPDATE SKIP LOCKED) │
|
|
||||||
│ ├── Claim task atomically │
|
|
||||||
│ ├── Execute handler (product_refresh, store_discovery, etc.) │
|
|
||||||
│ ├── Mark complete/failed │
|
|
||||||
│ ├── Chain next task if applicable │
|
|
||||||
│ └── Loop │
|
|
||||||
│ │
|
|
||||||
└─────────────────────────────────────────────────────────────────────────────┘
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Schedule Flow
|
|
||||||
|
|
||||||
```
|
|
||||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
||||||
│ SCHEDULER POLL (every 60 seconds) │
|
|
||||||
├─────────────────────────────────────────────────────────────────────────────┤
|
|
||||||
│ │
|
|
||||||
│ BEGIN TRANSACTION │
|
|
||||||
│ │ │
|
|
||||||
│ ▼ │
|
|
||||||
│ SELECT * FROM task_schedules │
|
|
||||||
│ WHERE enabled = true AND next_run_at <= NOW() │
|
|
||||||
│ FOR UPDATE SKIP LOCKED ◄─── Prevents duplicate execution across replicas │
|
|
||||||
│ │ │
|
|
||||||
│ ▼ │
|
|
||||||
│ For each due schedule: │
|
|
||||||
│ │ │
|
|
||||||
│ ├── product_refresh_all │
|
|
||||||
│ │ └─► Query dispensaries needing crawl │
|
|
||||||
│ │ └─► Create product_refresh tasks in worker_tasks │
|
|
||||||
│ │ │
|
|
||||||
│ ├── store_discovery_dutchie │
|
|
||||||
│ │ └─► Create single store_discovery task │
|
|
||||||
│ │ │
|
|
||||||
│ └── analytics_refresh │
|
|
||||||
│ └─► Create single analytics_refresh task │
|
|
||||||
│ │ │
|
|
||||||
│ ▼ │
|
|
||||||
│ UPDATE task_schedules SET │
|
|
||||||
│ last_run_at = NOW(), │
|
|
||||||
│ next_run_at = NOW() + interval_hours │
|
|
||||||
│ │ │
|
|
||||||
│ ▼ │
|
|
||||||
│ COMMIT │
|
|
||||||
│ │
|
|
||||||
└─────────────────────────────────────────────────────────────────────────────┘
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task Lifecycle
|
|
||||||
|
|
||||||
```
|
|
||||||
┌──────────┐
|
|
||||||
│ SCHEDULE │
|
|
||||||
│ DUE │
|
|
||||||
└────┬─────┘
|
|
||||||
│
|
|
||||||
▼
|
|
||||||
┌──────────────┐ claim ┌──────────────┐ start ┌──────────────┐
|
|
||||||
│ PENDING │────────────►│ CLAIMED │────────────►│ RUNNING │
|
|
||||||
└──────────────┘ └──────────────┘ └──────┬───────┘
|
|
||||||
▲ │
|
|
||||||
│ ┌──────────────┼──────────────┐
|
|
||||||
│ retry │ │ │
|
|
||||||
│ (if retries < max) ▼ ▼ ▼
|
|
||||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐
|
|
||||||
└──────────────────────────────────│ FAILED │ │ COMPLETED│ │ STALE │
|
|
||||||
└──────────┘ └──────────┘ └────┬─────┘
|
|
||||||
│
|
|
||||||
recover_stale_tasks()
|
|
||||||
│
|
|
||||||
▼
|
|
||||||
┌──────────┐
|
|
||||||
│ PENDING │
|
|
||||||
└──────────┘
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Database Tables
|
|
||||||
|
|
||||||
### task_schedules (NEW - migration 079)
|
|
||||||
|
|
||||||
Stores schedule definitions. Survives restarts.
|
|
||||||
|
|
||||||
```sql
|
|
||||||
CREATE TABLE task_schedules (
|
|
||||||
id SERIAL PRIMARY KEY,
|
|
||||||
name VARCHAR(100) NOT NULL UNIQUE,
|
|
||||||
role VARCHAR(50) NOT NULL, -- product_refresh, store_discovery, etc.
|
|
||||||
enabled BOOLEAN DEFAULT TRUE,
|
|
||||||
interval_hours INTEGER NOT NULL, -- How often to run
|
|
||||||
priority INTEGER DEFAULT 0, -- Task priority when created
|
|
||||||
state_code VARCHAR(2), -- Optional filter
|
|
||||||
last_run_at TIMESTAMPTZ, -- When it last ran
|
|
||||||
next_run_at TIMESTAMPTZ, -- When it's due next
|
|
||||||
last_task_count INTEGER, -- Tasks created last run
|
|
||||||
last_error TEXT -- Error message if failed
|
|
||||||
);
|
|
||||||
```
|
|
||||||
|
|
||||||
### worker_tasks (migration 074)
|
|
||||||
|
|
||||||
The task queue. Workers pull from here.
|
|
||||||
|
|
||||||
```sql
|
|
||||||
CREATE TABLE worker_tasks (
|
|
||||||
id SERIAL PRIMARY KEY,
|
|
||||||
role task_role NOT NULL, -- What type of work
|
|
||||||
dispensary_id INTEGER, -- Which store (if applicable)
|
|
||||||
platform VARCHAR(50), -- Which platform
|
|
||||||
status task_status DEFAULT 'pending',
|
|
||||||
priority INTEGER DEFAULT 0, -- Higher = process first
|
|
||||||
scheduled_for TIMESTAMP, -- Don't process before this time
|
|
||||||
worker_id VARCHAR(100), -- Which worker claimed it
|
|
||||||
claimed_at TIMESTAMP,
|
|
||||||
started_at TIMESTAMP,
|
|
||||||
completed_at TIMESTAMP,
|
|
||||||
last_heartbeat_at TIMESTAMP, -- For stale detection
|
|
||||||
result JSONB,
|
|
||||||
error_message TEXT,
|
|
||||||
retry_count INTEGER DEFAULT 0,
|
|
||||||
max_retries INTEGER DEFAULT 3
|
|
||||||
);
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Default Schedules
|
|
||||||
|
|
||||||
| Name | Role | Interval | Priority | Description |
|
|
||||||
|------|------|----------|----------|-------------|
|
|
||||||
| `payload_fetch_all` | payload_fetch | 4 hours | 0 | Fetch payloads from Dutchie API (chains to product_refresh) |
|
|
||||||
| `store_discovery_dutchie` | store_discovery | 24 hours | 5 | Find new Dutchie stores |
|
|
||||||
| `analytics_refresh` | analytics_refresh | 6 hours | 0 | Refresh MVs |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task Roles
|
|
||||||
|
|
||||||
| Role | Description | Creates Tasks For |
|
|
||||||
|------|-------------|-------------------|
|
|
||||||
| `payload_fetch` | **NEW** - Fetch from Dutchie API, save to disk | Each dispensary needing crawl |
|
|
||||||
| `product_refresh` | **CHANGED** - Read local payload, normalize, upsert to DB | Chained from payload_fetch |
|
|
||||||
| `store_discovery` | Find new dispensaries, returns newStoreIds[] | Single task per platform |
|
|
||||||
| `entry_point_discovery` | **DEPRECATED** - Resolve platform IDs | No longer used |
|
|
||||||
| `product_discovery` | Initial product fetch for new stores | Chained from store_discovery |
|
|
||||||
| `analytics_refresh` | Refresh MVs | Single global task |
|
|
||||||
|
|
||||||
### Payload/Refresh Separation (2024-12-10)
|
|
||||||
|
|
||||||
The crawl workflow is now split into two phases:
|
|
||||||
|
|
||||||
```
|
|
||||||
payload_fetch (scheduled every 4h)
|
|
||||||
└─► Hit Dutchie GraphQL API
|
|
||||||
└─► Save raw JSON to /storage/payloads/{year}/{month}/{day}/store_{id}_{ts}.json.gz
|
|
||||||
└─► Record metadata in raw_crawl_payloads table
|
|
||||||
└─► Queue product_refresh task with payload_id
|
|
||||||
|
|
||||||
product_refresh (chained from payload_fetch)
|
|
||||||
└─► Load payload from filesystem (NOT from API)
|
|
||||||
└─► Normalize via DutchieNormalizer
|
|
||||||
└─► Upsert to store_products
|
|
||||||
└─► Create snapshots
|
|
||||||
└─► Track missing products
|
|
||||||
└─► Download images
|
|
||||||
```
|
|
||||||
|
|
||||||
**Benefits:**
|
|
||||||
- **Retry-friendly**: If normalize fails, re-run product_refresh without re-crawling
|
|
||||||
- **Replay-able**: Run product_refresh against any historical payload
|
|
||||||
- **Faster refreshes**: Local file read vs network call
|
|
||||||
- **Historical diffs**: Compare payloads to see what changed between crawls
|
|
||||||
- **Less API pressure**: Only payload_fetch hits Dutchie
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task Chaining
|
|
||||||
|
|
||||||
Tasks automatically queue follow-up tasks upon successful completion. This creates two main flows:
|
|
||||||
|
|
||||||
### Discovery Flow (New Stores)
|
|
||||||
|
|
||||||
When `store_discovery` finds new dispensaries, they automatically get their initial product data:
|
|
||||||
|
|
||||||
```
|
|
||||||
store_discovery
|
|
||||||
└─► Discovers new locations via Dutchie GraphQL
|
|
||||||
└─► Auto-promotes valid locations to dispensaries table
|
|
||||||
└─► Collects newDispensaryIds[] from promotions
|
|
||||||
└─► Returns { newStoreIds: [...] } in result
|
|
||||||
|
|
||||||
chainNextTask() detects newStoreIds
|
|
||||||
└─► Creates product_discovery task for each new store
|
|
||||||
|
|
||||||
product_discovery
|
|
||||||
└─► Calls handlePayloadFetch() internally
|
|
||||||
└─► payload_fetch hits Dutchie API
|
|
||||||
└─► Saves raw JSON to /storage/payloads/
|
|
||||||
└─► Queues product_refresh task with payload_id
|
|
||||||
|
|
||||||
product_refresh
|
|
||||||
└─► Loads payload from filesystem
|
|
||||||
└─► Normalizes and upserts to store_products
|
|
||||||
└─► Creates snapshots, downloads images
|
|
||||||
```
|
|
||||||
|
|
||||||
**Complete Discovery Chain:**
|
|
||||||
```
|
|
||||||
store_discovery → product_discovery → payload_fetch → product_refresh
|
|
||||||
(internal call) (queues next)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Scheduled Flow (Existing Stores)
|
|
||||||
|
|
||||||
For existing stores, `payload_fetch_all` schedule runs every 4 hours:
|
|
||||||
|
|
||||||
```
|
|
||||||
TaskScheduler (every 60s)
|
|
||||||
└─► Checks task_schedules for due schedules
|
|
||||||
└─► payload_fetch_all is due
|
|
||||||
└─► Generates payload_fetch task for each dispensary
|
|
||||||
|
|
||||||
payload_fetch
|
|
||||||
└─► Hits Dutchie GraphQL API
|
|
||||||
└─► Saves raw JSON to /storage/payloads/
|
|
||||||
└─► Queues product_refresh task with payload_id
|
|
||||||
|
|
||||||
product_refresh
|
|
||||||
└─► Loads payload from filesystem (NOT API)
|
|
||||||
└─► Normalizes via DutchieNormalizer
|
|
||||||
└─► Upserts to store_products
|
|
||||||
└─► Creates snapshots
|
|
||||||
```
|
|
||||||
|
|
||||||
**Complete Scheduled Chain:**
|
|
||||||
```
|
|
||||||
payload_fetch → product_refresh
|
|
||||||
(queues) (reads local)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Chaining Implementation
|
|
||||||
|
|
||||||
Task chaining is handled in two places:
|
|
||||||
|
|
||||||
1. **Internal chaining (handler calls handler):**
|
|
||||||
- `product_discovery` calls `handlePayloadFetch()` directly
|
|
||||||
|
|
||||||
2. **External chaining (chainNextTask() in task-service.ts):**
|
|
||||||
- Called after task completion
|
|
||||||
- `store_discovery` → queues `product_discovery` for each newStoreId
|
|
||||||
|
|
||||||
3. **Queue-based chaining (taskService.createTask):**
|
|
||||||
- `payload_fetch` queues `product_refresh` with `payload: { payload_id }`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Payload API Endpoints
|
|
||||||
|
|
||||||
Raw crawl payloads can be accessed via the Payloads API:
|
|
||||||
|
|
||||||
| Endpoint | Method | Description |
|
|
||||||
|----------|--------|-------------|
|
|
||||||
| `GET /api/payloads` | GET | List payload metadata (paginated) |
|
|
||||||
| `GET /api/payloads/:id` | GET | Get payload metadata by ID |
|
|
||||||
| `GET /api/payloads/:id/data` | GET | Get full payload JSON (decompressed) |
|
|
||||||
| `GET /api/payloads/store/:dispensaryId` | GET | List payloads for a store |
|
|
||||||
| `GET /api/payloads/store/:dispensaryId/latest` | GET | Get latest payload for a store |
|
|
||||||
| `GET /api/payloads/store/:dispensaryId/diff` | GET | Diff two payloads for changes |
|
|
||||||
|
|
||||||
### Payload Diff Response
|
|
||||||
|
|
||||||
The diff endpoint returns:
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"success": true,
|
|
||||||
"from": { "id": 123, "fetchedAt": "...", "productCount": 100 },
|
|
||||||
"to": { "id": 456, "fetchedAt": "...", "productCount": 105 },
|
|
||||||
"diff": {
|
|
||||||
"added": 10,
|
|
||||||
"removed": 5,
|
|
||||||
"priceChanges": 8,
|
|
||||||
"stockChanges": 12
|
|
||||||
},
|
|
||||||
"details": {
|
|
||||||
"added": [...],
|
|
||||||
"removed": [...],
|
|
||||||
"priceChanges": [...],
|
|
||||||
"stockChanges": [...]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## API Endpoints
|
|
||||||
|
|
||||||
### Schedules (NEW)
|
|
||||||
|
|
||||||
| Endpoint | Method | Description |
|
|
||||||
|----------|--------|-------------|
|
|
||||||
| `GET /api/schedules` | GET | List all schedules |
|
|
||||||
| `PUT /api/schedules/:id` | PUT | Update schedule |
|
|
||||||
| `POST /api/schedules/:id/trigger` | POST | Run schedule immediately |
|
|
||||||
|
|
||||||
### Task Creation (rewired 2024-12-10)
|
|
||||||
|
|
||||||
| Endpoint | Method | Description |
|
|
||||||
|----------|--------|-------------|
|
|
||||||
| `POST /api/job-queue/enqueue` | POST | Create single task |
|
|
||||||
| `POST /api/job-queue/enqueue-batch` | POST | Create batch tasks |
|
|
||||||
| `POST /api/job-queue/enqueue-state` | POST | Create tasks for state |
|
|
||||||
| `POST /api/tasks` | POST | Direct task creation |
|
|
||||||
|
|
||||||
### Task Management
|
|
||||||
|
|
||||||
| Endpoint | Method | Description |
|
|
||||||
|----------|--------|-------------|
|
|
||||||
| `GET /api/tasks` | GET | List tasks |
|
|
||||||
| `GET /api/tasks/:id` | GET | Get single task |
|
|
||||||
| `GET /api/tasks/counts` | GET | Task counts by status |
|
|
||||||
| `POST /api/tasks/recover-stale` | POST | Recover stale tasks |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Key Files
|
|
||||||
|
|
||||||
| File | Purpose |
|
|
||||||
|------|---------|
|
|
||||||
| `src/services/task-scheduler.ts` | **NEW** - DB-driven scheduler |
|
|
||||||
| `src/tasks/task-worker.ts` | Worker that processes tasks |
|
|
||||||
| `src/tasks/task-service.ts` | Task CRUD operations |
|
|
||||||
| `src/tasks/handlers/payload-fetch.ts` | **NEW** - Fetches from API, saves to disk |
|
|
||||||
| `src/tasks/handlers/product-refresh.ts` | **CHANGED** - Reads from disk, processes to DB |
|
|
||||||
| `src/utils/payload-storage.ts` | **NEW** - Payload save/load utilities |
|
|
||||||
| `src/routes/tasks.ts` | Task API endpoints |
|
|
||||||
| `src/routes/job-queue.ts` | Job Queue UI endpoints (rewired) |
|
|
||||||
| `migrations/079_task_schedules.sql` | Schedule table |
|
|
||||||
| `migrations/080_raw_crawl_payloads.sql` | Payload metadata table |
|
|
||||||
| `migrations/081_payload_fetch_columns.sql` | payload, last_fetch_at columns |
|
|
||||||
| `migrations/074_worker_task_queue.sql` | Task queue table |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Legacy Code (DEPRECATED)
|
|
||||||
|
|
||||||
| File | Status | Replacement |
|
|
||||||
|------|--------|-------------|
|
|
||||||
| `src/services/scheduler.ts` | DEPRECATED | `task-scheduler.ts` |
|
|
||||||
| `dispensary_crawl_jobs` table | ORPHANED | `worker_tasks` |
|
|
||||||
| `job_schedules` table | LEGACY | `task_schedules` |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Dashboard Integration
|
|
||||||
|
|
||||||
Both pages remain wired to the dashboard:
|
|
||||||
|
|
||||||
| Page | Data Source | Actions |
|
|
||||||
|------|-------------|---------|
|
|
||||||
| **Job Queue** | `worker_tasks`, `task_schedules` | Create tasks, view schedules |
|
|
||||||
| **Task Queue** | `worker_tasks` | View tasks, recover stale |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Multi-Replica Safety
|
|
||||||
|
|
||||||
The scheduler uses `SELECT FOR UPDATE SKIP LOCKED` to ensure:
|
|
||||||
|
|
||||||
1. **Only one replica** executes a schedule at a time
|
|
||||||
2. **No duplicate tasks** created
|
|
||||||
3. **Survives pod restarts** - state in DB, not memory
|
|
||||||
4. **Self-healing** - recovers stale tasks on startup
|
|
||||||
|
|
||||||
```sql
|
|
||||||
-- This query is atomic across all API server replicas
|
|
||||||
SELECT * FROM task_schedules
|
|
||||||
WHERE enabled = true AND next_run_at <= NOW()
|
|
||||||
FOR UPDATE SKIP LOCKED
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Worker Scaling (K8s)
|
|
||||||
|
|
||||||
Workers run as a StatefulSet in Kubernetes. You can scale from the admin UI or CLI.
|
|
||||||
|
|
||||||
### From Admin UI
|
|
||||||
|
|
||||||
The Workers page (`/admin/workers`) provides:
|
|
||||||
- Current replica count display
|
|
||||||
- Scale up/down buttons
|
|
||||||
- Target replica input
|
|
||||||
|
|
||||||
### API Endpoints
|
|
||||||
|
|
||||||
| Endpoint | Method | Description |
|
|
||||||
|----------|--------|-------------|
|
|
||||||
| `GET /api/workers/k8s/replicas` | GET | Get current/desired replica counts |
|
|
||||||
| `POST /api/workers/k8s/scale` | POST | Scale to N replicas (body: `{ replicas: N }`) |
|
|
||||||
|
|
||||||
### From CLI
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# View current replicas
|
|
||||||
kubectl get statefulset scraper-worker -n dispensary-scraper
|
|
||||||
|
|
||||||
# Scale to 10 workers
|
|
||||||
kubectl scale statefulset scraper-worker -n dispensary-scraper --replicas=10
|
|
||||||
|
|
||||||
# Scale down to 3 workers
|
|
||||||
kubectl scale statefulset scraper-worker -n dispensary-scraper --replicas=3
|
|
||||||
```
|
|
||||||
|
|
||||||
### Configuration
|
|
||||||
|
|
||||||
Environment variables for the API server:
|
|
||||||
|
|
||||||
| Variable | Default | Description |
|
|
||||||
|----------|---------|-------------|
|
|
||||||
| `K8S_NAMESPACE` | `dispensary-scraper` | Kubernetes namespace |
|
|
||||||
| `K8S_WORKER_STATEFULSET` | `scraper-worker` | StatefulSet name |
|
|
||||||
|
|
||||||
### RBAC Requirements
|
|
||||||
|
|
||||||
The API server pod needs these K8s permissions:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
apiVersion: rbac.authorization.k8s.io/v1
|
|
||||||
kind: Role
|
|
||||||
metadata:
|
|
||||||
name: worker-scaler
|
|
||||||
namespace: dispensary-scraper
|
|
||||||
rules:
|
|
||||||
- apiGroups: ["apps"]
|
|
||||||
resources: ["statefulsets"]
|
|
||||||
verbs: ["get", "patch"]
|
|
||||||
---
|
|
||||||
apiVersion: rbac.authorization.k8s.io/v1
|
|
||||||
kind: RoleBinding
|
|
||||||
metadata:
|
|
||||||
name: scraper-worker-scaler
|
|
||||||
namespace: dispensary-scraper
|
|
||||||
subjects:
|
|
||||||
- kind: ServiceAccount
|
|
||||||
name: default
|
|
||||||
namespace: dispensary-scraper
|
|
||||||
roleRef:
|
|
||||||
kind: Role
|
|
||||||
name: worker-scaler
|
|
||||||
apiGroup: rbac.authorization.k8s.io
|
|
||||||
```
|
|
||||||
@@ -1,8 +0,0 @@
|
|||||||
-- Migration 078: Add consecutive_403_count to proxies table
|
|
||||||
-- Per workflow-12102025.md: Track consecutive 403s per proxy
|
|
||||||
-- After 3 consecutive 403s with different fingerprints → disable proxy
|
|
||||||
|
|
||||||
ALTER TABLE proxies ADD COLUMN IF NOT EXISTS consecutive_403_count INTEGER DEFAULT 0;
|
|
||||||
|
|
||||||
-- Add comment explaining the column
|
|
||||||
COMMENT ON COLUMN proxies.consecutive_403_count IS 'Tracks consecutive 403 blocks. Reset to 0 on success. Proxy disabled at 3.';
|
|
||||||
@@ -1,49 +0,0 @@
|
|||||||
-- Migration 079: Task Schedules for Database-Driven Scheduler
|
|
||||||
-- Per TASK_WORKFLOW_2024-12-10.md: Replaces node-cron with DB-driven scheduling
|
|
||||||
--
|
|
||||||
-- 2024-12-10: Created for reliable, multi-replica-safe task scheduling
|
|
||||||
|
|
||||||
-- task_schedules: Stores schedule definitions and state
|
|
||||||
CREATE TABLE IF NOT EXISTS task_schedules (
|
|
||||||
id SERIAL PRIMARY KEY,
|
|
||||||
name VARCHAR(100) NOT NULL UNIQUE,
|
|
||||||
role VARCHAR(50) NOT NULL, -- TaskRole: product_refresh, store_discovery, etc.
|
|
||||||
description TEXT,
|
|
||||||
|
|
||||||
-- Schedule configuration
|
|
||||||
enabled BOOLEAN DEFAULT TRUE,
|
|
||||||
interval_hours INTEGER NOT NULL DEFAULT 4,
|
|
||||||
priority INTEGER DEFAULT 0,
|
|
||||||
|
|
||||||
-- Optional scope filters
|
|
||||||
state_code VARCHAR(2), -- NULL = all states
|
|
||||||
platform VARCHAR(50), -- NULL = all platforms
|
|
||||||
|
|
||||||
-- Execution state (updated by scheduler)
|
|
||||||
last_run_at TIMESTAMPTZ,
|
|
||||||
next_run_at TIMESTAMPTZ,
|
|
||||||
last_task_count INTEGER DEFAULT 0,
|
|
||||||
last_error TEXT,
|
|
||||||
|
|
||||||
created_at TIMESTAMPTZ DEFAULT NOW(),
|
|
||||||
updated_at TIMESTAMPTZ DEFAULT NOW()
|
|
||||||
);
|
|
||||||
|
|
||||||
-- Indexes for scheduler queries
|
|
||||||
CREATE INDEX IF NOT EXISTS idx_task_schedules_enabled ON task_schedules(enabled) WHERE enabled = TRUE;
|
|
||||||
CREATE INDEX IF NOT EXISTS idx_task_schedules_next_run ON task_schedules(next_run_at) WHERE enabled = TRUE;
|
|
||||||
|
|
||||||
-- Insert default schedules
|
|
||||||
INSERT INTO task_schedules (name, role, interval_hours, priority, description, next_run_at)
|
|
||||||
VALUES
|
|
||||||
('product_refresh_all', 'product_refresh', 4, 0, 'Generate product refresh tasks for all crawl-enabled stores every 4 hours', NOW()),
|
|
||||||
('store_discovery_dutchie', 'store_discovery', 24, 5, 'Discover new Dutchie stores daily', NOW()),
|
|
||||||
('analytics_refresh', 'analytics_refresh', 6, 0, 'Refresh analytics materialized views every 6 hours', NOW())
|
|
||||||
ON CONFLICT (name) DO NOTHING;
|
|
||||||
|
|
||||||
-- Comment for documentation
|
|
||||||
COMMENT ON TABLE task_schedules IS 'Database-driven task scheduler configuration. Per TASK_WORKFLOW_2024-12-10.md:
|
|
||||||
- Schedules persist in DB (survive restarts)
|
|
||||||
- Uses SELECT FOR UPDATE SKIP LOCKED for multi-replica safety
|
|
||||||
- Scheduler polls every 60s and executes due schedules
|
|
||||||
- Creates tasks in worker_tasks for task-worker.ts to process';
|
|
||||||
@@ -1,58 +0,0 @@
|
|||||||
-- Migration 080: Raw Crawl Payloads Metadata Table
|
|
||||||
-- Per TASK_WORKFLOW_2024-12-10.md: Store full GraphQL payloads for historical analysis
|
|
||||||
--
|
|
||||||
-- Design Pattern: Metadata/Payload Separation
|
|
||||||
-- - Metadata (this table): Small, indexed, queryable
|
|
||||||
-- - Payload (filesystem): Gzipped JSON at storage_path
|
|
||||||
--
|
|
||||||
-- Benefits:
|
|
||||||
-- - Compare any two crawls to see what changed
|
|
||||||
-- - Replay/re-normalize historical data if logic changes
|
|
||||||
-- - Debug issues by seeing exactly what the API returned
|
|
||||||
-- - DB stays small, backups stay fast
|
|
||||||
--
|
|
||||||
-- Storage location: /storage/payloads/{year}/{month}/{day}/store_{id}_{timestamp}.json.gz
|
|
||||||
-- Compression: ~90% reduction (1.5MB -> 150KB per crawl)
|
|
||||||
|
|
||||||
CREATE TABLE IF NOT EXISTS raw_crawl_payloads (
|
|
||||||
id SERIAL PRIMARY KEY,
|
|
||||||
|
|
||||||
-- Links to crawl tracking
|
|
||||||
crawl_run_id INTEGER REFERENCES crawl_runs(id) ON DELETE SET NULL,
|
|
||||||
dispensary_id INTEGER NOT NULL REFERENCES dispensaries(id) ON DELETE CASCADE,
|
|
||||||
|
|
||||||
-- File location (gzipped JSON)
|
|
||||||
storage_path TEXT NOT NULL,
|
|
||||||
|
|
||||||
-- Metadata for quick queries without loading file
|
|
||||||
product_count INTEGER NOT NULL DEFAULT 0,
|
|
||||||
size_bytes INTEGER, -- Compressed size
|
|
||||||
size_bytes_raw INTEGER, -- Uncompressed size
|
|
||||||
|
|
||||||
-- Timestamps
|
|
||||||
fetched_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
|
||||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
|
||||||
|
|
||||||
-- Optional: checksum for integrity verification
|
|
||||||
checksum_sha256 VARCHAR(64)
|
|
||||||
);
|
|
||||||
|
|
||||||
-- Indexes for common queries
|
|
||||||
CREATE INDEX IF NOT EXISTS idx_raw_crawl_payloads_dispensary
|
|
||||||
ON raw_crawl_payloads(dispensary_id);
|
|
||||||
|
|
||||||
CREATE INDEX IF NOT EXISTS idx_raw_crawl_payloads_dispensary_fetched
|
|
||||||
ON raw_crawl_payloads(dispensary_id, fetched_at DESC);
|
|
||||||
|
|
||||||
CREATE INDEX IF NOT EXISTS idx_raw_crawl_payloads_fetched
|
|
||||||
ON raw_crawl_payloads(fetched_at DESC);
|
|
||||||
|
|
||||||
CREATE INDEX IF NOT EXISTS idx_raw_crawl_payloads_crawl_run
|
|
||||||
ON raw_crawl_payloads(crawl_run_id)
|
|
||||||
WHERE crawl_run_id IS NOT NULL;
|
|
||||||
|
|
||||||
-- Comments
|
|
||||||
COMMENT ON TABLE raw_crawl_payloads IS 'Metadata for raw GraphQL payloads stored on filesystem. Per TASK_WORKFLOW_2024-12-10.md: Full payloads enable historical diffs and replay.';
|
|
||||||
COMMENT ON COLUMN raw_crawl_payloads.storage_path IS 'Path to gzipped JSON file, e.g. /storage/payloads/2024/12/10/store_123_1702234567.json.gz';
|
|
||||||
COMMENT ON COLUMN raw_crawl_payloads.size_bytes IS 'Compressed file size in bytes';
|
|
||||||
COMMENT ON COLUMN raw_crawl_payloads.size_bytes_raw IS 'Uncompressed payload size in bytes';
|
|
||||||
@@ -1,37 +0,0 @@
|
|||||||
-- Migration 081: Payload Fetch Columns
|
|
||||||
-- Per TASK_WORKFLOW_2024-12-10.md: Separates API fetch from data processing
|
|
||||||
--
|
|
||||||
-- New architecture:
|
|
||||||
-- - payload_fetch: Hits Dutchie API, saves raw payload to disk
|
|
||||||
-- - product_refresh: Reads local payload, normalizes, upserts to DB
|
|
||||||
--
|
|
||||||
-- This migration adds:
|
|
||||||
-- 1. payload column to worker_tasks (for task chaining data)
|
|
||||||
-- 2. processed_at column to raw_crawl_payloads (track when payload was processed)
|
|
||||||
-- 3. last_fetch_at column to dispensaries (track when last payload was fetched)
|
|
||||||
|
|
||||||
-- Add payload column to worker_tasks for task chaining
|
|
||||||
-- Used by payload_fetch to pass payload_id to product_refresh
|
|
||||||
ALTER TABLE worker_tasks
|
|
||||||
ADD COLUMN IF NOT EXISTS payload JSONB DEFAULT NULL;
|
|
||||||
|
|
||||||
COMMENT ON COLUMN worker_tasks.payload IS 'Per TASK_WORKFLOW_2024-12-10.md: Task chaining data (e.g., payload_id from payload_fetch to product_refresh)';
|
|
||||||
|
|
||||||
-- Add processed_at to raw_crawl_payloads
|
|
||||||
-- Tracks when the payload was processed by product_refresh
|
|
||||||
ALTER TABLE raw_crawl_payloads
|
|
||||||
ADD COLUMN IF NOT EXISTS processed_at TIMESTAMPTZ DEFAULT NULL;
|
|
||||||
|
|
||||||
COMMENT ON COLUMN raw_crawl_payloads.processed_at IS 'When this payload was processed by product_refresh handler';
|
|
||||||
|
|
||||||
-- Index for finding unprocessed payloads
|
|
||||||
CREATE INDEX IF NOT EXISTS idx_raw_crawl_payloads_unprocessed
|
|
||||||
ON raw_crawl_payloads(dispensary_id, fetched_at DESC)
|
|
||||||
WHERE processed_at IS NULL;
|
|
||||||
|
|
||||||
-- Add last_fetch_at to dispensaries
|
|
||||||
-- Tracks when the last payload was fetched (separate from last_crawl_at which is when processing completed)
|
|
||||||
ALTER TABLE dispensaries
|
|
||||||
ADD COLUMN IF NOT EXISTS last_fetch_at TIMESTAMPTZ DEFAULT NULL;
|
|
||||||
|
|
||||||
COMMENT ON COLUMN dispensaries.last_fetch_at IS 'Per TASK_WORKFLOW_2024-12-10.md: When last payload was fetched from API (separate from last_crawl_at which is when processing completed)';
|
|
||||||
@@ -1,27 +0,0 @@
|
|||||||
-- Migration: 082_proxy_notification_trigger
|
|
||||||
-- Date: 2024-12-11
|
|
||||||
-- Description: Add PostgreSQL NOTIFY trigger to alert workers when proxies are added
|
|
||||||
|
|
||||||
-- Create function to notify workers when active proxy is added/activated
|
|
||||||
CREATE OR REPLACE FUNCTION notify_proxy_added()
|
|
||||||
RETURNS TRIGGER AS $$
|
|
||||||
BEGIN
|
|
||||||
-- Only notify if proxy is active
|
|
||||||
IF NEW.active = true THEN
|
|
||||||
PERFORM pg_notify('proxy_added', NEW.id::text);
|
|
||||||
END IF;
|
|
||||||
RETURN NEW;
|
|
||||||
END;
|
|
||||||
$$ LANGUAGE plpgsql;
|
|
||||||
|
|
||||||
-- Drop existing trigger if any
|
|
||||||
DROP TRIGGER IF EXISTS proxy_added_trigger ON proxies;
|
|
||||||
|
|
||||||
-- Create trigger on insert and update of active column
|
|
||||||
CREATE TRIGGER proxy_added_trigger
|
|
||||||
AFTER INSERT OR UPDATE OF active ON proxies
|
|
||||||
FOR EACH ROW
|
|
||||||
EXECUTE FUNCTION notify_proxy_added();
|
|
||||||
|
|
||||||
COMMENT ON FUNCTION notify_proxy_added() IS
|
|
||||||
'Sends PostgreSQL NOTIFY to proxy_added channel when an active proxy is added or activated. Workers LISTEN on this channel to wake up immediately.';
|
|
||||||
286
backend/node_modules/.package-lock.json
generated
vendored
286
backend/node_modules/.package-lock.json
generated
vendored
@@ -1,6 +1,6 @@
|
|||||||
{
|
{
|
||||||
"name": "dutchie-menus-backend",
|
"name": "dutchie-menus-backend",
|
||||||
"version": "1.6.0",
|
"version": "1.5.1",
|
||||||
"lockfileVersion": 3,
|
"lockfileVersion": 3,
|
||||||
"requires": true,
|
"requires": true,
|
||||||
"packages": {
|
"packages": {
|
||||||
@@ -46,97 +46,6 @@
|
|||||||
"resolved": "https://registry.npmjs.org/@ioredis/commands/-/commands-1.4.0.tgz",
|
"resolved": "https://registry.npmjs.org/@ioredis/commands/-/commands-1.4.0.tgz",
|
||||||
"integrity": "sha512-aFT2yemJJo+TZCmieA7qnYGQooOS7QfNmYrzGtsYd3g9j5iDP8AimYYAesf79ohjbLG12XxC4nG5DyEnC88AsQ=="
|
"integrity": "sha512-aFT2yemJJo+TZCmieA7qnYGQooOS7QfNmYrzGtsYd3g9j5iDP8AimYYAesf79ohjbLG12XxC4nG5DyEnC88AsQ=="
|
||||||
},
|
},
|
||||||
"node_modules/@jsep-plugin/assignment": {
|
|
||||||
"version": "1.3.0",
|
|
||||||
"resolved": "https://registry.npmjs.org/@jsep-plugin/assignment/-/assignment-1.3.0.tgz",
|
|
||||||
"integrity": "sha512-VVgV+CXrhbMI3aSusQyclHkenWSAm95WaiKrMxRFam3JSUiIaQjoMIw2sEs/OX4XifnqeQUN4DYbJjlA8EfktQ==",
|
|
||||||
"engines": {
|
|
||||||
"node": ">= 10.16.0"
|
|
||||||
},
|
|
||||||
"peerDependencies": {
|
|
||||||
"jsep": "^0.4.0||^1.0.0"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/@jsep-plugin/regex": {
|
|
||||||
"version": "1.0.4",
|
|
||||||
"resolved": "https://registry.npmjs.org/@jsep-plugin/regex/-/regex-1.0.4.tgz",
|
|
||||||
"integrity": "sha512-q7qL4Mgjs1vByCaTnDFcBnV9HS7GVPJX5vyVoCgZHNSC9rjwIlmbXG5sUuorR5ndfHAIlJ8pVStxvjXHbNvtUg==",
|
|
||||||
"engines": {
|
|
||||||
"node": ">= 10.16.0"
|
|
||||||
},
|
|
||||||
"peerDependencies": {
|
|
||||||
"jsep": "^0.4.0||^1.0.0"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/@kubernetes/client-node": {
|
|
||||||
"version": "1.4.0",
|
|
||||||
"resolved": "https://registry.npmjs.org/@kubernetes/client-node/-/client-node-1.4.0.tgz",
|
|
||||||
"integrity": "sha512-Zge3YvF7DJi264dU1b3wb/GmzR99JhUpqTvp+VGHfwZT+g7EOOYNScDJNZwXy9cszyIGPIs0VHr+kk8e95qqrA==",
|
|
||||||
"dependencies": {
|
|
||||||
"@types/js-yaml": "^4.0.1",
|
|
||||||
"@types/node": "^24.0.0",
|
|
||||||
"@types/node-fetch": "^2.6.13",
|
|
||||||
"@types/stream-buffers": "^3.0.3",
|
|
||||||
"form-data": "^4.0.0",
|
|
||||||
"hpagent": "^1.2.0",
|
|
||||||
"isomorphic-ws": "^5.0.0",
|
|
||||||
"js-yaml": "^4.1.0",
|
|
||||||
"jsonpath-plus": "^10.3.0",
|
|
||||||
"node-fetch": "^2.7.0",
|
|
||||||
"openid-client": "^6.1.3",
|
|
||||||
"rfc4648": "^1.3.0",
|
|
||||||
"socks-proxy-agent": "^8.0.4",
|
|
||||||
"stream-buffers": "^3.0.2",
|
|
||||||
"tar-fs": "^3.0.9",
|
|
||||||
"ws": "^8.18.2"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/@kubernetes/client-node/node_modules/@types/node": {
|
|
||||||
"version": "24.10.3",
|
|
||||||
"resolved": "https://registry.npmjs.org/@types/node/-/node-24.10.3.tgz",
|
|
||||||
"integrity": "sha512-gqkrWUsS8hcm0r44yn7/xZeV1ERva/nLgrLxFRUGb7aoNMIJfZJ3AC261zDQuOAKC7MiXai1WCpYc48jAHoShQ==",
|
|
||||||
"dependencies": {
|
|
||||||
"undici-types": "~7.16.0"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/@kubernetes/client-node/node_modules/tar-fs": {
|
|
||||||
"version": "3.1.1",
|
|
||||||
"resolved": "https://registry.npmjs.org/tar-fs/-/tar-fs-3.1.1.tgz",
|
|
||||||
"integrity": "sha512-LZA0oaPOc2fVo82Txf3gw+AkEd38szODlptMYejQUhndHMLQ9M059uXR+AfS7DNo0NpINvSqDsvyaCrBVkptWg==",
|
|
||||||
"dependencies": {
|
|
||||||
"pump": "^3.0.0",
|
|
||||||
"tar-stream": "^3.1.5"
|
|
||||||
},
|
|
||||||
"optionalDependencies": {
|
|
||||||
"bare-fs": "^4.0.1",
|
|
||||||
"bare-path": "^3.0.0"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/@kubernetes/client-node/node_modules/undici-types": {
|
|
||||||
"version": "7.16.0",
|
|
||||||
"resolved": "https://registry.npmjs.org/undici-types/-/undici-types-7.16.0.tgz",
|
|
||||||
"integrity": "sha512-Zz+aZWSj8LE6zoxD+xrjh4VfkIG8Ya6LvYkZqtUQGJPZjYl53ypCaUwWqo7eI0x66KBGeRo+mlBEkMSeSZ38Nw=="
|
|
||||||
},
|
|
||||||
"node_modules/@kubernetes/client-node/node_modules/ws": {
|
|
||||||
"version": "8.18.3",
|
|
||||||
"resolved": "https://registry.npmjs.org/ws/-/ws-8.18.3.tgz",
|
|
||||||
"integrity": "sha512-PEIGCY5tSlUt50cqyMXfCzX+oOPqN0vuGqWzbcJ2xvnkzkq46oOpz7dQaTDBdfICb4N14+GARUDw2XV2N4tvzg==",
|
|
||||||
"engines": {
|
|
||||||
"node": ">=10.0.0"
|
|
||||||
},
|
|
||||||
"peerDependencies": {
|
|
||||||
"bufferutil": "^4.0.1",
|
|
||||||
"utf-8-validate": ">=5.0.2"
|
|
||||||
},
|
|
||||||
"peerDependenciesMeta": {
|
|
||||||
"bufferutil": {
|
|
||||||
"optional": true
|
|
||||||
},
|
|
||||||
"utf-8-validate": {
|
|
||||||
"optional": true
|
|
||||||
}
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/@mapbox/node-pre-gyp": {
|
"node_modules/@mapbox/node-pre-gyp": {
|
||||||
"version": "1.0.11",
|
"version": "1.0.11",
|
||||||
"resolved": "https://registry.npmjs.org/@mapbox/node-pre-gyp/-/node-pre-gyp-1.0.11.tgz",
|
"resolved": "https://registry.npmjs.org/@mapbox/node-pre-gyp/-/node-pre-gyp-1.0.11.tgz",
|
||||||
@@ -342,11 +251,6 @@
|
|||||||
"integrity": "sha512-r8Tayk8HJnX0FztbZN7oVqGccWgw98T/0neJphO91KkmOzug1KkofZURD4UaD5uH8AqcFLfdPErnBod0u71/qg==",
|
"integrity": "sha512-r8Tayk8HJnX0FztbZN7oVqGccWgw98T/0neJphO91KkmOzug1KkofZURD4UaD5uH8AqcFLfdPErnBod0u71/qg==",
|
||||||
"dev": true
|
"dev": true
|
||||||
},
|
},
|
||||||
"node_modules/@types/js-yaml": {
|
|
||||||
"version": "4.0.9",
|
|
||||||
"resolved": "https://registry.npmjs.org/@types/js-yaml/-/js-yaml-4.0.9.tgz",
|
|
||||||
"integrity": "sha512-k4MGaQl5TGo/iipqb2UDG2UwjXziSWkh0uysQelTlJpX1qGlpUZYm8PnO4DxG1qBomtJUdYJ6qR6xdIah10JLg=="
|
|
||||||
},
|
|
||||||
"node_modules/@types/jsonwebtoken": {
|
"node_modules/@types/jsonwebtoken": {
|
||||||
"version": "9.0.10",
|
"version": "9.0.10",
|
||||||
"resolved": "https://registry.npmjs.org/@types/jsonwebtoken/-/jsonwebtoken-9.0.10.tgz",
|
"resolved": "https://registry.npmjs.org/@types/jsonwebtoken/-/jsonwebtoken-9.0.10.tgz",
|
||||||
@@ -372,6 +276,7 @@
|
|||||||
"version": "20.19.25",
|
"version": "20.19.25",
|
||||||
"resolved": "https://registry.npmjs.org/@types/node/-/node-20.19.25.tgz",
|
"resolved": "https://registry.npmjs.org/@types/node/-/node-20.19.25.tgz",
|
||||||
"integrity": "sha512-ZsJzA5thDQMSQO788d7IocwwQbI8B5OPzmqNvpf3NY/+MHDAS759Wo0gd2WQeXYt5AAAQjzcrTVC6SKCuYgoCQ==",
|
"integrity": "sha512-ZsJzA5thDQMSQO788d7IocwwQbI8B5OPzmqNvpf3NY/+MHDAS759Wo0gd2WQeXYt5AAAQjzcrTVC6SKCuYgoCQ==",
|
||||||
|
"devOptional": true,
|
||||||
"dependencies": {
|
"dependencies": {
|
||||||
"undici-types": "~6.21.0"
|
"undici-types": "~6.21.0"
|
||||||
}
|
}
|
||||||
@@ -382,15 +287,6 @@
|
|||||||
"integrity": "sha512-0ikrnug3/IyneSHqCBeslAhlK2aBfYek1fGo4bP4QnZPmiqSGRK+Oy7ZMisLWkesffJvQ1cqAcBnJC+8+nxIAg==",
|
"integrity": "sha512-0ikrnug3/IyneSHqCBeslAhlK2aBfYek1fGo4bP4QnZPmiqSGRK+Oy7ZMisLWkesffJvQ1cqAcBnJC+8+nxIAg==",
|
||||||
"dev": true
|
"dev": true
|
||||||
},
|
},
|
||||||
"node_modules/@types/node-fetch": {
|
|
||||||
"version": "2.6.13",
|
|
||||||
"resolved": "https://registry.npmjs.org/@types/node-fetch/-/node-fetch-2.6.13.tgz",
|
|
||||||
"integrity": "sha512-QGpRVpzSaUs30JBSGPjOg4Uveu384erbHBoT1zeONvyCfwQxIkUshLAOqN/k9EjGviPRmWTTe6aH2qySWKTVSw==",
|
|
||||||
"dependencies": {
|
|
||||||
"@types/node": "*",
|
|
||||||
"form-data": "^4.0.4"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/@types/pg": {
|
"node_modules/@types/pg": {
|
||||||
"version": "8.15.6",
|
"version": "8.15.6",
|
||||||
"resolved": "https://registry.npmjs.org/@types/pg/-/pg-8.15.6.tgz",
|
"resolved": "https://registry.npmjs.org/@types/pg/-/pg-8.15.6.tgz",
|
||||||
@@ -444,14 +340,6 @@
|
|||||||
"@types/node": "*"
|
"@types/node": "*"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/@types/stream-buffers": {
|
|
||||||
"version": "3.0.8",
|
|
||||||
"resolved": "https://registry.npmjs.org/@types/stream-buffers/-/stream-buffers-3.0.8.tgz",
|
|
||||||
"integrity": "sha512-J+7VaHKNvlNPJPEJXX/fKa9DZtR/xPMwuIbe+yNOwp1YB+ApUOBv2aUpEoBJEi8nJgbgs1x8e73ttg0r1rSUdw==",
|
|
||||||
"dependencies": {
|
|
||||||
"@types/node": "*"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/@types/uuid": {
|
"node_modules/@types/uuid": {
|
||||||
"version": "9.0.8",
|
"version": "9.0.8",
|
||||||
"resolved": "https://registry.npmjs.org/@types/uuid/-/uuid-9.0.8.tgz",
|
"resolved": "https://registry.npmjs.org/@types/uuid/-/uuid-9.0.8.tgz",
|
||||||
@@ -632,78 +520,6 @@
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/bare-fs": {
|
|
||||||
"version": "4.5.2",
|
|
||||||
"resolved": "https://registry.npmjs.org/bare-fs/-/bare-fs-4.5.2.tgz",
|
|
||||||
"integrity": "sha512-veTnRzkb6aPHOvSKIOy60KzURfBdUflr5VReI+NSaPL6xf+XLdONQgZgpYvUuZLVQ8dCqxpBAudaOM1+KpAUxw==",
|
|
||||||
"optional": true,
|
|
||||||
"dependencies": {
|
|
||||||
"bare-events": "^2.5.4",
|
|
||||||
"bare-path": "^3.0.0",
|
|
||||||
"bare-stream": "^2.6.4",
|
|
||||||
"bare-url": "^2.2.2",
|
|
||||||
"fast-fifo": "^1.3.2"
|
|
||||||
},
|
|
||||||
"engines": {
|
|
||||||
"bare": ">=1.16.0"
|
|
||||||
},
|
|
||||||
"peerDependencies": {
|
|
||||||
"bare-buffer": "*"
|
|
||||||
},
|
|
||||||
"peerDependenciesMeta": {
|
|
||||||
"bare-buffer": {
|
|
||||||
"optional": true
|
|
||||||
}
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/bare-os": {
|
|
||||||
"version": "3.6.2",
|
|
||||||
"resolved": "https://registry.npmjs.org/bare-os/-/bare-os-3.6.2.tgz",
|
|
||||||
"integrity": "sha512-T+V1+1srU2qYNBmJCXZkUY5vQ0B4FSlL3QDROnKQYOqeiQR8UbjNHlPa+TIbM4cuidiN9GaTaOZgSEgsvPbh5A==",
|
|
||||||
"optional": true,
|
|
||||||
"engines": {
|
|
||||||
"bare": ">=1.14.0"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/bare-path": {
|
|
||||||
"version": "3.0.0",
|
|
||||||
"resolved": "https://registry.npmjs.org/bare-path/-/bare-path-3.0.0.tgz",
|
|
||||||
"integrity": "sha512-tyfW2cQcB5NN8Saijrhqn0Zh7AnFNsnczRcuWODH0eYAXBsJ5gVxAUuNr7tsHSC6IZ77cA0SitzT+s47kot8Mw==",
|
|
||||||
"optional": true,
|
|
||||||
"dependencies": {
|
|
||||||
"bare-os": "^3.0.1"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/bare-stream": {
|
|
||||||
"version": "2.7.0",
|
|
||||||
"resolved": "https://registry.npmjs.org/bare-stream/-/bare-stream-2.7.0.tgz",
|
|
||||||
"integrity": "sha512-oyXQNicV1y8nc2aKffH+BUHFRXmx6VrPzlnaEvMhram0nPBrKcEdcyBg5r08D0i8VxngHFAiVyn1QKXpSG0B8A==",
|
|
||||||
"optional": true,
|
|
||||||
"dependencies": {
|
|
||||||
"streamx": "^2.21.0"
|
|
||||||
},
|
|
||||||
"peerDependencies": {
|
|
||||||
"bare-buffer": "*",
|
|
||||||
"bare-events": "*"
|
|
||||||
},
|
|
||||||
"peerDependenciesMeta": {
|
|
||||||
"bare-buffer": {
|
|
||||||
"optional": true
|
|
||||||
},
|
|
||||||
"bare-events": {
|
|
||||||
"optional": true
|
|
||||||
}
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/bare-url": {
|
|
||||||
"version": "2.3.2",
|
|
||||||
"resolved": "https://registry.npmjs.org/bare-url/-/bare-url-2.3.2.tgz",
|
|
||||||
"integrity": "sha512-ZMq4gd9ngV5aTMa5p9+UfY0b3skwhHELaDkhEHetMdX0LRkW9kzaym4oo/Eh+Ghm0CCDuMTsRIGM/ytUc1ZYmw==",
|
|
||||||
"optional": true,
|
|
||||||
"dependencies": {
|
|
||||||
"bare-path": "^3.0.0"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/base64-js": {
|
"node_modules/base64-js": {
|
||||||
"version": "1.5.1",
|
"version": "1.5.1",
|
||||||
"resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.5.1.tgz",
|
"resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.5.1.tgz",
|
||||||
@@ -2203,14 +2019,6 @@
|
|||||||
"node": ">=16.0.0"
|
"node": ">=16.0.0"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/hpagent": {
|
|
||||||
"version": "1.2.0",
|
|
||||||
"resolved": "https://registry.npmjs.org/hpagent/-/hpagent-1.2.0.tgz",
|
|
||||||
"integrity": "sha512-A91dYTeIB6NoXG+PxTQpCCDDnfHsW9kc06Lvpu1TEe9gnd6ZFeiBoRO9JvzEv6xK7EX97/dUE8g/vBMTqTS3CA==",
|
|
||||||
"engines": {
|
|
||||||
"node": ">=14"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/htmlparser2": {
|
"node_modules/htmlparser2": {
|
||||||
"version": "10.0.0",
|
"version": "10.0.0",
|
||||||
"resolved": "https://registry.npmjs.org/htmlparser2/-/htmlparser2-10.0.0.tgz",
|
"resolved": "https://registry.npmjs.org/htmlparser2/-/htmlparser2-10.0.0.tgz",
|
||||||
@@ -2574,22 +2382,6 @@
|
|||||||
"node": ">=0.10.0"
|
"node": ">=0.10.0"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/isomorphic-ws": {
|
|
||||||
"version": "5.0.0",
|
|
||||||
"resolved": "https://registry.npmjs.org/isomorphic-ws/-/isomorphic-ws-5.0.0.tgz",
|
|
||||||
"integrity": "sha512-muId7Zzn9ywDsyXgTIafTry2sV3nySZeUDe6YedVd1Hvuuep5AsIlqK+XefWpYTyJG5e503F2xIuT2lcU6rCSw==",
|
|
||||||
"peerDependencies": {
|
|
||||||
"ws": "*"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/jose": {
|
|
||||||
"version": "6.1.3",
|
|
||||||
"resolved": "https://registry.npmjs.org/jose/-/jose-6.1.3.tgz",
|
|
||||||
"integrity": "sha512-0TpaTfihd4QMNwrz/ob2Bp7X04yuxJkjRGi4aKmOqwhov54i6u79oCv7T+C7lo70MKH6BesI3vscD1yb/yzKXQ==",
|
|
||||||
"funding": {
|
|
||||||
"url": "https://github.com/sponsors/panva"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/js-tokens": {
|
"node_modules/js-tokens": {
|
||||||
"version": "4.0.0",
|
"version": "4.0.0",
|
||||||
"resolved": "https://registry.npmjs.org/js-tokens/-/js-tokens-4.0.0.tgz",
|
"resolved": "https://registry.npmjs.org/js-tokens/-/js-tokens-4.0.0.tgz",
|
||||||
@@ -2606,14 +2398,6 @@
|
|||||||
"js-yaml": "bin/js-yaml.js"
|
"js-yaml": "bin/js-yaml.js"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/jsep": {
|
|
||||||
"version": "1.4.0",
|
|
||||||
"resolved": "https://registry.npmjs.org/jsep/-/jsep-1.4.0.tgz",
|
|
||||||
"integrity": "sha512-B7qPcEVE3NVkmSJbaYxvv4cHkVW7DQsZz13pUMrfS8z8Q/BuShN+gcTXrUlPiGqM2/t/EEaI030bpxMqY8gMlw==",
|
|
||||||
"engines": {
|
|
||||||
"node": ">= 10.16.0"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/json-parse-even-better-errors": {
|
"node_modules/json-parse-even-better-errors": {
|
||||||
"version": "2.3.1",
|
"version": "2.3.1",
|
||||||
"resolved": "https://registry.npmjs.org/json-parse-even-better-errors/-/json-parse-even-better-errors-2.3.1.tgz",
|
"resolved": "https://registry.npmjs.org/json-parse-even-better-errors/-/json-parse-even-better-errors-2.3.1.tgz",
|
||||||
@@ -2635,23 +2419,6 @@
|
|||||||
"graceful-fs": "^4.1.6"
|
"graceful-fs": "^4.1.6"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/jsonpath-plus": {
|
|
||||||
"version": "10.3.0",
|
|
||||||
"resolved": "https://registry.npmjs.org/jsonpath-plus/-/jsonpath-plus-10.3.0.tgz",
|
|
||||||
"integrity": "sha512-8TNmfeTCk2Le33A3vRRwtuworG/L5RrgMvdjhKZxvyShO+mBu2fP50OWUjRLNtvw344DdDarFh9buFAZs5ujeA==",
|
|
||||||
"dependencies": {
|
|
||||||
"@jsep-plugin/assignment": "^1.3.0",
|
|
||||||
"@jsep-plugin/regex": "^1.0.4",
|
|
||||||
"jsep": "^1.4.0"
|
|
||||||
},
|
|
||||||
"bin": {
|
|
||||||
"jsonpath": "bin/jsonpath-cli.js",
|
|
||||||
"jsonpath-plus": "bin/jsonpath-cli.js"
|
|
||||||
},
|
|
||||||
"engines": {
|
|
||||||
"node": ">=18.0.0"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/jsonwebtoken": {
|
"node_modules/jsonwebtoken": {
|
||||||
"version": "9.0.2",
|
"version": "9.0.2",
|
||||||
"resolved": "https://registry.npmjs.org/jsonwebtoken/-/jsonwebtoken-9.0.2.tgz",
|
"resolved": "https://registry.npmjs.org/jsonwebtoken/-/jsonwebtoken-9.0.2.tgz",
|
||||||
@@ -2726,11 +2493,6 @@
|
|||||||
"resolved": "https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz",
|
"resolved": "https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz",
|
||||||
"integrity": "sha512-v2kDEe57lecTulaDIuNTPy3Ry4gLGJ6Z1O3vE1krgXZNrsQ+LFTGHVxVjcXPs17LhbZVGedAJv8XZ1tvj5FvSg=="
|
"integrity": "sha512-v2kDEe57lecTulaDIuNTPy3Ry4gLGJ6Z1O3vE1krgXZNrsQ+LFTGHVxVjcXPs17LhbZVGedAJv8XZ1tvj5FvSg=="
|
||||||
},
|
},
|
||||||
"node_modules/lodash.clonedeep": {
|
|
||||||
"version": "4.5.0",
|
|
||||||
"resolved": "https://registry.npmjs.org/lodash.clonedeep/-/lodash.clonedeep-4.5.0.tgz",
|
|
||||||
"integrity": "sha512-H5ZhCF25riFd9uB5UCkVKo61m3S/xZk1x4wA6yp/L3RFP6Z/eHH1ymQcGLo7J3GMPfm0V/7m1tryHuGVxpqEBQ=="
|
|
||||||
},
|
|
||||||
"node_modules/lodash.defaults": {
|
"node_modules/lodash.defaults": {
|
||||||
"version": "4.2.0",
|
"version": "4.2.0",
|
||||||
"resolved": "https://registry.npmjs.org/lodash.defaults/-/lodash.defaults-4.2.0.tgz",
|
"resolved": "https://registry.npmjs.org/lodash.defaults/-/lodash.defaults-4.2.0.tgz",
|
||||||
@@ -3180,14 +2942,6 @@
|
|||||||
"url": "https://github.com/fb55/nth-check?sponsor=1"
|
"url": "https://github.com/fb55/nth-check?sponsor=1"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/oauth4webapi": {
|
|
||||||
"version": "3.8.3",
|
|
||||||
"resolved": "https://registry.npmjs.org/oauth4webapi/-/oauth4webapi-3.8.3.tgz",
|
|
||||||
"integrity": "sha512-pQ5BsX3QRTgnt5HxgHwgunIRaDXBdkT23tf8dfzmtTIL2LTpdmxgbpbBm0VgFWAIDlezQvQCTgnVIUmHupXHxw==",
|
|
||||||
"funding": {
|
|
||||||
"url": "https://github.com/sponsors/panva"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/object-assign": {
|
"node_modules/object-assign": {
|
||||||
"version": "4.1.1",
|
"version": "4.1.1",
|
||||||
"resolved": "https://registry.npmjs.org/object-assign/-/object-assign-4.1.1.tgz",
|
"resolved": "https://registry.npmjs.org/object-assign/-/object-assign-4.1.1.tgz",
|
||||||
@@ -3226,18 +2980,6 @@
|
|||||||
"wrappy": "1"
|
"wrappy": "1"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/openid-client": {
|
|
||||||
"version": "6.8.1",
|
|
||||||
"resolved": "https://registry.npmjs.org/openid-client/-/openid-client-6.8.1.tgz",
|
|
||||||
"integrity": "sha512-VoYT6enBo6Vj2j3Q5Ec0AezS+9YGzQo1f5Xc42lreMGlfP4ljiXPKVDvCADh+XHCV/bqPu/wWSiCVXbJKvrODw==",
|
|
||||||
"dependencies": {
|
|
||||||
"jose": "^6.1.0",
|
|
||||||
"oauth4webapi": "^3.8.2"
|
|
||||||
},
|
|
||||||
"funding": {
|
|
||||||
"url": "https://github.com/sponsors/panva"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/pac-proxy-agent": {
|
"node_modules/pac-proxy-agent": {
|
||||||
"version": "7.2.0",
|
"version": "7.2.0",
|
||||||
"resolved": "https://registry.npmjs.org/pac-proxy-agent/-/pac-proxy-agent-7.2.0.tgz",
|
"resolved": "https://registry.npmjs.org/pac-proxy-agent/-/pac-proxy-agent-7.2.0.tgz",
|
||||||
@@ -4141,11 +3883,6 @@
|
|||||||
"url": "https://github.com/privatenumber/resolve-pkg-maps?sponsor=1"
|
"url": "https://github.com/privatenumber/resolve-pkg-maps?sponsor=1"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/rfc4648": {
|
|
||||||
"version": "1.5.4",
|
|
||||||
"resolved": "https://registry.npmjs.org/rfc4648/-/rfc4648-1.5.4.tgz",
|
|
||||||
"integrity": "sha512-rRg/6Lb+IGfJqO05HZkN50UtY7K/JhxJag1kP23+zyMfrvoB0B7RWv06MbOzoc79RgCdNTiUaNsTT1AJZ7Z+cg=="
|
|
||||||
},
|
|
||||||
"node_modules/rimraf": {
|
"node_modules/rimraf": {
|
||||||
"version": "3.0.2",
|
"version": "3.0.2",
|
||||||
"resolved": "https://registry.npmjs.org/rimraf/-/rimraf-3.0.2.tgz",
|
"resolved": "https://registry.npmjs.org/rimraf/-/rimraf-3.0.2.tgz",
|
||||||
@@ -4576,14 +4313,6 @@
|
|||||||
"node": ">= 0.8"
|
"node": ">= 0.8"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/stream-buffers": {
|
|
||||||
"version": "3.0.3",
|
|
||||||
"resolved": "https://registry.npmjs.org/stream-buffers/-/stream-buffers-3.0.3.tgz",
|
|
||||||
"integrity": "sha512-pqMqwQCso0PBJt2PQmDO0cFj0lyqmiwOMiMSkVtRokl7e+ZTRYgDHKnuZNbqjiJXgsg4nuqtD/zxuo9KqTp0Yw==",
|
|
||||||
"engines": {
|
|
||||||
"node": ">= 0.10.0"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/streamx": {
|
"node_modules/streamx": {
|
||||||
"version": "2.23.0",
|
"version": "2.23.0",
|
||||||
"resolved": "https://registry.npmjs.org/streamx/-/streamx-2.23.0.tgz",
|
"resolved": "https://registry.npmjs.org/streamx/-/streamx-2.23.0.tgz",
|
||||||
@@ -4803,7 +4532,8 @@
|
|||||||
"node_modules/undici-types": {
|
"node_modules/undici-types": {
|
||||||
"version": "6.21.0",
|
"version": "6.21.0",
|
||||||
"resolved": "https://registry.npmjs.org/undici-types/-/undici-types-6.21.0.tgz",
|
"resolved": "https://registry.npmjs.org/undici-types/-/undici-types-6.21.0.tgz",
|
||||||
"integrity": "sha512-iwDZqg0QAGrg9Rav5H4n0M64c3mkR59cJ6wQp+7C4nI0gsmExaedaYLNO44eT4AtBBwjbTiGPMlt2Md0T9H9JQ=="
|
"integrity": "sha512-iwDZqg0QAGrg9Rav5H4n0M64c3mkR59cJ6wQp+7C4nI0gsmExaedaYLNO44eT4AtBBwjbTiGPMlt2Md0T9H9JQ==",
|
||||||
|
"devOptional": true
|
||||||
},
|
},
|
||||||
"node_modules/universalify": {
|
"node_modules/universalify": {
|
||||||
"version": "2.0.1",
|
"version": "2.0.1",
|
||||||
@@ -4826,14 +4556,6 @@
|
|||||||
"resolved": "https://registry.npmjs.org/urlpattern-polyfill/-/urlpattern-polyfill-10.0.0.tgz",
|
"resolved": "https://registry.npmjs.org/urlpattern-polyfill/-/urlpattern-polyfill-10.0.0.tgz",
|
||||||
"integrity": "sha512-H/A06tKD7sS1O1X2SshBVeA5FLycRpjqiBeqGKmBwBDBy28EnRjORxTNe269KSSr5un5qyWi1iL61wLxpd+ZOg=="
|
"integrity": "sha512-H/A06tKD7sS1O1X2SshBVeA5FLycRpjqiBeqGKmBwBDBy28EnRjORxTNe269KSSr5un5qyWi1iL61wLxpd+ZOg=="
|
||||||
},
|
},
|
||||||
"node_modules/user-agents": {
|
|
||||||
"version": "1.1.669",
|
|
||||||
"resolved": "https://registry.npmjs.org/user-agents/-/user-agents-1.1.669.tgz",
|
|
||||||
"integrity": "sha512-pbIzG+AOqCaIpySKJ4IAm1l0VyE4jMnK4y1thV8lm8PYxI+7X5uWcppOK7zY79TCKKTAnJH3/4gaVIZHsjrmJA==",
|
|
||||||
"dependencies": {
|
|
||||||
"lodash.clonedeep": "^4.5.0"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/util": {
|
"node_modules/util": {
|
||||||
"version": "0.12.5",
|
"version": "0.12.5",
|
||||||
"resolved": "https://registry.npmjs.org/util/-/util-0.12.5.tgz",
|
"resolved": "https://registry.npmjs.org/util/-/util-0.12.5.tgz",
|
||||||
|
|||||||
290
backend/package-lock.json
generated
290
backend/package-lock.json
generated
@@ -1,14 +1,13 @@
|
|||||||
{
|
{
|
||||||
"name": "dutchie-menus-backend",
|
"name": "dutchie-menus-backend",
|
||||||
"version": "1.6.0",
|
"version": "1.5.1",
|
||||||
"lockfileVersion": 3,
|
"lockfileVersion": 3,
|
||||||
"requires": true,
|
"requires": true,
|
||||||
"packages": {
|
"packages": {
|
||||||
"": {
|
"": {
|
||||||
"name": "dutchie-menus-backend",
|
"name": "dutchie-menus-backend",
|
||||||
"version": "1.6.0",
|
"version": "1.5.1",
|
||||||
"dependencies": {
|
"dependencies": {
|
||||||
"@kubernetes/client-node": "^1.4.0",
|
|
||||||
"@types/bcryptjs": "^3.0.0",
|
"@types/bcryptjs": "^3.0.0",
|
||||||
"axios": "^1.6.2",
|
"axios": "^1.6.2",
|
||||||
"bcrypt": "^5.1.1",
|
"bcrypt": "^5.1.1",
|
||||||
@@ -35,7 +34,6 @@
|
|||||||
"puppeteer-extra-plugin-stealth": "^2.11.2",
|
"puppeteer-extra-plugin-stealth": "^2.11.2",
|
||||||
"sharp": "^0.32.0",
|
"sharp": "^0.32.0",
|
||||||
"socks-proxy-agent": "^8.0.2",
|
"socks-proxy-agent": "^8.0.2",
|
||||||
"user-agents": "^1.1.669",
|
|
||||||
"uuid": "^9.0.1",
|
"uuid": "^9.0.1",
|
||||||
"zod": "^3.22.4"
|
"zod": "^3.22.4"
|
||||||
},
|
},
|
||||||
@@ -494,97 +492,6 @@
|
|||||||
"resolved": "https://registry.npmjs.org/@ioredis/commands/-/commands-1.4.0.tgz",
|
"resolved": "https://registry.npmjs.org/@ioredis/commands/-/commands-1.4.0.tgz",
|
||||||
"integrity": "sha512-aFT2yemJJo+TZCmieA7qnYGQooOS7QfNmYrzGtsYd3g9j5iDP8AimYYAesf79ohjbLG12XxC4nG5DyEnC88AsQ=="
|
"integrity": "sha512-aFT2yemJJo+TZCmieA7qnYGQooOS7QfNmYrzGtsYd3g9j5iDP8AimYYAesf79ohjbLG12XxC4nG5DyEnC88AsQ=="
|
||||||
},
|
},
|
||||||
"node_modules/@jsep-plugin/assignment": {
|
|
||||||
"version": "1.3.0",
|
|
||||||
"resolved": "https://registry.npmjs.org/@jsep-plugin/assignment/-/assignment-1.3.0.tgz",
|
|
||||||
"integrity": "sha512-VVgV+CXrhbMI3aSusQyclHkenWSAm95WaiKrMxRFam3JSUiIaQjoMIw2sEs/OX4XifnqeQUN4DYbJjlA8EfktQ==",
|
|
||||||
"engines": {
|
|
||||||
"node": ">= 10.16.0"
|
|
||||||
},
|
|
||||||
"peerDependencies": {
|
|
||||||
"jsep": "^0.4.0||^1.0.0"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/@jsep-plugin/regex": {
|
|
||||||
"version": "1.0.4",
|
|
||||||
"resolved": "https://registry.npmjs.org/@jsep-plugin/regex/-/regex-1.0.4.tgz",
|
|
||||||
"integrity": "sha512-q7qL4Mgjs1vByCaTnDFcBnV9HS7GVPJX5vyVoCgZHNSC9rjwIlmbXG5sUuorR5ndfHAIlJ8pVStxvjXHbNvtUg==",
|
|
||||||
"engines": {
|
|
||||||
"node": ">= 10.16.0"
|
|
||||||
},
|
|
||||||
"peerDependencies": {
|
|
||||||
"jsep": "^0.4.0||^1.0.0"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/@kubernetes/client-node": {
|
|
||||||
"version": "1.4.0",
|
|
||||||
"resolved": "https://registry.npmjs.org/@kubernetes/client-node/-/client-node-1.4.0.tgz",
|
|
||||||
"integrity": "sha512-Zge3YvF7DJi264dU1b3wb/GmzR99JhUpqTvp+VGHfwZT+g7EOOYNScDJNZwXy9cszyIGPIs0VHr+kk8e95qqrA==",
|
|
||||||
"dependencies": {
|
|
||||||
"@types/js-yaml": "^4.0.1",
|
|
||||||
"@types/node": "^24.0.0",
|
|
||||||
"@types/node-fetch": "^2.6.13",
|
|
||||||
"@types/stream-buffers": "^3.0.3",
|
|
||||||
"form-data": "^4.0.0",
|
|
||||||
"hpagent": "^1.2.0",
|
|
||||||
"isomorphic-ws": "^5.0.0",
|
|
||||||
"js-yaml": "^4.1.0",
|
|
||||||
"jsonpath-plus": "^10.3.0",
|
|
||||||
"node-fetch": "^2.7.0",
|
|
||||||
"openid-client": "^6.1.3",
|
|
||||||
"rfc4648": "^1.3.0",
|
|
||||||
"socks-proxy-agent": "^8.0.4",
|
|
||||||
"stream-buffers": "^3.0.2",
|
|
||||||
"tar-fs": "^3.0.9",
|
|
||||||
"ws": "^8.18.2"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/@kubernetes/client-node/node_modules/@types/node": {
|
|
||||||
"version": "24.10.3",
|
|
||||||
"resolved": "https://registry.npmjs.org/@types/node/-/node-24.10.3.tgz",
|
|
||||||
"integrity": "sha512-gqkrWUsS8hcm0r44yn7/xZeV1ERva/nLgrLxFRUGb7aoNMIJfZJ3AC261zDQuOAKC7MiXai1WCpYc48jAHoShQ==",
|
|
||||||
"dependencies": {
|
|
||||||
"undici-types": "~7.16.0"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/@kubernetes/client-node/node_modules/tar-fs": {
|
|
||||||
"version": "3.1.1",
|
|
||||||
"resolved": "https://registry.npmjs.org/tar-fs/-/tar-fs-3.1.1.tgz",
|
|
||||||
"integrity": "sha512-LZA0oaPOc2fVo82Txf3gw+AkEd38szODlptMYejQUhndHMLQ9M059uXR+AfS7DNo0NpINvSqDsvyaCrBVkptWg==",
|
|
||||||
"dependencies": {
|
|
||||||
"pump": "^3.0.0",
|
|
||||||
"tar-stream": "^3.1.5"
|
|
||||||
},
|
|
||||||
"optionalDependencies": {
|
|
||||||
"bare-fs": "^4.0.1",
|
|
||||||
"bare-path": "^3.0.0"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/@kubernetes/client-node/node_modules/undici-types": {
|
|
||||||
"version": "7.16.0",
|
|
||||||
"resolved": "https://registry.npmjs.org/undici-types/-/undici-types-7.16.0.tgz",
|
|
||||||
"integrity": "sha512-Zz+aZWSj8LE6zoxD+xrjh4VfkIG8Ya6LvYkZqtUQGJPZjYl53ypCaUwWqo7eI0x66KBGeRo+mlBEkMSeSZ38Nw=="
|
|
||||||
},
|
|
||||||
"node_modules/@kubernetes/client-node/node_modules/ws": {
|
|
||||||
"version": "8.18.3",
|
|
||||||
"resolved": "https://registry.npmjs.org/ws/-/ws-8.18.3.tgz",
|
|
||||||
"integrity": "sha512-PEIGCY5tSlUt50cqyMXfCzX+oOPqN0vuGqWzbcJ2xvnkzkq46oOpz7dQaTDBdfICb4N14+GARUDw2XV2N4tvzg==",
|
|
||||||
"engines": {
|
|
||||||
"node": ">=10.0.0"
|
|
||||||
},
|
|
||||||
"peerDependencies": {
|
|
||||||
"bufferutil": "^4.0.1",
|
|
||||||
"utf-8-validate": ">=5.0.2"
|
|
||||||
},
|
|
||||||
"peerDependenciesMeta": {
|
|
||||||
"bufferutil": {
|
|
||||||
"optional": true
|
|
||||||
},
|
|
||||||
"utf-8-validate": {
|
|
||||||
"optional": true
|
|
||||||
}
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/@mapbox/node-pre-gyp": {
|
"node_modules/@mapbox/node-pre-gyp": {
|
||||||
"version": "1.0.11",
|
"version": "1.0.11",
|
||||||
"resolved": "https://registry.npmjs.org/@mapbox/node-pre-gyp/-/node-pre-gyp-1.0.11.tgz",
|
"resolved": "https://registry.npmjs.org/@mapbox/node-pre-gyp/-/node-pre-gyp-1.0.11.tgz",
|
||||||
@@ -850,11 +757,6 @@
|
|||||||
"integrity": "sha512-r8Tayk8HJnX0FztbZN7oVqGccWgw98T/0neJphO91KkmOzug1KkofZURD4UaD5uH8AqcFLfdPErnBod0u71/qg==",
|
"integrity": "sha512-r8Tayk8HJnX0FztbZN7oVqGccWgw98T/0neJphO91KkmOzug1KkofZURD4UaD5uH8AqcFLfdPErnBod0u71/qg==",
|
||||||
"dev": true
|
"dev": true
|
||||||
},
|
},
|
||||||
"node_modules/@types/js-yaml": {
|
|
||||||
"version": "4.0.9",
|
|
||||||
"resolved": "https://registry.npmjs.org/@types/js-yaml/-/js-yaml-4.0.9.tgz",
|
|
||||||
"integrity": "sha512-k4MGaQl5TGo/iipqb2UDG2UwjXziSWkh0uysQelTlJpX1qGlpUZYm8PnO4DxG1qBomtJUdYJ6qR6xdIah10JLg=="
|
|
||||||
},
|
|
||||||
"node_modules/@types/jsonwebtoken": {
|
"node_modules/@types/jsonwebtoken": {
|
||||||
"version": "9.0.10",
|
"version": "9.0.10",
|
||||||
"resolved": "https://registry.npmjs.org/@types/jsonwebtoken/-/jsonwebtoken-9.0.10.tgz",
|
"resolved": "https://registry.npmjs.org/@types/jsonwebtoken/-/jsonwebtoken-9.0.10.tgz",
|
||||||
@@ -880,6 +782,7 @@
|
|||||||
"version": "20.19.25",
|
"version": "20.19.25",
|
||||||
"resolved": "https://registry.npmjs.org/@types/node/-/node-20.19.25.tgz",
|
"resolved": "https://registry.npmjs.org/@types/node/-/node-20.19.25.tgz",
|
||||||
"integrity": "sha512-ZsJzA5thDQMSQO788d7IocwwQbI8B5OPzmqNvpf3NY/+MHDAS759Wo0gd2WQeXYt5AAAQjzcrTVC6SKCuYgoCQ==",
|
"integrity": "sha512-ZsJzA5thDQMSQO788d7IocwwQbI8B5OPzmqNvpf3NY/+MHDAS759Wo0gd2WQeXYt5AAAQjzcrTVC6SKCuYgoCQ==",
|
||||||
|
"devOptional": true,
|
||||||
"dependencies": {
|
"dependencies": {
|
||||||
"undici-types": "~6.21.0"
|
"undici-types": "~6.21.0"
|
||||||
}
|
}
|
||||||
@@ -890,15 +793,6 @@
|
|||||||
"integrity": "sha512-0ikrnug3/IyneSHqCBeslAhlK2aBfYek1fGo4bP4QnZPmiqSGRK+Oy7ZMisLWkesffJvQ1cqAcBnJC+8+nxIAg==",
|
"integrity": "sha512-0ikrnug3/IyneSHqCBeslAhlK2aBfYek1fGo4bP4QnZPmiqSGRK+Oy7ZMisLWkesffJvQ1cqAcBnJC+8+nxIAg==",
|
||||||
"dev": true
|
"dev": true
|
||||||
},
|
},
|
||||||
"node_modules/@types/node-fetch": {
|
|
||||||
"version": "2.6.13",
|
|
||||||
"resolved": "https://registry.npmjs.org/@types/node-fetch/-/node-fetch-2.6.13.tgz",
|
|
||||||
"integrity": "sha512-QGpRVpzSaUs30JBSGPjOg4Uveu384erbHBoT1zeONvyCfwQxIkUshLAOqN/k9EjGviPRmWTTe6aH2qySWKTVSw==",
|
|
||||||
"dependencies": {
|
|
||||||
"@types/node": "*",
|
|
||||||
"form-data": "^4.0.4"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/@types/pg": {
|
"node_modules/@types/pg": {
|
||||||
"version": "8.15.6",
|
"version": "8.15.6",
|
||||||
"resolved": "https://registry.npmjs.org/@types/pg/-/pg-8.15.6.tgz",
|
"resolved": "https://registry.npmjs.org/@types/pg/-/pg-8.15.6.tgz",
|
||||||
@@ -952,14 +846,6 @@
|
|||||||
"@types/node": "*"
|
"@types/node": "*"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/@types/stream-buffers": {
|
|
||||||
"version": "3.0.8",
|
|
||||||
"resolved": "https://registry.npmjs.org/@types/stream-buffers/-/stream-buffers-3.0.8.tgz",
|
|
||||||
"integrity": "sha512-J+7VaHKNvlNPJPEJXX/fKa9DZtR/xPMwuIbe+yNOwp1YB+ApUOBv2aUpEoBJEi8nJgbgs1x8e73ttg0r1rSUdw==",
|
|
||||||
"dependencies": {
|
|
||||||
"@types/node": "*"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/@types/uuid": {
|
"node_modules/@types/uuid": {
|
||||||
"version": "9.0.8",
|
"version": "9.0.8",
|
||||||
"resolved": "https://registry.npmjs.org/@types/uuid/-/uuid-9.0.8.tgz",
|
"resolved": "https://registry.npmjs.org/@types/uuid/-/uuid-9.0.8.tgz",
|
||||||
@@ -1140,78 +1026,6 @@
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/bare-fs": {
|
|
||||||
"version": "4.5.2",
|
|
||||||
"resolved": "https://registry.npmjs.org/bare-fs/-/bare-fs-4.5.2.tgz",
|
|
||||||
"integrity": "sha512-veTnRzkb6aPHOvSKIOy60KzURfBdUflr5VReI+NSaPL6xf+XLdONQgZgpYvUuZLVQ8dCqxpBAudaOM1+KpAUxw==",
|
|
||||||
"optional": true,
|
|
||||||
"dependencies": {
|
|
||||||
"bare-events": "^2.5.4",
|
|
||||||
"bare-path": "^3.0.0",
|
|
||||||
"bare-stream": "^2.6.4",
|
|
||||||
"bare-url": "^2.2.2",
|
|
||||||
"fast-fifo": "^1.3.2"
|
|
||||||
},
|
|
||||||
"engines": {
|
|
||||||
"bare": ">=1.16.0"
|
|
||||||
},
|
|
||||||
"peerDependencies": {
|
|
||||||
"bare-buffer": "*"
|
|
||||||
},
|
|
||||||
"peerDependenciesMeta": {
|
|
||||||
"bare-buffer": {
|
|
||||||
"optional": true
|
|
||||||
}
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/bare-os": {
|
|
||||||
"version": "3.6.2",
|
|
||||||
"resolved": "https://registry.npmjs.org/bare-os/-/bare-os-3.6.2.tgz",
|
|
||||||
"integrity": "sha512-T+V1+1srU2qYNBmJCXZkUY5vQ0B4FSlL3QDROnKQYOqeiQR8UbjNHlPa+TIbM4cuidiN9GaTaOZgSEgsvPbh5A==",
|
|
||||||
"optional": true,
|
|
||||||
"engines": {
|
|
||||||
"bare": ">=1.14.0"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/bare-path": {
|
|
||||||
"version": "3.0.0",
|
|
||||||
"resolved": "https://registry.npmjs.org/bare-path/-/bare-path-3.0.0.tgz",
|
|
||||||
"integrity": "sha512-tyfW2cQcB5NN8Saijrhqn0Zh7AnFNsnczRcuWODH0eYAXBsJ5gVxAUuNr7tsHSC6IZ77cA0SitzT+s47kot8Mw==",
|
|
||||||
"optional": true,
|
|
||||||
"dependencies": {
|
|
||||||
"bare-os": "^3.0.1"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/bare-stream": {
|
|
||||||
"version": "2.7.0",
|
|
||||||
"resolved": "https://registry.npmjs.org/bare-stream/-/bare-stream-2.7.0.tgz",
|
|
||||||
"integrity": "sha512-oyXQNicV1y8nc2aKffH+BUHFRXmx6VrPzlnaEvMhram0nPBrKcEdcyBg5r08D0i8VxngHFAiVyn1QKXpSG0B8A==",
|
|
||||||
"optional": true,
|
|
||||||
"dependencies": {
|
|
||||||
"streamx": "^2.21.0"
|
|
||||||
},
|
|
||||||
"peerDependencies": {
|
|
||||||
"bare-buffer": "*",
|
|
||||||
"bare-events": "*"
|
|
||||||
},
|
|
||||||
"peerDependenciesMeta": {
|
|
||||||
"bare-buffer": {
|
|
||||||
"optional": true
|
|
||||||
},
|
|
||||||
"bare-events": {
|
|
||||||
"optional": true
|
|
||||||
}
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/bare-url": {
|
|
||||||
"version": "2.3.2",
|
|
||||||
"resolved": "https://registry.npmjs.org/bare-url/-/bare-url-2.3.2.tgz",
|
|
||||||
"integrity": "sha512-ZMq4gd9ngV5aTMa5p9+UfY0b3skwhHELaDkhEHetMdX0LRkW9kzaym4oo/Eh+Ghm0CCDuMTsRIGM/ytUc1ZYmw==",
|
|
||||||
"optional": true,
|
|
||||||
"dependencies": {
|
|
||||||
"bare-path": "^3.0.0"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/base64-js": {
|
"node_modules/base64-js": {
|
||||||
"version": "1.5.1",
|
"version": "1.5.1",
|
||||||
"resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.5.1.tgz",
|
"resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.5.1.tgz",
|
||||||
@@ -2725,14 +2539,6 @@
|
|||||||
"node": ">=16.0.0"
|
"node": ">=16.0.0"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/hpagent": {
|
|
||||||
"version": "1.2.0",
|
|
||||||
"resolved": "https://registry.npmjs.org/hpagent/-/hpagent-1.2.0.tgz",
|
|
||||||
"integrity": "sha512-A91dYTeIB6NoXG+PxTQpCCDDnfHsW9kc06Lvpu1TEe9gnd6ZFeiBoRO9JvzEv6xK7EX97/dUE8g/vBMTqTS3CA==",
|
|
||||||
"engines": {
|
|
||||||
"node": ">=14"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/htmlparser2": {
|
"node_modules/htmlparser2": {
|
||||||
"version": "10.0.0",
|
"version": "10.0.0",
|
||||||
"resolved": "https://registry.npmjs.org/htmlparser2/-/htmlparser2-10.0.0.tgz",
|
"resolved": "https://registry.npmjs.org/htmlparser2/-/htmlparser2-10.0.0.tgz",
|
||||||
@@ -3096,22 +2902,6 @@
|
|||||||
"node": ">=0.10.0"
|
"node": ">=0.10.0"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/isomorphic-ws": {
|
|
||||||
"version": "5.0.0",
|
|
||||||
"resolved": "https://registry.npmjs.org/isomorphic-ws/-/isomorphic-ws-5.0.0.tgz",
|
|
||||||
"integrity": "sha512-muId7Zzn9ywDsyXgTIafTry2sV3nySZeUDe6YedVd1Hvuuep5AsIlqK+XefWpYTyJG5e503F2xIuT2lcU6rCSw==",
|
|
||||||
"peerDependencies": {
|
|
||||||
"ws": "*"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/jose": {
|
|
||||||
"version": "6.1.3",
|
|
||||||
"resolved": "https://registry.npmjs.org/jose/-/jose-6.1.3.tgz",
|
|
||||||
"integrity": "sha512-0TpaTfihd4QMNwrz/ob2Bp7X04yuxJkjRGi4aKmOqwhov54i6u79oCv7T+C7lo70MKH6BesI3vscD1yb/yzKXQ==",
|
|
||||||
"funding": {
|
|
||||||
"url": "https://github.com/sponsors/panva"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/js-tokens": {
|
"node_modules/js-tokens": {
|
||||||
"version": "4.0.0",
|
"version": "4.0.0",
|
||||||
"resolved": "https://registry.npmjs.org/js-tokens/-/js-tokens-4.0.0.tgz",
|
"resolved": "https://registry.npmjs.org/js-tokens/-/js-tokens-4.0.0.tgz",
|
||||||
@@ -3128,14 +2918,6 @@
|
|||||||
"js-yaml": "bin/js-yaml.js"
|
"js-yaml": "bin/js-yaml.js"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/jsep": {
|
|
||||||
"version": "1.4.0",
|
|
||||||
"resolved": "https://registry.npmjs.org/jsep/-/jsep-1.4.0.tgz",
|
|
||||||
"integrity": "sha512-B7qPcEVE3NVkmSJbaYxvv4cHkVW7DQsZz13pUMrfS8z8Q/BuShN+gcTXrUlPiGqM2/t/EEaI030bpxMqY8gMlw==",
|
|
||||||
"engines": {
|
|
||||||
"node": ">= 10.16.0"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/json-parse-even-better-errors": {
|
"node_modules/json-parse-even-better-errors": {
|
||||||
"version": "2.3.1",
|
"version": "2.3.1",
|
||||||
"resolved": "https://registry.npmjs.org/json-parse-even-better-errors/-/json-parse-even-better-errors-2.3.1.tgz",
|
"resolved": "https://registry.npmjs.org/json-parse-even-better-errors/-/json-parse-even-better-errors-2.3.1.tgz",
|
||||||
@@ -3157,23 +2939,6 @@
|
|||||||
"graceful-fs": "^4.1.6"
|
"graceful-fs": "^4.1.6"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/jsonpath-plus": {
|
|
||||||
"version": "10.3.0",
|
|
||||||
"resolved": "https://registry.npmjs.org/jsonpath-plus/-/jsonpath-plus-10.3.0.tgz",
|
|
||||||
"integrity": "sha512-8TNmfeTCk2Le33A3vRRwtuworG/L5RrgMvdjhKZxvyShO+mBu2fP50OWUjRLNtvw344DdDarFh9buFAZs5ujeA==",
|
|
||||||
"dependencies": {
|
|
||||||
"@jsep-plugin/assignment": "^1.3.0",
|
|
||||||
"@jsep-plugin/regex": "^1.0.4",
|
|
||||||
"jsep": "^1.4.0"
|
|
||||||
},
|
|
||||||
"bin": {
|
|
||||||
"jsonpath": "bin/jsonpath-cli.js",
|
|
||||||
"jsonpath-plus": "bin/jsonpath-cli.js"
|
|
||||||
},
|
|
||||||
"engines": {
|
|
||||||
"node": ">=18.0.0"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/jsonwebtoken": {
|
"node_modules/jsonwebtoken": {
|
||||||
"version": "9.0.2",
|
"version": "9.0.2",
|
||||||
"resolved": "https://registry.npmjs.org/jsonwebtoken/-/jsonwebtoken-9.0.2.tgz",
|
"resolved": "https://registry.npmjs.org/jsonwebtoken/-/jsonwebtoken-9.0.2.tgz",
|
||||||
@@ -3248,11 +3013,6 @@
|
|||||||
"resolved": "https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz",
|
"resolved": "https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz",
|
||||||
"integrity": "sha512-v2kDEe57lecTulaDIuNTPy3Ry4gLGJ6Z1O3vE1krgXZNrsQ+LFTGHVxVjcXPs17LhbZVGedAJv8XZ1tvj5FvSg=="
|
"integrity": "sha512-v2kDEe57lecTulaDIuNTPy3Ry4gLGJ6Z1O3vE1krgXZNrsQ+LFTGHVxVjcXPs17LhbZVGedAJv8XZ1tvj5FvSg=="
|
||||||
},
|
},
|
||||||
"node_modules/lodash.clonedeep": {
|
|
||||||
"version": "4.5.0",
|
|
||||||
"resolved": "https://registry.npmjs.org/lodash.clonedeep/-/lodash.clonedeep-4.5.0.tgz",
|
|
||||||
"integrity": "sha512-H5ZhCF25riFd9uB5UCkVKo61m3S/xZk1x4wA6yp/L3RFP6Z/eHH1ymQcGLo7J3GMPfm0V/7m1tryHuGVxpqEBQ=="
|
|
||||||
},
|
|
||||||
"node_modules/lodash.defaults": {
|
"node_modules/lodash.defaults": {
|
||||||
"version": "4.2.0",
|
"version": "4.2.0",
|
||||||
"resolved": "https://registry.npmjs.org/lodash.defaults/-/lodash.defaults-4.2.0.tgz",
|
"resolved": "https://registry.npmjs.org/lodash.defaults/-/lodash.defaults-4.2.0.tgz",
|
||||||
@@ -3702,14 +3462,6 @@
|
|||||||
"url": "https://github.com/fb55/nth-check?sponsor=1"
|
"url": "https://github.com/fb55/nth-check?sponsor=1"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/oauth4webapi": {
|
|
||||||
"version": "3.8.3",
|
|
||||||
"resolved": "https://registry.npmjs.org/oauth4webapi/-/oauth4webapi-3.8.3.tgz",
|
|
||||||
"integrity": "sha512-pQ5BsX3QRTgnt5HxgHwgunIRaDXBdkT23tf8dfzmtTIL2LTpdmxgbpbBm0VgFWAIDlezQvQCTgnVIUmHupXHxw==",
|
|
||||||
"funding": {
|
|
||||||
"url": "https://github.com/sponsors/panva"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/object-assign": {
|
"node_modules/object-assign": {
|
||||||
"version": "4.1.1",
|
"version": "4.1.1",
|
||||||
"resolved": "https://registry.npmjs.org/object-assign/-/object-assign-4.1.1.tgz",
|
"resolved": "https://registry.npmjs.org/object-assign/-/object-assign-4.1.1.tgz",
|
||||||
@@ -3748,18 +3500,6 @@
|
|||||||
"wrappy": "1"
|
"wrappy": "1"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/openid-client": {
|
|
||||||
"version": "6.8.1",
|
|
||||||
"resolved": "https://registry.npmjs.org/openid-client/-/openid-client-6.8.1.tgz",
|
|
||||||
"integrity": "sha512-VoYT6enBo6Vj2j3Q5Ec0AezS+9YGzQo1f5Xc42lreMGlfP4ljiXPKVDvCADh+XHCV/bqPu/wWSiCVXbJKvrODw==",
|
|
||||||
"dependencies": {
|
|
||||||
"jose": "^6.1.0",
|
|
||||||
"oauth4webapi": "^3.8.2"
|
|
||||||
},
|
|
||||||
"funding": {
|
|
||||||
"url": "https://github.com/sponsors/panva"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/pac-proxy-agent": {
|
"node_modules/pac-proxy-agent": {
|
||||||
"version": "7.2.0",
|
"version": "7.2.0",
|
||||||
"resolved": "https://registry.npmjs.org/pac-proxy-agent/-/pac-proxy-agent-7.2.0.tgz",
|
"resolved": "https://registry.npmjs.org/pac-proxy-agent/-/pac-proxy-agent-7.2.0.tgz",
|
||||||
@@ -4676,11 +4416,6 @@
|
|||||||
"url": "https://github.com/privatenumber/resolve-pkg-maps?sponsor=1"
|
"url": "https://github.com/privatenumber/resolve-pkg-maps?sponsor=1"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/rfc4648": {
|
|
||||||
"version": "1.5.4",
|
|
||||||
"resolved": "https://registry.npmjs.org/rfc4648/-/rfc4648-1.5.4.tgz",
|
|
||||||
"integrity": "sha512-rRg/6Lb+IGfJqO05HZkN50UtY7K/JhxJag1kP23+zyMfrvoB0B7RWv06MbOzoc79RgCdNTiUaNsTT1AJZ7Z+cg=="
|
|
||||||
},
|
|
||||||
"node_modules/rimraf": {
|
"node_modules/rimraf": {
|
||||||
"version": "3.0.2",
|
"version": "3.0.2",
|
||||||
"resolved": "https://registry.npmjs.org/rimraf/-/rimraf-3.0.2.tgz",
|
"resolved": "https://registry.npmjs.org/rimraf/-/rimraf-3.0.2.tgz",
|
||||||
@@ -5111,14 +4846,6 @@
|
|||||||
"node": ">= 0.8"
|
"node": ">= 0.8"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"node_modules/stream-buffers": {
|
|
||||||
"version": "3.0.3",
|
|
||||||
"resolved": "https://registry.npmjs.org/stream-buffers/-/stream-buffers-3.0.3.tgz",
|
|
||||||
"integrity": "sha512-pqMqwQCso0PBJt2PQmDO0cFj0lyqmiwOMiMSkVtRokl7e+ZTRYgDHKnuZNbqjiJXgsg4nuqtD/zxuo9KqTp0Yw==",
|
|
||||||
"engines": {
|
|
||||||
"node": ">= 0.10.0"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/streamx": {
|
"node_modules/streamx": {
|
||||||
"version": "2.23.0",
|
"version": "2.23.0",
|
||||||
"resolved": "https://registry.npmjs.org/streamx/-/streamx-2.23.0.tgz",
|
"resolved": "https://registry.npmjs.org/streamx/-/streamx-2.23.0.tgz",
|
||||||
@@ -5338,7 +5065,8 @@
|
|||||||
"node_modules/undici-types": {
|
"node_modules/undici-types": {
|
||||||
"version": "6.21.0",
|
"version": "6.21.0",
|
||||||
"resolved": "https://registry.npmjs.org/undici-types/-/undici-types-6.21.0.tgz",
|
"resolved": "https://registry.npmjs.org/undici-types/-/undici-types-6.21.0.tgz",
|
||||||
"integrity": "sha512-iwDZqg0QAGrg9Rav5H4n0M64c3mkR59cJ6wQp+7C4nI0gsmExaedaYLNO44eT4AtBBwjbTiGPMlt2Md0T9H9JQ=="
|
"integrity": "sha512-iwDZqg0QAGrg9Rav5H4n0M64c3mkR59cJ6wQp+7C4nI0gsmExaedaYLNO44eT4AtBBwjbTiGPMlt2Md0T9H9JQ==",
|
||||||
|
"devOptional": true
|
||||||
},
|
},
|
||||||
"node_modules/universalify": {
|
"node_modules/universalify": {
|
||||||
"version": "2.0.1",
|
"version": "2.0.1",
|
||||||
@@ -5361,14 +5089,6 @@
|
|||||||
"resolved": "https://registry.npmjs.org/urlpattern-polyfill/-/urlpattern-polyfill-10.0.0.tgz",
|
"resolved": "https://registry.npmjs.org/urlpattern-polyfill/-/urlpattern-polyfill-10.0.0.tgz",
|
||||||
"integrity": "sha512-H/A06tKD7sS1O1X2SshBVeA5FLycRpjqiBeqGKmBwBDBy28EnRjORxTNe269KSSr5un5qyWi1iL61wLxpd+ZOg=="
|
"integrity": "sha512-H/A06tKD7sS1O1X2SshBVeA5FLycRpjqiBeqGKmBwBDBy28EnRjORxTNe269KSSr5un5qyWi1iL61wLxpd+ZOg=="
|
||||||
},
|
},
|
||||||
"node_modules/user-agents": {
|
|
||||||
"version": "1.1.669",
|
|
||||||
"resolved": "https://registry.npmjs.org/user-agents/-/user-agents-1.1.669.tgz",
|
|
||||||
"integrity": "sha512-pbIzG+AOqCaIpySKJ4IAm1l0VyE4jMnK4y1thV8lm8PYxI+7X5uWcppOK7zY79TCKKTAnJH3/4gaVIZHsjrmJA==",
|
|
||||||
"dependencies": {
|
|
||||||
"lodash.clonedeep": "^4.5.0"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"node_modules/util": {
|
"node_modules/util": {
|
||||||
"version": "0.12.5",
|
"version": "0.12.5",
|
||||||
"resolved": "https://registry.npmjs.org/util/-/util-0.12.5.tgz",
|
"resolved": "https://registry.npmjs.org/util/-/util-0.12.5.tgz",
|
||||||
|
|||||||
@@ -22,7 +22,6 @@
|
|||||||
"seed:dt:cities:bulk": "tsx src/scripts/seed-dt-cities-bulk.ts"
|
"seed:dt:cities:bulk": "tsx src/scripts/seed-dt-cities-bulk.ts"
|
||||||
},
|
},
|
||||||
"dependencies": {
|
"dependencies": {
|
||||||
"@kubernetes/client-node": "^1.4.0",
|
|
||||||
"@types/bcryptjs": "^3.0.0",
|
"@types/bcryptjs": "^3.0.0",
|
||||||
"axios": "^1.6.2",
|
"axios": "^1.6.2",
|
||||||
"bcrypt": "^5.1.1",
|
"bcrypt": "^5.1.1",
|
||||||
@@ -49,7 +48,6 @@
|
|||||||
"puppeteer-extra-plugin-stealth": "^2.11.2",
|
"puppeteer-extra-plugin-stealth": "^2.11.2",
|
||||||
"sharp": "^0.32.0",
|
"sharp": "^0.32.0",
|
||||||
"socks-proxy-agent": "^8.0.2",
|
"socks-proxy-agent": "^8.0.2",
|
||||||
"user-agents": "^1.1.669",
|
|
||||||
"uuid": "^9.0.1",
|
"uuid": "^9.0.1",
|
||||||
"zod": "^3.22.4"
|
"zod": "^3.22.4"
|
||||||
},
|
},
|
||||||
|
|||||||
@@ -172,9 +172,6 @@ export async function runFullDiscovery(
|
|||||||
console.log(`Errors: ${totalErrors}`);
|
console.log(`Errors: ${totalErrors}`);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Track new dispensary IDs for task chaining
|
|
||||||
let newDispensaryIds: number[] = [];
|
|
||||||
|
|
||||||
// Step 4: Auto-validate and promote discovered locations
|
// Step 4: Auto-validate and promote discovered locations
|
||||||
if (!dryRun && totalLocationsUpserted > 0) {
|
if (!dryRun && totalLocationsUpserted > 0) {
|
||||||
console.log('\n[Discovery] Step 4: Auto-promoting discovered locations...');
|
console.log('\n[Discovery] Step 4: Auto-promoting discovered locations...');
|
||||||
@@ -183,13 +180,6 @@ export async function runFullDiscovery(
|
|||||||
console.log(` Created: ${promotionResult.created} new dispensaries`);
|
console.log(` Created: ${promotionResult.created} new dispensaries`);
|
||||||
console.log(` Updated: ${promotionResult.updated} existing dispensaries`);
|
console.log(` Updated: ${promotionResult.updated} existing dispensaries`);
|
||||||
console.log(` Rejected: ${promotionResult.rejected} (validation failed)`);
|
console.log(` Rejected: ${promotionResult.rejected} (validation failed)`);
|
||||||
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Capture new IDs for task chaining
|
|
||||||
newDispensaryIds = promotionResult.newDispensaryIds;
|
|
||||||
if (newDispensaryIds.length > 0) {
|
|
||||||
console.log(` New store IDs for crawl: [${newDispensaryIds.join(', ')}]`);
|
|
||||||
}
|
|
||||||
|
|
||||||
if (promotionResult.rejectedRecords.length > 0) {
|
if (promotionResult.rejectedRecords.length > 0) {
|
||||||
console.log(` Rejection reasons:`);
|
console.log(` Rejection reasons:`);
|
||||||
promotionResult.rejectedRecords.slice(0, 5).forEach(r => {
|
promotionResult.rejectedRecords.slice(0, 5).forEach(r => {
|
||||||
@@ -224,8 +214,6 @@ export async function runFullDiscovery(
|
|||||||
totalLocationsFound,
|
totalLocationsFound,
|
||||||
totalLocationsUpserted,
|
totalLocationsUpserted,
|
||||||
durationMs,
|
durationMs,
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Return new IDs for task chaining
|
|
||||||
newDispensaryIds,
|
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -127,8 +127,6 @@ export interface PromotionSummary {
|
|||||||
errors: string[];
|
errors: string[];
|
||||||
}>;
|
}>;
|
||||||
durationMs: number;
|
durationMs: number;
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Track new dispensary IDs for task chaining
|
|
||||||
newDispensaryIds: number[];
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@@ -471,8 +469,6 @@ export async function promoteDiscoveredLocations(
|
|||||||
|
|
||||||
const results: PromotionResult[] = [];
|
const results: PromotionResult[] = [];
|
||||||
const rejectedRecords: PromotionSummary['rejectedRecords'] = [];
|
const rejectedRecords: PromotionSummary['rejectedRecords'] = [];
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Track new dispensary IDs for task chaining
|
|
||||||
const newDispensaryIds: number[] = [];
|
|
||||||
let created = 0;
|
let created = 0;
|
||||||
let updated = 0;
|
let updated = 0;
|
||||||
let skipped = 0;
|
let skipped = 0;
|
||||||
@@ -529,8 +525,6 @@ export async function promoteDiscoveredLocations(
|
|||||||
|
|
||||||
if (promotionResult.action === 'created') {
|
if (promotionResult.action === 'created') {
|
||||||
created++;
|
created++;
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Track new IDs for task chaining
|
|
||||||
newDispensaryIds.push(promotionResult.dispensaryId);
|
|
||||||
} else {
|
} else {
|
||||||
updated++;
|
updated++;
|
||||||
}
|
}
|
||||||
@@ -554,8 +548,6 @@ export async function promoteDiscoveredLocations(
|
|||||||
results,
|
results,
|
||||||
rejectedRecords,
|
rejectedRecords,
|
||||||
durationMs: Date.now() - startTime,
|
durationMs: Date.now() - startTime,
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Return new IDs for task chaining
|
|
||||||
newDispensaryIds,
|
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -211,8 +211,6 @@ export interface FullDiscoveryResult {
|
|||||||
totalLocationsFound: number;
|
totalLocationsFound: number;
|
||||||
totalLocationsUpserted: number;
|
totalLocationsUpserted: number;
|
||||||
durationMs: number;
|
durationMs: number;
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Track new dispensary IDs for task chaining
|
|
||||||
newDispensaryIds?: number[];
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
|
|||||||
@@ -6,8 +6,6 @@ import { initializeMinio, isMinioEnabled } from './utils/minio';
|
|||||||
import { initializeImageStorage } from './utils/image-storage';
|
import { initializeImageStorage } from './utils/image-storage';
|
||||||
import { logger } from './services/logger';
|
import { logger } from './services/logger';
|
||||||
import { cleanupOrphanedJobs } from './services/proxyTestQueue';
|
import { cleanupOrphanedJobs } from './services/proxyTestQueue';
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Database-driven task scheduler
|
|
||||||
import { taskScheduler } from './services/task-scheduler';
|
|
||||||
import { runAutoMigrations } from './db/auto-migrate';
|
import { runAutoMigrations } from './db/auto-migrate';
|
||||||
import { getPool } from './db/pool';
|
import { getPool } from './db/pool';
|
||||||
import healthRoutes from './routes/health';
|
import healthRoutes from './routes/health';
|
||||||
@@ -144,8 +142,6 @@ import seoRoutes from './routes/seo';
|
|||||||
import priceAnalyticsRoutes from './routes/price-analytics';
|
import priceAnalyticsRoutes from './routes/price-analytics';
|
||||||
import tasksRoutes from './routes/tasks';
|
import tasksRoutes from './routes/tasks';
|
||||||
import workerRegistryRoutes from './routes/worker-registry';
|
import workerRegistryRoutes from './routes/worker-registry';
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Raw payload access API
|
|
||||||
import payloadsRoutes from './routes/payloads';
|
|
||||||
|
|
||||||
// Mark requests from trusted domains (cannaiq.co, findagram.co, findadispo.com)
|
// Mark requests from trusted domains (cannaiq.co, findagram.co, findadispo.com)
|
||||||
// These domains can access the API without authentication
|
// These domains can access the API without authentication
|
||||||
@@ -226,10 +222,6 @@ console.log('[Tasks] Routes registered at /api/tasks');
|
|||||||
app.use('/api/worker-registry', workerRegistryRoutes);
|
app.use('/api/worker-registry', workerRegistryRoutes);
|
||||||
console.log('[WorkerRegistry] Routes registered at /api/worker-registry');
|
console.log('[WorkerRegistry] Routes registered at /api/worker-registry');
|
||||||
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Raw payload access API
|
|
||||||
app.use('/api/payloads', payloadsRoutes);
|
|
||||||
console.log('[Payloads] Routes registered at /api/payloads');
|
|
||||||
|
|
||||||
// Phase 3: Analytics V2 - Enhanced analytics with rec/med state segmentation
|
// Phase 3: Analytics V2 - Enhanced analytics with rec/med state segmentation
|
||||||
try {
|
try {
|
||||||
const analyticsV2Router = createAnalyticsV2Router(getPool());
|
const analyticsV2Router = createAnalyticsV2Router(getPool());
|
||||||
@@ -334,17 +326,6 @@ async function startServer() {
|
|||||||
// Clean up any orphaned proxy test jobs from previous server runs
|
// Clean up any orphaned proxy test jobs from previous server runs
|
||||||
await cleanupOrphanedJobs();
|
await cleanupOrphanedJobs();
|
||||||
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Start database-driven task scheduler
|
|
||||||
// This replaces node-cron - schedules are stored in DB and survive restarts
|
|
||||||
// Uses SELECT FOR UPDATE SKIP LOCKED for multi-replica safety
|
|
||||||
try {
|
|
||||||
await taskScheduler.start();
|
|
||||||
logger.info('system', 'Task scheduler started');
|
|
||||||
} catch (err: any) {
|
|
||||||
// Non-fatal - scheduler can recover on next poll
|
|
||||||
logger.warn('system', `Task scheduler startup warning: ${err.message}`);
|
|
||||||
}
|
|
||||||
|
|
||||||
app.listen(PORT, () => {
|
app.listen(PORT, () => {
|
||||||
logger.info('system', `Server running on port ${PORT}`);
|
logger.info('system', `Server running on port ${PORT}`);
|
||||||
console.log(`🚀 Server running on port ${PORT}`);
|
console.log(`🚀 Server running on port ${PORT}`);
|
||||||
|
|||||||
@@ -702,10 +702,12 @@ export class StateQueryService {
|
|||||||
async getNationalSummary(): Promise<NationalSummary> {
|
async getNationalSummary(): Promise<NationalSummary> {
|
||||||
const stateMetrics = await this.getAllStateMetrics();
|
const stateMetrics = await this.getAllStateMetrics();
|
||||||
|
|
||||||
// Get all states count and aggregate metrics
|
|
||||||
const result = await this.pool.query(`
|
const result = await this.pool.query(`
|
||||||
SELECT
|
SELECT
|
||||||
COUNT(DISTINCT s.code) AS total_states,
|
COUNT(DISTINCT s.code) AS total_states,
|
||||||
|
COUNT(DISTINCT CASE WHEN EXISTS (
|
||||||
|
SELECT 1 FROM dispensaries d WHERE d.state = s.code AND d.menu_type IS NOT NULL
|
||||||
|
) THEN s.code END) AS active_states,
|
||||||
(SELECT COUNT(*) FROM dispensaries WHERE state IS NOT NULL) AS total_stores,
|
(SELECT COUNT(*) FROM dispensaries WHERE state IS NOT NULL) AS total_stores,
|
||||||
(SELECT COUNT(*) FROM store_products sp
|
(SELECT COUNT(*) FROM store_products sp
|
||||||
JOIN dispensaries d ON sp.dispensary_id = d.id
|
JOIN dispensaries d ON sp.dispensary_id = d.id
|
||||||
@@ -723,7 +725,7 @@ export class StateQueryService {
|
|||||||
|
|
||||||
return {
|
return {
|
||||||
totalStates: parseInt(data.total_states),
|
totalStates: parseInt(data.total_states),
|
||||||
activeStates: parseInt(data.total_states), // Same as totalStates - all states shown
|
activeStates: parseInt(data.active_states),
|
||||||
totalStores: parseInt(data.total_stores),
|
totalStores: parseInt(data.total_stores),
|
||||||
totalProducts: parseInt(data.total_products),
|
totalProducts: parseInt(data.total_products),
|
||||||
totalBrands: parseInt(data.total_brands),
|
totalBrands: parseInt(data.total_brands),
|
||||||
|
|||||||
@@ -5,35 +5,22 @@
|
|||||||
*
|
*
|
||||||
* DO NOT MODIFY THIS FILE WITHOUT EXPLICIT AUTHORIZATION.
|
* DO NOT MODIFY THIS FILE WITHOUT EXPLICIT AUTHORIZATION.
|
||||||
*
|
*
|
||||||
* Updated: 2025-12-10 per workflow-12102025.md
|
* This is the canonical HTTP client for all Dutchie communication.
|
||||||
*
|
* All Dutchie workers (Alice, Bella, etc.) MUST use this client.
|
||||||
* KEY BEHAVIORS (per workflow-12102025.md):
|
|
||||||
* 1. startSession() gets identity from PROXY LOCATION, not task params
|
|
||||||
* 2. On 403: immediately get new IP + new fingerprint, then retry
|
|
||||||
* 3. After 3 consecutive 403s on same proxy → disable it (burned)
|
|
||||||
* 4. Language is always English (en-US)
|
|
||||||
*
|
*
|
||||||
* IMPLEMENTATION:
|
* IMPLEMENTATION:
|
||||||
* - Uses curl via child_process.execSync (bypasses TLS fingerprinting)
|
* - Uses curl via child_process.execSync (bypasses TLS fingerprinting)
|
||||||
* - NO Puppeteer, NO axios, NO fetch
|
* - NO Puppeteer, NO axios, NO fetch
|
||||||
* - Uses intoli/user-agents via CrawlRotator for realistic fingerprints
|
* - Fingerprint rotation on 403
|
||||||
* - Residential IP compatible
|
* - Residential IP compatible
|
||||||
*
|
*
|
||||||
* USAGE:
|
* USAGE:
|
||||||
* import { curlPost, curlGet, executeGraphQL, startSession } from '@dutchie/client';
|
* import { curlPost, curlGet, executeGraphQL } from '@dutchie/client';
|
||||||
*
|
*
|
||||||
* ============================================================
|
* ============================================================
|
||||||
*/
|
*/
|
||||||
|
|
||||||
import { execSync } from 'child_process';
|
import { execSync } from 'child_process';
|
||||||
import {
|
|
||||||
buildOrderedHeaders,
|
|
||||||
buildRefererFromMenuUrl,
|
|
||||||
getCurlBinary,
|
|
||||||
isCurlImpersonateAvailable,
|
|
||||||
HeaderContext,
|
|
||||||
BrowserType,
|
|
||||||
} from '../../services/http-fingerprint';
|
|
||||||
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
// TYPES
|
// TYPES
|
||||||
@@ -45,8 +32,6 @@ export interface CurlResponse {
|
|||||||
error?: string;
|
error?: string;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Per workflow-12102025.md: fingerprint comes from CrawlRotator's BrowserFingerprint
|
|
||||||
// We keep a simplified interface here for header building
|
|
||||||
export interface Fingerprint {
|
export interface Fingerprint {
|
||||||
userAgent: string;
|
userAgent: string;
|
||||||
acceptLanguage: string;
|
acceptLanguage: string;
|
||||||
@@ -72,13 +57,15 @@ export const DUTCHIE_CONFIG = {
|
|||||||
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
// PROXY SUPPORT
|
// PROXY SUPPORT
|
||||||
// Per workflow-12102025.md:
|
// ============================================================
|
||||||
// - On 403: recordBlock() → increment consecutive_403_count
|
// Integrates with the CrawlRotator system from proxy-rotator.ts
|
||||||
// - After 3 consecutive 403s → proxy disabled
|
// On 403 errors:
|
||||||
// - Immediately rotate to new IP + new fingerprint on 403
|
// 1. Record failure on current proxy
|
||||||
|
// 2. Rotate to next proxy
|
||||||
|
// 3. Retry with new proxy
|
||||||
// ============================================================
|
// ============================================================
|
||||||
|
|
||||||
import type { CrawlRotator, BrowserFingerprint } from '../../services/crawl-rotator';
|
import type { CrawlRotator, Proxy } from '../../services/crawl-rotator';
|
||||||
|
|
||||||
let currentProxy: string | null = null;
|
let currentProxy: string | null = null;
|
||||||
let crawlRotator: CrawlRotator | null = null;
|
let crawlRotator: CrawlRotator | null = null;
|
||||||
@@ -105,12 +92,13 @@ export function getProxy(): string | null {
|
|||||||
|
|
||||||
/**
|
/**
|
||||||
* Set CrawlRotator for proxy rotation on 403s
|
* Set CrawlRotator for proxy rotation on 403s
|
||||||
* Per workflow-12102025.md: enables automatic rotation when blocked
|
* This enables automatic proxy rotation when blocked
|
||||||
*/
|
*/
|
||||||
export function setCrawlRotator(rotator: CrawlRotator | null): void {
|
export function setCrawlRotator(rotator: CrawlRotator | null): void {
|
||||||
crawlRotator = rotator;
|
crawlRotator = rotator;
|
||||||
if (rotator) {
|
if (rotator) {
|
||||||
console.log('[Dutchie Client] CrawlRotator attached - proxy rotation enabled');
|
console.log('[Dutchie Client] CrawlRotator attached - proxy rotation enabled');
|
||||||
|
// Set initial proxy from rotator
|
||||||
const proxy = rotator.proxy.getCurrent();
|
const proxy = rotator.proxy.getCurrent();
|
||||||
if (proxy) {
|
if (proxy) {
|
||||||
currentProxy = rotator.proxy.getProxyUrl(proxy);
|
currentProxy = rotator.proxy.getProxyUrl(proxy);
|
||||||
@@ -127,41 +115,30 @@ export function getCrawlRotator(): CrawlRotator | null {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Handle 403 block - per workflow-12102025.md:
|
* Rotate to next proxy (called on 403)
|
||||||
* 1. Record block on current proxy (increments consecutive_403_count)
|
|
||||||
* 2. Immediately rotate to new proxy (new IP)
|
|
||||||
* 3. Rotate fingerprint
|
|
||||||
* Returns false if no more proxies available
|
|
||||||
*/
|
*/
|
||||||
async function handle403Block(): Promise<boolean> {
|
async function rotateProxyOn403(error?: string): Promise<boolean> {
|
||||||
if (!crawlRotator) {
|
if (!crawlRotator) {
|
||||||
console.warn('[Dutchie Client] No CrawlRotator - cannot handle 403');
|
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Per workflow-12102025.md: record block (tracks consecutive 403s)
|
// Record failure on current proxy
|
||||||
const wasDisabled = await crawlRotator.recordBlock();
|
await crawlRotator.recordFailure(error || '403 Forbidden');
|
||||||
if (wasDisabled) {
|
|
||||||
console.log('[Dutchie Client] Current proxy was disabled (3 consecutive 403s)');
|
|
||||||
}
|
|
||||||
|
|
||||||
// Per workflow-12102025.md: immediately get new IP + new fingerprint
|
|
||||||
const { proxy: nextProxy, fingerprint } = crawlRotator.rotateBoth();
|
|
||||||
|
|
||||||
|
// Rotate to next proxy
|
||||||
|
const nextProxy = crawlRotator.rotateProxy();
|
||||||
if (nextProxy) {
|
if (nextProxy) {
|
||||||
currentProxy = crawlRotator.proxy.getProxyUrl(nextProxy);
|
currentProxy = crawlRotator.proxy.getProxyUrl(nextProxy);
|
||||||
console.log(`[Dutchie Client] Rotated to new proxy: ${currentProxy.replace(/:[^:@]+@/, ':***@')}`);
|
console.log(`[Dutchie Client] Rotated proxy: ${currentProxy.replace(/:[^:@]+@/, ':***@')}`);
|
||||||
console.log(`[Dutchie Client] New fingerprint: ${fingerprint.userAgent.slice(0, 50)}...`);
|
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
|
|
||||||
console.error('[Dutchie Client] No more proxies available!');
|
console.warn('[Dutchie Client] No more proxies available');
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Record success on current proxy
|
* Record success on current proxy
|
||||||
* Per workflow-12102025.md: resets consecutive_403_count
|
|
||||||
*/
|
*/
|
||||||
async function recordProxySuccess(responseTimeMs?: number): Promise<void> {
|
async function recordProxySuccess(responseTimeMs?: number): Promise<void> {
|
||||||
if (crawlRotator) {
|
if (crawlRotator) {
|
||||||
@@ -185,69 +162,163 @@ export const GRAPHQL_HASHES = {
|
|||||||
GetAllCitiesByState: 'ae547a0466ace5a48f91e55bf6699eacd87e3a42841560f0c0eabed5a0a920e6',
|
GetAllCitiesByState: 'ae547a0466ace5a48f91e55bf6699eacd87e3a42841560f0c0eabed5a0a920e6',
|
||||||
};
|
};
|
||||||
|
|
||||||
|
// ============================================================
|
||||||
|
// FINGERPRINTS - Browser profiles for anti-detect
|
||||||
|
// ============================================================
|
||||||
|
|
||||||
|
const FINGERPRINTS: Fingerprint[] = [
|
||||||
|
// Chrome Windows (latest) - typical residential user, use first
|
||||||
|
{
|
||||||
|
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
|
||||||
|
acceptLanguage: 'en-US,en;q=0.9',
|
||||||
|
secChUa: '"Google Chrome";v="131", "Chromium";v="131", "Not_A Brand";v="24"',
|
||||||
|
secChUaPlatform: '"Windows"',
|
||||||
|
secChUaMobile: '?0',
|
||||||
|
},
|
||||||
|
// Chrome Mac (latest)
|
||||||
|
{
|
||||||
|
userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
|
||||||
|
acceptLanguage: 'en-US,en;q=0.9',
|
||||||
|
secChUa: '"Google Chrome";v="131", "Chromium";v="131", "Not_A Brand";v="24"',
|
||||||
|
secChUaPlatform: '"macOS"',
|
||||||
|
secChUaMobile: '?0',
|
||||||
|
},
|
||||||
|
// Chrome Windows (120)
|
||||||
|
{
|
||||||
|
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
|
||||||
|
acceptLanguage: 'en-US,en;q=0.9',
|
||||||
|
secChUa: '"Chromium";v="120", "Google Chrome";v="120", "Not-A.Brand";v="99"',
|
||||||
|
secChUaPlatform: '"Windows"',
|
||||||
|
secChUaMobile: '?0',
|
||||||
|
},
|
||||||
|
// Firefox Windows
|
||||||
|
{
|
||||||
|
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:133.0) Gecko/20100101 Firefox/133.0',
|
||||||
|
acceptLanguage: 'en-US,en;q=0.5',
|
||||||
|
},
|
||||||
|
// Safari Mac
|
||||||
|
{
|
||||||
|
userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 14_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15',
|
||||||
|
acceptLanguage: 'en-US,en;q=0.9',
|
||||||
|
},
|
||||||
|
// Edge Windows
|
||||||
|
{
|
||||||
|
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36 Edg/131.0.0.0',
|
||||||
|
acceptLanguage: 'en-US,en;q=0.9',
|
||||||
|
secChUa: '"Microsoft Edge";v="131", "Chromium";v="131", "Not_A Brand";v="24"',
|
||||||
|
secChUaPlatform: '"Windows"',
|
||||||
|
secChUaMobile: '?0',
|
||||||
|
},
|
||||||
|
];
|
||||||
|
|
||||||
|
let currentFingerprintIndex = 0;
|
||||||
|
|
||||||
|
// Forward declaration for session (actual CrawlSession interface defined later)
|
||||||
|
let currentSession: {
|
||||||
|
sessionId: string;
|
||||||
|
fingerprint: Fingerprint;
|
||||||
|
proxyUrl: string | null;
|
||||||
|
stateCode?: string;
|
||||||
|
timezone?: string;
|
||||||
|
startedAt: Date;
|
||||||
|
} | null = null;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Get current fingerprint - returns session fingerprint if active, otherwise default
|
||||||
|
*/
|
||||||
|
export function getFingerprint(): Fingerprint {
|
||||||
|
// Use session fingerprint if a session is active
|
||||||
|
if (currentSession) {
|
||||||
|
return currentSession.fingerprint;
|
||||||
|
}
|
||||||
|
return FINGERPRINTS[currentFingerprintIndex];
|
||||||
|
}
|
||||||
|
|
||||||
|
export function rotateFingerprint(): Fingerprint {
|
||||||
|
currentFingerprintIndex = (currentFingerprintIndex + 1) % FINGERPRINTS.length;
|
||||||
|
const fp = FINGERPRINTS[currentFingerprintIndex];
|
||||||
|
console.log(`[Dutchie Client] Rotated to fingerprint: ${fp.userAgent.slice(0, 50)}...`);
|
||||||
|
return fp;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function resetFingerprint(): void {
|
||||||
|
currentFingerprintIndex = 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Get a random fingerprint from the pool
|
||||||
|
*/
|
||||||
|
export function getRandomFingerprint(): Fingerprint {
|
||||||
|
const index = Math.floor(Math.random() * FINGERPRINTS.length);
|
||||||
|
return FINGERPRINTS[index];
|
||||||
|
}
|
||||||
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
// SESSION MANAGEMENT
|
// SESSION MANAGEMENT
|
||||||
// Per workflow-12102025.md:
|
// Per-session fingerprint rotation for stealth
|
||||||
// - Session identity comes from PROXY LOCATION
|
|
||||||
// - NOT from task params (no stateCode/timezone params)
|
|
||||||
// - Language is always English
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
|
|
||||||
export interface CrawlSession {
|
export interface CrawlSession {
|
||||||
sessionId: string;
|
sessionId: string;
|
||||||
fingerprint: BrowserFingerprint;
|
fingerprint: Fingerprint;
|
||||||
proxyUrl: string | null;
|
proxyUrl: string | null;
|
||||||
proxyTimezone?: string;
|
stateCode?: string;
|
||||||
proxyState?: string;
|
timezone?: string;
|
||||||
startedAt: Date;
|
startedAt: Date;
|
||||||
// Per workflow-12102025.md: Dynamic Referer per dispensary
|
|
||||||
menuUrl?: string;
|
|
||||||
referer: string;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
let currentSession: CrawlSession | null = null;
|
// Note: currentSession variable declared earlier in file for proper scoping
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Start a new crawl session
|
* Timezone to Accept-Language mapping
|
||||||
*
|
* US timezones all use en-US but this can be extended for international
|
||||||
* Per workflow-12102025.md:
|
|
||||||
* - NO state/timezone params - identity comes from proxy location
|
|
||||||
* - Gets fingerprint from CrawlRotator (uses intoli/user-agents)
|
|
||||||
* - Language is always English (en-US)
|
|
||||||
* - Dynamic Referer per dispensary (from menuUrl)
|
|
||||||
*
|
|
||||||
* @param menuUrl - The dispensary's menu URL for dynamic Referer header
|
|
||||||
*/
|
*/
|
||||||
export function startSession(menuUrl?: string): CrawlSession {
|
const TIMEZONE_TO_LOCALE: Record<string, string> = {
|
||||||
if (!crawlRotator) {
|
'America/Phoenix': 'en-US,en;q=0.9',
|
||||||
throw new Error('[Dutchie Client] Cannot start session without CrawlRotator');
|
'America/Los_Angeles': 'en-US,en;q=0.9',
|
||||||
|
'America/Denver': 'en-US,en;q=0.9',
|
||||||
|
'America/Chicago': 'en-US,en;q=0.9',
|
||||||
|
'America/New_York': 'en-US,en;q=0.9',
|
||||||
|
'America/Detroit': 'en-US,en;q=0.9',
|
||||||
|
'America/Anchorage': 'en-US,en;q=0.9',
|
||||||
|
'Pacific/Honolulu': 'en-US,en;q=0.9',
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Get Accept-Language header for a given timezone
|
||||||
|
*/
|
||||||
|
export function getLocaleForTimezone(timezone?: string): string {
|
||||||
|
if (!timezone) return 'en-US,en;q=0.9';
|
||||||
|
return TIMEZONE_TO_LOCALE[timezone] || 'en-US,en;q=0.9';
|
||||||
}
|
}
|
||||||
|
|
||||||
// Per workflow-12102025.md: get identity from proxy location
|
/**
|
||||||
const proxyLocation = crawlRotator.getProxyLocation();
|
* Start a new crawl session with a random fingerprint
|
||||||
const fingerprint = crawlRotator.userAgent.getCurrent();
|
* Call this before crawling a store to get a fresh identity
|
||||||
|
*/
|
||||||
|
export function startSession(stateCode?: string, timezone?: string): CrawlSession {
|
||||||
|
const baseFp = getRandomFingerprint();
|
||||||
|
|
||||||
// Per workflow-12102025.md: Dynamic Referer per dispensary
|
// Override Accept-Language based on timezone for geographic consistency
|
||||||
const referer = buildRefererFromMenuUrl(menuUrl);
|
const fingerprint: Fingerprint = {
|
||||||
|
...baseFp,
|
||||||
|
acceptLanguage: getLocaleForTimezone(timezone),
|
||||||
|
};
|
||||||
|
|
||||||
currentSession = {
|
currentSession = {
|
||||||
sessionId: `session_${Date.now()}_${Math.random().toString(36).slice(2, 8)}`,
|
sessionId: `session_${Date.now()}_${Math.random().toString(36).slice(2, 8)}`,
|
||||||
fingerprint,
|
fingerprint,
|
||||||
proxyUrl: currentProxy,
|
proxyUrl: currentProxy,
|
||||||
proxyTimezone: proxyLocation?.timezone,
|
stateCode,
|
||||||
proxyState: proxyLocation?.state,
|
timezone,
|
||||||
startedAt: new Date(),
|
startedAt: new Date(),
|
||||||
menuUrl,
|
|
||||||
referer,
|
|
||||||
};
|
};
|
||||||
|
|
||||||
console.log(`[Dutchie Client] Started session ${currentSession.sessionId}`);
|
console.log(`[Dutchie Client] Started session ${currentSession.sessionId}`);
|
||||||
console.log(`[Dutchie Client] Browser: ${fingerprint.browserName} (${fingerprint.deviceCategory})`);
|
console.log(`[Dutchie Client] Fingerprint: ${fingerprint.userAgent.slice(0, 50)}...`);
|
||||||
console.log(`[Dutchie Client] DNT: ${fingerprint.httpFingerprint.hasDNT ? 'enabled' : 'disabled'}`);
|
console.log(`[Dutchie Client] Accept-Language: ${fingerprint.acceptLanguage}`);
|
||||||
console.log(`[Dutchie Client] TLS: ${fingerprint.httpFingerprint.curlImpersonateBinary}`);
|
if (timezone) {
|
||||||
console.log(`[Dutchie Client] Referer: ${referer}`);
|
console.log(`[Dutchie Client] Timezone: ${timezone}`);
|
||||||
if (proxyLocation?.timezone) {
|
|
||||||
console.log(`[Dutchie Client] Proxy: ${proxyLocation.state || 'unknown'} (${proxyLocation.timezone})`);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
return currentSession;
|
return currentSession;
|
||||||
@@ -276,80 +347,48 @@ export function getCurrentSession(): CrawlSession | null {
|
|||||||
// ============================================================
|
// ============================================================
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Per workflow-12102025.md: Build headers using HTTP fingerprint system
|
* Build headers for Dutchie requests
|
||||||
* Returns headers in browser-specific order with all natural variations
|
|
||||||
*/
|
*/
|
||||||
export function buildHeaders(isPost: boolean, contentLength?: number): { headers: Record<string, string>; orderedHeaders: string[] } {
|
export function buildHeaders(refererPath: string, fingerprint?: Fingerprint): Record<string, string> {
|
||||||
if (!currentSession || !crawlRotator) {
|
const fp = fingerprint || getFingerprint();
|
||||||
throw new Error('[Dutchie Client] Cannot build headers without active session');
|
const refererUrl = `https://dutchie.com${refererPath}`;
|
||||||
}
|
|
||||||
|
|
||||||
const fp = currentSession.fingerprint;
|
const headers: Record<string, string> = {
|
||||||
const httpFp = fp.httpFingerprint;
|
'accept': 'application/json, text/plain, */*',
|
||||||
|
'accept-language': fp.acceptLanguage,
|
||||||
// Per workflow-12102025.md: Build context for ordered headers
|
'content-type': 'application/json',
|
||||||
const context: HeaderContext = {
|
'origin': 'https://dutchie.com',
|
||||||
userAgent: fp.userAgent,
|
'referer': refererUrl,
|
||||||
secChUa: fp.secChUa,
|
'user-agent': fp.userAgent,
|
||||||
secChUaPlatform: fp.secChUaPlatform,
|
'apollographql-client-name': 'Marketplace (production)',
|
||||||
secChUaMobile: fp.secChUaMobile,
|
|
||||||
referer: currentSession.referer,
|
|
||||||
isPost,
|
|
||||||
contentLength,
|
|
||||||
};
|
};
|
||||||
|
|
||||||
// Per workflow-12102025.md: Get ordered headers from HTTP fingerprint service
|
if (fp.secChUa) {
|
||||||
return buildOrderedHeaders(httpFp, context);
|
headers['sec-ch-ua'] = fp.secChUa;
|
||||||
|
headers['sec-ch-ua-mobile'] = fp.secChUaMobile || '?0';
|
||||||
|
headers['sec-ch-ua-platform'] = fp.secChUaPlatform || '"Windows"';
|
||||||
|
headers['sec-fetch-dest'] = 'empty';
|
||||||
|
headers['sec-fetch-mode'] = 'cors';
|
||||||
|
headers['sec-fetch-site'] = 'same-site';
|
||||||
|
}
|
||||||
|
|
||||||
|
return headers;
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Per workflow-12102025.md: Get curl binary for current session's browser
|
* Execute HTTP POST using curl (bypasses TLS fingerprinting)
|
||||||
* Uses curl-impersonate for TLS fingerprint matching
|
|
||||||
*/
|
*/
|
||||||
function getCurlBinaryForSession(): string {
|
export function curlPost(url: string, body: any, headers: Record<string, string>, timeout = 30000): CurlResponse {
|
||||||
if (!currentSession) {
|
const filteredHeaders = Object.entries(headers)
|
||||||
return 'curl'; // Fallback to standard curl
|
.filter(([k]) => k.toLowerCase() !== 'accept-encoding')
|
||||||
}
|
.map(([k, v]) => `-H '${k}: ${v}'`)
|
||||||
|
|
||||||
const browserType = currentSession.fingerprint.browserName as BrowserType;
|
|
||||||
|
|
||||||
// Per workflow-12102025.md: Check if curl-impersonate is available
|
|
||||||
if (isCurlImpersonateAvailable(browserType)) {
|
|
||||||
return getCurlBinary(browserType);
|
|
||||||
}
|
|
||||||
|
|
||||||
// Fallback to standard curl with warning
|
|
||||||
console.warn(`[Dutchie Client] curl-impersonate not available for ${browserType}, using standard curl`);
|
|
||||||
return 'curl';
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Per workflow-12102025.md: Execute HTTP POST using curl/curl-impersonate
|
|
||||||
* - Uses browser-specific TLS fingerprint via curl-impersonate
|
|
||||||
* - Headers sent in browser-specific order
|
|
||||||
* - Dynamic Referer per dispensary
|
|
||||||
*/
|
|
||||||
export function curlPost(url: string, body: any, timeout = 30000): CurlResponse {
|
|
||||||
const bodyJson = JSON.stringify(body);
|
|
||||||
|
|
||||||
// Per workflow-12102025.md: Build ordered headers for POST request
|
|
||||||
const { headers, orderedHeaders } = buildHeaders(true, bodyJson.length);
|
|
||||||
|
|
||||||
// Per workflow-12102025.md: Build header args in browser-specific order
|
|
||||||
const headerArgs = orderedHeaders
|
|
||||||
.filter(h => h !== 'Host' && h !== 'Content-Length') // curl handles these
|
|
||||||
.map(h => `-H '${h}: ${headers[h]}'`)
|
|
||||||
.join(' ');
|
.join(' ');
|
||||||
|
|
||||||
const bodyEscaped = bodyJson.replace(/'/g, "'\\''");
|
const bodyJson = JSON.stringify(body).replace(/'/g, "'\\''");
|
||||||
const timeoutSec = Math.ceil(timeout / 1000);
|
const timeoutSec = Math.ceil(timeout / 1000);
|
||||||
const separator = '___HTTP_STATUS___';
|
const separator = '___HTTP_STATUS___';
|
||||||
const proxyArg = getProxyArg();
|
const proxyArg = getProxyArg();
|
||||||
|
const cmd = `curl -s --compressed ${proxyArg} -w '${separator}%{http_code}' --max-time ${timeoutSec} ${filteredHeaders} -d '${bodyJson}' '${url}'`;
|
||||||
// Per workflow-12102025.md: Use curl-impersonate for TLS fingerprint matching
|
|
||||||
const curlBinary = getCurlBinaryForSession();
|
|
||||||
|
|
||||||
const cmd = `${curlBinary} -s --compressed ${proxyArg} -w '${separator}%{http_code}' --max-time ${timeoutSec} ${headerArgs} -d '${bodyEscaped}' '${url}'`;
|
|
||||||
|
|
||||||
try {
|
try {
|
||||||
const output = execSync(cmd, {
|
const output = execSync(cmd, {
|
||||||
@@ -388,29 +427,19 @@ export function curlPost(url: string, body: any, timeout = 30000): CurlResponse
|
|||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Per workflow-12102025.md: Execute HTTP GET using curl/curl-impersonate
|
* Execute HTTP GET using curl (bypasses TLS fingerprinting)
|
||||||
* - Uses browser-specific TLS fingerprint via curl-impersonate
|
* Returns HTML or JSON depending on response content-type
|
||||||
* - Headers sent in browser-specific order
|
|
||||||
* - Dynamic Referer per dispensary
|
|
||||||
*/
|
*/
|
||||||
export function curlGet(url: string, timeout = 30000): CurlResponse {
|
export function curlGet(url: string, headers: Record<string, string>, timeout = 30000): CurlResponse {
|
||||||
// Per workflow-12102025.md: Build ordered headers for GET request
|
const filteredHeaders = Object.entries(headers)
|
||||||
const { headers, orderedHeaders } = buildHeaders(false);
|
.filter(([k]) => k.toLowerCase() !== 'accept-encoding')
|
||||||
|
.map(([k, v]) => `-H '${k}: ${v}'`)
|
||||||
// Per workflow-12102025.md: Build header args in browser-specific order
|
|
||||||
const headerArgs = orderedHeaders
|
|
||||||
.filter(h => h !== 'Host' && h !== 'Content-Length') // curl handles these
|
|
||||||
.map(h => `-H '${h}: ${headers[h]}'`)
|
|
||||||
.join(' ');
|
.join(' ');
|
||||||
|
|
||||||
const timeoutSec = Math.ceil(timeout / 1000);
|
const timeoutSec = Math.ceil(timeout / 1000);
|
||||||
const separator = '___HTTP_STATUS___';
|
const separator = '___HTTP_STATUS___';
|
||||||
const proxyArg = getProxyArg();
|
const proxyArg = getProxyArg();
|
||||||
|
const cmd = `curl -s --compressed ${proxyArg} -w '${separator}%{http_code}' --max-time ${timeoutSec} ${filteredHeaders} '${url}'`;
|
||||||
// Per workflow-12102025.md: Use curl-impersonate for TLS fingerprint matching
|
|
||||||
const curlBinary = getCurlBinaryForSession();
|
|
||||||
|
|
||||||
const cmd = `${curlBinary} -s --compressed ${proxyArg} -w '${separator}%{http_code}' --max-time ${timeoutSec} ${headerArgs} '${url}'`;
|
|
||||||
|
|
||||||
try {
|
try {
|
||||||
const output = execSync(cmd, {
|
const output = execSync(cmd, {
|
||||||
@@ -430,6 +459,7 @@ export function curlGet(url: string, timeout = 30000): CurlResponse {
|
|||||||
const responseBody = output.slice(0, separatorIndex);
|
const responseBody = output.slice(0, separatorIndex);
|
||||||
const statusCode = parseInt(output.slice(separatorIndex + separator.length).trim(), 10);
|
const statusCode = parseInt(output.slice(separatorIndex + separator.length).trim(), 10);
|
||||||
|
|
||||||
|
// Try to parse as JSON, otherwise return as string (HTML)
|
||||||
try {
|
try {
|
||||||
return { status: statusCode, data: JSON.parse(responseBody) };
|
return { status: statusCode, data: JSON.parse(responseBody) };
|
||||||
} catch {
|
} catch {
|
||||||
@@ -446,22 +476,16 @@ export function curlGet(url: string, timeout = 30000): CurlResponse {
|
|||||||
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
// GRAPHQL EXECUTION
|
// GRAPHQL EXECUTION
|
||||||
// Per workflow-12102025.md:
|
|
||||||
// - On 403: immediately rotate IP + fingerprint (no delay first)
|
|
||||||
// - Then retry
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
|
|
||||||
export interface ExecuteGraphQLOptions {
|
export interface ExecuteGraphQLOptions {
|
||||||
maxRetries?: number;
|
maxRetries?: number;
|
||||||
retryOn403?: boolean;
|
retryOn403?: boolean;
|
||||||
cName?: string;
|
cName?: string; // Optional - used for Referer header, defaults to 'cities'
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Per workflow-12102025.md: Execute GraphQL query with curl/curl-impersonate
|
* Execute GraphQL query with curl (bypasses TLS fingerprinting)
|
||||||
* - Uses browser-specific TLS fingerprint
|
|
||||||
* - Headers in browser-specific order
|
|
||||||
* - On 403: immediately rotate IP + fingerprint, then retry
|
|
||||||
*/
|
*/
|
||||||
export async function executeGraphQL(
|
export async function executeGraphQL(
|
||||||
operationName: string,
|
operationName: string,
|
||||||
@@ -469,12 +493,7 @@ export async function executeGraphQL(
|
|||||||
hash: string,
|
hash: string,
|
||||||
options: ExecuteGraphQLOptions
|
options: ExecuteGraphQLOptions
|
||||||
): Promise<any> {
|
): Promise<any> {
|
||||||
const { maxRetries = 3, retryOn403 = true } = options;
|
const { maxRetries = 3, retryOn403 = true, cName = 'cities' } = options;
|
||||||
|
|
||||||
// Per workflow-12102025.md: Session must be active for requests
|
|
||||||
if (!currentSession) {
|
|
||||||
throw new Error('[Dutchie Client] Cannot execute GraphQL without active session - call startSession() first');
|
|
||||||
}
|
|
||||||
|
|
||||||
const body = {
|
const body = {
|
||||||
operationName,
|
operationName,
|
||||||
@@ -488,14 +507,14 @@ export async function executeGraphQL(
|
|||||||
let attempt = 0;
|
let attempt = 0;
|
||||||
|
|
||||||
while (attempt <= maxRetries) {
|
while (attempt <= maxRetries) {
|
||||||
|
const fingerprint = getFingerprint();
|
||||||
|
const headers = buildHeaders(`/embedded-menu/${cName}`, fingerprint);
|
||||||
|
|
||||||
console.log(`[Dutchie Client] curl POST ${operationName} (attempt ${attempt + 1}/${maxRetries + 1})`);
|
console.log(`[Dutchie Client] curl POST ${operationName} (attempt ${attempt + 1}/${maxRetries + 1})`);
|
||||||
|
|
||||||
const startTime = Date.now();
|
const response = curlPost(DUTCHIE_CONFIG.graphqlEndpoint, body, headers, DUTCHIE_CONFIG.timeout);
|
||||||
// Per workflow-12102025.md: curlPost now uses ordered headers and curl-impersonate
|
|
||||||
const response = curlPost(DUTCHIE_CONFIG.graphqlEndpoint, body, DUTCHIE_CONFIG.timeout);
|
|
||||||
const responseTime = Date.now() - startTime;
|
|
||||||
|
|
||||||
console.log(`[Dutchie Client] Response status: ${response.status} (${responseTime}ms)`);
|
console.log(`[Dutchie Client] Response status: ${response.status}`);
|
||||||
|
|
||||||
if (response.error) {
|
if (response.error) {
|
||||||
console.error(`[Dutchie Client] curl error: ${response.error}`);
|
console.error(`[Dutchie Client] curl error: ${response.error}`);
|
||||||
@@ -508,9 +527,6 @@ export async function executeGraphQL(
|
|||||||
}
|
}
|
||||||
|
|
||||||
if (response.status === 200) {
|
if (response.status === 200) {
|
||||||
// Per workflow-12102025.md: success resets consecutive 403 count
|
|
||||||
await recordProxySuccess(responseTime);
|
|
||||||
|
|
||||||
if (response.data?.errors?.length > 0) {
|
if (response.data?.errors?.length > 0) {
|
||||||
console.warn(`[Dutchie Client] GraphQL errors: ${JSON.stringify(response.data.errors[0])}`);
|
console.warn(`[Dutchie Client] GraphQL errors: ${JSON.stringify(response.data.errors[0])}`);
|
||||||
}
|
}
|
||||||
@@ -518,20 +534,11 @@ export async function executeGraphQL(
|
|||||||
}
|
}
|
||||||
|
|
||||||
if (response.status === 403 && retryOn403) {
|
if (response.status === 403 && retryOn403) {
|
||||||
// Per workflow-12102025.md: immediately rotate IP + fingerprint
|
console.warn(`[Dutchie Client] 403 blocked - rotating proxy and fingerprint...`);
|
||||||
console.warn(`[Dutchie Client] 403 blocked - immediately rotating proxy + fingerprint...`);
|
await rotateProxyOn403('403 Forbidden on GraphQL');
|
||||||
const hasMoreProxies = await handle403Block();
|
rotateFingerprint();
|
||||||
|
|
||||||
if (!hasMoreProxies) {
|
|
||||||
throw new Error('All proxies exhausted - no more IPs available');
|
|
||||||
}
|
|
||||||
|
|
||||||
// Per workflow-12102025.md: Update session referer after rotation
|
|
||||||
currentSession.referer = buildRefererFromMenuUrl(currentSession.menuUrl);
|
|
||||||
|
|
||||||
attempt++;
|
attempt++;
|
||||||
// Per workflow-12102025.md: small backoff after rotation
|
await sleep(1000 * attempt);
|
||||||
await sleep(500);
|
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -560,10 +567,8 @@ export interface FetchPageOptions {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Per workflow-12102025.md: Fetch HTML page from Dutchie
|
* Fetch HTML page from Dutchie (for city pages, dispensary pages, etc.)
|
||||||
* - Uses browser-specific TLS fingerprint
|
* Returns raw HTML string
|
||||||
* - Headers in browser-specific order
|
|
||||||
* - Same 403 handling as GraphQL
|
|
||||||
*/
|
*/
|
||||||
export async function fetchPage(
|
export async function fetchPage(
|
||||||
path: string,
|
path: string,
|
||||||
@@ -572,22 +577,32 @@ export async function fetchPage(
|
|||||||
const { maxRetries = 3, retryOn403 = true } = options;
|
const { maxRetries = 3, retryOn403 = true } = options;
|
||||||
const url = `${DUTCHIE_CONFIG.baseUrl}${path}`;
|
const url = `${DUTCHIE_CONFIG.baseUrl}${path}`;
|
||||||
|
|
||||||
// Per workflow-12102025.md: Session must be active for requests
|
|
||||||
if (!currentSession) {
|
|
||||||
throw new Error('[Dutchie Client] Cannot fetch page without active session - call startSession() first');
|
|
||||||
}
|
|
||||||
|
|
||||||
let attempt = 0;
|
let attempt = 0;
|
||||||
|
|
||||||
while (attempt <= maxRetries) {
|
while (attempt <= maxRetries) {
|
||||||
// Per workflow-12102025.md: curlGet now uses ordered headers and curl-impersonate
|
const fingerprint = getFingerprint();
|
||||||
|
const headers: Record<string, string> = {
|
||||||
|
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
|
||||||
|
'accept-language': fingerprint.acceptLanguage,
|
||||||
|
'user-agent': fingerprint.userAgent,
|
||||||
|
};
|
||||||
|
|
||||||
|
if (fingerprint.secChUa) {
|
||||||
|
headers['sec-ch-ua'] = fingerprint.secChUa;
|
||||||
|
headers['sec-ch-ua-mobile'] = fingerprint.secChUaMobile || '?0';
|
||||||
|
headers['sec-ch-ua-platform'] = fingerprint.secChUaPlatform || '"Windows"';
|
||||||
|
headers['sec-fetch-dest'] = 'document';
|
||||||
|
headers['sec-fetch-mode'] = 'navigate';
|
||||||
|
headers['sec-fetch-site'] = 'none';
|
||||||
|
headers['sec-fetch-user'] = '?1';
|
||||||
|
headers['upgrade-insecure-requests'] = '1';
|
||||||
|
}
|
||||||
|
|
||||||
console.log(`[Dutchie Client] curl GET ${path} (attempt ${attempt + 1}/${maxRetries + 1})`);
|
console.log(`[Dutchie Client] curl GET ${path} (attempt ${attempt + 1}/${maxRetries + 1})`);
|
||||||
|
|
||||||
const startTime = Date.now();
|
const response = curlGet(url, headers, DUTCHIE_CONFIG.timeout);
|
||||||
const response = curlGet(url, DUTCHIE_CONFIG.timeout);
|
|
||||||
const responseTime = Date.now() - startTime;
|
|
||||||
|
|
||||||
console.log(`[Dutchie Client] Response status: ${response.status} (${responseTime}ms)`);
|
console.log(`[Dutchie Client] Response status: ${response.status}`);
|
||||||
|
|
||||||
if (response.error) {
|
if (response.error) {
|
||||||
console.error(`[Dutchie Client] curl error: ${response.error}`);
|
console.error(`[Dutchie Client] curl error: ${response.error}`);
|
||||||
@@ -599,26 +614,15 @@ export async function fetchPage(
|
|||||||
}
|
}
|
||||||
|
|
||||||
if (response.status === 200) {
|
if (response.status === 200) {
|
||||||
// Per workflow-12102025.md: success resets consecutive 403 count
|
|
||||||
await recordProxySuccess(responseTime);
|
|
||||||
return { html: response.data, status: response.status };
|
return { html: response.data, status: response.status };
|
||||||
}
|
}
|
||||||
|
|
||||||
if (response.status === 403 && retryOn403) {
|
if (response.status === 403 && retryOn403) {
|
||||||
// Per workflow-12102025.md: immediately rotate IP + fingerprint
|
console.warn(`[Dutchie Client] 403 blocked - rotating proxy and fingerprint...`);
|
||||||
console.warn(`[Dutchie Client] 403 blocked - immediately rotating proxy + fingerprint...`);
|
await rotateProxyOn403('403 Forbidden on page fetch');
|
||||||
const hasMoreProxies = await handle403Block();
|
rotateFingerprint();
|
||||||
|
|
||||||
if (!hasMoreProxies) {
|
|
||||||
throw new Error('All proxies exhausted - no more IPs available');
|
|
||||||
}
|
|
||||||
|
|
||||||
// Per workflow-12102025.md: Update session after rotation
|
|
||||||
currentSession.referer = buildRefererFromMenuUrl(currentSession.menuUrl);
|
|
||||||
|
|
||||||
attempt++;
|
attempt++;
|
||||||
// Per workflow-12102025.md: small backoff after rotation
|
await sleep(1000 * attempt);
|
||||||
await sleep(500);
|
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -6,17 +6,22 @@
|
|||||||
*/
|
*/
|
||||||
|
|
||||||
export {
|
export {
|
||||||
// HTTP Client (per workflow-12102025.md: uses curl-impersonate + ordered headers)
|
// HTTP Client
|
||||||
curlPost,
|
curlPost,
|
||||||
curlGet,
|
curlGet,
|
||||||
executeGraphQL,
|
executeGraphQL,
|
||||||
fetchPage,
|
fetchPage,
|
||||||
extractNextData,
|
extractNextData,
|
||||||
|
|
||||||
// Headers (per workflow-12102025.md: browser-specific ordering)
|
// Headers & Fingerprints
|
||||||
buildHeaders,
|
buildHeaders,
|
||||||
|
getFingerprint,
|
||||||
|
rotateFingerprint,
|
||||||
|
resetFingerprint,
|
||||||
|
getRandomFingerprint,
|
||||||
|
getLocaleForTimezone,
|
||||||
|
|
||||||
// Session Management (per workflow-12102025.md: menuUrl for dynamic Referer)
|
// Session Management (per-store fingerprint rotation)
|
||||||
startSession,
|
startSession,
|
||||||
endSession,
|
endSession,
|
||||||
getCurrentSession,
|
getCurrentSession,
|
||||||
|
|||||||
@@ -14,25 +14,13 @@ router.use(authMiddleware);
|
|||||||
/**
|
/**
|
||||||
* GET /api/admin/intelligence/brands
|
* GET /api/admin/intelligence/brands
|
||||||
* List all brands with state presence, store counts, and pricing
|
* List all brands with state presence, store counts, and pricing
|
||||||
* Query params:
|
|
||||||
* - state: Filter by state (e.g., "AZ")
|
|
||||||
* - limit: Max results (default 500)
|
|
||||||
* - offset: Pagination offset
|
|
||||||
*/
|
*/
|
||||||
router.get('/brands', async (req: Request, res: Response) => {
|
router.get('/brands', async (req: Request, res: Response) => {
|
||||||
try {
|
try {
|
||||||
const { limit = '500', offset = '0', state } = req.query;
|
const { limit = '500', offset = '0' } = req.query;
|
||||||
const limitNum = Math.min(parseInt(limit as string, 10), 1000);
|
const limitNum = Math.min(parseInt(limit as string, 10), 1000);
|
||||||
const offsetNum = parseInt(offset as string, 10);
|
const offsetNum = parseInt(offset as string, 10);
|
||||||
|
|
||||||
// Build WHERE clause based on state filter
|
|
||||||
let stateFilter = '';
|
|
||||||
const params: any[] = [limitNum, offsetNum];
|
|
||||||
if (state && state !== 'all') {
|
|
||||||
stateFilter = 'AND d.state = $3';
|
|
||||||
params.push(state);
|
|
||||||
}
|
|
||||||
|
|
||||||
const { rows } = await pool.query(`
|
const { rows } = await pool.query(`
|
||||||
SELECT
|
SELECT
|
||||||
sp.brand_name_raw as brand_name,
|
sp.brand_name_raw as brand_name,
|
||||||
@@ -44,26 +32,17 @@ router.get('/brands', async (req: Request, res: Response) => {
|
|||||||
FROM store_products sp
|
FROM store_products sp
|
||||||
JOIN dispensaries d ON sp.dispensary_id = d.id
|
JOIN dispensaries d ON sp.dispensary_id = d.id
|
||||||
WHERE sp.brand_name_raw IS NOT NULL AND sp.brand_name_raw != ''
|
WHERE sp.brand_name_raw IS NOT NULL AND sp.brand_name_raw != ''
|
||||||
${stateFilter}
|
|
||||||
GROUP BY sp.brand_name_raw
|
GROUP BY sp.brand_name_raw
|
||||||
ORDER BY store_count DESC, sku_count DESC
|
ORDER BY store_count DESC, sku_count DESC
|
||||||
LIMIT $1 OFFSET $2
|
LIMIT $1 OFFSET $2
|
||||||
`, params);
|
`, [limitNum, offsetNum]);
|
||||||
|
|
||||||
// Get total count with same state filter
|
// Get total count
|
||||||
const countParams: any[] = [];
|
|
||||||
let countStateFilter = '';
|
|
||||||
if (state && state !== 'all') {
|
|
||||||
countStateFilter = 'AND d.state = $1';
|
|
||||||
countParams.push(state);
|
|
||||||
}
|
|
||||||
const { rows: countRows } = await pool.query(`
|
const { rows: countRows } = await pool.query(`
|
||||||
SELECT COUNT(DISTINCT sp.brand_name_raw) as total
|
SELECT COUNT(DISTINCT brand_name_raw) as total
|
||||||
FROM store_products sp
|
FROM store_products
|
||||||
JOIN dispensaries d ON sp.dispensary_id = d.id
|
WHERE brand_name_raw IS NOT NULL AND brand_name_raw != ''
|
||||||
WHERE sp.brand_name_raw IS NOT NULL AND sp.brand_name_raw != ''
|
`);
|
||||||
${countStateFilter}
|
|
||||||
`, countParams);
|
|
||||||
|
|
||||||
res.json({
|
res.json({
|
||||||
brands: rows.map((r: any) => ({
|
brands: rows.map((r: any) => ({
|
||||||
@@ -168,42 +147,10 @@ router.get('/brands/:brandName/penetration', async (req: Request, res: Response)
|
|||||||
/**
|
/**
|
||||||
* GET /api/admin/intelligence/pricing
|
* GET /api/admin/intelligence/pricing
|
||||||
* Get pricing analytics by category
|
* Get pricing analytics by category
|
||||||
* Query params:
|
|
||||||
* - state: Filter by state (e.g., "AZ")
|
|
||||||
*/
|
*/
|
||||||
router.get('/pricing', async (req: Request, res: Response) => {
|
router.get('/pricing', async (req: Request, res: Response) => {
|
||||||
try {
|
try {
|
||||||
const { state } = req.query;
|
const { rows: categoryRows } = await pool.query(`
|
||||||
|
|
||||||
// Build WHERE clause based on state filter
|
|
||||||
let stateFilter = '';
|
|
||||||
const categoryParams: any[] = [];
|
|
||||||
const stateQueryParams: any[] = [];
|
|
||||||
const overallParams: any[] = [];
|
|
||||||
|
|
||||||
if (state && state !== 'all') {
|
|
||||||
stateFilter = 'AND d.state = $1';
|
|
||||||
categoryParams.push(state);
|
|
||||||
overallParams.push(state);
|
|
||||||
}
|
|
||||||
|
|
||||||
// Category pricing with optional state filter
|
|
||||||
const categoryQuery = state && state !== 'all'
|
|
||||||
? `
|
|
||||||
SELECT
|
|
||||||
sp.category_raw as category,
|
|
||||||
ROUND(AVG(sp.price_rec)::numeric, 2) as avg_price,
|
|
||||||
MIN(sp.price_rec) as min_price,
|
|
||||||
MAX(sp.price_rec) as max_price,
|
|
||||||
ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY sp.price_rec)::numeric, 2) as median_price,
|
|
||||||
COUNT(*) as product_count
|
|
||||||
FROM store_products sp
|
|
||||||
JOIN dispensaries d ON sp.dispensary_id = d.id
|
|
||||||
WHERE sp.category_raw IS NOT NULL AND sp.price_rec > 0 ${stateFilter}
|
|
||||||
GROUP BY sp.category_raw
|
|
||||||
ORDER BY product_count DESC
|
|
||||||
`
|
|
||||||
: `
|
|
||||||
SELECT
|
SELECT
|
||||||
sp.category_raw as category,
|
sp.category_raw as category,
|
||||||
ROUND(AVG(sp.price_rec)::numeric, 2) as avg_price,
|
ROUND(AVG(sp.price_rec)::numeric, 2) as avg_price,
|
||||||
@@ -215,11 +162,8 @@ router.get('/pricing', async (req: Request, res: Response) => {
|
|||||||
WHERE sp.category_raw IS NOT NULL AND sp.price_rec > 0
|
WHERE sp.category_raw IS NOT NULL AND sp.price_rec > 0
|
||||||
GROUP BY sp.category_raw
|
GROUP BY sp.category_raw
|
||||||
ORDER BY product_count DESC
|
ORDER BY product_count DESC
|
||||||
`;
|
`);
|
||||||
|
|
||||||
const { rows: categoryRows } = await pool.query(categoryQuery, categoryParams);
|
|
||||||
|
|
||||||
// State pricing
|
|
||||||
const { rows: stateRows } = await pool.query(`
|
const { rows: stateRows } = await pool.query(`
|
||||||
SELECT
|
SELECT
|
||||||
d.state,
|
d.state,
|
||||||
@@ -234,31 +178,6 @@ router.get('/pricing', async (req: Request, res: Response) => {
|
|||||||
ORDER BY avg_price DESC
|
ORDER BY avg_price DESC
|
||||||
`);
|
`);
|
||||||
|
|
||||||
// Overall stats with optional state filter
|
|
||||||
const overallQuery = state && state !== 'all'
|
|
||||||
? `
|
|
||||||
SELECT
|
|
||||||
ROUND(AVG(sp.price_rec)::numeric, 2) as avg_price,
|
|
||||||
MIN(sp.price_rec) as min_price,
|
|
||||||
MAX(sp.price_rec) as max_price,
|
|
||||||
COUNT(*) as total_products
|
|
||||||
FROM store_products sp
|
|
||||||
JOIN dispensaries d ON sp.dispensary_id = d.id
|
|
||||||
WHERE sp.price_rec > 0 ${stateFilter}
|
|
||||||
`
|
|
||||||
: `
|
|
||||||
SELECT
|
|
||||||
ROUND(AVG(sp.price_rec)::numeric, 2) as avg_price,
|
|
||||||
MIN(sp.price_rec) as min_price,
|
|
||||||
MAX(sp.price_rec) as max_price,
|
|
||||||
COUNT(*) as total_products
|
|
||||||
FROM store_products sp
|
|
||||||
WHERE sp.price_rec > 0
|
|
||||||
`;
|
|
||||||
|
|
||||||
const { rows: overallRows } = await pool.query(overallQuery, overallParams);
|
|
||||||
const overall = overallRows[0];
|
|
||||||
|
|
||||||
res.json({
|
res.json({
|
||||||
byCategory: categoryRows.map((r: any) => ({
|
byCategory: categoryRows.map((r: any) => ({
|
||||||
category: r.category,
|
category: r.category,
|
||||||
@@ -275,12 +194,6 @@ router.get('/pricing', async (req: Request, res: Response) => {
|
|||||||
maxPrice: r.max_price ? parseFloat(r.max_price) : null,
|
maxPrice: r.max_price ? parseFloat(r.max_price) : null,
|
||||||
productCount: parseInt(r.product_count, 10),
|
productCount: parseInt(r.product_count, 10),
|
||||||
})),
|
})),
|
||||||
overall: {
|
|
||||||
avgPrice: overall?.avg_price ? parseFloat(overall.avg_price) : null,
|
|
||||||
minPrice: overall?.min_price ? parseFloat(overall.min_price) : null,
|
|
||||||
maxPrice: overall?.max_price ? parseFloat(overall.max_price) : null,
|
|
||||||
totalProducts: parseInt(overall?.total_products || '0', 10),
|
|
||||||
},
|
|
||||||
});
|
});
|
||||||
} catch (error: any) {
|
} catch (error: any) {
|
||||||
console.error('[Intelligence] Error fetching pricing:', error.message);
|
console.error('[Intelligence] Error fetching pricing:', error.message);
|
||||||
@@ -291,23 +204,9 @@ router.get('/pricing', async (req: Request, res: Response) => {
|
|||||||
/**
|
/**
|
||||||
* GET /api/admin/intelligence/stores
|
* GET /api/admin/intelligence/stores
|
||||||
* Get store intelligence summary
|
* Get store intelligence summary
|
||||||
* Query params:
|
|
||||||
* - state: Filter by state (e.g., "AZ")
|
|
||||||
* - limit: Max results (default 200)
|
|
||||||
*/
|
*/
|
||||||
router.get('/stores', async (req: Request, res: Response) => {
|
router.get('/stores', async (req: Request, res: Response) => {
|
||||||
try {
|
try {
|
||||||
const { state, limit = '200' } = req.query;
|
|
||||||
const limitNum = Math.min(parseInt(limit as string, 10), 500);
|
|
||||||
|
|
||||||
// Build WHERE clause based on state filter
|
|
||||||
let stateFilter = '';
|
|
||||||
const params: any[] = [limitNum];
|
|
||||||
if (state && state !== 'all') {
|
|
||||||
stateFilter = 'AND d.state = $2';
|
|
||||||
params.push(state);
|
|
||||||
}
|
|
||||||
|
|
||||||
const { rows: storeRows } = await pool.query(`
|
const { rows: storeRows } = await pool.query(`
|
||||||
SELECT
|
SELECT
|
||||||
d.id,
|
d.id,
|
||||||
@@ -317,22 +216,17 @@ router.get('/stores', async (req: Request, res: Response) => {
|
|||||||
d.state,
|
d.state,
|
||||||
d.menu_type,
|
d.menu_type,
|
||||||
d.crawl_enabled,
|
d.crawl_enabled,
|
||||||
c.name as chain_name,
|
COUNT(DISTINCT sp.id) as product_count,
|
||||||
COUNT(DISTINCT sp.id) as sku_count,
|
|
||||||
COUNT(DISTINCT sp.brand_name_raw) as brand_count,
|
COUNT(DISTINCT sp.brand_name_raw) as brand_count,
|
||||||
ROUND(AVG(sp.price_rec)::numeric, 2) as avg_price,
|
ROUND(AVG(sp.price_rec)::numeric, 2) as avg_price,
|
||||||
MAX(sp.updated_at) as last_crawl,
|
MAX(sp.updated_at) as last_product_update
|
||||||
(SELECT COUNT(*) FROM store_product_snapshots sps
|
|
||||||
WHERE sps.store_product_id IN (SELECT id FROM store_products WHERE dispensary_id = d.id)) as snapshot_count
|
|
||||||
FROM dispensaries d
|
FROM dispensaries d
|
||||||
LEFT JOIN store_products sp ON sp.dispensary_id = d.id
|
LEFT JOIN store_products sp ON sp.dispensary_id = d.id
|
||||||
LEFT JOIN chains c ON d.chain_id = c.id
|
WHERE d.state IS NOT NULL
|
||||||
WHERE d.state IS NOT NULL AND d.crawl_enabled = true
|
GROUP BY d.id, d.name, d.dba_name, d.city, d.state, d.menu_type, d.crawl_enabled
|
||||||
${stateFilter}
|
ORDER BY product_count DESC
|
||||||
GROUP BY d.id, d.name, d.dba_name, d.city, d.state, d.menu_type, d.crawl_enabled, c.name
|
LIMIT 200
|
||||||
ORDER BY sku_count DESC
|
`);
|
||||||
LIMIT $1
|
|
||||||
`, params);
|
|
||||||
|
|
||||||
res.json({
|
res.json({
|
||||||
stores: storeRows.map((r: any) => ({
|
stores: storeRows.map((r: any) => ({
|
||||||
@@ -343,13 +237,10 @@ router.get('/stores', async (req: Request, res: Response) => {
|
|||||||
state: r.state,
|
state: r.state,
|
||||||
menuType: r.menu_type,
|
menuType: r.menu_type,
|
||||||
crawlEnabled: r.crawl_enabled,
|
crawlEnabled: r.crawl_enabled,
|
||||||
chainName: r.chain_name || null,
|
productCount: parseInt(r.product_count || '0', 10),
|
||||||
skuCount: parseInt(r.sku_count || '0', 10),
|
|
||||||
snapshotCount: parseInt(r.snapshot_count || '0', 10),
|
|
||||||
brandCount: parseInt(r.brand_count || '0', 10),
|
brandCount: parseInt(r.brand_count || '0', 10),
|
||||||
avgPrice: r.avg_price ? parseFloat(r.avg_price) : null,
|
avgPrice: r.avg_price ? parseFloat(r.avg_price) : null,
|
||||||
lastCrawl: r.last_crawl,
|
lastProductUpdate: r.last_product_update,
|
||||||
crawlFrequencyHours: 4, // Default crawl frequency
|
|
||||||
})),
|
})),
|
||||||
total: storeRows.length,
|
total: storeRows.length,
|
||||||
});
|
});
|
||||||
|
|||||||
@@ -543,9 +543,6 @@ router.post('/bulk-priority', async (req: Request, res: Response) => {
|
|||||||
|
|
||||||
/**
|
/**
|
||||||
* POST /api/job-queue/enqueue - Add a new job to the queue
|
* POST /api/job-queue/enqueue - Add a new job to the queue
|
||||||
*
|
|
||||||
* 2024-12-10: Rewired to use worker_tasks via taskService.
|
|
||||||
* Legacy dispensary_crawl_jobs code commented out below.
|
|
||||||
*/
|
*/
|
||||||
router.post('/enqueue', async (req: Request, res: Response) => {
|
router.post('/enqueue', async (req: Request, res: Response) => {
|
||||||
try {
|
try {
|
||||||
@@ -555,59 +552,6 @@ router.post('/enqueue', async (req: Request, res: Response) => {
|
|||||||
return res.status(400).json({ success: false, error: 'dispensary_id is required' });
|
return res.status(400).json({ success: false, error: 'dispensary_id is required' });
|
||||||
}
|
}
|
||||||
|
|
||||||
// 2024-12-10: Map legacy job_type to new task role
|
|
||||||
const roleMap: Record<string, string> = {
|
|
||||||
'dutchie_product_crawl': 'product_refresh',
|
|
||||||
'menu_detection': 'entry_point_discovery',
|
|
||||||
'menu_detection_single': 'entry_point_discovery',
|
|
||||||
'product_discovery': 'product_discovery',
|
|
||||||
'store_discovery': 'store_discovery',
|
|
||||||
};
|
|
||||||
const role = roleMap[job_type] || 'product_refresh';
|
|
||||||
|
|
||||||
// 2024-12-10: Use taskService to create task in worker_tasks table
|
|
||||||
const { taskService } = await import('../tasks/task-service');
|
|
||||||
|
|
||||||
// Check if task already pending for this dispensary
|
|
||||||
const existingTasks = await taskService.listTasks({
|
|
||||||
dispensary_id,
|
|
||||||
role: role as any,
|
|
||||||
status: ['pending', 'claimed', 'running'],
|
|
||||||
limit: 1,
|
|
||||||
});
|
|
||||||
|
|
||||||
if (existingTasks.length > 0) {
|
|
||||||
return res.json({
|
|
||||||
success: true,
|
|
||||||
task_id: existingTasks[0].id,
|
|
||||||
message: 'Task already queued'
|
|
||||||
});
|
|
||||||
}
|
|
||||||
|
|
||||||
const task = await taskService.createTask({
|
|
||||||
role: role as any,
|
|
||||||
dispensary_id,
|
|
||||||
priority,
|
|
||||||
});
|
|
||||||
|
|
||||||
res.json({ success: true, task_id: task.id, message: 'Task enqueued' });
|
|
||||||
} catch (error: any) {
|
|
||||||
console.error('[JobQueue] Error enqueuing task:', error);
|
|
||||||
res.status(500).json({ success: false, error: error.message });
|
|
||||||
}
|
|
||||||
});
|
|
||||||
|
|
||||||
/*
|
|
||||||
* LEGACY CODE - 2024-12-10: Commented out, was using orphaned dispensary_crawl_jobs table
|
|
||||||
*
|
|
||||||
router.post('/enqueue', async (req: Request, res: Response) => {
|
|
||||||
try {
|
|
||||||
const { dispensary_id, job_type = 'dutchie_product_crawl', priority = 0 } = req.body;
|
|
||||||
|
|
||||||
if (!dispensary_id) {
|
|
||||||
return res.status(400).json({ success: false, error: 'dispensary_id is required' });
|
|
||||||
}
|
|
||||||
|
|
||||||
// Check if job already pending for this dispensary
|
// Check if job already pending for this dispensary
|
||||||
const existing = await pool.query(`
|
const existing = await pool.query(`
|
||||||
SELECT id FROM dispensary_crawl_jobs
|
SELECT id FROM dispensary_crawl_jobs
|
||||||
@@ -641,7 +585,6 @@ router.post('/enqueue', async (req: Request, res: Response) => {
|
|||||||
res.status(500).json({ success: false, error: error.message });
|
res.status(500).json({ success: false, error: error.message });
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
*/
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* POST /api/job-queue/pause - Pause queue processing
|
* POST /api/job-queue/pause - Pause queue processing
|
||||||
@@ -669,8 +612,6 @@ router.get('/paused', async (_req: Request, res: Response) => {
|
|||||||
/**
|
/**
|
||||||
* POST /api/job-queue/enqueue-batch - Queue multiple dispensaries at once
|
* POST /api/job-queue/enqueue-batch - Queue multiple dispensaries at once
|
||||||
* Body: { dispensary_ids: number[], job_type?: string, priority?: number }
|
* Body: { dispensary_ids: number[], job_type?: string, priority?: number }
|
||||||
*
|
|
||||||
* 2024-12-10: Rewired to use worker_tasks via taskService.
|
|
||||||
*/
|
*/
|
||||||
router.post('/enqueue-batch', async (req: Request, res: Response) => {
|
router.post('/enqueue-batch', async (req: Request, res: Response) => {
|
||||||
try {
|
try {
|
||||||
@@ -684,30 +625,35 @@ router.post('/enqueue-batch', async (req: Request, res: Response) => {
|
|||||||
return res.status(400).json({ success: false, error: 'Maximum 500 dispensaries per batch' });
|
return res.status(400).json({ success: false, error: 'Maximum 500 dispensaries per batch' });
|
||||||
}
|
}
|
||||||
|
|
||||||
// 2024-12-10: Map legacy job_type to new task role
|
// Insert jobs, skipping duplicates
|
||||||
const roleMap: Record<string, string> = {
|
const { rows } = await pool.query(`
|
||||||
'dutchie_product_crawl': 'product_refresh',
|
INSERT INTO dispensary_crawl_jobs (dispensary_id, job_type, priority, trigger_type, status, created_at)
|
||||||
'menu_detection': 'entry_point_discovery',
|
SELECT
|
||||||
'product_discovery': 'product_discovery',
|
d.id,
|
||||||
};
|
$2::text,
|
||||||
const role = roleMap[job_type] || 'product_refresh';
|
$3::integer,
|
||||||
|
'api_batch',
|
||||||
// 2024-12-10: Use taskService to create tasks in worker_tasks table
|
'pending',
|
||||||
const { taskService } = await import('../tasks/task-service');
|
NOW()
|
||||||
|
FROM dispensaries d
|
||||||
const tasks = dispensary_ids.map(dispensary_id => ({
|
WHERE d.id = ANY($1::int[])
|
||||||
role: role as any,
|
AND d.crawl_enabled = true
|
||||||
dispensary_id,
|
AND d.platform_dispensary_id IS NOT NULL
|
||||||
priority,
|
AND NOT EXISTS (
|
||||||
}));
|
SELECT 1 FROM dispensary_crawl_jobs cj
|
||||||
|
WHERE cj.dispensary_id = d.id
|
||||||
const createdCount = await taskService.createTasks(tasks);
|
AND cj.job_type = $2::text
|
||||||
|
AND cj.status IN ('pending', 'running')
|
||||||
|
)
|
||||||
|
RETURNING id, dispensary_id
|
||||||
|
`, [dispensary_ids, job_type, priority]);
|
||||||
|
|
||||||
res.json({
|
res.json({
|
||||||
success: true,
|
success: true,
|
||||||
queued: createdCount,
|
queued: rows.length,
|
||||||
requested: dispensary_ids.length,
|
requested: dispensary_ids.length,
|
||||||
message: `Queued ${createdCount} of ${dispensary_ids.length} dispensaries`
|
job_ids: rows.map(r => r.id),
|
||||||
|
message: `Queued ${rows.length} of ${dispensary_ids.length} dispensaries`
|
||||||
});
|
});
|
||||||
} catch (error: any) {
|
} catch (error: any) {
|
||||||
console.error('[JobQueue] Error batch enqueuing:', error);
|
console.error('[JobQueue] Error batch enqueuing:', error);
|
||||||
@@ -718,8 +664,6 @@ router.post('/enqueue-batch', async (req: Request, res: Response) => {
|
|||||||
/**
|
/**
|
||||||
* POST /api/job-queue/enqueue-state - Queue all crawl-enabled dispensaries for a state
|
* POST /api/job-queue/enqueue-state - Queue all crawl-enabled dispensaries for a state
|
||||||
* Body: { state_code: string, job_type?: string, priority?: number, limit?: number }
|
* Body: { state_code: string, job_type?: string, priority?: number, limit?: number }
|
||||||
*
|
|
||||||
* 2024-12-10: Rewired to use worker_tasks via taskService.
|
|
||||||
*/
|
*/
|
||||||
router.post('/enqueue-state', async (req: Request, res: Response) => {
|
router.post('/enqueue-state', async (req: Request, res: Response) => {
|
||||||
try {
|
try {
|
||||||
@@ -729,55 +673,52 @@ router.post('/enqueue-state', async (req: Request, res: Response) => {
|
|||||||
return res.status(400).json({ success: false, error: 'state_code is required (e.g., "AZ")' });
|
return res.status(400).json({ success: false, error: 'state_code is required (e.g., "AZ")' });
|
||||||
}
|
}
|
||||||
|
|
||||||
// 2024-12-10: Map legacy job_type to new task role
|
// Get state_id and queue jobs
|
||||||
const roleMap: Record<string, string> = {
|
const { rows } = await pool.query(`
|
||||||
'dutchie_product_crawl': 'product_refresh',
|
WITH target_state AS (
|
||||||
'menu_detection': 'entry_point_discovery',
|
SELECT id FROM states WHERE code = $1
|
||||||
'product_discovery': 'product_discovery',
|
)
|
||||||
};
|
INSERT INTO dispensary_crawl_jobs (dispensary_id, job_type, priority, trigger_type, status, created_at)
|
||||||
const role = roleMap[job_type] || 'product_refresh';
|
SELECT
|
||||||
|
d.id,
|
||||||
// Get dispensary IDs for the state
|
$2::text,
|
||||||
const dispensaryResult = await pool.query(`
|
$3::integer,
|
||||||
SELECT d.id
|
'api_state',
|
||||||
FROM dispensaries d
|
'pending',
|
||||||
JOIN states s ON s.id = d.state_id
|
NOW()
|
||||||
WHERE s.code = $1
|
FROM dispensaries d, target_state
|
||||||
|
WHERE d.state_id = target_state.id
|
||||||
AND d.crawl_enabled = true
|
AND d.crawl_enabled = true
|
||||||
AND d.platform_dispensary_id IS NOT NULL
|
AND d.platform_dispensary_id IS NOT NULL
|
||||||
LIMIT $2
|
AND NOT EXISTS (
|
||||||
`, [state_code.toUpperCase(), limit]);
|
SELECT 1 FROM dispensary_crawl_jobs cj
|
||||||
|
WHERE cj.dispensary_id = d.id
|
||||||
const dispensary_ids = dispensaryResult.rows.map((r: any) => r.id);
|
AND cj.job_type = $2::text
|
||||||
|
AND cj.status IN ('pending', 'running')
|
||||||
// 2024-12-10: Use taskService to create tasks in worker_tasks table
|
)
|
||||||
const { taskService } = await import('../tasks/task-service');
|
LIMIT $4::integer
|
||||||
|
RETURNING id, dispensary_id
|
||||||
const tasks = dispensary_ids.map((dispensary_id: number) => ({
|
`, [state_code.toUpperCase(), job_type, priority, limit]);
|
||||||
role: role as any,
|
|
||||||
dispensary_id,
|
|
||||||
priority,
|
|
||||||
}));
|
|
||||||
|
|
||||||
const createdCount = await taskService.createTasks(tasks);
|
|
||||||
|
|
||||||
// Get total available count
|
// Get total available count
|
||||||
const countResult = await pool.query(`
|
const countResult = await pool.query(`
|
||||||
|
WITH target_state AS (
|
||||||
|
SELECT id FROM states WHERE code = $1
|
||||||
|
)
|
||||||
SELECT COUNT(*) as total
|
SELECT COUNT(*) as total
|
||||||
FROM dispensaries d
|
FROM dispensaries d, target_state
|
||||||
JOIN states s ON s.id = d.state_id
|
WHERE d.state_id = target_state.id
|
||||||
WHERE s.code = $1
|
|
||||||
AND d.crawl_enabled = true
|
AND d.crawl_enabled = true
|
||||||
AND d.platform_dispensary_id IS NOT NULL
|
AND d.platform_dispensary_id IS NOT NULL
|
||||||
`, [state_code.toUpperCase()]);
|
`, [state_code.toUpperCase()]);
|
||||||
|
|
||||||
res.json({
|
res.json({
|
||||||
success: true,
|
success: true,
|
||||||
queued: createdCount,
|
queued: rows.length,
|
||||||
total_available: parseInt(countResult.rows[0].total),
|
total_available: parseInt(countResult.rows[0].total),
|
||||||
state: state_code.toUpperCase(),
|
state: state_code.toUpperCase(),
|
||||||
role,
|
job_type,
|
||||||
message: `Queued ${createdCount} dispensaries for ${state_code.toUpperCase()}`
|
message: `Queued ${rows.length} dispensaries for ${state_code.toUpperCase()}`
|
||||||
});
|
});
|
||||||
} catch (error: any) {
|
} catch (error: any) {
|
||||||
console.error('[JobQueue] Error enqueuing state:', error);
|
console.error('[JobQueue] Error enqueuing state:', error);
|
||||||
|
|||||||
@@ -78,14 +78,14 @@ router.get('/metrics', async (_req: Request, res: Response) => {
|
|||||||
|
|
||||||
/**
|
/**
|
||||||
* GET /api/admin/orchestrator/states
|
* GET /api/admin/orchestrator/states
|
||||||
* Returns array of states with at least one crawl-enabled dispensary
|
* Returns array of states with at least one known dispensary
|
||||||
*/
|
*/
|
||||||
router.get('/states', async (_req: Request, res: Response) => {
|
router.get('/states', async (_req: Request, res: Response) => {
|
||||||
try {
|
try {
|
||||||
const { rows } = await pool.query(`
|
const { rows } = await pool.query(`
|
||||||
SELECT DISTINCT state, COUNT(*) as store_count
|
SELECT DISTINCT state, COUNT(*) as store_count
|
||||||
FROM dispensaries
|
FROM dispensaries
|
||||||
WHERE state IS NOT NULL AND crawl_enabled = true
|
WHERE state IS NOT NULL
|
||||||
GROUP BY state
|
GROUP BY state
|
||||||
ORDER BY state
|
ORDER BY state
|
||||||
`);
|
`);
|
||||||
|
|||||||
@@ -1,334 +0,0 @@
|
|||||||
/**
|
|
||||||
* Payload Routes
|
|
||||||
*
|
|
||||||
* Per TASK_WORKFLOW_2024-12-10.md: API access to raw crawl payloads.
|
|
||||||
*
|
|
||||||
* Endpoints:
|
|
||||||
* - GET /api/payloads - List payload metadata (paginated)
|
|
||||||
* - GET /api/payloads/:id - Get payload metadata by ID
|
|
||||||
* - GET /api/payloads/:id/data - Get full payload JSON
|
|
||||||
* - GET /api/payloads/store/:dispensaryId - List payloads for a store
|
|
||||||
* - GET /api/payloads/store/:dispensaryId/latest - Get latest payload for a store
|
|
||||||
* - GET /api/payloads/store/:dispensaryId/diff - Diff two payloads
|
|
||||||
*/
|
|
||||||
|
|
||||||
import { Router, Request, Response } from 'express';
|
|
||||||
import { getPool } from '../db/pool';
|
|
||||||
import {
|
|
||||||
loadRawPayloadById,
|
|
||||||
getLatestPayload,
|
|
||||||
getRecentPayloads,
|
|
||||||
listPayloadMetadata,
|
|
||||||
} from '../utils/payload-storage';
|
|
||||||
import { Pool } from 'pg';
|
|
||||||
|
|
||||||
const router = Router();
|
|
||||||
|
|
||||||
// Get pool instance for queries
|
|
||||||
const getDbPool = (): Pool => getPool() as unknown as Pool;
|
|
||||||
|
|
||||||
/**
|
|
||||||
* GET /api/payloads
|
|
||||||
* List payload metadata (paginated)
|
|
||||||
*/
|
|
||||||
router.get('/', async (req: Request, res: Response) => {
|
|
||||||
try {
|
|
||||||
const pool = getDbPool();
|
|
||||||
const limit = Math.min(parseInt(req.query.limit as string) || 50, 100);
|
|
||||||
const offset = parseInt(req.query.offset as string) || 0;
|
|
||||||
const dispensaryId = req.query.dispensary_id ? parseInt(req.query.dispensary_id as string) : undefined;
|
|
||||||
|
|
||||||
const payloads = await listPayloadMetadata(pool, {
|
|
||||||
dispensaryId,
|
|
||||||
limit,
|
|
||||||
offset,
|
|
||||||
});
|
|
||||||
|
|
||||||
res.json({
|
|
||||||
success: true,
|
|
||||||
payloads,
|
|
||||||
pagination: { limit, offset },
|
|
||||||
});
|
|
||||||
} catch (error: any) {
|
|
||||||
console.error('[Payloads] List error:', error.message);
|
|
||||||
res.status(500).json({ success: false, error: error.message });
|
|
||||||
}
|
|
||||||
});
|
|
||||||
|
|
||||||
/**
|
|
||||||
* GET /api/payloads/:id
|
|
||||||
* Get payload metadata by ID
|
|
||||||
*/
|
|
||||||
router.get('/:id', async (req: Request, res: Response) => {
|
|
||||||
try {
|
|
||||||
const pool = getDbPool();
|
|
||||||
const id = parseInt(req.params.id);
|
|
||||||
|
|
||||||
const result = await pool.query(`
|
|
||||||
SELECT
|
|
||||||
p.id,
|
|
||||||
p.dispensary_id,
|
|
||||||
p.crawl_run_id,
|
|
||||||
p.storage_path,
|
|
||||||
p.product_count,
|
|
||||||
p.size_bytes,
|
|
||||||
p.size_bytes_raw,
|
|
||||||
p.fetched_at,
|
|
||||||
p.processed_at,
|
|
||||||
p.checksum_sha256,
|
|
||||||
d.name as dispensary_name
|
|
||||||
FROM raw_crawl_payloads p
|
|
||||||
LEFT JOIN dispensaries d ON d.id = p.dispensary_id
|
|
||||||
WHERE p.id = $1
|
|
||||||
`, [id]);
|
|
||||||
|
|
||||||
if (result.rows.length === 0) {
|
|
||||||
return res.status(404).json({ success: false, error: 'Payload not found' });
|
|
||||||
}
|
|
||||||
|
|
||||||
res.json({
|
|
||||||
success: true,
|
|
||||||
payload: result.rows[0],
|
|
||||||
});
|
|
||||||
} catch (error: any) {
|
|
||||||
console.error('[Payloads] Get error:', error.message);
|
|
||||||
res.status(500).json({ success: false, error: error.message });
|
|
||||||
}
|
|
||||||
});
|
|
||||||
|
|
||||||
/**
|
|
||||||
* GET /api/payloads/:id/data
|
|
||||||
* Get full payload JSON (decompressed from disk)
|
|
||||||
*/
|
|
||||||
router.get('/:id/data', async (req: Request, res: Response) => {
|
|
||||||
try {
|
|
||||||
const pool = getDbPool();
|
|
||||||
const id = parseInt(req.params.id);
|
|
||||||
|
|
||||||
const result = await loadRawPayloadById(pool, id);
|
|
||||||
|
|
||||||
if (!result) {
|
|
||||||
return res.status(404).json({ success: false, error: 'Payload not found' });
|
|
||||||
}
|
|
||||||
|
|
||||||
res.json({
|
|
||||||
success: true,
|
|
||||||
metadata: result.metadata,
|
|
||||||
data: result.payload,
|
|
||||||
});
|
|
||||||
} catch (error: any) {
|
|
||||||
console.error('[Payloads] Get data error:', error.message);
|
|
||||||
res.status(500).json({ success: false, error: error.message });
|
|
||||||
}
|
|
||||||
});
|
|
||||||
|
|
||||||
/**
|
|
||||||
* GET /api/payloads/store/:dispensaryId
|
|
||||||
* List payloads for a specific store
|
|
||||||
*/
|
|
||||||
router.get('/store/:dispensaryId', async (req: Request, res: Response) => {
|
|
||||||
try {
|
|
||||||
const pool = getDbPool();
|
|
||||||
const dispensaryId = parseInt(req.params.dispensaryId);
|
|
||||||
const limit = Math.min(parseInt(req.query.limit as string) || 20, 100);
|
|
||||||
const offset = parseInt(req.query.offset as string) || 0;
|
|
||||||
|
|
||||||
const payloads = await listPayloadMetadata(pool, {
|
|
||||||
dispensaryId,
|
|
||||||
limit,
|
|
||||||
offset,
|
|
||||||
});
|
|
||||||
|
|
||||||
res.json({
|
|
||||||
success: true,
|
|
||||||
dispensaryId,
|
|
||||||
payloads,
|
|
||||||
pagination: { limit, offset },
|
|
||||||
});
|
|
||||||
} catch (error: any) {
|
|
||||||
console.error('[Payloads] Store list error:', error.message);
|
|
||||||
res.status(500).json({ success: false, error: error.message });
|
|
||||||
}
|
|
||||||
});
|
|
||||||
|
|
||||||
/**
|
|
||||||
* GET /api/payloads/store/:dispensaryId/latest
|
|
||||||
* Get the latest payload for a store (with full data)
|
|
||||||
*/
|
|
||||||
router.get('/store/:dispensaryId/latest', async (req: Request, res: Response) => {
|
|
||||||
try {
|
|
||||||
const pool = getDbPool();
|
|
||||||
const dispensaryId = parseInt(req.params.dispensaryId);
|
|
||||||
|
|
||||||
const result = await getLatestPayload(pool, dispensaryId);
|
|
||||||
|
|
||||||
if (!result) {
|
|
||||||
return res.status(404).json({
|
|
||||||
success: false,
|
|
||||||
error: `No payloads found for dispensary ${dispensaryId}`,
|
|
||||||
});
|
|
||||||
}
|
|
||||||
|
|
||||||
res.json({
|
|
||||||
success: true,
|
|
||||||
metadata: result.metadata,
|
|
||||||
data: result.payload,
|
|
||||||
});
|
|
||||||
} catch (error: any) {
|
|
||||||
console.error('[Payloads] Latest error:', error.message);
|
|
||||||
res.status(500).json({ success: false, error: error.message });
|
|
||||||
}
|
|
||||||
});
|
|
||||||
|
|
||||||
/**
|
|
||||||
* GET /api/payloads/store/:dispensaryId/diff
|
|
||||||
* Compare two payloads for a store
|
|
||||||
*
|
|
||||||
* Query params:
|
|
||||||
* - from: payload ID (older)
|
|
||||||
* - to: payload ID (newer) - optional, defaults to latest
|
|
||||||
*/
|
|
||||||
router.get('/store/:dispensaryId/diff', async (req: Request, res: Response) => {
|
|
||||||
try {
|
|
||||||
const pool = getDbPool();
|
|
||||||
const dispensaryId = parseInt(req.params.dispensaryId);
|
|
||||||
const fromId = req.query.from ? parseInt(req.query.from as string) : undefined;
|
|
||||||
const toId = req.query.to ? parseInt(req.query.to as string) : undefined;
|
|
||||||
|
|
||||||
let fromPayload: any;
|
|
||||||
let toPayload: any;
|
|
||||||
|
|
||||||
if (fromId && toId) {
|
|
||||||
// Load specific payloads
|
|
||||||
const [from, to] = await Promise.all([
|
|
||||||
loadRawPayloadById(pool, fromId),
|
|
||||||
loadRawPayloadById(pool, toId),
|
|
||||||
]);
|
|
||||||
fromPayload = from;
|
|
||||||
toPayload = to;
|
|
||||||
} else {
|
|
||||||
// Load two most recent
|
|
||||||
const recent = await getRecentPayloads(pool, dispensaryId, 2);
|
|
||||||
if (recent.length < 2) {
|
|
||||||
return res.status(400).json({
|
|
||||||
success: false,
|
|
||||||
error: 'Need at least 2 payloads to diff. Only found ' + recent.length,
|
|
||||||
});
|
|
||||||
}
|
|
||||||
toPayload = recent[0]; // Most recent
|
|
||||||
fromPayload = recent[1]; // Previous
|
|
||||||
}
|
|
||||||
|
|
||||||
if (!fromPayload || !toPayload) {
|
|
||||||
return res.status(404).json({ success: false, error: 'One or both payloads not found' });
|
|
||||||
}
|
|
||||||
|
|
||||||
// Build product maps by ID
|
|
||||||
const fromProducts = new Map<string, any>();
|
|
||||||
const toProducts = new Map<string, any>();
|
|
||||||
|
|
||||||
for (const p of fromPayload.payload.products || []) {
|
|
||||||
const id = p._id || p.id;
|
|
||||||
if (id) fromProducts.set(id, p);
|
|
||||||
}
|
|
||||||
|
|
||||||
for (const p of toPayload.payload.products || []) {
|
|
||||||
const id = p._id || p.id;
|
|
||||||
if (id) toProducts.set(id, p);
|
|
||||||
}
|
|
||||||
|
|
||||||
// Find differences
|
|
||||||
const added: any[] = [];
|
|
||||||
const removed: any[] = [];
|
|
||||||
const priceChanges: any[] = [];
|
|
||||||
const stockChanges: any[] = [];
|
|
||||||
|
|
||||||
// Products in "to" but not in "from" = added
|
|
||||||
for (const [id, product] of toProducts) {
|
|
||||||
if (!fromProducts.has(id)) {
|
|
||||||
added.push({
|
|
||||||
id,
|
|
||||||
name: product.name,
|
|
||||||
brand: product.brand?.name,
|
|
||||||
price: product.Prices?.[0]?.price,
|
|
||||||
});
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Products in "from" but not in "to" = removed
|
|
||||||
for (const [id, product] of fromProducts) {
|
|
||||||
if (!toProducts.has(id)) {
|
|
||||||
removed.push({
|
|
||||||
id,
|
|
||||||
name: product.name,
|
|
||||||
brand: product.brand?.name,
|
|
||||||
price: product.Prices?.[0]?.price,
|
|
||||||
});
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Products in both - check for changes
|
|
||||||
for (const [id, toProduct] of toProducts) {
|
|
||||||
const fromProduct = fromProducts.get(id);
|
|
||||||
if (!fromProduct) continue;
|
|
||||||
|
|
||||||
const fromPrice = fromProduct.Prices?.[0]?.price;
|
|
||||||
const toPrice = toProduct.Prices?.[0]?.price;
|
|
||||||
|
|
||||||
if (fromPrice !== toPrice) {
|
|
||||||
priceChanges.push({
|
|
||||||
id,
|
|
||||||
name: toProduct.name,
|
|
||||||
brand: toProduct.brand?.name,
|
|
||||||
oldPrice: fromPrice,
|
|
||||||
newPrice: toPrice,
|
|
||||||
change: toPrice && fromPrice ? toPrice - fromPrice : null,
|
|
||||||
});
|
|
||||||
}
|
|
||||||
|
|
||||||
const fromStock = fromProduct.Status || fromProduct.status;
|
|
||||||
const toStock = toProduct.Status || toProduct.status;
|
|
||||||
|
|
||||||
if (fromStock !== toStock) {
|
|
||||||
stockChanges.push({
|
|
||||||
id,
|
|
||||||
name: toProduct.name,
|
|
||||||
brand: toProduct.brand?.name,
|
|
||||||
oldStatus: fromStock,
|
|
||||||
newStatus: toStock,
|
|
||||||
});
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
res.json({
|
|
||||||
success: true,
|
|
||||||
from: {
|
|
||||||
id: fromPayload.metadata.id,
|
|
||||||
fetchedAt: fromPayload.metadata.fetchedAt,
|
|
||||||
productCount: fromPayload.metadata.productCount,
|
|
||||||
},
|
|
||||||
to: {
|
|
||||||
id: toPayload.metadata.id,
|
|
||||||
fetchedAt: toPayload.metadata.fetchedAt,
|
|
||||||
productCount: toPayload.metadata.productCount,
|
|
||||||
},
|
|
||||||
diff: {
|
|
||||||
added: added.length,
|
|
||||||
removed: removed.length,
|
|
||||||
priceChanges: priceChanges.length,
|
|
||||||
stockChanges: stockChanges.length,
|
|
||||||
},
|
|
||||||
details: {
|
|
||||||
added,
|
|
||||||
removed,
|
|
||||||
priceChanges,
|
|
||||||
stockChanges,
|
|
||||||
},
|
|
||||||
});
|
|
||||||
} catch (error: any) {
|
|
||||||
console.error('[Payloads] Diff error:', error.message);
|
|
||||||
res.status(500).json({ success: false, error: error.message });
|
|
||||||
}
|
|
||||||
});
|
|
||||||
|
|
||||||
export default router;
|
|
||||||
@@ -13,12 +13,6 @@ import {
|
|||||||
TaskFilter,
|
TaskFilter,
|
||||||
} from '../tasks/task-service';
|
} from '../tasks/task-service';
|
||||||
import { pool } from '../db/pool';
|
import { pool } from '../db/pool';
|
||||||
import {
|
|
||||||
isTaskPoolPaused,
|
|
||||||
pauseTaskPool,
|
|
||||||
resumeTaskPool,
|
|
||||||
getTaskPoolStatus,
|
|
||||||
} from '../tasks/task-pool-state';
|
|
||||||
|
|
||||||
const router = Router();
|
const router = Router();
|
||||||
|
|
||||||
@@ -598,42 +592,4 @@ router.post('/migration/full-migrate', async (req: Request, res: Response) => {
|
|||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
/**
|
|
||||||
* GET /api/tasks/pool/status
|
|
||||||
* Check if task pool is paused
|
|
||||||
*/
|
|
||||||
router.get('/pool/status', async (_req: Request, res: Response) => {
|
|
||||||
const status = getTaskPoolStatus();
|
|
||||||
res.json({
|
|
||||||
success: true,
|
|
||||||
...status,
|
|
||||||
});
|
|
||||||
});
|
|
||||||
|
|
||||||
/**
|
|
||||||
* POST /api/tasks/pool/pause
|
|
||||||
* Pause the task pool - workers won't pick up new tasks
|
|
||||||
*/
|
|
||||||
router.post('/pool/pause', async (_req: Request, res: Response) => {
|
|
||||||
pauseTaskPool();
|
|
||||||
res.json({
|
|
||||||
success: true,
|
|
||||||
paused: true,
|
|
||||||
message: 'Task pool paused - workers will not pick up new tasks',
|
|
||||||
});
|
|
||||||
});
|
|
||||||
|
|
||||||
/**
|
|
||||||
* POST /api/tasks/pool/resume
|
|
||||||
* Resume the task pool - workers will pick up tasks again
|
|
||||||
*/
|
|
||||||
router.post('/pool/resume', async (_req: Request, res: Response) => {
|
|
||||||
resumeTaskPool();
|
|
||||||
res.json({
|
|
||||||
success: true,
|
|
||||||
paused: false,
|
|
||||||
message: 'Task pool resumed - workers will pick up new tasks',
|
|
||||||
});
|
|
||||||
});
|
|
||||||
|
|
||||||
export default router;
|
export default router;
|
||||||
|
|||||||
@@ -17,167 +17,13 @@
|
|||||||
* GET /api/monitor/jobs - Get recent job history
|
* GET /api/monitor/jobs - Get recent job history
|
||||||
* GET /api/monitor/active-jobs - Get currently running jobs
|
* GET /api/monitor/active-jobs - Get currently running jobs
|
||||||
* GET /api/monitor/summary - Get monitoring summary
|
* GET /api/monitor/summary - Get monitoring summary
|
||||||
*
|
|
||||||
* K8s Scaling (added 2024-12-10):
|
|
||||||
* GET /api/workers/k8s/replicas - Get current replica count
|
|
||||||
* POST /api/workers/k8s/scale - Scale worker replicas up/down
|
|
||||||
*/
|
*/
|
||||||
|
|
||||||
import { Router, Request, Response } from 'express';
|
import { Router, Request, Response } from 'express';
|
||||||
import { pool } from '../db/pool';
|
import { pool } from '../db/pool';
|
||||||
import * as k8s from '@kubernetes/client-node';
|
|
||||||
|
|
||||||
const router = Router();
|
const router = Router();
|
||||||
|
|
||||||
// ============================================================
|
|
||||||
// K8S SCALING CONFIGURATION (added 2024-12-10)
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Admin can scale workers from UI
|
|
||||||
// ============================================================
|
|
||||||
|
|
||||||
const K8S_NAMESPACE = process.env.K8S_NAMESPACE || 'dispensary-scraper';
|
|
||||||
const K8S_STATEFULSET_NAME = process.env.K8S_WORKER_STATEFULSET || 'scraper-worker';
|
|
||||||
|
|
||||||
// Initialize K8s client - uses in-cluster config when running in K8s,
|
|
||||||
// or kubeconfig when running locally
|
|
||||||
let k8sAppsApi: k8s.AppsV1Api | null = null;
|
|
||||||
|
|
||||||
function getK8sClient(): k8s.AppsV1Api | null {
|
|
||||||
if (k8sAppsApi) return k8sAppsApi;
|
|
||||||
|
|
||||||
try {
|
|
||||||
const kc = new k8s.KubeConfig();
|
|
||||||
|
|
||||||
// Try in-cluster config first (when running as a pod)
|
|
||||||
// Falls back to default kubeconfig (~/.kube/config) for local dev
|
|
||||||
try {
|
|
||||||
kc.loadFromCluster();
|
|
||||||
} catch {
|
|
||||||
kc.loadFromDefault();
|
|
||||||
}
|
|
||||||
|
|
||||||
k8sAppsApi = kc.makeApiClient(k8s.AppsV1Api);
|
|
||||||
return k8sAppsApi;
|
|
||||||
} catch (err: any) {
|
|
||||||
console.warn('[Workers] K8s client not available:', err.message);
|
|
||||||
return null;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// ============================================================
|
|
||||||
// K8S SCALING ROUTES (added 2024-12-10)
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Admin can scale workers from UI
|
|
||||||
// ============================================================
|
|
||||||
|
|
||||||
/**
|
|
||||||
* GET /api/workers/k8s/replicas - Get current worker replica count
|
|
||||||
* Returns current and desired replica counts from the StatefulSet
|
|
||||||
*/
|
|
||||||
router.get('/k8s/replicas', async (_req: Request, res: Response) => {
|
|
||||||
const client = getK8sClient();
|
|
||||||
|
|
||||||
if (!client) {
|
|
||||||
return res.status(503).json({
|
|
||||||
success: false,
|
|
||||||
error: 'K8s client not available (not running in cluster or no kubeconfig)',
|
|
||||||
replicas: null,
|
|
||||||
});
|
|
||||||
}
|
|
||||||
|
|
||||||
try {
|
|
||||||
const response = await client.readNamespacedStatefulSet({
|
|
||||||
name: K8S_STATEFULSET_NAME,
|
|
||||||
namespace: K8S_NAMESPACE,
|
|
||||||
});
|
|
||||||
|
|
||||||
const statefulSet = response;
|
|
||||||
res.json({
|
|
||||||
success: true,
|
|
||||||
replicas: {
|
|
||||||
current: statefulSet.status?.readyReplicas || 0,
|
|
||||||
desired: statefulSet.spec?.replicas || 0,
|
|
||||||
available: statefulSet.status?.availableReplicas || 0,
|
|
||||||
updated: statefulSet.status?.updatedReplicas || 0,
|
|
||||||
},
|
|
||||||
statefulset: K8S_STATEFULSET_NAME,
|
|
||||||
namespace: K8S_NAMESPACE,
|
|
||||||
});
|
|
||||||
} catch (err: any) {
|
|
||||||
console.error('[Workers] K8s replicas error:', err.body?.message || err.message);
|
|
||||||
res.status(500).json({
|
|
||||||
success: false,
|
|
||||||
error: err.body?.message || err.message,
|
|
||||||
});
|
|
||||||
}
|
|
||||||
});
|
|
||||||
|
|
||||||
/**
|
|
||||||
* POST /api/workers/k8s/scale - Scale worker replicas
|
|
||||||
* Body: { replicas: number } - desired replica count (1-20)
|
|
||||||
*/
|
|
||||||
router.post('/k8s/scale', async (req: Request, res: Response) => {
|
|
||||||
const client = getK8sClient();
|
|
||||||
|
|
||||||
if (!client) {
|
|
||||||
return res.status(503).json({
|
|
||||||
success: false,
|
|
||||||
error: 'K8s client not available (not running in cluster or no kubeconfig)',
|
|
||||||
});
|
|
||||||
}
|
|
||||||
|
|
||||||
const { replicas } = req.body;
|
|
||||||
|
|
||||||
// Validate replica count
|
|
||||||
if (typeof replicas !== 'number' || replicas < 0 || replicas > 20) {
|
|
||||||
return res.status(400).json({
|
|
||||||
success: false,
|
|
||||||
error: 'replicas must be a number between 0 and 20',
|
|
||||||
});
|
|
||||||
}
|
|
||||||
|
|
||||||
try {
|
|
||||||
// Get current state first
|
|
||||||
const currentResponse = await client.readNamespacedStatefulSetScale({
|
|
||||||
name: K8S_STATEFULSET_NAME,
|
|
||||||
namespace: K8S_NAMESPACE,
|
|
||||||
});
|
|
||||||
const currentReplicas = currentResponse.spec?.replicas || 0;
|
|
||||||
|
|
||||||
// Update scale using replaceNamespacedStatefulSetScale
|
|
||||||
await client.replaceNamespacedStatefulSetScale({
|
|
||||||
name: K8S_STATEFULSET_NAME,
|
|
||||||
namespace: K8S_NAMESPACE,
|
|
||||||
body: {
|
|
||||||
apiVersion: 'autoscaling/v1',
|
|
||||||
kind: 'Scale',
|
|
||||||
metadata: {
|
|
||||||
name: K8S_STATEFULSET_NAME,
|
|
||||||
namespace: K8S_NAMESPACE,
|
|
||||||
},
|
|
||||||
spec: {
|
|
||||||
replicas: replicas,
|
|
||||||
},
|
|
||||||
},
|
|
||||||
});
|
|
||||||
|
|
||||||
console.log(`[Workers] Scaled ${K8S_STATEFULSET_NAME} from ${currentReplicas} to ${replicas} replicas`);
|
|
||||||
|
|
||||||
res.json({
|
|
||||||
success: true,
|
|
||||||
message: `Scaled from ${currentReplicas} to ${replicas} replicas`,
|
|
||||||
previous: currentReplicas,
|
|
||||||
desired: replicas,
|
|
||||||
statefulset: K8S_STATEFULSET_NAME,
|
|
||||||
namespace: K8S_NAMESPACE,
|
|
||||||
});
|
|
||||||
} catch (err: any) {
|
|
||||||
console.error('[Workers] K8s scale error:', err.body?.message || err.message);
|
|
||||||
res.status(500).json({
|
|
||||||
success: false,
|
|
||||||
error: err.body?.message || err.message,
|
|
||||||
});
|
|
||||||
}
|
|
||||||
});
|
|
||||||
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
// STATIC ROUTES (must come before parameterized routes)
|
// STATIC ROUTES (must come before parameterized routes)
|
||||||
// ============================================================
|
// ============================================================
|
||||||
|
|||||||
@@ -16,11 +16,10 @@ import {
|
|||||||
executeGraphQL,
|
executeGraphQL,
|
||||||
startSession,
|
startSession,
|
||||||
endSession,
|
endSession,
|
||||||
setCrawlRotator,
|
getFingerprint,
|
||||||
GRAPHQL_HASHES,
|
GRAPHQL_HASHES,
|
||||||
DUTCHIE_CONFIG,
|
DUTCHIE_CONFIG,
|
||||||
} from '../platforms/dutchie';
|
} from '../platforms/dutchie';
|
||||||
import { CrawlRotator } from '../services/crawl-rotator';
|
|
||||||
|
|
||||||
dotenv.config();
|
dotenv.config();
|
||||||
|
|
||||||
@@ -109,27 +108,19 @@ async function main() {
|
|||||||
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
// STEP 2: Start stealth session
|
// STEP 2: Start stealth session
|
||||||
// Per workflow-12102025.md: Initialize CrawlRotator and start session with menuUrl
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
console.log('┌─────────────────────────────────────────────────────────────┐');
|
console.log('┌─────────────────────────────────────────────────────────────┐');
|
||||||
console.log('│ STEP 2: Start Stealth Session │');
|
console.log('│ STEP 2: Start Stealth Session │');
|
||||||
console.log('└─────────────────────────────────────────────────────────────┘');
|
console.log('└─────────────────────────────────────────────────────────────┘');
|
||||||
|
|
||||||
// Per workflow-12102025.md: Initialize CrawlRotator (required for sessions)
|
// Use Arizona timezone for this store
|
||||||
const rotator = new CrawlRotator();
|
const session = startSession(disp.state || 'AZ', 'America/Phoenix');
|
||||||
setCrawlRotator(rotator);
|
|
||||||
|
|
||||||
// Per workflow-12102025.md: startSession takes menuUrl for dynamic Referer
|
const fp = getFingerprint();
|
||||||
const session = startSession(disp.menu_url);
|
|
||||||
|
|
||||||
const fp = session.fingerprint;
|
|
||||||
console.log(` Session ID: ${session.sessionId}`);
|
console.log(` Session ID: ${session.sessionId}`);
|
||||||
console.log(` Browser: ${fp.browserName} (${fp.deviceCategory})`);
|
|
||||||
console.log(` User-Agent: ${fp.userAgent.slice(0, 60)}...`);
|
console.log(` User-Agent: ${fp.userAgent.slice(0, 60)}...`);
|
||||||
console.log(` Accept-Language: ${fp.acceptLanguage}`);
|
console.log(` Accept-Language: ${fp.acceptLanguage}`);
|
||||||
console.log(` Referer: ${session.referer}`);
|
console.log(` Sec-CH-UA: ${fp.secChUa || '(not set)'}`);
|
||||||
console.log(` DNT: ${fp.httpFingerprint.hasDNT ? 'enabled' : 'disabled'}`);
|
|
||||||
console.log(` TLS: ${fp.httpFingerprint.curlImpersonateBinary}`);
|
|
||||||
console.log('');
|
console.log('');
|
||||||
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
|
|||||||
@@ -1,10 +1,10 @@
|
|||||||
/**
|
/**
|
||||||
* Test script for stealth session management
|
* Test script for stealth session management
|
||||||
*
|
*
|
||||||
* Per workflow-12102025.md:
|
* Tests:
|
||||||
* - Tests HTTP fingerprinting (browser-specific headers + ordering)
|
* 1. Per-session fingerprint rotation
|
||||||
* - Tests UA generation (device distribution, browser filtering)
|
* 2. Geographic consistency (timezone → Accept-Language)
|
||||||
* - Tests dynamic Referer per dispensary
|
* 3. Proxy location loading from database
|
||||||
*
|
*
|
||||||
* Usage:
|
* Usage:
|
||||||
* npx tsx src/scripts/test-stealth-session.ts
|
* npx tsx src/scripts/test-stealth-session.ts
|
||||||
@@ -14,142 +14,104 @@ import {
|
|||||||
startSession,
|
startSession,
|
||||||
endSession,
|
endSession,
|
||||||
getCurrentSession,
|
getCurrentSession,
|
||||||
|
getFingerprint,
|
||||||
|
getRandomFingerprint,
|
||||||
|
getLocaleForTimezone,
|
||||||
buildHeaders,
|
buildHeaders,
|
||||||
setCrawlRotator,
|
|
||||||
} from '../platforms/dutchie';
|
} from '../platforms/dutchie';
|
||||||
|
|
||||||
import { CrawlRotator } from '../services/crawl-rotator';
|
|
||||||
import {
|
|
||||||
generateHTTPFingerprint,
|
|
||||||
buildRefererFromMenuUrl,
|
|
||||||
BrowserType,
|
|
||||||
} from '../services/http-fingerprint';
|
|
||||||
|
|
||||||
console.log('='.repeat(60));
|
console.log('='.repeat(60));
|
||||||
console.log('STEALTH SESSION TEST (per workflow-12102025.md)');
|
console.log('STEALTH SESSION TEST');
|
||||||
console.log('='.repeat(60));
|
console.log('='.repeat(60));
|
||||||
|
|
||||||
// Initialize CrawlRotator (required for sessions)
|
// Test 1: Timezone to Locale mapping
|
||||||
console.log('\n[Setup] Initializing CrawlRotator...');
|
console.log('\n[Test 1] Timezone to Locale Mapping:');
|
||||||
const rotator = new CrawlRotator();
|
const testTimezones = [
|
||||||
setCrawlRotator(rotator);
|
'America/Phoenix',
|
||||||
console.log(' CrawlRotator initialized');
|
'America/Los_Angeles',
|
||||||
|
'America/New_York',
|
||||||
// Test 1: HTTP Fingerprint Generation
|
'America/Chicago',
|
||||||
console.log('\n[Test 1] HTTP Fingerprint Generation:');
|
|
||||||
const browsers: BrowserType[] = ['Chrome', 'Firefox', 'Safari', 'Edge'];
|
|
||||||
|
|
||||||
for (const browser of browsers) {
|
|
||||||
const httpFp = generateHTTPFingerprint(browser);
|
|
||||||
console.log(` ${browser}:`);
|
|
||||||
console.log(` TLS binary: ${httpFp.curlImpersonateBinary}`);
|
|
||||||
console.log(` DNT: ${httpFp.hasDNT ? 'enabled' : 'disabled'}`);
|
|
||||||
console.log(` Header order: ${httpFp.headerOrder.slice(0, 5).join(', ')}...`);
|
|
||||||
}
|
|
||||||
|
|
||||||
// Test 2: Dynamic Referer from menu URLs
|
|
||||||
console.log('\n[Test 2] Dynamic Referer from Menu URLs:');
|
|
||||||
const testUrls = [
|
|
||||||
'https://dutchie.com/embedded-menu/harvest-of-tempe',
|
|
||||||
'https://dutchie.com/dispensary/zen-leaf-mesa',
|
|
||||||
'/embedded-menu/deeply-rooted',
|
|
||||||
'/dispensary/curaleaf-phoenix',
|
|
||||||
null,
|
|
||||||
undefined,
|
undefined,
|
||||||
|
'Invalid/Timezone',
|
||||||
];
|
];
|
||||||
|
|
||||||
for (const url of testUrls) {
|
for (const tz of testTimezones) {
|
||||||
const referer = buildRefererFromMenuUrl(url);
|
const locale = getLocaleForTimezone(tz);
|
||||||
console.log(` ${url || '(null/undefined)'}`);
|
console.log(` ${tz || '(undefined)'} → ${locale}`);
|
||||||
console.log(` → ${referer}`);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// Test 3: Session with Dynamic Referer
|
// Test 2: Random fingerprint selection
|
||||||
console.log('\n[Test 3] Session with Dynamic Referer:');
|
console.log('\n[Test 2] Random Fingerprint Selection (5 samples):');
|
||||||
const testMenuUrl = 'https://dutchie.com/dispensary/harvest-of-tempe';
|
for (let i = 0; i < 5; i++) {
|
||||||
console.log(` Starting session with menuUrl: ${testMenuUrl}`);
|
const fp = getRandomFingerprint();
|
||||||
|
console.log(` ${i + 1}. ${fp.userAgent.slice(0, 60)}...`);
|
||||||
|
}
|
||||||
|
|
||||||
const session1 = startSession(testMenuUrl);
|
// Test 3: Session Management
|
||||||
|
console.log('\n[Test 3] Session Management:');
|
||||||
|
|
||||||
|
// Before session - should use default fingerprint
|
||||||
|
console.log(' Before session:');
|
||||||
|
const beforeFp = getFingerprint();
|
||||||
|
console.log(` getFingerprint(): ${beforeFp.userAgent.slice(0, 50)}...`);
|
||||||
|
console.log(` getCurrentSession(): ${getCurrentSession()}`);
|
||||||
|
|
||||||
|
// Start session with Arizona timezone
|
||||||
|
console.log('\n Starting session (AZ, America/Phoenix):');
|
||||||
|
const session1 = startSession('AZ', 'America/Phoenix');
|
||||||
console.log(` Session ID: ${session1.sessionId}`);
|
console.log(` Session ID: ${session1.sessionId}`);
|
||||||
console.log(` Browser: ${session1.fingerprint.browserName}`);
|
console.log(` Fingerprint UA: ${session1.fingerprint.userAgent.slice(0, 50)}...`);
|
||||||
console.log(` Device: ${session1.fingerprint.deviceCategory}`);
|
console.log(` Accept-Language: ${session1.fingerprint.acceptLanguage}`);
|
||||||
console.log(` Referer: ${session1.referer}`);
|
console.log(` Timezone: ${session1.timezone}`);
|
||||||
console.log(` DNT: ${session1.fingerprint.httpFingerprint.hasDNT ? 'enabled' : 'disabled'}`);
|
|
||||||
console.log(` TLS: ${session1.fingerprint.httpFingerprint.curlImpersonateBinary}`);
|
|
||||||
|
|
||||||
// Test 4: Build Headers (browser-specific order)
|
// During session - should use session fingerprint
|
||||||
console.log('\n[Test 4] Build Headers (browser-specific order):');
|
console.log('\n During session:');
|
||||||
const { headers, orderedHeaders } = buildHeaders(true, 1000);
|
const duringFp = getFingerprint();
|
||||||
console.log(` Headers built for ${session1.fingerprint.browserName}:`);
|
console.log(` getFingerprint(): ${duringFp.userAgent.slice(0, 50)}...`);
|
||||||
console.log(` Order: ${orderedHeaders.join(' → ')}`);
|
console.log(` Same as session? ${duringFp.userAgent === session1.fingerprint.userAgent}`);
|
||||||
console.log(` Sample headers:`);
|
|
||||||
console.log(` User-Agent: ${headers['User-Agent']?.slice(0, 50)}...`);
|
|
||||||
console.log(` Accept: ${headers['Accept']}`);
|
|
||||||
console.log(` Accept-Language: ${headers['Accept-Language']}`);
|
|
||||||
console.log(` Referer: ${headers['Referer']}`);
|
|
||||||
if (headers['sec-ch-ua']) {
|
|
||||||
console.log(` sec-ch-ua: ${headers['sec-ch-ua']}`);
|
|
||||||
}
|
|
||||||
if (headers['DNT']) {
|
|
||||||
console.log(` DNT: ${headers['DNT']}`);
|
|
||||||
}
|
|
||||||
|
|
||||||
|
// Test buildHeaders with session
|
||||||
|
console.log('\n buildHeaders() during session:');
|
||||||
|
const headers = buildHeaders('/embedded-menu/test-store');
|
||||||
|
console.log(` User-Agent: ${headers['user-agent'].slice(0, 50)}...`);
|
||||||
|
console.log(` Accept-Language: ${headers['accept-language']}`);
|
||||||
|
console.log(` Origin: ${headers['origin']}`);
|
||||||
|
console.log(` Referer: ${headers['referer']}`);
|
||||||
|
|
||||||
|
// End session
|
||||||
|
console.log('\n Ending session:');
|
||||||
endSession();
|
endSession();
|
||||||
|
console.log(` getCurrentSession(): ${getCurrentSession()}`);
|
||||||
|
|
||||||
// Test 5: Multiple Sessions (UA variety)
|
// Test 4: Multiple sessions should have different fingerprints
|
||||||
console.log('\n[Test 5] Multiple Sessions (UA & fingerprint variety):');
|
console.log('\n[Test 4] Multiple Sessions (fingerprint variety):');
|
||||||
const sessions: {
|
const fingerprints: string[] = [];
|
||||||
browser: string;
|
|
||||||
device: string;
|
|
||||||
hasDNT: boolean;
|
|
||||||
}[] = [];
|
|
||||||
|
|
||||||
for (let i = 0; i < 10; i++) {
|
for (let i = 0; i < 10; i++) {
|
||||||
const session = startSession(`/dispensary/store-${i}`);
|
const session = startSession('CA', 'America/Los_Angeles');
|
||||||
sessions.push({
|
fingerprints.push(session.fingerprint.userAgent);
|
||||||
browser: session.fingerprint.browserName,
|
|
||||||
device: session.fingerprint.deviceCategory,
|
|
||||||
hasDNT: session.fingerprint.httpFingerprint.hasDNT,
|
|
||||||
});
|
|
||||||
endSession();
|
endSession();
|
||||||
}
|
}
|
||||||
|
|
||||||
// Count distribution
|
const uniqueCount = new Set(fingerprints).size;
|
||||||
const browserCounts: Record<string, number> = {};
|
console.log(` 10 sessions created, ${uniqueCount} unique fingerprints`);
|
||||||
const deviceCounts: Record<string, number> = {};
|
console.log(` Variety: ${uniqueCount >= 3 ? '✅ Good' : '⚠️ Low - may need more fingerprint options'}`);
|
||||||
let dntCount = 0;
|
|
||||||
|
|
||||||
for (const s of sessions) {
|
// Test 5: Geographic consistency check
|
||||||
browserCounts[s.browser] = (browserCounts[s.browser] || 0) + 1;
|
console.log('\n[Test 5] Geographic Consistency:');
|
||||||
deviceCounts[s.device] = (deviceCounts[s.device] || 0) + 1;
|
const geoTests = [
|
||||||
if (s.hasDNT) dntCount++;
|
{ state: 'AZ', tz: 'America/Phoenix' },
|
||||||
}
|
{ state: 'CA', tz: 'America/Los_Angeles' },
|
||||||
|
{ state: 'NY', tz: 'America/New_York' },
|
||||||
|
{ state: 'IL', tz: 'America/Chicago' },
|
||||||
|
];
|
||||||
|
|
||||||
console.log(` 10 sessions created:`);
|
for (const { state, tz } of geoTests) {
|
||||||
console.log(` Browsers: ${JSON.stringify(browserCounts)}`);
|
const session = startSession(state, tz);
|
||||||
console.log(` Devices: ${JSON.stringify(deviceCounts)}`);
|
const consistent = session.fingerprint.acceptLanguage.includes('en-US');
|
||||||
console.log(` DNT enabled: ${dntCount}/10 (expected ~30%)`);
|
console.log(` ${state} (${tz}): Accept-Language=${session.fingerprint.acceptLanguage} ${consistent ? '✅' : '❌'}`);
|
||||||
|
|
||||||
// Test 6: Device distribution check (per workflow-12102025.md: 62/36/2)
|
|
||||||
console.log('\n[Test 6] Device Distribution (larger sample):');
|
|
||||||
const deviceSamples: string[] = [];
|
|
||||||
|
|
||||||
for (let i = 0; i < 100; i++) {
|
|
||||||
const session = startSession();
|
|
||||||
deviceSamples.push(session.fingerprint.deviceCategory);
|
|
||||||
endSession();
|
endSession();
|
||||||
}
|
}
|
||||||
|
|
||||||
const mobileCount = deviceSamples.filter(d => d === 'mobile').length;
|
|
||||||
const desktopCount = deviceSamples.filter(d => d === 'desktop').length;
|
|
||||||
const tabletCount = deviceSamples.filter(d => d === 'tablet').length;
|
|
||||||
|
|
||||||
console.log(` 100 sessions (expected: 62% mobile, 36% desktop, 2% tablet):`);
|
|
||||||
console.log(` Mobile: ${mobileCount}%`);
|
|
||||||
console.log(` Desktop: ${desktopCount}%`);
|
|
||||||
console.log(` Tablet: ${tabletCount}%`);
|
|
||||||
console.log(` Distribution: ${Math.abs(mobileCount - 62) < 15 && Math.abs(desktopCount - 36) < 15 ? '✅ Reasonable' : '⚠️ Off target'}`);
|
|
||||||
|
|
||||||
console.log('\n' + '='.repeat(60));
|
console.log('\n' + '='.repeat(60));
|
||||||
console.log('TEST COMPLETE');
|
console.log('TEST COMPLETE');
|
||||||
console.log('='.repeat(60));
|
console.log('='.repeat(60));
|
||||||
|
|||||||
@@ -1,53 +1,49 @@
|
|||||||
/**
|
/**
|
||||||
* Crawl Rotator - Proxy & User Agent Rotation for Crawlers
|
* Crawl Rotator - Proxy & User Agent Rotation for Crawlers
|
||||||
*
|
*
|
||||||
* Updated: 2025-12-10 per workflow-12102025.md
|
* Manages rotation of proxies and user agents to avoid blocks.
|
||||||
*
|
* Used by platform-specific crawlers (Dutchie, Jane, etc.)
|
||||||
* KEY BEHAVIORS (per workflow-12102025.md):
|
|
||||||
* 1. Task determines WHAT work to do, proxy determines SESSION IDENTITY
|
|
||||||
* 2. Proxy location (timezone) sets Accept-Language headers (always English)
|
|
||||||
* 3. On 403: immediately get new IP, new fingerprint, retry
|
|
||||||
* 4. After 3 consecutive 403s on same proxy with different fingerprints → disable proxy
|
|
||||||
*
|
|
||||||
* USER-AGENT GENERATION (per workflow-12102025.md):
|
|
||||||
* - Device distribution: Mobile 62%, Desktop 36%, Tablet 2%
|
|
||||||
* - Browser whitelist: Chrome, Safari, Edge, Firefox only
|
|
||||||
* - UA sticks until IP rotates
|
|
||||||
* - Failure = alert admin + stop crawl (no fallback)
|
|
||||||
*
|
|
||||||
* Uses intoli/user-agents for realistic UA generation with daily-updated data.
|
|
||||||
*
|
*
|
||||||
* Canonical location: src/services/crawl-rotator.ts
|
* Canonical location: src/services/crawl-rotator.ts
|
||||||
*/
|
*/
|
||||||
|
|
||||||
import { Pool } from 'pg';
|
import { Pool } from 'pg';
|
||||||
import UserAgent from 'user-agents';
|
|
||||||
import {
|
|
||||||
HTTPFingerprint,
|
|
||||||
generateHTTPFingerprint,
|
|
||||||
BrowserType,
|
|
||||||
} from './http-fingerprint';
|
|
||||||
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
// UA CONSTANTS (per workflow-12102025.md)
|
// USER AGENT CONFIGURATION
|
||||||
// ============================================================
|
// ============================================================
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Per workflow-12102025.md: Device category distribution (hardcoded)
|
* Modern browser user agents (Chrome, Firefox, Safari, Edge on various platforms)
|
||||||
* Mobile: 62%, Desktop: 36%, Tablet: 2%
|
* Updated: 2024
|
||||||
*/
|
*/
|
||||||
const DEVICE_WEIGHTS = {
|
export const USER_AGENTS = [
|
||||||
mobile: 62,
|
// Chrome on Windows
|
||||||
desktop: 36,
|
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
|
||||||
tablet: 2,
|
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36',
|
||||||
} as const;
|
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36',
|
||||||
|
|
||||||
/**
|
// Chrome on macOS
|
||||||
* Per workflow-12102025.md: Browser whitelist
|
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
|
||||||
* Only Chrome (67%), Safari (20%), Edge (6%), Firefox (3%)
|
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36',
|
||||||
* Samsung Internet, Opera, and other niche browsers are filtered out
|
|
||||||
*/
|
// Firefox on Windows
|
||||||
const ALLOWED_BROWSERS = ['Chrome', 'Safari', 'Edge', 'Firefox'] as const;
|
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
|
||||||
|
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:120.0) Gecko/20100101 Firefox/120.0',
|
||||||
|
|
||||||
|
// Firefox on macOS
|
||||||
|
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:121.0) Gecko/20100101 Firefox/121.0',
|
||||||
|
|
||||||
|
// Safari on macOS
|
||||||
|
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15',
|
||||||
|
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.1 Safari/605.1.15',
|
||||||
|
|
||||||
|
// Edge on Windows
|
||||||
|
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.0.0',
|
||||||
|
|
||||||
|
// Chrome on Linux
|
||||||
|
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
|
||||||
|
];
|
||||||
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
// PROXY TYPES
|
// PROXY TYPES
|
||||||
@@ -65,13 +61,8 @@ export interface Proxy {
|
|||||||
failureCount: number;
|
failureCount: number;
|
||||||
successCount: number;
|
successCount: number;
|
||||||
avgResponseTimeMs: number | null;
|
avgResponseTimeMs: number | null;
|
||||||
maxConnections: number;
|
maxConnections: number; // Number of concurrent connections allowed (for rotating proxies)
|
||||||
/**
|
// Location info (if known)
|
||||||
* Per workflow-12102025.md: Track consecutive 403s with different fingerprints.
|
|
||||||
* After 3 consecutive 403s → disable proxy (it's burned).
|
|
||||||
*/
|
|
||||||
consecutive403Count: number;
|
|
||||||
// Location info - determines session headers per workflow-12102025.md
|
|
||||||
city?: string;
|
city?: string;
|
||||||
state?: string;
|
state?: string;
|
||||||
country?: string;
|
country?: string;
|
||||||
@@ -86,40 +77,6 @@ export interface ProxyStats {
|
|||||||
avgSuccessRate: number;
|
avgSuccessRate: number;
|
||||||
}
|
}
|
||||||
|
|
||||||
// ============================================================
|
|
||||||
// FINGERPRINT TYPE
|
|
||||||
// Per workflow-12102025.md: Full browser fingerprint from user-agents
|
|
||||||
// ============================================================
|
|
||||||
|
|
||||||
export interface BrowserFingerprint {
|
|
||||||
userAgent: string;
|
|
||||||
platform: string;
|
|
||||||
screenWidth: number;
|
|
||||||
screenHeight: number;
|
|
||||||
viewportWidth: number;
|
|
||||||
viewportHeight: number;
|
|
||||||
deviceCategory: string;
|
|
||||||
browserName: string; // Per workflow-12102025.md: for session logging
|
|
||||||
// Derived headers for anti-detect
|
|
||||||
acceptLanguage: string;
|
|
||||||
secChUa?: string;
|
|
||||||
secChUaPlatform?: string;
|
|
||||||
secChUaMobile?: string;
|
|
||||||
// Per workflow-12102025.md: HTTP Fingerprinting section
|
|
||||||
httpFingerprint: HTTPFingerprint;
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Per workflow-12102025.md: Session log entry for debugging blocked sessions
|
|
||||||
*/
|
|
||||||
export interface UASessionLog {
|
|
||||||
deviceCategory: string;
|
|
||||||
browserName: string;
|
|
||||||
userAgent: string;
|
|
||||||
proxyIp: string | null;
|
|
||||||
sessionStartedAt: Date;
|
|
||||||
}
|
|
||||||
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
// PROXY ROTATOR CLASS
|
// PROXY ROTATOR CLASS
|
||||||
// ============================================================
|
// ============================================================
|
||||||
@@ -134,6 +91,9 @@ export class ProxyRotator {
|
|||||||
this.pool = pool || null;
|
this.pool = pool || null;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Initialize with database pool
|
||||||
|
*/
|
||||||
setPool(pool: Pool): void {
|
setPool(pool: Pool): void {
|
||||||
this.pool = pool;
|
this.pool = pool;
|
||||||
}
|
}
|
||||||
@@ -162,7 +122,6 @@ export class ProxyRotator {
|
|||||||
0 as "successCount",
|
0 as "successCount",
|
||||||
response_time_ms as "avgResponseTimeMs",
|
response_time_ms as "avgResponseTimeMs",
|
||||||
COALESCE(max_connections, 1) as "maxConnections",
|
COALESCE(max_connections, 1) as "maxConnections",
|
||||||
COALESCE(consecutive_403_count, 0) as "consecutive403Count",
|
|
||||||
city,
|
city,
|
||||||
state,
|
state,
|
||||||
country,
|
country,
|
||||||
@@ -175,9 +134,11 @@ export class ProxyRotator {
|
|||||||
|
|
||||||
this.proxies = result.rows;
|
this.proxies = result.rows;
|
||||||
|
|
||||||
|
// Calculate total concurrent capacity
|
||||||
const totalCapacity = this.proxies.reduce((sum, p) => sum + p.maxConnections, 0);
|
const totalCapacity = this.proxies.reduce((sum, p) => sum + p.maxConnections, 0);
|
||||||
console.log(`[ProxyRotator] Loaded ${this.proxies.length} active proxies (${totalCapacity} max concurrent connections)`);
|
console.log(`[ProxyRotator] Loaded ${this.proxies.length} active proxies (${totalCapacity} max concurrent connections)`);
|
||||||
} catch (error) {
|
} catch (error) {
|
||||||
|
// Table might not exist - that's okay
|
||||||
console.warn(`[ProxyRotator] Could not load proxies: ${error}`);
|
console.warn(`[ProxyRotator] Could not load proxies: ${error}`);
|
||||||
this.proxies = [];
|
this.proxies = [];
|
||||||
}
|
}
|
||||||
@@ -189,6 +150,7 @@ export class ProxyRotator {
|
|||||||
getNext(): Proxy | null {
|
getNext(): Proxy | null {
|
||||||
if (this.proxies.length === 0) return null;
|
if (this.proxies.length === 0) return null;
|
||||||
|
|
||||||
|
// Round-robin rotation
|
||||||
this.currentIndex = (this.currentIndex + 1) % this.proxies.length;
|
this.currentIndex = (this.currentIndex + 1) % this.proxies.length;
|
||||||
this.lastRotation = new Date();
|
this.lastRotation = new Date();
|
||||||
|
|
||||||
@@ -223,68 +185,23 @@ export class ProxyRotator {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Mark proxy as blocked (403 received)
|
* Mark proxy as failed (temporarily remove from rotation)
|
||||||
* Per workflow-12102025.md:
|
|
||||||
* - Increment consecutive_403_count
|
|
||||||
* - After 3 consecutive 403s with different fingerprints → disable proxy
|
|
||||||
* - This is separate from general failures (timeouts, etc.)
|
|
||||||
*/
|
|
||||||
async markBlocked(proxyId: number): Promise<boolean> {
|
|
||||||
const proxy = this.proxies.find(p => p.id === proxyId);
|
|
||||||
let shouldDisable = false;
|
|
||||||
|
|
||||||
if (proxy) {
|
|
||||||
proxy.consecutive403Count++;
|
|
||||||
|
|
||||||
// Per workflow-12102025.md: 3 consecutive 403s → proxy is burned
|
|
||||||
if (proxy.consecutive403Count >= 3) {
|
|
||||||
proxy.isActive = false;
|
|
||||||
this.proxies = this.proxies.filter(p => p.id !== proxyId);
|
|
||||||
console.log(`[ProxyRotator] Proxy ${proxyId} DISABLED after ${proxy.consecutive403Count} consecutive 403s (burned)`);
|
|
||||||
shouldDisable = true;
|
|
||||||
} else {
|
|
||||||
console.log(`[ProxyRotator] Proxy ${proxyId} blocked (403 #${proxy.consecutive403Count}/3)`);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Update database
|
|
||||||
if (this.pool) {
|
|
||||||
try {
|
|
||||||
await this.pool.query(`
|
|
||||||
UPDATE proxies
|
|
||||||
SET
|
|
||||||
consecutive_403_count = COALESCE(consecutive_403_count, 0) + 1,
|
|
||||||
last_failure_at = NOW(),
|
|
||||||
test_result = '403 Forbidden',
|
|
||||||
active = CASE WHEN COALESCE(consecutive_403_count, 0) >= 2 THEN false ELSE active END,
|
|
||||||
updated_at = NOW()
|
|
||||||
WHERE id = $1
|
|
||||||
`, [proxyId]);
|
|
||||||
} catch (err) {
|
|
||||||
console.error(`[ProxyRotator] Failed to update proxy ${proxyId}:`, err);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
return shouldDisable;
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Mark proxy as failed (general error - timeout, connection error, etc.)
|
|
||||||
* Separate from 403 blocking per workflow-12102025.md
|
|
||||||
*/
|
*/
|
||||||
async markFailed(proxyId: number, error?: string): Promise<void> {
|
async markFailed(proxyId: number, error?: string): Promise<void> {
|
||||||
|
// Update in-memory
|
||||||
const proxy = this.proxies.find(p => p.id === proxyId);
|
const proxy = this.proxies.find(p => p.id === proxyId);
|
||||||
if (proxy) {
|
if (proxy) {
|
||||||
proxy.failureCount++;
|
proxy.failureCount++;
|
||||||
|
|
||||||
// Deactivate if too many general failures
|
// Deactivate if too many failures
|
||||||
if (proxy.failureCount >= 5) {
|
if (proxy.failureCount >= 5) {
|
||||||
proxy.isActive = false;
|
proxy.isActive = false;
|
||||||
this.proxies = this.proxies.filter(p => p.id !== proxyId);
|
this.proxies = this.proxies.filter(p => p.id !== proxyId);
|
||||||
console.log(`[ProxyRotator] Proxy ${proxyId} deactivated after ${proxy.failureCount} general failures`);
|
console.log(`[ProxyRotator] Proxy ${proxyId} deactivated after ${proxy.failureCount} failures`);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Update database
|
||||||
if (this.pool) {
|
if (this.pool) {
|
||||||
try {
|
try {
|
||||||
await this.pool.query(`
|
await this.pool.query(`
|
||||||
@@ -303,22 +220,23 @@ export class ProxyRotator {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Mark proxy as successful - resets consecutive 403 count
|
* Mark proxy as successful
|
||||||
* Per workflow-12102025.md: successful request clears the 403 counter
|
|
||||||
*/
|
*/
|
||||||
async markSuccess(proxyId: number, responseTimeMs?: number): Promise<void> {
|
async markSuccess(proxyId: number, responseTimeMs?: number): Promise<void> {
|
||||||
|
// Update in-memory
|
||||||
const proxy = this.proxies.find(p => p.id === proxyId);
|
const proxy = this.proxies.find(p => p.id === proxyId);
|
||||||
if (proxy) {
|
if (proxy) {
|
||||||
proxy.successCount++;
|
proxy.successCount++;
|
||||||
proxy.consecutive403Count = 0; // Reset on success per workflow-12102025.md
|
|
||||||
proxy.lastUsedAt = new Date();
|
proxy.lastUsedAt = new Date();
|
||||||
if (responseTimeMs !== undefined) {
|
if (responseTimeMs !== undefined) {
|
||||||
|
// Rolling average
|
||||||
proxy.avgResponseTimeMs = proxy.avgResponseTimeMs
|
proxy.avgResponseTimeMs = proxy.avgResponseTimeMs
|
||||||
? (proxy.avgResponseTimeMs * 0.8) + (responseTimeMs * 0.2)
|
? (proxy.avgResponseTimeMs * 0.8) + (responseTimeMs * 0.2)
|
||||||
: responseTimeMs;
|
: responseTimeMs;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Update database
|
||||||
if (this.pool) {
|
if (this.pool) {
|
||||||
try {
|
try {
|
||||||
await this.pool.query(`
|
await this.pool.query(`
|
||||||
@@ -326,7 +244,6 @@ export class ProxyRotator {
|
|||||||
SET
|
SET
|
||||||
last_tested_at = NOW(),
|
last_tested_at = NOW(),
|
||||||
test_result = 'success',
|
test_result = 'success',
|
||||||
consecutive_403_count = 0,
|
|
||||||
response_time_ms = CASE
|
response_time_ms = CASE
|
||||||
WHEN response_time_ms IS NULL THEN $2
|
WHEN response_time_ms IS NULL THEN $2
|
||||||
ELSE (response_time_ms * 0.8 + $2 * 0.2)::integer
|
ELSE (response_time_ms * 0.8 + $2 * 0.2)::integer
|
||||||
@@ -355,8 +272,8 @@ export class ProxyRotator {
|
|||||||
*/
|
*/
|
||||||
getStats(): ProxyStats {
|
getStats(): ProxyStats {
|
||||||
const totalProxies = this.proxies.length;
|
const totalProxies = this.proxies.length;
|
||||||
const activeProxies = this.proxies.reduce((sum, p) => sum + p.maxConnections, 0);
|
const activeProxies = this.proxies.reduce((sum, p) => sum + p.maxConnections, 0); // Total concurrent capacity
|
||||||
const blockedProxies = this.proxies.filter(p => p.failureCount >= 5 || p.consecutive403Count >= 3).length;
|
const blockedProxies = this.proxies.filter(p => p.failureCount >= 5).length;
|
||||||
|
|
||||||
const successRates = this.proxies
|
const successRates = this.proxies
|
||||||
.filter(p => p.successCount + p.failureCount > 0)
|
.filter(p => p.successCount + p.failureCount > 0)
|
||||||
@@ -368,12 +285,15 @@ export class ProxyRotator {
|
|||||||
|
|
||||||
return {
|
return {
|
||||||
totalProxies,
|
totalProxies,
|
||||||
activeProxies,
|
activeProxies, // Total concurrent capacity across all proxies
|
||||||
blockedProxies,
|
blockedProxies,
|
||||||
avgSuccessRate,
|
avgSuccessRate,
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Check if proxy pool has available proxies
|
||||||
|
*/
|
||||||
hasAvailableProxies(): boolean {
|
hasAvailableProxies(): boolean {
|
||||||
return this.proxies.length > 0;
|
return this.proxies.length > 0;
|
||||||
}
|
}
|
||||||
@@ -381,194 +301,53 @@ export class ProxyRotator {
|
|||||||
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
// USER AGENT ROTATOR CLASS
|
// USER AGENT ROTATOR CLASS
|
||||||
// Per workflow-12102025.md: Uses intoli/user-agents for realistic fingerprints
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
|
|
||||||
export class UserAgentRotator {
|
export class UserAgentRotator {
|
||||||
private currentFingerprint: BrowserFingerprint | null = null;
|
private userAgents: string[];
|
||||||
private sessionLog: UASessionLog | null = null;
|
private currentIndex: number = 0;
|
||||||
|
private lastRotation: Date = new Date();
|
||||||
|
|
||||||
constructor() {
|
constructor(userAgents: string[] = USER_AGENTS) {
|
||||||
// Per workflow-12102025.md: Initialize with first fingerprint
|
this.userAgents = userAgents;
|
||||||
this.rotate();
|
// Start at random index to avoid patterns
|
||||||
|
this.currentIndex = Math.floor(Math.random() * userAgents.length);
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Per workflow-12102025.md: Roll device category based on distribution
|
* Get next user agent in rotation
|
||||||
* Mobile: 62%, Desktop: 36%, Tablet: 2%
|
|
||||||
*/
|
*/
|
||||||
private rollDeviceCategory(): 'mobile' | 'desktop' | 'tablet' {
|
getNext(): string {
|
||||||
const roll = Math.random() * 100;
|
this.currentIndex = (this.currentIndex + 1) % this.userAgents.length;
|
||||||
if (roll < DEVICE_WEIGHTS.mobile) {
|
this.lastRotation = new Date();
|
||||||
return 'mobile';
|
return this.userAgents[this.currentIndex];
|
||||||
} else if (roll < DEVICE_WEIGHTS.mobile + DEVICE_WEIGHTS.desktop) {
|
|
||||||
return 'desktop';
|
|
||||||
} else {
|
|
||||||
return 'tablet';
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Per workflow-12102025.md: Extract browser name from UA string
|
* Get current user agent without rotating
|
||||||
*/
|
*/
|
||||||
private extractBrowserName(userAgent: string): string {
|
getCurrent(): string {
|
||||||
if (userAgent.includes('Edg/')) return 'Edge';
|
return this.userAgents[this.currentIndex];
|
||||||
if (userAgent.includes('Firefox/')) return 'Firefox';
|
|
||||||
if (userAgent.includes('Safari/') && !userAgent.includes('Chrome/')) return 'Safari';
|
|
||||||
if (userAgent.includes('Chrome/')) return 'Chrome';
|
|
||||||
return 'Unknown';
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Per workflow-12102025.md: Check if browser is in whitelist
|
* Get a random user agent
|
||||||
*/
|
*/
|
||||||
private isAllowedBrowser(userAgent: string): boolean {
|
getRandom(): string {
|
||||||
const browserName = this.extractBrowserName(userAgent);
|
const index = Math.floor(Math.random() * this.userAgents.length);
|
||||||
return ALLOWED_BROWSERS.includes(browserName as typeof ALLOWED_BROWSERS[number]);
|
return this.userAgents[index];
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Generate a new random fingerprint
|
* Get total available user agents
|
||||||
* Per workflow-12102025.md:
|
|
||||||
* - Roll device category (62/36/2)
|
|
||||||
* - Filter to top 4 browsers only
|
|
||||||
* - Failure = alert admin + stop (no fallback)
|
|
||||||
*/
|
*/
|
||||||
rotate(proxyIp?: string): BrowserFingerprint {
|
|
||||||
// Per workflow-12102025.md: Roll device category
|
|
||||||
const deviceCategory = this.rollDeviceCategory();
|
|
||||||
|
|
||||||
// Per workflow-12102025.md: Generate UA filtered to device category
|
|
||||||
const generator = new UserAgent({ deviceCategory });
|
|
||||||
|
|
||||||
// Per workflow-12102025.md: Try to get an allowed browser (max 50 attempts)
|
|
||||||
let ua: ReturnType<typeof generator>;
|
|
||||||
let attempts = 0;
|
|
||||||
const maxAttempts = 50;
|
|
||||||
|
|
||||||
do {
|
|
||||||
ua = generator();
|
|
||||||
attempts++;
|
|
||||||
} while (!this.isAllowedBrowser(ua.data.userAgent) && attempts < maxAttempts);
|
|
||||||
|
|
||||||
// Per workflow-12102025.md: If we can't get allowed browser, this is a failure
|
|
||||||
if (!this.isAllowedBrowser(ua.data.userAgent)) {
|
|
||||||
const errorMsg = `[UserAgentRotator] CRITICAL: Failed to generate allowed browser after ${maxAttempts} attempts. Device: ${deviceCategory}. Last UA: ${ua.data.userAgent}`;
|
|
||||||
console.error(errorMsg);
|
|
||||||
// Per workflow-12102025.md: Alert admin + stop crawl
|
|
||||||
// TODO: Post alert to admin dashboard
|
|
||||||
throw new Error(errorMsg);
|
|
||||||
}
|
|
||||||
|
|
||||||
const data = ua.data;
|
|
||||||
const browserName = this.extractBrowserName(data.userAgent);
|
|
||||||
|
|
||||||
// Build sec-ch-ua headers from user agent string
|
|
||||||
const secChUa = this.buildSecChUa(data.userAgent, deviceCategory);
|
|
||||||
|
|
||||||
// Per workflow-12102025.md: HTTP Fingerprinting - generate full HTTP fingerprint
|
|
||||||
const httpFingerprint = generateHTTPFingerprint(browserName as BrowserType);
|
|
||||||
|
|
||||||
this.currentFingerprint = {
|
|
||||||
userAgent: data.userAgent,
|
|
||||||
platform: data.platform,
|
|
||||||
screenWidth: data.screenWidth,
|
|
||||||
screenHeight: data.screenHeight,
|
|
||||||
viewportWidth: data.viewportWidth,
|
|
||||||
viewportHeight: data.viewportHeight,
|
|
||||||
deviceCategory: data.deviceCategory,
|
|
||||||
browserName, // Per workflow-12102025.md: for session logging
|
|
||||||
// Per workflow-12102025.md: always English
|
|
||||||
acceptLanguage: 'en-US,en;q=0.9',
|
|
||||||
...secChUa,
|
|
||||||
// Per workflow-12102025.md: HTTP Fingerprinting section
|
|
||||||
httpFingerprint,
|
|
||||||
};
|
|
||||||
|
|
||||||
// Per workflow-12102025.md: Log session data
|
|
||||||
this.sessionLog = {
|
|
||||||
deviceCategory,
|
|
||||||
browserName,
|
|
||||||
userAgent: data.userAgent,
|
|
||||||
proxyIp: proxyIp || null,
|
|
||||||
sessionStartedAt: new Date(),
|
|
||||||
};
|
|
||||||
|
|
||||||
console.log(`[UserAgentRotator] New fingerprint: device=${deviceCategory}, browser=${browserName}, UA=${data.userAgent.slice(0, 50)}...`);
|
|
||||||
return this.currentFingerprint;
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Get current fingerprint without rotating
|
|
||||||
*/
|
|
||||||
getCurrent(): BrowserFingerprint {
|
|
||||||
if (!this.currentFingerprint) {
|
|
||||||
return this.rotate();
|
|
||||||
}
|
|
||||||
return this.currentFingerprint;
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Get a random fingerprint (rotates and returns)
|
|
||||||
*/
|
|
||||||
getRandom(proxyIp?: string): BrowserFingerprint {
|
|
||||||
return this.rotate(proxyIp);
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Per workflow-12102025.md: Get session log for debugging
|
|
||||||
*/
|
|
||||||
getSessionLog(): UASessionLog | null {
|
|
||||||
return this.sessionLog;
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Build sec-ch-ua headers from user agent string
|
|
||||||
* Per workflow-12102025.md: Include mobile indicator based on device category
|
|
||||||
*/
|
|
||||||
private buildSecChUa(userAgent: string, deviceCategory: string): { secChUa?: string; secChUaPlatform?: string; secChUaMobile?: string } {
|
|
||||||
const isMobile = deviceCategory === 'mobile' || deviceCategory === 'tablet';
|
|
||||||
|
|
||||||
// Extract Chrome version if present
|
|
||||||
const chromeMatch = userAgent.match(/Chrome\/(\d+)/);
|
|
||||||
const edgeMatch = userAgent.match(/Edg\/(\d+)/);
|
|
||||||
|
|
||||||
if (edgeMatch) {
|
|
||||||
const version = edgeMatch[1];
|
|
||||||
return {
|
|
||||||
secChUa: `"Microsoft Edge";v="${version}", "Chromium";v="${version}", "Not_A Brand";v="24"`,
|
|
||||||
secChUaPlatform: userAgent.includes('Windows') ? '"Windows"' : userAgent.includes('Android') ? '"Android"' : '"macOS"',
|
|
||||||
secChUaMobile: isMobile ? '?1' : '?0',
|
|
||||||
};
|
|
||||||
}
|
|
||||||
|
|
||||||
if (chromeMatch) {
|
|
||||||
const version = chromeMatch[1];
|
|
||||||
let platform = '"Linux"';
|
|
||||||
if (userAgent.includes('Windows')) platform = '"Windows"';
|
|
||||||
else if (userAgent.includes('Mac')) platform = '"macOS"';
|
|
||||||
else if (userAgent.includes('Android')) platform = '"Android"';
|
|
||||||
else if (userAgent.includes('iPhone') || userAgent.includes('iPad')) platform = '"iOS"';
|
|
||||||
|
|
||||||
return {
|
|
||||||
secChUa: `"Google Chrome";v="${version}", "Chromium";v="${version}", "Not_A Brand";v="24"`,
|
|
||||||
secChUaPlatform: platform,
|
|
||||||
secChUaMobile: isMobile ? '?1' : '?0',
|
|
||||||
};
|
|
||||||
}
|
|
||||||
|
|
||||||
// Firefox/Safari don't send sec-ch-ua
|
|
||||||
return {};
|
|
||||||
}
|
|
||||||
|
|
||||||
getCount(): number {
|
getCount(): number {
|
||||||
return 1; // user-agents generates dynamically
|
return this.userAgents.length;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
// COMBINED ROTATOR
|
// COMBINED ROTATOR (for convenience)
|
||||||
// Per workflow-12102025.md: Coordinates proxy + fingerprint rotation
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
|
|
||||||
export class CrawlRotator {
|
export class CrawlRotator {
|
||||||
@@ -580,51 +359,49 @@ export class CrawlRotator {
|
|||||||
this.userAgent = new UserAgentRotator();
|
this.userAgent = new UserAgentRotator();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Initialize rotator (load proxies from DB)
|
||||||
|
*/
|
||||||
async initialize(): Promise<void> {
|
async initialize(): Promise<void> {
|
||||||
await this.proxy.loadProxies();
|
await this.proxy.loadProxies();
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Rotate proxy only (get new IP)
|
* Rotate proxy only
|
||||||
*/
|
*/
|
||||||
rotateProxy(): Proxy | null {
|
rotateProxy(): Proxy | null {
|
||||||
return this.proxy.getNext();
|
return this.proxy.getNext();
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Rotate fingerprint only (new UA, screen size, etc.)
|
* Rotate user agent only
|
||||||
*/
|
*/
|
||||||
rotateFingerprint(): BrowserFingerprint {
|
rotateUserAgent(): string {
|
||||||
return this.userAgent.rotate();
|
return this.userAgent.getNext();
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Rotate both proxy and fingerprint
|
* Rotate both proxy and user agent
|
||||||
* Per workflow-12102025.md: called on 403 for fresh identity
|
|
||||||
* Passes proxy IP to UA rotation for session logging
|
|
||||||
*/
|
*/
|
||||||
rotateBoth(): { proxy: Proxy | null; fingerprint: BrowserFingerprint } {
|
rotateBoth(): { proxy: Proxy | null; userAgent: string } {
|
||||||
const proxy = this.proxy.getNext();
|
|
||||||
const proxyIp = proxy ? proxy.host : undefined;
|
|
||||||
return {
|
return {
|
||||||
proxy,
|
proxy: this.proxy.getNext(),
|
||||||
fingerprint: this.userAgent.rotate(proxyIp),
|
userAgent: this.userAgent.getNext(),
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Get current proxy and fingerprint without rotating
|
* Get current proxy and user agent without rotating
|
||||||
*/
|
*/
|
||||||
getCurrent(): { proxy: Proxy | null; fingerprint: BrowserFingerprint } {
|
getCurrent(): { proxy: Proxy | null; userAgent: string } {
|
||||||
return {
|
return {
|
||||||
proxy: this.proxy.getCurrent(),
|
proxy: this.proxy.getCurrent(),
|
||||||
fingerprint: this.userAgent.getCurrent(),
|
userAgent: this.userAgent.getCurrent(),
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Record success for current proxy
|
* Record success for current proxy
|
||||||
* Per workflow-12102025.md: resets consecutive 403 count
|
|
||||||
*/
|
*/
|
||||||
async recordSuccess(responseTimeMs?: number): Promise<void> {
|
async recordSuccess(responseTimeMs?: number): Promise<void> {
|
||||||
const current = this.proxy.getCurrent();
|
const current = this.proxy.getCurrent();
|
||||||
@@ -634,20 +411,7 @@ export class CrawlRotator {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Record 403 block for current proxy
|
* Record failure for current proxy
|
||||||
* Per workflow-12102025.md: increments consecutive_403_count, disables after 3
|
|
||||||
* Returns true if proxy was disabled
|
|
||||||
*/
|
|
||||||
async recordBlock(): Promise<boolean> {
|
|
||||||
const current = this.proxy.getCurrent();
|
|
||||||
if (current) {
|
|
||||||
return await this.proxy.markBlocked(current.id);
|
|
||||||
}
|
|
||||||
return false;
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Record general failure (not 403)
|
|
||||||
*/
|
*/
|
||||||
async recordFailure(error?: string): Promise<void> {
|
async recordFailure(error?: string): Promise<void> {
|
||||||
const current = this.proxy.getCurrent();
|
const current = this.proxy.getCurrent();
|
||||||
@@ -657,13 +421,14 @@ export class CrawlRotator {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Get current proxy location info
|
* Get current proxy location info (for reporting)
|
||||||
* Per workflow-12102025.md: proxy location determines session headers
|
* Note: For rotating proxies (like IPRoyal), the actual exit location varies per request
|
||||||
*/
|
*/
|
||||||
getProxyLocation(): { city?: string; state?: string; country?: string; timezone?: string; isRotating: boolean } | null {
|
getProxyLocation(): { city?: string; state?: string; country?: string; timezone?: string; isRotating: boolean } | null {
|
||||||
const current = this.proxy.getCurrent();
|
const current = this.proxy.getCurrent();
|
||||||
if (!current) return null;
|
if (!current) return null;
|
||||||
|
|
||||||
|
// Check if this is a rotating proxy (max_connections > 1 usually indicates rotating)
|
||||||
const isRotating = current.maxConnections > 1;
|
const isRotating = current.maxConnections > 1;
|
||||||
|
|
||||||
return {
|
return {
|
||||||
@@ -674,15 +439,6 @@ export class CrawlRotator {
|
|||||||
isRotating
|
isRotating
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
|
||||||
* Get timezone from current proxy
|
|
||||||
* Per workflow-12102025.md: used for Accept-Language header
|
|
||||||
*/
|
|
||||||
getProxyTimezone(): string | undefined {
|
|
||||||
const current = this.proxy.getCurrent();
|
|
||||||
return current?.timezone;
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
|
|||||||
@@ -1,315 +0,0 @@
|
|||||||
/**
|
|
||||||
* HTTP Fingerprinting Service
|
|
||||||
*
|
|
||||||
* Per workflow-12102025.md - HTTP Fingerprinting section:
|
|
||||||
* - Full header set per browser type
|
|
||||||
* - Browser-specific header ordering
|
|
||||||
* - Natural randomization (DNT, Accept quality)
|
|
||||||
* - Dynamic Referer per dispensary
|
|
||||||
*
|
|
||||||
* Canonical location: src/services/http-fingerprint.ts
|
|
||||||
*/
|
|
||||||
|
|
||||||
// ============================================================
|
|
||||||
// TYPES
|
|
||||||
// ============================================================
|
|
||||||
|
|
||||||
export type BrowserType = 'Chrome' | 'Firefox' | 'Safari' | 'Edge';
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Per workflow-12102025.md: Full HTTP fingerprint for a session
|
|
||||||
*/
|
|
||||||
export interface HTTPFingerprint {
|
|
||||||
browserType: BrowserType;
|
|
||||||
headers: Record<string, string>;
|
|
||||||
headerOrder: string[];
|
|
||||||
curlImpersonateBinary: string;
|
|
||||||
hasDNT: boolean;
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Per workflow-12102025.md: Context for building headers
|
|
||||||
*/
|
|
||||||
export interface HeaderContext {
|
|
||||||
userAgent: string;
|
|
||||||
secChUa?: string;
|
|
||||||
secChUaPlatform?: string;
|
|
||||||
secChUaMobile?: string;
|
|
||||||
referer: string;
|
|
||||||
isPost: boolean;
|
|
||||||
contentLength?: number;
|
|
||||||
}
|
|
||||||
|
|
||||||
// ============================================================
|
|
||||||
// CONSTANTS (per workflow-12102025.md)
|
|
||||||
// ============================================================
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Per workflow-12102025.md: DNT header distribution (~30% of users)
|
|
||||||
*/
|
|
||||||
const DNT_PROBABILITY = 0.30;
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Per workflow-12102025.md: Accept header variations for natural traffic
|
|
||||||
*/
|
|
||||||
const ACCEPT_VARIATIONS = [
|
|
||||||
'application/json, text/plain, */*',
|
|
||||||
'application/json,text/plain,*/*',
|
|
||||||
'*/*',
|
|
||||||
];
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Per workflow-12102025.md: Accept-Language variations
|
|
||||||
*/
|
|
||||||
const ACCEPT_LANGUAGE_VARIATIONS = [
|
|
||||||
'en-US,en;q=0.9',
|
|
||||||
'en-US,en;q=0.8',
|
|
||||||
'en-US;q=0.9,en;q=0.8',
|
|
||||||
];
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Per workflow-12102025.md: curl-impersonate binaries per browser
|
|
||||||
*/
|
|
||||||
const CURL_IMPERSONATE_BINARIES: Record<BrowserType, string> = {
|
|
||||||
Chrome: 'curl_chrome131',
|
|
||||||
Edge: 'curl_chrome131', // Edge uses Chromium
|
|
||||||
Firefox: 'curl_ff133',
|
|
||||||
Safari: 'curl_safari17',
|
|
||||||
};
|
|
||||||
|
|
||||||
// ============================================================
|
|
||||||
// HEADER ORDERING (per workflow-12102025.md)
|
|
||||||
// ============================================================
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Per workflow-12102025.md: Chrome header order for GraphQL requests
|
|
||||||
*/
|
|
||||||
const CHROME_HEADER_ORDER = [
|
|
||||||
'Host',
|
|
||||||
'Connection',
|
|
||||||
'Content-Length',
|
|
||||||
'sec-ch-ua',
|
|
||||||
'DNT',
|
|
||||||
'sec-ch-ua-mobile',
|
|
||||||
'User-Agent',
|
|
||||||
'sec-ch-ua-platform',
|
|
||||||
'Content-Type',
|
|
||||||
'Accept',
|
|
||||||
'Origin',
|
|
||||||
'sec-fetch-site',
|
|
||||||
'sec-fetch-mode',
|
|
||||||
'sec-fetch-dest',
|
|
||||||
'Referer',
|
|
||||||
'Accept-Encoding',
|
|
||||||
'Accept-Language',
|
|
||||||
];
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Per workflow-12102025.md: Firefox header order for GraphQL requests
|
|
||||||
*/
|
|
||||||
const FIREFOX_HEADER_ORDER = [
|
|
||||||
'Host',
|
|
||||||
'User-Agent',
|
|
||||||
'Accept',
|
|
||||||
'Accept-Language',
|
|
||||||
'Accept-Encoding',
|
|
||||||
'Content-Type',
|
|
||||||
'Content-Length',
|
|
||||||
'Origin',
|
|
||||||
'DNT',
|
|
||||||
'Connection',
|
|
||||||
'Referer',
|
|
||||||
'sec-fetch-dest',
|
|
||||||
'sec-fetch-mode',
|
|
||||||
'sec-fetch-site',
|
|
||||||
];
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Per workflow-12102025.md: Safari header order for GraphQL requests
|
|
||||||
*/
|
|
||||||
const SAFARI_HEADER_ORDER = [
|
|
||||||
'Host',
|
|
||||||
'Connection',
|
|
||||||
'Content-Length',
|
|
||||||
'Accept',
|
|
||||||
'User-Agent',
|
|
||||||
'Content-Type',
|
|
||||||
'Origin',
|
|
||||||
'Referer',
|
|
||||||
'Accept-Encoding',
|
|
||||||
'Accept-Language',
|
|
||||||
];
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Per workflow-12102025.md: Edge uses Chrome order (Chromium-based)
|
|
||||||
*/
|
|
||||||
const HEADER_ORDERS: Record<BrowserType, string[]> = {
|
|
||||||
Chrome: CHROME_HEADER_ORDER,
|
|
||||||
Edge: CHROME_HEADER_ORDER,
|
|
||||||
Firefox: FIREFOX_HEADER_ORDER,
|
|
||||||
Safari: SAFARI_HEADER_ORDER,
|
|
||||||
};
|
|
||||||
|
|
||||||
// ============================================================
|
|
||||||
// FINGERPRINT GENERATION
|
|
||||||
// ============================================================
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Per workflow-12102025.md: Generate HTTP fingerprint for a session
|
|
||||||
* Randomization is done once per session for consistency
|
|
||||||
*/
|
|
||||||
export function generateHTTPFingerprint(browserType: BrowserType): HTTPFingerprint {
|
|
||||||
// Per workflow-12102025.md: DNT randomized per session (~30%)
|
|
||||||
const hasDNT = Math.random() < DNT_PROBABILITY;
|
|
||||||
|
|
||||||
return {
|
|
||||||
browserType,
|
|
||||||
headers: {}, // Built dynamically per request
|
|
||||||
headerOrder: HEADER_ORDERS[browserType],
|
|
||||||
curlImpersonateBinary: CURL_IMPERSONATE_BINARIES[browserType],
|
|
||||||
hasDNT,
|
|
||||||
};
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Per workflow-12102025.md: Build complete headers for a request
|
|
||||||
* Returns headers in browser-specific order
|
|
||||||
*/
|
|
||||||
export function buildOrderedHeaders(
|
|
||||||
fingerprint: HTTPFingerprint,
|
|
||||||
context: HeaderContext
|
|
||||||
): { headers: Record<string, string>; orderedHeaders: string[] } {
|
|
||||||
const { browserType, hasDNT, headerOrder } = fingerprint;
|
|
||||||
const { userAgent, secChUa, secChUaPlatform, secChUaMobile, referer, isPost, contentLength } = context;
|
|
||||||
|
|
||||||
// Per workflow-12102025.md: Natural randomization for Accept
|
|
||||||
const accept = ACCEPT_VARIATIONS[Math.floor(Math.random() * ACCEPT_VARIATIONS.length)];
|
|
||||||
const acceptLanguage = ACCEPT_LANGUAGE_VARIATIONS[Math.floor(Math.random() * ACCEPT_LANGUAGE_VARIATIONS.length)];
|
|
||||||
|
|
||||||
// Build all possible headers
|
|
||||||
const allHeaders: Record<string, string> = {
|
|
||||||
'Connection': 'keep-alive',
|
|
||||||
'User-Agent': userAgent,
|
|
||||||
'Accept': accept,
|
|
||||||
'Accept-Language': acceptLanguage,
|
|
||||||
'Accept-Encoding': 'gzip, deflate, br',
|
|
||||||
};
|
|
||||||
|
|
||||||
// Per workflow-12102025.md: POST-only headers
|
|
||||||
if (isPost) {
|
|
||||||
allHeaders['Content-Type'] = 'application/json';
|
|
||||||
allHeaders['Origin'] = 'https://dutchie.com';
|
|
||||||
if (contentLength !== undefined) {
|
|
||||||
allHeaders['Content-Length'] = String(contentLength);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Per workflow-12102025.md: Dynamic Referer per dispensary
|
|
||||||
allHeaders['Referer'] = referer;
|
|
||||||
|
|
||||||
// Per workflow-12102025.md: DNT randomized per session
|
|
||||||
if (hasDNT) {
|
|
||||||
allHeaders['DNT'] = '1';
|
|
||||||
}
|
|
||||||
|
|
||||||
// Per workflow-12102025.md: Chromium-only headers (Chrome, Edge)
|
|
||||||
if (browserType === 'Chrome' || browserType === 'Edge') {
|
|
||||||
if (secChUa) allHeaders['sec-ch-ua'] = secChUa;
|
|
||||||
if (secChUaMobile) allHeaders['sec-ch-ua-mobile'] = secChUaMobile;
|
|
||||||
if (secChUaPlatform) allHeaders['sec-ch-ua-platform'] = secChUaPlatform;
|
|
||||||
allHeaders['sec-fetch-site'] = 'same-origin';
|
|
||||||
allHeaders['sec-fetch-mode'] = 'cors';
|
|
||||||
allHeaders['sec-fetch-dest'] = 'empty';
|
|
||||||
}
|
|
||||||
|
|
||||||
// Per workflow-12102025.md: Firefox has sec-fetch but no sec-ch
|
|
||||||
if (browserType === 'Firefox') {
|
|
||||||
allHeaders['sec-fetch-site'] = 'same-origin';
|
|
||||||
allHeaders['sec-fetch-mode'] = 'cors';
|
|
||||||
allHeaders['sec-fetch-dest'] = 'empty';
|
|
||||||
}
|
|
||||||
|
|
||||||
// Per workflow-12102025.md: Safari has no sec-* headers
|
|
||||||
|
|
||||||
// Filter to only headers that exist and order them
|
|
||||||
const orderedHeaders: string[] = [];
|
|
||||||
const headers: Record<string, string> = {};
|
|
||||||
|
|
||||||
for (const headerName of headerOrder) {
|
|
||||||
if (allHeaders[headerName]) {
|
|
||||||
orderedHeaders.push(headerName);
|
|
||||||
headers[headerName] = allHeaders[headerName];
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
return { headers, orderedHeaders };
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Per workflow-12102025.md: Build curl command arguments for headers
|
|
||||||
* Headers are added in browser-specific order
|
|
||||||
*/
|
|
||||||
export function buildCurlHeaderArgs(
|
|
||||||
fingerprint: HTTPFingerprint,
|
|
||||||
context: HeaderContext
|
|
||||||
): string[] {
|
|
||||||
const { headers, orderedHeaders } = buildOrderedHeaders(fingerprint, context);
|
|
||||||
|
|
||||||
const args: string[] = [];
|
|
||||||
for (const headerName of orderedHeaders) {
|
|
||||||
// Skip Host and Content-Length - curl handles these
|
|
||||||
if (headerName === 'Host' || headerName === 'Content-Length') continue;
|
|
||||||
args.push('-H', `${headerName}: ${headers[headerName]}`);
|
|
||||||
}
|
|
||||||
|
|
||||||
return args;
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Per workflow-12102025.md: Extract Referer from dispensary menu_url
|
|
||||||
*/
|
|
||||||
export function buildRefererFromMenuUrl(menuUrl: string | null | undefined): string {
|
|
||||||
if (!menuUrl) {
|
|
||||||
return 'https://dutchie.com/';
|
|
||||||
}
|
|
||||||
|
|
||||||
// Extract slug from menu_url
|
|
||||||
// Formats: /embedded-menu/<slug> or /dispensary/<slug> or full URL
|
|
||||||
let slug: string | null = null;
|
|
||||||
|
|
||||||
const embeddedMatch = menuUrl.match(/\/embedded-menu\/([^/?]+)/);
|
|
||||||
const dispensaryMatch = menuUrl.match(/\/dispensary\/([^/?]+)/);
|
|
||||||
|
|
||||||
if (embeddedMatch) {
|
|
||||||
slug = embeddedMatch[1];
|
|
||||||
} else if (dispensaryMatch) {
|
|
||||||
slug = dispensaryMatch[1];
|
|
||||||
}
|
|
||||||
|
|
||||||
if (slug) {
|
|
||||||
return `https://dutchie.com/dispensary/${slug}`;
|
|
||||||
}
|
|
||||||
|
|
||||||
return 'https://dutchie.com/';
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Per workflow-12102025.md: Get curl-impersonate binary for browser
|
|
||||||
*/
|
|
||||||
export function getCurlBinary(browserType: BrowserType): string {
|
|
||||||
return CURL_IMPERSONATE_BINARIES[browserType];
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Per workflow-12102025.md: Check if curl-impersonate is available
|
|
||||||
*/
|
|
||||||
export function isCurlImpersonateAvailable(browserType: BrowserType): boolean {
|
|
||||||
const binary = CURL_IMPERSONATE_BINARIES[browserType];
|
|
||||||
try {
|
|
||||||
const { execSync } = require('child_process');
|
|
||||||
execSync(`which ${binary}`, { stdio: 'ignore' });
|
|
||||||
return true;
|
|
||||||
} catch {
|
|
||||||
return false;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1,38 +1,116 @@
|
|||||||
/**
|
import cron from 'node-cron';
|
||||||
* LEGACY SCHEDULER - DEPRECATED 2024-12-10
|
import { pool } from '../db/pool';
|
||||||
*
|
import { scrapeStore, scrapeCategory } from '../scraper-v2';
|
||||||
* DO NOT USE THIS FILE.
|
|
||||||
*
|
let scheduledJobs: cron.ScheduledTask[] = [];
|
||||||
* Per TASK_WORKFLOW_2024-12-10.md:
|
|
||||||
* This node-cron scheduler has been replaced by the database-driven
|
async function getSettings(): Promise<{
|
||||||
* task scheduler in src/services/task-scheduler.ts
|
scrapeIntervalHours: number;
|
||||||
*
|
scrapeSpecialsTime: string;
|
||||||
* The new scheduler:
|
}> {
|
||||||
* - Stores schedules in PostgreSQL (survives restarts)
|
const result = await pool.query(`
|
||||||
* - Uses SELECT FOR UPDATE SKIP LOCKED (multi-replica safe)
|
SELECT key, value FROM settings
|
||||||
* - Creates tasks in worker_tasks table (processed by task-worker.ts)
|
WHERE key IN ('scrape_interval_hours', 'scrape_specials_time')
|
||||||
*
|
`);
|
||||||
* This file is kept for reference only. All exports are no-ops.
|
|
||||||
* Legacy code has been removed - see git history for original implementation.
|
const settings: Record<string, string> = {};
|
||||||
*/
|
result.rows.forEach((row: { key: string; value: string }) => {
|
||||||
|
settings[row.key] = row.value;
|
||||||
|
});
|
||||||
|
|
||||||
|
return {
|
||||||
|
scrapeIntervalHours: parseInt(settings.scrape_interval_hours || '4'),
|
||||||
|
scrapeSpecialsTime: settings.scrape_specials_time || '00:01'
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
async function scrapeAllStores(): Promise<void> {
|
||||||
|
console.log('🔄 Starting scheduled scrape for all stores...');
|
||||||
|
|
||||||
|
const result = await pool.query(`
|
||||||
|
SELECT id, name FROM stores WHERE active = true AND scrape_enabled = true
|
||||||
|
`);
|
||||||
|
|
||||||
|
for (const store of result.rows) {
|
||||||
|
try {
|
||||||
|
console.log(`Scraping store: ${store.name}`);
|
||||||
|
await scrapeStore(store.id);
|
||||||
|
} catch (error) {
|
||||||
|
console.error(`Failed to scrape store ${store.name}:`, error);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
console.log('✅ Scheduled scrape completed');
|
||||||
|
}
|
||||||
|
|
||||||
|
async function scrapeSpecials(): Promise<void> {
|
||||||
|
console.log('🌟 Starting scheduled specials scrape...');
|
||||||
|
|
||||||
|
const result = await pool.query(`
|
||||||
|
SELECT s.id, s.name, c.id as category_id
|
||||||
|
FROM stores s
|
||||||
|
JOIN categories c ON c.store_id = s.id
|
||||||
|
WHERE s.active = true AND s.scrape_enabled = true
|
||||||
|
AND c.slug = 'specials' AND c.scrape_enabled = true
|
||||||
|
`);
|
||||||
|
|
||||||
|
for (const row of result.rows) {
|
||||||
|
try {
|
||||||
|
console.log(`Scraping specials for: ${row.name}`);
|
||||||
|
await scrapeCategory(row.id, row.category_id);
|
||||||
|
} catch (error) {
|
||||||
|
console.error(`Failed to scrape specials for ${row.name}:`, error);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
console.log('✅ Specials scrape completed');
|
||||||
|
}
|
||||||
|
|
||||||
// 2024-12-10: All functions are now no-ops
|
|
||||||
export async function startScheduler(): Promise<void> {
|
export async function startScheduler(): Promise<void> {
|
||||||
console.warn('[DEPRECATED] startScheduler() called - use taskScheduler from task-scheduler.ts instead');
|
// Stop any existing jobs
|
||||||
|
stopScheduler();
|
||||||
|
|
||||||
|
const settings = await getSettings();
|
||||||
|
|
||||||
|
// Schedule regular store scrapes (every N hours)
|
||||||
|
const scrapeIntervalCron = `0 */${settings.scrapeIntervalHours} * * *`;
|
||||||
|
const storeJob = cron.schedule(scrapeIntervalCron, scrapeAllStores);
|
||||||
|
scheduledJobs.push(storeJob);
|
||||||
|
console.log(`📅 Scheduled store scraping: every ${settings.scrapeIntervalHours} hours`);
|
||||||
|
|
||||||
|
// Schedule specials scraping (daily at specified time)
|
||||||
|
const [hours, minutes] = settings.scrapeSpecialsTime.split(':');
|
||||||
|
const specialsCron = `${minutes} ${hours} * * *`;
|
||||||
|
const specialsJob = cron.schedule(specialsCron, scrapeSpecials);
|
||||||
|
scheduledJobs.push(specialsJob);
|
||||||
|
console.log(`📅 Scheduled specials scraping: daily at ${settings.scrapeSpecialsTime}`);
|
||||||
|
|
||||||
|
// Initial scrape on startup (after 10 seconds)
|
||||||
|
setTimeout(() => {
|
||||||
|
console.log('🚀 Running initial scrape...');
|
||||||
|
scrapeAllStores().catch(console.error);
|
||||||
|
}, 10000);
|
||||||
}
|
}
|
||||||
|
|
||||||
export function stopScheduler(): void {
|
export function stopScheduler(): void {
|
||||||
console.warn('[DEPRECATED] stopScheduler() called - use taskScheduler from task-scheduler.ts instead');
|
scheduledJobs.forEach(job => job.stop());
|
||||||
|
scheduledJobs = [];
|
||||||
|
console.log('🛑 Scheduler stopped');
|
||||||
}
|
}
|
||||||
|
|
||||||
export async function restartScheduler(): Promise<void> {
|
export async function restartScheduler(): Promise<void> {
|
||||||
console.warn('[DEPRECATED] restartScheduler() called - use taskScheduler from task-scheduler.ts instead');
|
console.log('🔄 Restarting scheduler...');
|
||||||
|
stopScheduler();
|
||||||
|
await startScheduler();
|
||||||
}
|
}
|
||||||
|
|
||||||
export async function triggerStoreScrape(_storeId: number): Promise<void> {
|
// Manual trigger functions for admin
|
||||||
console.warn('[DEPRECATED] triggerStoreScrape() called - use taskService.createTask() instead');
|
export async function triggerStoreScrape(storeId: number): Promise<void> {
|
||||||
|
console.log(`🔧 Manual scrape triggered for store ID: ${storeId}`);
|
||||||
|
await scrapeStore(storeId);
|
||||||
}
|
}
|
||||||
|
|
||||||
export async function triggerAllStoresScrape(): Promise<void> {
|
export async function triggerAllStoresScrape(): Promise<void> {
|
||||||
console.warn('[DEPRECATED] triggerAllStoresScrape() called - use taskScheduler.triggerSchedule() instead');
|
console.log('🔧 Manual scrape triggered for all stores');
|
||||||
|
await scrapeAllStores();
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,375 +0,0 @@
|
|||||||
/**
|
|
||||||
* Database-Driven Task Scheduler
|
|
||||||
*
|
|
||||||
* Per TASK_WORKFLOW_2024-12-10.md:
|
|
||||||
* - Schedules stored in DB (survives restarts)
|
|
||||||
* - Uses SELECT FOR UPDATE to prevent duplicate execution across replicas
|
|
||||||
* - Polls every 60s to check if schedules are due
|
|
||||||
* - Generates tasks into worker_tasks table for task-worker.ts to process
|
|
||||||
*
|
|
||||||
* 2024-12-10: Created to replace legacy node-cron scheduler
|
|
||||||
*/
|
|
||||||
|
|
||||||
import { pool } from '../db/pool';
|
|
||||||
import { taskService, TaskRole } from '../tasks/task-service';
|
|
||||||
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Poll interval for checking schedules
|
|
||||||
const POLL_INTERVAL_MS = 60_000; // 60 seconds
|
|
||||||
|
|
||||||
interface TaskSchedule {
|
|
||||||
id: number;
|
|
||||||
name: string;
|
|
||||||
role: TaskRole;
|
|
||||||
enabled: boolean;
|
|
||||||
interval_hours: number;
|
|
||||||
last_run_at: Date | null;
|
|
||||||
next_run_at: Date | null;
|
|
||||||
state_code: string | null;
|
|
||||||
priority: number;
|
|
||||||
}
|
|
||||||
|
|
||||||
class TaskScheduler {
|
|
||||||
private pollTimer: NodeJS.Timeout | null = null;
|
|
||||||
private isRunning = false;
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Start the scheduler
|
|
||||||
* Per TASK_WORKFLOW_2024-12-10.md: Called on API server startup
|
|
||||||
*/
|
|
||||||
async start(): Promise<void> {
|
|
||||||
if (this.isRunning) {
|
|
||||||
console.log('[TaskScheduler] Already running');
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
|
|
||||||
console.log('[TaskScheduler] Starting database-driven scheduler...');
|
|
||||||
this.isRunning = true;
|
|
||||||
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: On startup, recover stale tasks
|
|
||||||
try {
|
|
||||||
const recovered = await taskService.recoverStaleTasks(10);
|
|
||||||
if (recovered > 0) {
|
|
||||||
console.log(`[TaskScheduler] Recovered ${recovered} stale tasks from dead workers`);
|
|
||||||
}
|
|
||||||
} catch (err: any) {
|
|
||||||
console.error('[TaskScheduler] Failed to recover stale tasks:', err.message);
|
|
||||||
}
|
|
||||||
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Ensure default schedules exist
|
|
||||||
await this.ensureDefaultSchedules();
|
|
||||||
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Check immediately on startup
|
|
||||||
await this.checkAndRunDueSchedules();
|
|
||||||
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Then poll every 60 seconds
|
|
||||||
this.pollTimer = setInterval(async () => {
|
|
||||||
await this.checkAndRunDueSchedules();
|
|
||||||
}, POLL_INTERVAL_MS);
|
|
||||||
|
|
||||||
console.log('[TaskScheduler] Started - polling every 60s');
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Stop the scheduler
|
|
||||||
*/
|
|
||||||
stop(): void {
|
|
||||||
if (this.pollTimer) {
|
|
||||||
clearInterval(this.pollTimer);
|
|
||||||
this.pollTimer = null;
|
|
||||||
}
|
|
||||||
this.isRunning = false;
|
|
||||||
console.log('[TaskScheduler] Stopped');
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Ensure default schedules exist in the database
|
|
||||||
* Per TASK_WORKFLOW_2024-12-10.md: Creates schedules if they don't exist
|
|
||||||
*/
|
|
||||||
private async ensureDefaultSchedules(): Promise<void> {
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Default schedules for task generation
|
|
||||||
// NOTE: payload_fetch replaces direct product_refresh - it chains to product_refresh
|
|
||||||
const defaults = [
|
|
||||||
{
|
|
||||||
name: 'payload_fetch_all',
|
|
||||||
role: 'payload_fetch' as TaskRole,
|
|
||||||
interval_hours: 4,
|
|
||||||
priority: 0,
|
|
||||||
description: 'Fetch payloads from Dutchie API for all crawl-enabled stores every 4 hours. Chains to product_refresh.',
|
|
||||||
},
|
|
||||||
{
|
|
||||||
name: 'store_discovery_dutchie',
|
|
||||||
role: 'store_discovery' as TaskRole,
|
|
||||||
interval_hours: 24,
|
|
||||||
priority: 5,
|
|
||||||
description: 'Discover new Dutchie stores daily',
|
|
||||||
},
|
|
||||||
{
|
|
||||||
name: 'analytics_refresh',
|
|
||||||
role: 'analytics_refresh' as TaskRole,
|
|
||||||
interval_hours: 6,
|
|
||||||
priority: 0,
|
|
||||||
description: 'Refresh analytics materialized views every 6 hours',
|
|
||||||
},
|
|
||||||
];
|
|
||||||
|
|
||||||
for (const sched of defaults) {
|
|
||||||
try {
|
|
||||||
await pool.query(`
|
|
||||||
INSERT INTO task_schedules (name, role, interval_hours, priority, description, enabled, next_run_at)
|
|
||||||
VALUES ($1, $2, $3, $4, $5, true, NOW())
|
|
||||||
ON CONFLICT (name) DO NOTHING
|
|
||||||
`, [sched.name, sched.role, sched.interval_hours, sched.priority, sched.description]);
|
|
||||||
} catch (err: any) {
|
|
||||||
// Table may not exist yet - will be created by migration
|
|
||||||
if (!err.message.includes('does not exist')) {
|
|
||||||
console.error(`[TaskScheduler] Failed to create default schedule ${sched.name}:`, err.message);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Check for and run any due schedules
|
|
||||||
* Per TASK_WORKFLOW_2024-12-10.md: Uses SELECT FOR UPDATE SKIP LOCKED to prevent duplicates
|
|
||||||
*/
|
|
||||||
private async checkAndRunDueSchedules(): Promise<void> {
|
|
||||||
const client = await pool.connect();
|
|
||||||
|
|
||||||
try {
|
|
||||||
await client.query('BEGIN');
|
|
||||||
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Atomic claim of due schedules
|
|
||||||
const result = await client.query<TaskSchedule>(`
|
|
||||||
SELECT *
|
|
||||||
FROM task_schedules
|
|
||||||
WHERE enabled = true
|
|
||||||
AND (next_run_at IS NULL OR next_run_at <= NOW())
|
|
||||||
FOR UPDATE SKIP LOCKED
|
|
||||||
`);
|
|
||||||
|
|
||||||
for (const schedule of result.rows) {
|
|
||||||
console.log(`[TaskScheduler] Running schedule: ${schedule.name} (${schedule.role})`);
|
|
||||||
|
|
||||||
try {
|
|
||||||
const tasksCreated = await this.executeSchedule(schedule);
|
|
||||||
console.log(`[TaskScheduler] Schedule ${schedule.name} created ${tasksCreated} tasks`);
|
|
||||||
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Update last_run_at and calculate next_run_at
|
|
||||||
await client.query(`
|
|
||||||
UPDATE task_schedules
|
|
||||||
SET
|
|
||||||
last_run_at = NOW(),
|
|
||||||
next_run_at = NOW() + ($1 || ' hours')::interval,
|
|
||||||
last_task_count = $2,
|
|
||||||
updated_at = NOW()
|
|
||||||
WHERE id = $3
|
|
||||||
`, [schedule.interval_hours, tasksCreated, schedule.id]);
|
|
||||||
|
|
||||||
} catch (err: any) {
|
|
||||||
console.error(`[TaskScheduler] Schedule ${schedule.name} failed:`, err.message);
|
|
||||||
|
|
||||||
// Still update next_run_at to prevent infinite retry loop
|
|
||||||
await client.query(`
|
|
||||||
UPDATE task_schedules
|
|
||||||
SET
|
|
||||||
next_run_at = NOW() + ($1 || ' hours')::interval,
|
|
||||||
last_error = $2,
|
|
||||||
updated_at = NOW()
|
|
||||||
WHERE id = $3
|
|
||||||
`, [schedule.interval_hours, err.message, schedule.id]);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
await client.query('COMMIT');
|
|
||||||
} catch (err: any) {
|
|
||||||
await client.query('ROLLBACK');
|
|
||||||
console.error('[TaskScheduler] Failed to check schedules:', err.message);
|
|
||||||
} finally {
|
|
||||||
client.release();
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Execute a schedule and create tasks
|
|
||||||
* Per TASK_WORKFLOW_2024-12-10.md: Different logic per role
|
|
||||||
*/
|
|
||||||
private async executeSchedule(schedule: TaskSchedule): Promise<number> {
|
|
||||||
switch (schedule.role) {
|
|
||||||
case 'payload_fetch':
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: payload_fetch replaces direct product_refresh
|
|
||||||
return this.generatePayloadFetchTasks(schedule);
|
|
||||||
|
|
||||||
case 'product_refresh':
|
|
||||||
// Legacy - kept for manual triggers, but scheduled crawls use payload_fetch
|
|
||||||
return this.generatePayloadFetchTasks(schedule);
|
|
||||||
|
|
||||||
case 'store_discovery':
|
|
||||||
return this.generateStoreDiscoveryTasks(schedule);
|
|
||||||
|
|
||||||
case 'analytics_refresh':
|
|
||||||
return this.generateAnalyticsRefreshTasks(schedule);
|
|
||||||
|
|
||||||
default:
|
|
||||||
console.warn(`[TaskScheduler] Unknown role: ${schedule.role}`);
|
|
||||||
return 0;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Generate payload_fetch tasks for stores that need crawling
|
|
||||||
* Per TASK_WORKFLOW_2024-12-10.md: payload_fetch hits API, saves to disk, chains to product_refresh
|
|
||||||
*/
|
|
||||||
private async generatePayloadFetchTasks(schedule: TaskSchedule): Promise<number> {
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Find stores needing refresh
|
|
||||||
const result = await pool.query(`
|
|
||||||
SELECT d.id
|
|
||||||
FROM dispensaries d
|
|
||||||
WHERE d.crawl_enabled = true
|
|
||||||
AND d.platform_dispensary_id IS NOT NULL
|
|
||||||
-- No pending/running payload_fetch or product_refresh task already
|
|
||||||
AND NOT EXISTS (
|
|
||||||
SELECT 1 FROM worker_tasks t
|
|
||||||
WHERE t.dispensary_id = d.id
|
|
||||||
AND t.role IN ('payload_fetch', 'product_refresh')
|
|
||||||
AND t.status IN ('pending', 'claimed', 'running')
|
|
||||||
)
|
|
||||||
-- Never fetched OR last fetch > interval ago
|
|
||||||
AND (
|
|
||||||
d.last_fetch_at IS NULL
|
|
||||||
OR d.last_fetch_at < NOW() - ($1 || ' hours')::interval
|
|
||||||
)
|
|
||||||
${schedule.state_code ? 'AND d.state_id = (SELECT id FROM states WHERE code = $2)' : ''}
|
|
||||||
`, schedule.state_code ? [schedule.interval_hours, schedule.state_code] : [schedule.interval_hours]);
|
|
||||||
|
|
||||||
const dispensaryIds = result.rows.map((r: { id: number }) => r.id);
|
|
||||||
|
|
||||||
if (dispensaryIds.length === 0) {
|
|
||||||
return 0;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Create payload_fetch tasks (they chain to product_refresh)
|
|
||||||
const tasks = dispensaryIds.map((id: number) => ({
|
|
||||||
role: 'payload_fetch' as TaskRole,
|
|
||||||
dispensary_id: id,
|
|
||||||
priority: schedule.priority,
|
|
||||||
}));
|
|
||||||
|
|
||||||
return taskService.createTasks(tasks);
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Generate store_discovery tasks
|
|
||||||
* Per TASK_WORKFLOW_2024-12-10.md: One task per platform
|
|
||||||
*/
|
|
||||||
private async generateStoreDiscoveryTasks(schedule: TaskSchedule): Promise<number> {
|
|
||||||
// Check if discovery task already pending
|
|
||||||
const existing = await taskService.listTasks({
|
|
||||||
role: 'store_discovery',
|
|
||||||
status: ['pending', 'claimed', 'running'],
|
|
||||||
limit: 1,
|
|
||||||
});
|
|
||||||
|
|
||||||
if (existing.length > 0) {
|
|
||||||
console.log('[TaskScheduler] Store discovery task already pending, skipping');
|
|
||||||
return 0;
|
|
||||||
}
|
|
||||||
|
|
||||||
await taskService.createTask({
|
|
||||||
role: 'store_discovery',
|
|
||||||
platform: 'dutchie',
|
|
||||||
priority: schedule.priority,
|
|
||||||
});
|
|
||||||
|
|
||||||
return 1;
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Generate analytics_refresh tasks
|
|
||||||
* Per TASK_WORKFLOW_2024-12-10.md: Single task to refresh all MVs
|
|
||||||
*/
|
|
||||||
private async generateAnalyticsRefreshTasks(schedule: TaskSchedule): Promise<number> {
|
|
||||||
// Check if analytics task already pending
|
|
||||||
const existing = await taskService.listTasks({
|
|
||||||
role: 'analytics_refresh',
|
|
||||||
status: ['pending', 'claimed', 'running'],
|
|
||||||
limit: 1,
|
|
||||||
});
|
|
||||||
|
|
||||||
if (existing.length > 0) {
|
|
||||||
console.log('[TaskScheduler] Analytics refresh task already pending, skipping');
|
|
||||||
return 0;
|
|
||||||
}
|
|
||||||
|
|
||||||
await taskService.createTask({
|
|
||||||
role: 'analytics_refresh',
|
|
||||||
priority: schedule.priority,
|
|
||||||
});
|
|
||||||
|
|
||||||
return 1;
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Get all schedules for dashboard display
|
|
||||||
*/
|
|
||||||
async getSchedules(): Promise<TaskSchedule[]> {
|
|
||||||
try {
|
|
||||||
const result = await pool.query(`
|
|
||||||
SELECT * FROM task_schedules ORDER BY name
|
|
||||||
`);
|
|
||||||
return result.rows as TaskSchedule[];
|
|
||||||
} catch {
|
|
||||||
return [];
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Update a schedule
|
|
||||||
*/
|
|
||||||
async updateSchedule(id: number, updates: Partial<TaskSchedule>): Promise<void> {
|
|
||||||
const setClauses: string[] = [];
|
|
||||||
const values: any[] = [];
|
|
||||||
let paramIndex = 1;
|
|
||||||
|
|
||||||
if (updates.enabled !== undefined) {
|
|
||||||
setClauses.push(`enabled = $${paramIndex++}`);
|
|
||||||
values.push(updates.enabled);
|
|
||||||
}
|
|
||||||
if (updates.interval_hours !== undefined) {
|
|
||||||
setClauses.push(`interval_hours = $${paramIndex++}`);
|
|
||||||
values.push(updates.interval_hours);
|
|
||||||
}
|
|
||||||
if (updates.priority !== undefined) {
|
|
||||||
setClauses.push(`priority = $${paramIndex++}`);
|
|
||||||
values.push(updates.priority);
|
|
||||||
}
|
|
||||||
|
|
||||||
if (setClauses.length === 0) return;
|
|
||||||
|
|
||||||
setClauses.push('updated_at = NOW()');
|
|
||||||
values.push(id);
|
|
||||||
|
|
||||||
await pool.query(`
|
|
||||||
UPDATE task_schedules
|
|
||||||
SET ${setClauses.join(', ')}
|
|
||||||
WHERE id = $${paramIndex}
|
|
||||||
`, values);
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Trigger a schedule to run immediately
|
|
||||||
*/
|
|
||||||
async triggerSchedule(id: number): Promise<number> {
|
|
||||||
const result = await pool.query(`
|
|
||||||
SELECT * FROM task_schedules WHERE id = $1
|
|
||||||
`, [id]);
|
|
||||||
|
|
||||||
if (result.rows.length === 0) {
|
|
||||||
throw new Error(`Schedule ${id} not found`);
|
|
||||||
}
|
|
||||||
|
|
||||||
return this.executeSchedule(result.rows[0] as TaskSchedule);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Singleton instance
|
|
||||||
export const taskScheduler = new TaskScheduler();
|
|
||||||
@@ -94,8 +94,7 @@ export async function handleEntryPointDiscovery(ctx: TaskContext): Promise<TaskR
|
|||||||
// ============================================================
|
// ============================================================
|
||||||
// STEP 3: Start stealth session
|
// STEP 3: Start stealth session
|
||||||
// ============================================================
|
// ============================================================
|
||||||
// Per workflow-12102025.md: session identity comes from proxy location, not task params
|
const session = startSession(dispensary.state || 'AZ', 'America/Phoenix');
|
||||||
const session = startSession();
|
|
||||||
console.log(`[EntryPointDiscovery] Session started: ${session.sessionId}`);
|
console.log(`[EntryPointDiscovery] Session started: ${session.sessionId}`);
|
||||||
|
|
||||||
try {
|
try {
|
||||||
|
|||||||
@@ -1,221 +0,0 @@
|
|||||||
/**
|
|
||||||
* Payload Fetch Handler
|
|
||||||
*
|
|
||||||
* Per TASK_WORKFLOW_2024-12-10.md: Separates API fetch from data processing.
|
|
||||||
*
|
|
||||||
* This handler ONLY:
|
|
||||||
* 1. Hits Dutchie GraphQL API
|
|
||||||
* 2. Saves raw payload to filesystem (gzipped)
|
|
||||||
* 3. Records metadata in raw_crawl_payloads table
|
|
||||||
* 4. Queues a product_refresh task to process the payload
|
|
||||||
*
|
|
||||||
* Benefits of separation:
|
|
||||||
* - Retry-friendly: If normalize fails, re-run refresh without re-crawling
|
|
||||||
* - Faster refreshes: Local file read vs network call
|
|
||||||
* - Replay-able: Run refresh against any historical payload
|
|
||||||
* - Less API pressure: Only this role hits Dutchie
|
|
||||||
*/
|
|
||||||
|
|
||||||
import { TaskContext, TaskResult } from '../task-worker';
|
|
||||||
import {
|
|
||||||
executeGraphQL,
|
|
||||||
startSession,
|
|
||||||
endSession,
|
|
||||||
GRAPHQL_HASHES,
|
|
||||||
DUTCHIE_CONFIG,
|
|
||||||
} from '../../platforms/dutchie';
|
|
||||||
import { saveRawPayload } from '../../utils/payload-storage';
|
|
||||||
import { taskService } from '../task-service';
|
|
||||||
|
|
||||||
export async function handlePayloadFetch(ctx: TaskContext): Promise<TaskResult> {
|
|
||||||
const { pool, task } = ctx;
|
|
||||||
const dispensaryId = task.dispensary_id;
|
|
||||||
|
|
||||||
if (!dispensaryId) {
|
|
||||||
return { success: false, error: 'No dispensary_id specified for payload_fetch task' };
|
|
||||||
}
|
|
||||||
|
|
||||||
try {
|
|
||||||
// ============================================================
|
|
||||||
// STEP 1: Load dispensary info
|
|
||||||
// ============================================================
|
|
||||||
const dispResult = await pool.query(`
|
|
||||||
SELECT
|
|
||||||
id, name, platform_dispensary_id, menu_url, menu_type, city, state
|
|
||||||
FROM dispensaries
|
|
||||||
WHERE id = $1 AND crawl_enabled = true
|
|
||||||
`, [dispensaryId]);
|
|
||||||
|
|
||||||
if (dispResult.rows.length === 0) {
|
|
||||||
return { success: false, error: `Dispensary ${dispensaryId} not found or not crawl_enabled` };
|
|
||||||
}
|
|
||||||
|
|
||||||
const dispensary = dispResult.rows[0];
|
|
||||||
const platformId = dispensary.platform_dispensary_id;
|
|
||||||
|
|
||||||
if (!platformId) {
|
|
||||||
return { success: false, error: `Dispensary ${dispensaryId} has no platform_dispensary_id` };
|
|
||||||
}
|
|
||||||
|
|
||||||
// Extract cName from menu_url
|
|
||||||
const cNameMatch = dispensary.menu_url?.match(/\/(?:embedded-menu|dispensary)\/([^/?]+)/);
|
|
||||||
const cName = cNameMatch ? cNameMatch[1] : 'dispensary';
|
|
||||||
|
|
||||||
console.log(`[PayloadFetch] Starting fetch for ${dispensary.name} (ID: ${dispensaryId})`);
|
|
||||||
console.log(`[PayloadFetch] Platform ID: ${platformId}, cName: ${cName}`);
|
|
||||||
|
|
||||||
// ============================================================
|
|
||||||
// STEP 2: Start stealth session
|
|
||||||
// ============================================================
|
|
||||||
const session = startSession();
|
|
||||||
console.log(`[PayloadFetch] Session started: ${session.sessionId}`);
|
|
||||||
|
|
||||||
await ctx.heartbeat();
|
|
||||||
|
|
||||||
// ============================================================
|
|
||||||
// STEP 3: Fetch products via GraphQL (Status: 'All')
|
|
||||||
// ============================================================
|
|
||||||
const allProducts: any[] = [];
|
|
||||||
let page = 0;
|
|
||||||
let totalCount = 0;
|
|
||||||
const perPage = DUTCHIE_CONFIG.perPage;
|
|
||||||
const maxPages = DUTCHIE_CONFIG.maxPages;
|
|
||||||
|
|
||||||
try {
|
|
||||||
while (page < maxPages) {
|
|
||||||
const variables = {
|
|
||||||
includeEnterpriseSpecials: false,
|
|
||||||
productsFilter: {
|
|
||||||
dispensaryId: platformId,
|
|
||||||
pricingType: 'rec',
|
|
||||||
Status: 'All',
|
|
||||||
types: [],
|
|
||||||
useCache: false,
|
|
||||||
isDefaultSort: true,
|
|
||||||
sortBy: 'popularSortIdx',
|
|
||||||
sortDirection: 1,
|
|
||||||
bypassOnlineThresholds: true,
|
|
||||||
isKioskMenu: false,
|
|
||||||
removeProductsBelowOptionThresholds: false,
|
|
||||||
},
|
|
||||||
page,
|
|
||||||
perPage,
|
|
||||||
};
|
|
||||||
|
|
||||||
console.log(`[PayloadFetch] Fetching page ${page + 1}...`);
|
|
||||||
|
|
||||||
const result = await executeGraphQL(
|
|
||||||
'FilteredProducts',
|
|
||||||
variables,
|
|
||||||
GRAPHQL_HASHES.FilteredProducts,
|
|
||||||
{ cName, maxRetries: 3 }
|
|
||||||
);
|
|
||||||
|
|
||||||
const data = result?.data?.filteredProducts;
|
|
||||||
if (!data || !data.products) {
|
|
||||||
if (page === 0) {
|
|
||||||
throw new Error('No product data returned from GraphQL');
|
|
||||||
}
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
|
|
||||||
const products = data.products;
|
|
||||||
allProducts.push(...products);
|
|
||||||
|
|
||||||
if (page === 0) {
|
|
||||||
totalCount = data.queryInfo?.totalCount || products.length;
|
|
||||||
console.log(`[PayloadFetch] Total products reported: ${totalCount}`);
|
|
||||||
}
|
|
||||||
|
|
||||||
if (allProducts.length >= totalCount || products.length < perPage) {
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
|
|
||||||
page++;
|
|
||||||
|
|
||||||
if (page < maxPages) {
|
|
||||||
await new Promise(r => setTimeout(r, DUTCHIE_CONFIG.pageDelayMs));
|
|
||||||
}
|
|
||||||
|
|
||||||
if (page % 5 === 0) {
|
|
||||||
await ctx.heartbeat();
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
console.log(`[PayloadFetch] Fetched ${allProducts.length} products in ${page + 1} pages`);
|
|
||||||
|
|
||||||
} finally {
|
|
||||||
endSession();
|
|
||||||
}
|
|
||||||
|
|
||||||
if (allProducts.length === 0) {
|
|
||||||
return {
|
|
||||||
success: false,
|
|
||||||
error: 'No products returned from GraphQL',
|
|
||||||
productsProcessed: 0,
|
|
||||||
};
|
|
||||||
}
|
|
||||||
|
|
||||||
await ctx.heartbeat();
|
|
||||||
|
|
||||||
// ============================================================
|
|
||||||
// STEP 4: Save raw payload to filesystem
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Metadata/Payload separation
|
|
||||||
// ============================================================
|
|
||||||
const rawPayload = {
|
|
||||||
dispensaryId,
|
|
||||||
platformId,
|
|
||||||
cName,
|
|
||||||
fetchedAt: new Date().toISOString(),
|
|
||||||
productCount: allProducts.length,
|
|
||||||
products: allProducts,
|
|
||||||
};
|
|
||||||
|
|
||||||
const payloadResult = await saveRawPayload(
|
|
||||||
pool,
|
|
||||||
dispensaryId,
|
|
||||||
rawPayload,
|
|
||||||
null, // crawl_run_id - not using crawl_runs in new system
|
|
||||||
allProducts.length
|
|
||||||
);
|
|
||||||
|
|
||||||
console.log(`[PayloadFetch] Saved payload #${payloadResult.id} (${(payloadResult.sizeBytes / 1024).toFixed(1)}KB)`);
|
|
||||||
|
|
||||||
// ============================================================
|
|
||||||
// STEP 5: Update dispensary last_fetch_at
|
|
||||||
// ============================================================
|
|
||||||
await pool.query(`
|
|
||||||
UPDATE dispensaries
|
|
||||||
SET last_fetch_at = NOW()
|
|
||||||
WHERE id = $1
|
|
||||||
`, [dispensaryId]);
|
|
||||||
|
|
||||||
// ============================================================
|
|
||||||
// STEP 6: Queue product_refresh task to process the payload
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Task chaining
|
|
||||||
// ============================================================
|
|
||||||
await taskService.createTask({
|
|
||||||
role: 'product_refresh',
|
|
||||||
dispensary_id: dispensaryId,
|
|
||||||
priority: task.priority || 0,
|
|
||||||
payload: { payload_id: payloadResult.id },
|
|
||||||
});
|
|
||||||
|
|
||||||
console.log(`[PayloadFetch] Queued product_refresh task for payload #${payloadResult.id}`);
|
|
||||||
|
|
||||||
return {
|
|
||||||
success: true,
|
|
||||||
payloadId: payloadResult.id,
|
|
||||||
productCount: allProducts.length,
|
|
||||||
sizeBytes: payloadResult.sizeBytes,
|
|
||||||
};
|
|
||||||
|
|
||||||
} catch (error: unknown) {
|
|
||||||
const errorMessage = error instanceof Error ? error.message : 'Unknown error';
|
|
||||||
console.error(`[PayloadFetch] Error for dispensary ${dispensaryId}:`, errorMessage);
|
|
||||||
return {
|
|
||||||
success: false,
|
|
||||||
error: errorMessage,
|
|
||||||
};
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1,31 +1,16 @@
|
|||||||
/**
|
/**
|
||||||
* Product Discovery Handler
|
* Product Discovery Handler
|
||||||
*
|
*
|
||||||
* Per TASK_WORKFLOW_2024-12-10.md: Initial product fetch for newly discovered stores.
|
* Initial product fetch for stores that have 0 products.
|
||||||
*
|
* Same logic as product_resync, but for initial discovery.
|
||||||
* Flow:
|
|
||||||
* 1. Triggered after store_discovery promotes a new dispensary
|
|
||||||
* 2. Chains to payload_fetch to get initial product data
|
|
||||||
* 3. payload_fetch chains to product_refresh for DB upsert
|
|
||||||
*
|
|
||||||
* Chaining:
|
|
||||||
* store_discovery → (newStoreIds) → product_discovery → payload_fetch → product_refresh
|
|
||||||
*/
|
*/
|
||||||
|
|
||||||
import { TaskContext, TaskResult } from '../task-worker';
|
import { TaskContext, TaskResult } from '../task-worker';
|
||||||
import { handlePayloadFetch } from './payload-fetch';
|
import { handleProductRefresh } from './product-refresh';
|
||||||
|
|
||||||
export async function handleProductDiscovery(ctx: TaskContext): Promise<TaskResult> {
|
export async function handleProductDiscovery(ctx: TaskContext): Promise<TaskResult> {
|
||||||
const { task } = ctx;
|
// Product discovery is essentially the same as refresh for the first time
|
||||||
const dispensaryId = task.dispensary_id;
|
// The main difference is in when this task is triggered (new store vs scheduled)
|
||||||
|
console.log(`[ProductDiscovery] Starting initial product fetch for dispensary ${ctx.task.dispensary_id}`);
|
||||||
if (!dispensaryId) {
|
return handleProductRefresh(ctx);
|
||||||
return { success: false, error: 'No dispensary_id provided' };
|
|
||||||
}
|
|
||||||
|
|
||||||
console.log(`[ProductDiscovery] Starting initial product discovery for dispensary ${dispensaryId}`);
|
|
||||||
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Chain to payload_fetch for API → disk
|
|
||||||
// payload_fetch will then chain to product_refresh for disk → DB
|
|
||||||
return handlePayloadFetch(ctx);
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,32 +1,33 @@
|
|||||||
/**
|
/**
|
||||||
* Product Refresh Handler
|
* Product Refresh Handler
|
||||||
*
|
*
|
||||||
* Per TASK_WORKFLOW_2024-12-10.md: Processes a locally-stored payload.
|
* Re-crawls a store to capture price/stock changes using the GraphQL pipeline.
|
||||||
*
|
|
||||||
* This handler reads from the filesystem (NOT the Dutchie API).
|
|
||||||
* The payload_fetch handler is responsible for API calls.
|
|
||||||
*
|
*
|
||||||
* Flow:
|
* Flow:
|
||||||
* 1. Load payload from filesystem (by payload_id or latest for dispensary)
|
* 1. Load dispensary info from database
|
||||||
* 2. Normalize data via DutchieNormalizer
|
* 2. Start stealth session (fingerprint + optional proxy)
|
||||||
* 3. Upsert to store_products and store_product_snapshots
|
* 3. Fetch products via GraphQL (Status: 'All')
|
||||||
* 4. Track missing products (increment consecutive_misses, mark OOS at 3)
|
* 4. Normalize data via DutchieNormalizer
|
||||||
* 5. Download new product images
|
* 5. Upsert to store_products and store_product_snapshots
|
||||||
*
|
* 6. Track missing products (increment consecutive_misses, mark OOS at 3)
|
||||||
* Benefits of separation:
|
* 7. Download new product images
|
||||||
* - Retry-friendly: If this fails, re-run without re-crawling
|
* 8. End session
|
||||||
* - Replay-able: Run against any historical payload
|
|
||||||
* - Faster: Local file read vs network call
|
|
||||||
*/
|
*/
|
||||||
|
|
||||||
import { TaskContext, TaskResult } from '../task-worker';
|
import { TaskContext, TaskResult } from '../task-worker';
|
||||||
|
import {
|
||||||
|
executeGraphQL,
|
||||||
|
startSession,
|
||||||
|
endSession,
|
||||||
|
GRAPHQL_HASHES,
|
||||||
|
DUTCHIE_CONFIG,
|
||||||
|
} from '../../platforms/dutchie';
|
||||||
import { DutchieNormalizer } from '../../hydration/normalizers/dutchie';
|
import { DutchieNormalizer } from '../../hydration/normalizers/dutchie';
|
||||||
import {
|
import {
|
||||||
upsertStoreProducts,
|
upsertStoreProducts,
|
||||||
createStoreProductSnapshots,
|
createStoreProductSnapshots,
|
||||||
downloadProductImages,
|
downloadProductImages,
|
||||||
} from '../../hydration/canonical-upsert';
|
} from '../../hydration/canonical-upsert';
|
||||||
import { loadRawPayloadById, getLatestPayload } from '../../utils/payload-storage';
|
|
||||||
|
|
||||||
const normalizer = new DutchieNormalizer();
|
const normalizer = new DutchieNormalizer();
|
||||||
|
|
||||||
@@ -46,76 +47,129 @@ export async function handleProductRefresh(ctx: TaskContext): Promise<TaskResult
|
|||||||
SELECT
|
SELECT
|
||||||
id, name, platform_dispensary_id, menu_url, menu_type, city, state
|
id, name, platform_dispensary_id, menu_url, menu_type, city, state
|
||||||
FROM dispensaries
|
FROM dispensaries
|
||||||
WHERE id = $1
|
WHERE id = $1 AND crawl_enabled = true
|
||||||
`, [dispensaryId]);
|
`, [dispensaryId]);
|
||||||
|
|
||||||
if (dispResult.rows.length === 0) {
|
if (dispResult.rows.length === 0) {
|
||||||
return { success: false, error: `Dispensary ${dispensaryId} not found` };
|
return { success: false, error: `Dispensary ${dispensaryId} not found or not crawl_enabled` };
|
||||||
}
|
}
|
||||||
|
|
||||||
const dispensary = dispResult.rows[0];
|
const dispensary = dispResult.rows[0];
|
||||||
|
const platformId = dispensary.platform_dispensary_id;
|
||||||
|
|
||||||
// Extract cName from menu_url for image storage context
|
if (!platformId) {
|
||||||
|
return { success: false, error: `Dispensary ${dispensaryId} has no platform_dispensary_id` };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Extract cName from menu_url
|
||||||
const cNameMatch = dispensary.menu_url?.match(/\/(?:embedded-menu|dispensary)\/([^/?]+)/);
|
const cNameMatch = dispensary.menu_url?.match(/\/(?:embedded-menu|dispensary)\/([^/?]+)/);
|
||||||
const cName = cNameMatch ? cNameMatch[1] : 'dispensary';
|
const cName = cNameMatch ? cNameMatch[1] : 'dispensary';
|
||||||
|
|
||||||
console.log(`[ProductRefresh] Starting refresh for ${dispensary.name} (ID: ${dispensaryId})`);
|
console.log(`[ProductResync] Starting crawl for ${dispensary.name} (ID: ${dispensaryId})`);
|
||||||
|
console.log(`[ProductResync] Platform ID: ${platformId}, cName: ${cName}`);
|
||||||
|
|
||||||
|
// ============================================================
|
||||||
|
// STEP 2: Start stealth session
|
||||||
|
// ============================================================
|
||||||
|
const session = startSession(dispensary.state || 'AZ', 'America/Phoenix');
|
||||||
|
console.log(`[ProductResync] Session started: ${session.sessionId}`);
|
||||||
|
|
||||||
await ctx.heartbeat();
|
await ctx.heartbeat();
|
||||||
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
// STEP 2: Load payload from filesystem
|
// STEP 3: Fetch products via GraphQL (Status: 'All')
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Read local payload, not API
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
let payloadData: any;
|
const allProducts: any[] = [];
|
||||||
let payloadId: number;
|
let page = 0;
|
||||||
|
let totalCount = 0;
|
||||||
|
const perPage = DUTCHIE_CONFIG.perPage;
|
||||||
|
const maxPages = DUTCHIE_CONFIG.maxPages;
|
||||||
|
|
||||||
// Check if specific payload_id was provided (from task chaining)
|
try {
|
||||||
const taskPayload = task.payload as { payload_id?: number } | null;
|
while (page < maxPages) {
|
||||||
|
const variables = {
|
||||||
|
includeEnterpriseSpecials: false,
|
||||||
|
productsFilter: {
|
||||||
|
dispensaryId: platformId,
|
||||||
|
pricingType: 'rec',
|
||||||
|
Status: 'All',
|
||||||
|
types: [],
|
||||||
|
useCache: false,
|
||||||
|
isDefaultSort: true,
|
||||||
|
sortBy: 'popularSortIdx',
|
||||||
|
sortDirection: 1,
|
||||||
|
bypassOnlineThresholds: true,
|
||||||
|
isKioskMenu: false,
|
||||||
|
removeProductsBelowOptionThresholds: false,
|
||||||
|
},
|
||||||
|
page,
|
||||||
|
perPage,
|
||||||
|
};
|
||||||
|
|
||||||
if (taskPayload?.payload_id) {
|
console.log(`[ProductResync] Fetching page ${page + 1}...`);
|
||||||
// Load specific payload (from payload_fetch chaining)
|
|
||||||
const result = await loadRawPayloadById(pool, taskPayload.payload_id);
|
const result = await executeGraphQL(
|
||||||
if (!result) {
|
'FilteredProducts',
|
||||||
return { success: false, error: `Payload ${taskPayload.payload_id} not found` };
|
variables,
|
||||||
|
GRAPHQL_HASHES.FilteredProducts,
|
||||||
|
{ cName, maxRetries: 3 }
|
||||||
|
);
|
||||||
|
|
||||||
|
const data = result?.data?.filteredProducts;
|
||||||
|
if (!data || !data.products) {
|
||||||
|
if (page === 0) {
|
||||||
|
throw new Error('No product data returned from GraphQL');
|
||||||
}
|
}
|
||||||
payloadData = result.payload;
|
break;
|
||||||
payloadId = result.metadata.id;
|
|
||||||
console.log(`[ProductRefresh] Loaded specific payload #${payloadId}`);
|
|
||||||
} else {
|
|
||||||
// Load latest payload for this dispensary
|
|
||||||
const result = await getLatestPayload(pool, dispensaryId);
|
|
||||||
if (!result) {
|
|
||||||
return { success: false, error: `No payload found for dispensary ${dispensaryId}` };
|
|
||||||
}
|
|
||||||
payloadData = result.payload;
|
|
||||||
payloadId = result.metadata.id;
|
|
||||||
console.log(`[ProductRefresh] Loaded latest payload #${payloadId} (${result.metadata.fetchedAt})`);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
const allProducts = payloadData.products || [];
|
const products = data.products;
|
||||||
|
allProducts.push(...products);
|
||||||
|
|
||||||
|
if (page === 0) {
|
||||||
|
totalCount = data.queryInfo?.totalCount || products.length;
|
||||||
|
console.log(`[ProductResync] Total products reported: ${totalCount}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (allProducts.length >= totalCount || products.length < perPage) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
page++;
|
||||||
|
|
||||||
|
if (page < maxPages) {
|
||||||
|
await new Promise(r => setTimeout(r, DUTCHIE_CONFIG.pageDelayMs));
|
||||||
|
}
|
||||||
|
|
||||||
|
if (page % 5 === 0) {
|
||||||
|
await ctx.heartbeat();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
console.log(`[ProductResync] Fetched ${allProducts.length} products in ${page + 1} pages`);
|
||||||
|
|
||||||
|
} finally {
|
||||||
|
endSession();
|
||||||
|
}
|
||||||
|
|
||||||
if (allProducts.length === 0) {
|
if (allProducts.length === 0) {
|
||||||
return {
|
return {
|
||||||
success: false,
|
success: false,
|
||||||
error: 'Payload contains no products',
|
error: 'No products returned from GraphQL',
|
||||||
payloadId,
|
|
||||||
productsProcessed: 0,
|
productsProcessed: 0,
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
console.log(`[ProductRefresh] Processing ${allProducts.length} products from payload #${payloadId}`);
|
|
||||||
|
|
||||||
await ctx.heartbeat();
|
await ctx.heartbeat();
|
||||||
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
// STEP 3: Normalize data
|
// STEP 4: Normalize data
|
||||||
// ============================================================
|
// ============================================================
|
||||||
console.log(`[ProductRefresh] Normalizing ${allProducts.length} products...`);
|
console.log(`[ProductResync] Normalizing ${allProducts.length} products...`);
|
||||||
|
|
||||||
// Build RawPayload for the normalizer
|
// Build RawPayload for the normalizer
|
||||||
const rawPayload = {
|
const rawPayload = {
|
||||||
id: `refresh-${dispensaryId}-${Date.now()}`,
|
id: `resync-${dispensaryId}-${Date.now()}`,
|
||||||
dispensary_id: dispensaryId,
|
dispensary_id: dispensaryId,
|
||||||
crawl_run_id: null,
|
crawl_run_id: null,
|
||||||
platform: 'dutchie',
|
platform: 'dutchie',
|
||||||
@@ -135,26 +189,25 @@ export async function handleProductRefresh(ctx: TaskContext): Promise<TaskResult
|
|||||||
const normalizationResult = normalizer.normalize(rawPayload);
|
const normalizationResult = normalizer.normalize(rawPayload);
|
||||||
|
|
||||||
if (normalizationResult.errors.length > 0) {
|
if (normalizationResult.errors.length > 0) {
|
||||||
console.warn(`[ProductRefresh] Normalization warnings: ${normalizationResult.errors.map(e => e.message).join(', ')}`);
|
console.warn(`[ProductResync] Normalization warnings: ${normalizationResult.errors.map(e => e.message).join(', ')}`);
|
||||||
}
|
}
|
||||||
|
|
||||||
if (normalizationResult.products.length === 0) {
|
if (normalizationResult.products.length === 0) {
|
||||||
return {
|
return {
|
||||||
success: false,
|
success: false,
|
||||||
error: 'Normalization produced no products',
|
error: 'Normalization produced no products',
|
||||||
payloadId,
|
|
||||||
productsProcessed: 0,
|
productsProcessed: 0,
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
console.log(`[ProductRefresh] Normalized ${normalizationResult.products.length} products`);
|
console.log(`[ProductResync] Normalized ${normalizationResult.products.length} products`);
|
||||||
|
|
||||||
await ctx.heartbeat();
|
await ctx.heartbeat();
|
||||||
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
// STEP 4: Upsert to canonical tables
|
// STEP 5: Upsert to canonical tables
|
||||||
// ============================================================
|
// ============================================================
|
||||||
console.log(`[ProductRefresh] Upserting to store_products...`);
|
console.log(`[ProductResync] Upserting to store_products...`);
|
||||||
|
|
||||||
const upsertResult = await upsertStoreProducts(
|
const upsertResult = await upsertStoreProducts(
|
||||||
pool,
|
pool,
|
||||||
@@ -163,12 +216,12 @@ export async function handleProductRefresh(ctx: TaskContext): Promise<TaskResult
|
|||||||
normalizationResult.availability
|
normalizationResult.availability
|
||||||
);
|
);
|
||||||
|
|
||||||
console.log(`[ProductRefresh] Upserted: ${upsertResult.upserted} (${upsertResult.new} new, ${upsertResult.updated} updated)`);
|
console.log(`[ProductResync] Upserted: ${upsertResult.upserted} (${upsertResult.new} new, ${upsertResult.updated} updated)`);
|
||||||
|
|
||||||
await ctx.heartbeat();
|
await ctx.heartbeat();
|
||||||
|
|
||||||
// Create snapshots
|
// Create snapshots
|
||||||
console.log(`[ProductRefresh] Creating snapshots...`);
|
console.log(`[ProductResync] Creating snapshots...`);
|
||||||
|
|
||||||
const snapshotsResult = await createStoreProductSnapshots(
|
const snapshotsResult = await createStoreProductSnapshots(
|
||||||
pool,
|
pool,
|
||||||
@@ -179,12 +232,12 @@ export async function handleProductRefresh(ctx: TaskContext): Promise<TaskResult
|
|||||||
null // No crawl_run_id in new system
|
null // No crawl_run_id in new system
|
||||||
);
|
);
|
||||||
|
|
||||||
console.log(`[ProductRefresh] Created ${snapshotsResult.created} snapshots`);
|
console.log(`[ProductResync] Created ${snapshotsResult.created} snapshots`);
|
||||||
|
|
||||||
await ctx.heartbeat();
|
await ctx.heartbeat();
|
||||||
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
// STEP 5: Track missing products (consecutive_misses logic)
|
// STEP 6: Track missing products (consecutive_misses logic)
|
||||||
// - Products in feed: reset consecutive_misses to 0
|
// - Products in feed: reset consecutive_misses to 0
|
||||||
// - Products not in feed: increment consecutive_misses
|
// - Products not in feed: increment consecutive_misses
|
||||||
// - At 3 consecutive misses: mark as OOS
|
// - At 3 consecutive misses: mark as OOS
|
||||||
@@ -217,7 +270,7 @@ export async function handleProductRefresh(ctx: TaskContext): Promise<TaskResult
|
|||||||
|
|
||||||
const incrementedCount = incrementResult.rowCount || 0;
|
const incrementedCount = incrementResult.rowCount || 0;
|
||||||
if (incrementedCount > 0) {
|
if (incrementedCount > 0) {
|
||||||
console.log(`[ProductRefresh] Incremented consecutive_misses for ${incrementedCount} products`);
|
console.log(`[ProductResync] Incremented consecutive_misses for ${incrementedCount} products`);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Mark as OOS any products that hit 3 consecutive misses
|
// Mark as OOS any products that hit 3 consecutive misses
|
||||||
@@ -233,16 +286,16 @@ export async function handleProductRefresh(ctx: TaskContext): Promise<TaskResult
|
|||||||
|
|
||||||
const markedOosCount = oosResult.rowCount || 0;
|
const markedOosCount = oosResult.rowCount || 0;
|
||||||
if (markedOosCount > 0) {
|
if (markedOosCount > 0) {
|
||||||
console.log(`[ProductRefresh] Marked ${markedOosCount} products as OOS (3+ consecutive misses)`);
|
console.log(`[ProductResync] Marked ${markedOosCount} products as OOS (3+ consecutive misses)`);
|
||||||
}
|
}
|
||||||
|
|
||||||
await ctx.heartbeat();
|
await ctx.heartbeat();
|
||||||
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
// STEP 6: Download images for new products
|
// STEP 7: Download images for new products
|
||||||
// ============================================================
|
// ============================================================
|
||||||
if (upsertResult.productsNeedingImages.length > 0) {
|
if (upsertResult.productsNeedingImages.length > 0) {
|
||||||
console.log(`[ProductRefresh] Downloading images for ${upsertResult.productsNeedingImages.length} products...`);
|
console.log(`[ProductResync] Downloading images for ${upsertResult.productsNeedingImages.length} products...`);
|
||||||
|
|
||||||
try {
|
try {
|
||||||
const dispensaryContext = {
|
const dispensaryContext = {
|
||||||
@@ -256,12 +309,12 @@ export async function handleProductRefresh(ctx: TaskContext): Promise<TaskResult
|
|||||||
);
|
);
|
||||||
} catch (imgError: any) {
|
} catch (imgError: any) {
|
||||||
// Image download errors shouldn't fail the whole task
|
// Image download errors shouldn't fail the whole task
|
||||||
console.warn(`[ProductRefresh] Image download error (non-fatal): ${imgError.message}`);
|
console.warn(`[ProductResync] Image download error (non-fatal): ${imgError.message}`);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
// STEP 7: Update dispensary last_crawl_at
|
// STEP 8: Update dispensary last_crawl_at
|
||||||
// ============================================================
|
// ============================================================
|
||||||
await pool.query(`
|
await pool.query(`
|
||||||
UPDATE dispensaries
|
UPDATE dispensaries
|
||||||
@@ -269,20 +322,10 @@ export async function handleProductRefresh(ctx: TaskContext): Promise<TaskResult
|
|||||||
WHERE id = $1
|
WHERE id = $1
|
||||||
`, [dispensaryId]);
|
`, [dispensaryId]);
|
||||||
|
|
||||||
// ============================================================
|
console.log(`[ProductResync] Completed ${dispensary.name}`);
|
||||||
// STEP 8: Mark payload as processed
|
|
||||||
// ============================================================
|
|
||||||
await pool.query(`
|
|
||||||
UPDATE raw_crawl_payloads
|
|
||||||
SET processed_at = NOW()
|
|
||||||
WHERE id = $1
|
|
||||||
`, [payloadId]);
|
|
||||||
|
|
||||||
console.log(`[ProductRefresh] Completed ${dispensary.name}`);
|
|
||||||
|
|
||||||
return {
|
return {
|
||||||
success: true,
|
success: true,
|
||||||
payloadId,
|
|
||||||
productsProcessed: normalizationResult.products.length,
|
productsProcessed: normalizationResult.products.length,
|
||||||
snapshotsCreated: snapshotsResult.created,
|
snapshotsCreated: snapshotsResult.created,
|
||||||
newProducts: upsertResult.new,
|
newProducts: upsertResult.new,
|
||||||
@@ -292,7 +335,7 @@ export async function handleProductRefresh(ctx: TaskContext): Promise<TaskResult
|
|||||||
|
|
||||||
} catch (error: unknown) {
|
} catch (error: unknown) {
|
||||||
const errorMessage = error instanceof Error ? error.message : 'Unknown error';
|
const errorMessage = error instanceof Error ? error.message : 'Unknown error';
|
||||||
console.error(`[ProductRefresh] Error for dispensary ${dispensaryId}:`, errorMessage);
|
console.error(`[ProductResync] Error for dispensary ${dispensaryId}:`, errorMessage);
|
||||||
return {
|
return {
|
||||||
success: false,
|
success: false,
|
||||||
error: errorMessage,
|
error: errorMessage,
|
||||||
|
|||||||
@@ -1,16 +1,8 @@
|
|||||||
/**
|
/**
|
||||||
* Store Discovery Handler
|
* Store Discovery Handler
|
||||||
*
|
*
|
||||||
* Per TASK_WORKFLOW_2024-12-10.md: Discovers new stores and returns their IDs for task chaining.
|
* Discovers new stores by crawling location APIs and adding them
|
||||||
*
|
* to discovery_locations table.
|
||||||
* Flow:
|
|
||||||
* 1. For each active state, run Dutchie discovery
|
|
||||||
* 2. Discover locations via GraphQL
|
|
||||||
* 3. Auto-promote valid locations to dispensaries table
|
|
||||||
* 4. Return newStoreIds[] for chaining to payload_fetch
|
|
||||||
*
|
|
||||||
* Chaining:
|
|
||||||
* store_discovery → (returns newStoreIds) → payload_fetch → product_refresh
|
|
||||||
*/
|
*/
|
||||||
|
|
||||||
import { TaskContext, TaskResult } from '../task-worker';
|
import { TaskContext, TaskResult } from '../task-worker';
|
||||||
@@ -18,25 +10,23 @@ import { discoverState } from '../../discovery';
|
|||||||
|
|
||||||
export async function handleStoreDiscovery(ctx: TaskContext): Promise<TaskResult> {
|
export async function handleStoreDiscovery(ctx: TaskContext): Promise<TaskResult> {
|
||||||
const { pool, task } = ctx;
|
const { pool, task } = ctx;
|
||||||
const platform = task.platform || 'dutchie';
|
const platform = task.platform || 'default';
|
||||||
|
|
||||||
console.log(`[StoreDiscovery] Starting discovery for platform: ${platform}`);
|
console.log(`[StoreDiscovery] Starting discovery for platform: ${platform}`);
|
||||||
|
|
||||||
try {
|
try {
|
||||||
// Get states to discover
|
// Get states to discover
|
||||||
const statesResult = await pool.query(`
|
const statesResult = await pool.query(`
|
||||||
SELECT code FROM states WHERE is_active = true ORDER BY code
|
SELECT code FROM states WHERE active = true ORDER BY code
|
||||||
`);
|
`);
|
||||||
const stateCodes = statesResult.rows.map(r => r.code);
|
const stateCodes = statesResult.rows.map(r => r.code);
|
||||||
|
|
||||||
if (stateCodes.length === 0) {
|
if (stateCodes.length === 0) {
|
||||||
return { success: true, storesDiscovered: 0, newStoreIds: [], message: 'No active states to discover' };
|
return { success: true, storesDiscovered: 0, message: 'No active states to discover' };
|
||||||
}
|
}
|
||||||
|
|
||||||
let totalDiscovered = 0;
|
let totalDiscovered = 0;
|
||||||
let totalPromoted = 0;
|
let totalPromoted = 0;
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Collect all new store IDs for task chaining
|
|
||||||
const allNewStoreIds: number[] = [];
|
|
||||||
|
|
||||||
// Run discovery for each state
|
// Run discovery for each state
|
||||||
for (const stateCode of stateCodes) {
|
for (const stateCode of stateCodes) {
|
||||||
@@ -49,13 +39,6 @@ export async function handleStoreDiscovery(ctx: TaskContext): Promise<TaskResult
|
|||||||
const result = await discoverState(pool, stateCode);
|
const result = await discoverState(pool, stateCode);
|
||||||
totalDiscovered += result.totalLocationsFound || 0;
|
totalDiscovered += result.totalLocationsFound || 0;
|
||||||
totalPromoted += result.totalLocationsUpserted || 0;
|
totalPromoted += result.totalLocationsUpserted || 0;
|
||||||
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Collect new IDs for chaining
|
|
||||||
if (result.newDispensaryIds && result.newDispensaryIds.length > 0) {
|
|
||||||
allNewStoreIds.push(...result.newDispensaryIds);
|
|
||||||
console.log(`[StoreDiscovery] ${stateCode}: ${result.newDispensaryIds.length} new stores`);
|
|
||||||
}
|
|
||||||
|
|
||||||
console.log(`[StoreDiscovery] ${stateCode}: found ${result.totalLocationsFound}, upserted ${result.totalLocationsUpserted}`);
|
console.log(`[StoreDiscovery] ${stateCode}: found ${result.totalLocationsFound}, upserted ${result.totalLocationsUpserted}`);
|
||||||
} catch (error: unknown) {
|
} catch (error: unknown) {
|
||||||
const errorMessage = error instanceof Error ? error.message : 'Unknown error';
|
const errorMessage = error instanceof Error ? error.message : 'Unknown error';
|
||||||
@@ -64,15 +47,13 @@ export async function handleStoreDiscovery(ctx: TaskContext): Promise<TaskResult
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
console.log(`[StoreDiscovery] Complete: ${totalDiscovered} discovered, ${totalPromoted} promoted, ${allNewStoreIds.length} new stores`);
|
console.log(`[StoreDiscovery] Complete: ${totalDiscovered} discovered, ${totalPromoted} promoted`);
|
||||||
|
|
||||||
return {
|
return {
|
||||||
success: true,
|
success: true,
|
||||||
storesDiscovered: totalDiscovered,
|
storesDiscovered: totalDiscovered,
|
||||||
storesPromoted: totalPromoted,
|
storesPromoted: totalPromoted,
|
||||||
statesProcessed: stateCodes.length,
|
statesProcessed: stateCodes.length,
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Return new IDs for task chaining
|
|
||||||
newStoreIds: allNewStoreIds,
|
|
||||||
};
|
};
|
||||||
} catch (error: unknown) {
|
} catch (error: unknown) {
|
||||||
const errorMessage = error instanceof Error ? error.message : 'Unknown error';
|
const errorMessage = error instanceof Error ? error.message : 'Unknown error';
|
||||||
@@ -80,7 +61,6 @@ export async function handleStoreDiscovery(ctx: TaskContext): Promise<TaskResult
|
|||||||
return {
|
return {
|
||||||
success: false,
|
success: false,
|
||||||
error: errorMessage,
|
error: errorMessage,
|
||||||
newStoreIds: [],
|
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,35 +0,0 @@
|
|||||||
/**
|
|
||||||
* Task Pool State
|
|
||||||
*
|
|
||||||
* Shared state for task pool pause/resume functionality.
|
|
||||||
* This is kept separate to avoid circular dependencies between
|
|
||||||
* task-service.ts and routes/tasks.ts.
|
|
||||||
*
|
|
||||||
* State is in-memory and resets on server restart.
|
|
||||||
* By default, the pool is OPEN (not paused).
|
|
||||||
*/
|
|
||||||
|
|
||||||
let taskPoolPaused = false;
|
|
||||||
|
|
||||||
export function isTaskPoolPaused(): boolean {
|
|
||||||
return taskPoolPaused;
|
|
||||||
}
|
|
||||||
|
|
||||||
export function pauseTaskPool(): void {
|
|
||||||
taskPoolPaused = true;
|
|
||||||
console.log('[TaskPool] Task pool PAUSED - workers will not pick up new tasks');
|
|
||||||
}
|
|
||||||
|
|
||||||
export function resumeTaskPool(): void {
|
|
||||||
taskPoolPaused = false;
|
|
||||||
console.log('[TaskPool] Task pool RESUMED - workers can pick up tasks');
|
|
||||||
}
|
|
||||||
|
|
||||||
export function getTaskPoolStatus(): { paused: boolean; message: string } {
|
|
||||||
return {
|
|
||||||
paused: taskPoolPaused,
|
|
||||||
message: taskPoolPaused
|
|
||||||
? 'Task pool is paused - workers will not pick up new tasks'
|
|
||||||
: 'Task pool is open - workers are picking up tasks',
|
|
||||||
};
|
|
||||||
}
|
|
||||||
@@ -9,7 +9,6 @@
|
|||||||
*/
|
*/
|
||||||
|
|
||||||
import { pool } from '../db/pool';
|
import { pool } from '../db/pool';
|
||||||
import { isTaskPoolPaused } from './task-pool-state';
|
|
||||||
|
|
||||||
// Helper to check if a table exists
|
// Helper to check if a table exists
|
||||||
async function tableExists(tableName: string): Promise<boolean> {
|
async function tableExists(tableName: string): Promise<boolean> {
|
||||||
@@ -22,15 +21,11 @@ async function tableExists(tableName: string): Promise<boolean> {
|
|||||||
return result.rows[0].exists;
|
return result.rows[0].exists;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Task roles
|
|
||||||
// payload_fetch: Hits Dutchie API, saves raw payload to filesystem
|
|
||||||
// product_refresh: Reads local payload, normalizes, upserts to DB
|
|
||||||
export type TaskRole =
|
export type TaskRole =
|
||||||
| 'store_discovery'
|
| 'store_discovery'
|
||||||
| 'entry_point_discovery'
|
| 'entry_point_discovery'
|
||||||
| 'product_discovery'
|
| 'product_discovery'
|
||||||
| 'payload_fetch' // NEW: Fetches from API, saves to disk
|
| 'product_refresh'
|
||||||
| 'product_refresh' // CHANGED: Now reads from local payload
|
|
||||||
| 'analytics_refresh';
|
| 'analytics_refresh';
|
||||||
|
|
||||||
export type TaskStatus =
|
export type TaskStatus =
|
||||||
@@ -60,7 +55,6 @@ export interface WorkerTask {
|
|||||||
error_message: string | null;
|
error_message: string | null;
|
||||||
retry_count: number;
|
retry_count: number;
|
||||||
max_retries: number;
|
max_retries: number;
|
||||||
payload: Record<string, unknown> | null; // Per TASK_WORKFLOW_2024-12-10.md: Task chaining data
|
|
||||||
created_at: Date;
|
created_at: Date;
|
||||||
updated_at: Date;
|
updated_at: Date;
|
||||||
}
|
}
|
||||||
@@ -71,7 +65,6 @@ export interface CreateTaskParams {
|
|||||||
platform?: string;
|
platform?: string;
|
||||||
priority?: number;
|
priority?: number;
|
||||||
scheduled_for?: Date;
|
scheduled_for?: Date;
|
||||||
payload?: Record<string, unknown>; // Per TASK_WORKFLOW_2024-12-10.md: For task chaining data
|
|
||||||
}
|
}
|
||||||
|
|
||||||
export interface CapacityMetrics {
|
export interface CapacityMetrics {
|
||||||
@@ -103,8 +96,8 @@ class TaskService {
|
|||||||
*/
|
*/
|
||||||
async createTask(params: CreateTaskParams): Promise<WorkerTask> {
|
async createTask(params: CreateTaskParams): Promise<WorkerTask> {
|
||||||
const result = await pool.query(
|
const result = await pool.query(
|
||||||
`INSERT INTO worker_tasks (role, dispensary_id, platform, priority, scheduled_for, payload)
|
`INSERT INTO worker_tasks (role, dispensary_id, platform, priority, scheduled_for)
|
||||||
VALUES ($1, $2, $3, $4, $5, $6)
|
VALUES ($1, $2, $3, $4, $5)
|
||||||
RETURNING *`,
|
RETURNING *`,
|
||||||
[
|
[
|
||||||
params.role,
|
params.role,
|
||||||
@@ -112,7 +105,6 @@ class TaskService {
|
|||||||
params.platform ?? null,
|
params.platform ?? null,
|
||||||
params.priority ?? 0,
|
params.priority ?? 0,
|
||||||
params.scheduled_for ?? null,
|
params.scheduled_for ?? null,
|
||||||
params.payload ? JSON.stringify(params.payload) : null,
|
|
||||||
]
|
]
|
||||||
);
|
);
|
||||||
return result.rows[0] as WorkerTask;
|
return result.rows[0] as WorkerTask;
|
||||||
@@ -150,14 +142,8 @@ class TaskService {
|
|||||||
/**
|
/**
|
||||||
* Claim a task atomically for a worker
|
* Claim a task atomically for a worker
|
||||||
* If role is null, claims ANY available task (role-agnostic worker)
|
* If role is null, claims ANY available task (role-agnostic worker)
|
||||||
* Returns null if task pool is paused.
|
|
||||||
*/
|
*/
|
||||||
async claimTask(role: TaskRole | null, workerId: string): Promise<WorkerTask | null> {
|
async claimTask(role: TaskRole | null, workerId: string): Promise<WorkerTask | null> {
|
||||||
// Check if task pool is paused - don't claim any tasks
|
|
||||||
if (isTaskPoolPaused()) {
|
|
||||||
return null;
|
|
||||||
}
|
|
||||||
|
|
||||||
if (role) {
|
if (role) {
|
||||||
// Role-specific claiming - use the SQL function
|
// Role-specific claiming - use the SQL function
|
||||||
const result = await pool.query(
|
const result = await pool.query(
|
||||||
@@ -415,17 +401,6 @@ class TaskService {
|
|||||||
/**
|
/**
|
||||||
* Chain next task after completion
|
* Chain next task after completion
|
||||||
* Called automatically when a task completes successfully
|
* Called automatically when a task completes successfully
|
||||||
*
|
|
||||||
* Per TASK_WORKFLOW_2024-12-10.md: Task chaining flow:
|
|
||||||
*
|
|
||||||
* Discovery flow (new stores):
|
|
||||||
* store_discovery → product_discovery → payload_fetch → product_refresh
|
|
||||||
*
|
|
||||||
* Scheduled flow (existing stores):
|
|
||||||
* payload_fetch → product_refresh
|
|
||||||
*
|
|
||||||
* Note: entry_point_discovery is deprecated since platform_dispensary_id
|
|
||||||
* is now resolved during store promotion.
|
|
||||||
*/
|
*/
|
||||||
async chainNextTask(completedTask: WorkerTask): Promise<WorkerTask | null> {
|
async chainNextTask(completedTask: WorkerTask): Promise<WorkerTask | null> {
|
||||||
if (completedTask.status !== 'completed') {
|
if (completedTask.status !== 'completed') {
|
||||||
@@ -434,14 +409,12 @@ class TaskService {
|
|||||||
|
|
||||||
switch (completedTask.role) {
|
switch (completedTask.role) {
|
||||||
case 'store_discovery': {
|
case 'store_discovery': {
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: New stores discovered -> create product_discovery tasks
|
// New stores discovered -> create entry_point_discovery tasks
|
||||||
// Skip entry_point_discovery since platform_dispensary_id is set during promotion
|
|
||||||
const newStoreIds = (completedTask.result as { newStoreIds?: number[] })?.newStoreIds;
|
const newStoreIds = (completedTask.result as { newStoreIds?: number[] })?.newStoreIds;
|
||||||
if (newStoreIds && newStoreIds.length > 0) {
|
if (newStoreIds && newStoreIds.length > 0) {
|
||||||
console.log(`[TaskService] Chaining ${newStoreIds.length} product_discovery tasks for new stores`);
|
|
||||||
for (const storeId of newStoreIds) {
|
for (const storeId of newStoreIds) {
|
||||||
await this.createTask({
|
await this.createTask({
|
||||||
role: 'product_discovery',
|
role: 'entry_point_discovery',
|
||||||
dispensary_id: storeId,
|
dispensary_id: storeId,
|
||||||
platform: completedTask.platform ?? undefined,
|
platform: completedTask.platform ?? undefined,
|
||||||
priority: 10, // High priority for new stores
|
priority: 10, // High priority for new stores
|
||||||
@@ -452,8 +425,7 @@ class TaskService {
|
|||||||
}
|
}
|
||||||
|
|
||||||
case 'entry_point_discovery': {
|
case 'entry_point_discovery': {
|
||||||
// DEPRECATED: Entry point resolution now happens during store promotion
|
// Entry point resolved -> create product_discovery task
|
||||||
// Kept for backward compatibility with any in-flight tasks
|
|
||||||
const success = (completedTask.result as { success?: boolean })?.success;
|
const success = (completedTask.result as { success?: boolean })?.success;
|
||||||
if (success && completedTask.dispensary_id) {
|
if (success && completedTask.dispensary_id) {
|
||||||
return this.createTask({
|
return this.createTask({
|
||||||
@@ -467,15 +439,8 @@ class TaskService {
|
|||||||
}
|
}
|
||||||
|
|
||||||
case 'product_discovery': {
|
case 'product_discovery': {
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Product discovery chains internally to payload_fetch
|
// Product discovery done -> store is now ready for regular resync
|
||||||
// No external chaining needed - handleProductDiscovery calls handlePayloadFetch directly
|
// No immediate chaining needed; will be picked up by daily batch generation
|
||||||
break;
|
|
||||||
}
|
|
||||||
|
|
||||||
case 'payload_fetch': {
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: payload_fetch chains to product_refresh
|
|
||||||
// This is handled internally by the payload_fetch handler via taskService.createTask
|
|
||||||
// No external chaining needed here
|
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -52,8 +52,6 @@ import { CrawlRotator } from '../services/crawl-rotator';
|
|||||||
import { setCrawlRotator } from '../platforms/dutchie';
|
import { setCrawlRotator } from '../platforms/dutchie';
|
||||||
|
|
||||||
// Task handlers by role
|
// Task handlers by role
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: payload_fetch and product_refresh are now separate
|
|
||||||
import { handlePayloadFetch } from './handlers/payload-fetch';
|
|
||||||
import { handleProductRefresh } from './handlers/product-refresh';
|
import { handleProductRefresh } from './handlers/product-refresh';
|
||||||
import { handleProductDiscovery } from './handlers/product-discovery';
|
import { handleProductDiscovery } from './handlers/product-discovery';
|
||||||
import { handleStoreDiscovery } from './handlers/store-discovery';
|
import { handleStoreDiscovery } from './handlers/store-discovery';
|
||||||
@@ -82,12 +80,8 @@ export interface TaskResult {
|
|||||||
|
|
||||||
type TaskHandler = (ctx: TaskContext) => Promise<TaskResult>;
|
type TaskHandler = (ctx: TaskContext) => Promise<TaskResult>;
|
||||||
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Handler registry
|
|
||||||
// payload_fetch: Fetches from Dutchie API, saves to disk, chains to product_refresh
|
|
||||||
// product_refresh: Reads local payload, normalizes, upserts to DB
|
|
||||||
const TASK_HANDLERS: Record<TaskRole, TaskHandler> = {
|
const TASK_HANDLERS: Record<TaskRole, TaskHandler> = {
|
||||||
payload_fetch: handlePayloadFetch, // NEW: API fetch -> disk
|
product_refresh: handleProductRefresh,
|
||||||
product_refresh: handleProductRefresh, // CHANGED: disk -> DB
|
|
||||||
product_discovery: handleProductDiscovery,
|
product_discovery: handleProductDiscovery,
|
||||||
store_discovery: handleStoreDiscovery,
|
store_discovery: handleStoreDiscovery,
|
||||||
entry_point_discovery: handleEntryPointDiscovery,
|
entry_point_discovery: handleEntryPointDiscovery,
|
||||||
@@ -116,80 +110,23 @@ export class TaskWorker {
|
|||||||
* Initialize stealth systems (proxy rotation, fingerprints)
|
* Initialize stealth systems (proxy rotation, fingerprints)
|
||||||
* Called once on worker startup before processing any tasks.
|
* Called once on worker startup before processing any tasks.
|
||||||
*
|
*
|
||||||
* IMPORTANT: Proxies are REQUIRED. Workers will wait until proxies are available.
|
* IMPORTANT: Proxies are REQUIRED. Workers will fail to start if no proxies available.
|
||||||
* Workers listen for PostgreSQL NOTIFY 'proxy_added' to wake up immediately when proxies are added.
|
|
||||||
*/
|
*/
|
||||||
private async initializeStealth(): Promise<void> {
|
private async initializeStealth(): Promise<void> {
|
||||||
const MAX_WAIT_MINUTES = 60;
|
|
||||||
const POLL_INTERVAL_MS = 30000; // 30 seconds fallback polling
|
|
||||||
const maxAttempts = (MAX_WAIT_MINUTES * 60 * 1000) / POLL_INTERVAL_MS;
|
|
||||||
let attempts = 0;
|
|
||||||
let notifyClient: any = null;
|
|
||||||
|
|
||||||
// Set up PostgreSQL LISTEN for proxy notifications
|
|
||||||
try {
|
|
||||||
notifyClient = await this.pool.connect();
|
|
||||||
await notifyClient.query('LISTEN proxy_added');
|
|
||||||
console.log(`[TaskWorker] Listening for proxy_added notifications...`);
|
|
||||||
} catch (err: any) {
|
|
||||||
console.log(`[TaskWorker] Could not set up LISTEN (will poll): ${err.message}`);
|
|
||||||
}
|
|
||||||
|
|
||||||
// Create a promise that resolves when notified
|
|
||||||
let notifyResolve: (() => void) | null = null;
|
|
||||||
if (notifyClient) {
|
|
||||||
notifyClient.on('notification', (msg: any) => {
|
|
||||||
if (msg.channel === 'proxy_added') {
|
|
||||||
console.log(`[TaskWorker] Received proxy_added notification!`);
|
|
||||||
if (notifyResolve) notifyResolve();
|
|
||||||
}
|
|
||||||
});
|
|
||||||
}
|
|
||||||
|
|
||||||
try {
|
|
||||||
while (attempts < maxAttempts) {
|
|
||||||
try {
|
|
||||||
// Load proxies from database
|
// Load proxies from database
|
||||||
await this.crawlRotator.initialize();
|
await this.crawlRotator.initialize();
|
||||||
|
|
||||||
const stats = this.crawlRotator.proxy.getStats();
|
const stats = this.crawlRotator.proxy.getStats();
|
||||||
if (stats.activeProxies > 0) {
|
if (stats.activeProxies === 0) {
|
||||||
|
throw new Error('No active proxies available. Workers MUST use proxies for all requests. Add proxies to the database before starting workers.');
|
||||||
|
}
|
||||||
|
|
||||||
console.log(`[TaskWorker] Loaded ${stats.activeProxies} proxies (${stats.avgSuccessRate.toFixed(1)}% avg success rate)`);
|
console.log(`[TaskWorker] Loaded ${stats.activeProxies} proxies (${stats.avgSuccessRate.toFixed(1)}% avg success rate)`);
|
||||||
|
|
||||||
// Wire rotator to Dutchie client - proxies will be used for ALL requests
|
// Wire rotator to Dutchie client - proxies will be used for ALL requests
|
||||||
setCrawlRotator(this.crawlRotator);
|
setCrawlRotator(this.crawlRotator);
|
||||||
|
|
||||||
console.log(`[TaskWorker] Stealth initialized: ${this.crawlRotator.userAgent.getCount()} fingerprints, proxy REQUIRED for all requests`);
|
console.log(`[TaskWorker] Stealth initialized: ${this.crawlRotator.userAgent.getCount()} fingerprints, proxy REQUIRED for all requests`);
|
||||||
return;
|
|
||||||
}
|
|
||||||
|
|
||||||
attempts++;
|
|
||||||
console.log(`[TaskWorker] No active proxies available (attempt ${attempts}). Waiting for proxies...`);
|
|
||||||
|
|
||||||
// Wait for either notification or timeout
|
|
||||||
await new Promise<void>((resolve) => {
|
|
||||||
notifyResolve = resolve;
|
|
||||||
setTimeout(resolve, POLL_INTERVAL_MS);
|
|
||||||
});
|
|
||||||
} catch (error: any) {
|
|
||||||
attempts++;
|
|
||||||
console.log(`[TaskWorker] Error loading proxies (attempt ${attempts}): ${error.message}. Retrying...`);
|
|
||||||
await this.sleep(POLL_INTERVAL_MS);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
throw new Error(`No active proxies available after waiting ${MAX_WAIT_MINUTES} minutes. Add proxies to the database.`);
|
|
||||||
} finally {
|
|
||||||
// Clean up LISTEN connection
|
|
||||||
if (notifyClient) {
|
|
||||||
try {
|
|
||||||
await notifyClient.query('UNLISTEN proxy_added');
|
|
||||||
notifyClient.release();
|
|
||||||
} catch {
|
|
||||||
// Ignore cleanup errors
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@@ -477,13 +414,11 @@ export class TaskWorker {
|
|||||||
async function main(): Promise<void> {
|
async function main(): Promise<void> {
|
||||||
const role = process.env.WORKER_ROLE as TaskRole | undefined;
|
const role = process.env.WORKER_ROLE as TaskRole | undefined;
|
||||||
|
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Valid task roles
|
|
||||||
const validRoles: TaskRole[] = [
|
const validRoles: TaskRole[] = [
|
||||||
'store_discovery',
|
'store_discovery',
|
||||||
'entry_point_discovery',
|
'entry_point_discovery',
|
||||||
'product_discovery',
|
'product_discovery',
|
||||||
'payload_fetch', // NEW: Fetches from API, saves to disk
|
'product_refresh',
|
||||||
'product_refresh', // CHANGED: Reads from disk, processes to DB
|
|
||||||
'analytics_refresh',
|
'analytics_refresh',
|
||||||
];
|
];
|
||||||
|
|
||||||
|
|||||||
49
backend/src/types/user-agents.d.ts
vendored
49
backend/src/types/user-agents.d.ts
vendored
@@ -1,49 +0,0 @@
|
|||||||
/**
|
|
||||||
* Type declarations for user-agents npm package
|
|
||||||
* Per workflow-12102025.md: Used for realistic UA generation with market-share weighting
|
|
||||||
*/
|
|
||||||
|
|
||||||
declare module 'user-agents' {
|
|
||||||
interface UserAgentData {
|
|
||||||
userAgent: string;
|
|
||||||
platform: string;
|
|
||||||
screenWidth: number;
|
|
||||||
screenHeight: number;
|
|
||||||
viewportWidth: number;
|
|
||||||
viewportHeight: number;
|
|
||||||
deviceCategory: 'desktop' | 'mobile' | 'tablet';
|
|
||||||
appName: string;
|
|
||||||
connection?: {
|
|
||||||
downlink: number;
|
|
||||||
effectiveType: string;
|
|
||||||
rtt: number;
|
|
||||||
};
|
|
||||||
}
|
|
||||||
|
|
||||||
interface UserAgentOptions {
|
|
||||||
deviceCategory?: 'desktop' | 'mobile' | 'tablet';
|
|
||||||
platform?: RegExp | string;
|
|
||||||
screenWidth?: RegExp | { min?: number; max?: number };
|
|
||||||
screenHeight?: RegExp | { min?: number; max?: number };
|
|
||||||
}
|
|
||||||
|
|
||||||
interface UserAgentInstance {
|
|
||||||
data: UserAgentData;
|
|
||||||
toString(): string;
|
|
||||||
random(): UserAgentInstance;
|
|
||||||
}
|
|
||||||
|
|
||||||
class UserAgent {
|
|
||||||
constructor(options?: UserAgentOptions | UserAgentOptions[]);
|
|
||||||
data: UserAgentData;
|
|
||||||
toString(): string;
|
|
||||||
random(): UserAgentInstance;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Make it callable
|
|
||||||
interface UserAgent {
|
|
||||||
(): UserAgentInstance;
|
|
||||||
}
|
|
||||||
|
|
||||||
export default UserAgent;
|
|
||||||
}
|
|
||||||
@@ -1,406 +0,0 @@
|
|||||||
/**
|
|
||||||
* Payload Storage Utility
|
|
||||||
*
|
|
||||||
* Per TASK_WORKFLOW_2024-12-10.md: Store raw GraphQL payloads for historical analysis.
|
|
||||||
*
|
|
||||||
* Design Pattern: Metadata/Payload Separation
|
|
||||||
* - Metadata in PostgreSQL (raw_crawl_payloads table): Small, indexed, queryable
|
|
||||||
* - Payload on filesystem: Gzipped JSON at storage_path
|
|
||||||
*
|
|
||||||
* Storage structure:
|
|
||||||
* /storage/payloads/{year}/{month}/{day}/store_{dispensary_id}_{timestamp}.json.gz
|
|
||||||
*
|
|
||||||
* Benefits:
|
|
||||||
* - Compare any two crawls to see what changed
|
|
||||||
* - Replay/re-normalize historical data if logic changes
|
|
||||||
* - Debug issues by seeing exactly what the API returned
|
|
||||||
* - DB stays small, backups stay fast
|
|
||||||
* - ~90% compression (1.5MB -> 150KB per crawl)
|
|
||||||
*/
|
|
||||||
|
|
||||||
import * as fs from 'fs';
|
|
||||||
import * as path from 'path';
|
|
||||||
import * as zlib from 'zlib';
|
|
||||||
import { promisify } from 'util';
|
|
||||||
import { Pool } from 'pg';
|
|
||||||
import * as crypto from 'crypto';
|
|
||||||
|
|
||||||
const gzip = promisify(zlib.gzip);
|
|
||||||
const gunzip = promisify(zlib.gunzip);
|
|
||||||
|
|
||||||
// Base path for payload storage (matches image storage pattern)
|
|
||||||
const PAYLOAD_BASE_PATH = process.env.PAYLOAD_STORAGE_PATH || './storage/payloads';
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Result from saving a payload
|
|
||||||
*/
|
|
||||||
export interface SavePayloadResult {
|
|
||||||
id: number;
|
|
||||||
storagePath: string;
|
|
||||||
sizeBytes: number;
|
|
||||||
sizeBytesRaw: number;
|
|
||||||
checksum: string;
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Result from loading a payload
|
|
||||||
*/
|
|
||||||
export interface LoadPayloadResult {
|
|
||||||
payload: any;
|
|
||||||
metadata: {
|
|
||||||
id: number;
|
|
||||||
dispensaryId: number;
|
|
||||||
crawlRunId: number | null;
|
|
||||||
productCount: number;
|
|
||||||
fetchedAt: Date;
|
|
||||||
storagePath: string;
|
|
||||||
};
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Generate storage path for a payload
|
|
||||||
*
|
|
||||||
* Format: /storage/payloads/{year}/{month}/{day}/store_{dispensary_id}_{timestamp}.json.gz
|
|
||||||
*/
|
|
||||||
function generateStoragePath(dispensaryId: number, timestamp: Date): string {
|
|
||||||
const year = timestamp.getFullYear();
|
|
||||||
const month = String(timestamp.getMonth() + 1).padStart(2, '0');
|
|
||||||
const day = String(timestamp.getDate()).padStart(2, '0');
|
|
||||||
const ts = timestamp.getTime();
|
|
||||||
|
|
||||||
return path.join(
|
|
||||||
PAYLOAD_BASE_PATH,
|
|
||||||
String(year),
|
|
||||||
month,
|
|
||||||
day,
|
|
||||||
`store_${dispensaryId}_${ts}.json.gz`
|
|
||||||
);
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Ensure directory exists for a file path
|
|
||||||
*/
|
|
||||||
async function ensureDir(filePath: string): Promise<void> {
|
|
||||||
const dir = path.dirname(filePath);
|
|
||||||
await fs.promises.mkdir(dir, { recursive: true });
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Calculate SHA256 checksum of data
|
|
||||||
*/
|
|
||||||
function calculateChecksum(data: Buffer): string {
|
|
||||||
return crypto.createHash('sha256').update(data).digest('hex');
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Save a raw crawl payload to filesystem and record metadata in DB
|
|
||||||
*
|
|
||||||
* @param pool - Database connection pool
|
|
||||||
* @param dispensaryId - ID of the dispensary
|
|
||||||
* @param payload - Raw JSON payload from GraphQL
|
|
||||||
* @param crawlRunId - Optional crawl_run ID for linking
|
|
||||||
* @param productCount - Number of products in payload
|
|
||||||
* @returns SavePayloadResult with file info and DB record ID
|
|
||||||
*/
|
|
||||||
export async function saveRawPayload(
|
|
||||||
pool: Pool,
|
|
||||||
dispensaryId: number,
|
|
||||||
payload: any,
|
|
||||||
crawlRunId: number | null = null,
|
|
||||||
productCount: number = 0
|
|
||||||
): Promise<SavePayloadResult> {
|
|
||||||
const timestamp = new Date();
|
|
||||||
const storagePath = generateStoragePath(dispensaryId, timestamp);
|
|
||||||
|
|
||||||
// Serialize and compress
|
|
||||||
const jsonStr = JSON.stringify(payload);
|
|
||||||
const rawSize = Buffer.byteLength(jsonStr, 'utf8');
|
|
||||||
const compressed = await gzip(Buffer.from(jsonStr, 'utf8'));
|
|
||||||
const compressedSize = compressed.length;
|
|
||||||
const checksum = calculateChecksum(compressed);
|
|
||||||
|
|
||||||
// Write to filesystem
|
|
||||||
await ensureDir(storagePath);
|
|
||||||
await fs.promises.writeFile(storagePath, compressed);
|
|
||||||
|
|
||||||
// Record metadata in DB
|
|
||||||
const result = await pool.query(`
|
|
||||||
INSERT INTO raw_crawl_payloads (
|
|
||||||
crawl_run_id,
|
|
||||||
dispensary_id,
|
|
||||||
storage_path,
|
|
||||||
product_count,
|
|
||||||
size_bytes,
|
|
||||||
size_bytes_raw,
|
|
||||||
fetched_at,
|
|
||||||
checksum_sha256
|
|
||||||
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
|
|
||||||
RETURNING id
|
|
||||||
`, [
|
|
||||||
crawlRunId,
|
|
||||||
dispensaryId,
|
|
||||||
storagePath,
|
|
||||||
productCount,
|
|
||||||
compressedSize,
|
|
||||||
rawSize,
|
|
||||||
timestamp,
|
|
||||||
checksum
|
|
||||||
]);
|
|
||||||
|
|
||||||
console.log(`[PayloadStorage] Saved payload for store ${dispensaryId}: ${storagePath} (${(compressedSize / 1024).toFixed(1)}KB compressed, ${(rawSize / 1024).toFixed(1)}KB raw)`);
|
|
||||||
|
|
||||||
return {
|
|
||||||
id: result.rows[0].id,
|
|
||||||
storagePath,
|
|
||||||
sizeBytes: compressedSize,
|
|
||||||
sizeBytesRaw: rawSize,
|
|
||||||
checksum
|
|
||||||
};
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Load a raw payload from filesystem by metadata ID
|
|
||||||
*
|
|
||||||
* @param pool - Database connection pool
|
|
||||||
* @param payloadId - ID from raw_crawl_payloads table
|
|
||||||
* @returns LoadPayloadResult with parsed payload and metadata
|
|
||||||
*/
|
|
||||||
export async function loadRawPayloadById(
|
|
||||||
pool: Pool,
|
|
||||||
payloadId: number
|
|
||||||
): Promise<LoadPayloadResult | null> {
|
|
||||||
const result = await pool.query(`
|
|
||||||
SELECT id, dispensary_id, crawl_run_id, storage_path, product_count, fetched_at
|
|
||||||
FROM raw_crawl_payloads
|
|
||||||
WHERE id = $1
|
|
||||||
`, [payloadId]);
|
|
||||||
|
|
||||||
if (result.rows.length === 0) {
|
|
||||||
return null;
|
|
||||||
}
|
|
||||||
|
|
||||||
const row = result.rows[0];
|
|
||||||
const payload = await loadPayloadFromPath(row.storage_path);
|
|
||||||
|
|
||||||
return {
|
|
||||||
payload,
|
|
||||||
metadata: {
|
|
||||||
id: row.id,
|
|
||||||
dispensaryId: row.dispensary_id,
|
|
||||||
crawlRunId: row.crawl_run_id,
|
|
||||||
productCount: row.product_count,
|
|
||||||
fetchedAt: row.fetched_at,
|
|
||||||
storagePath: row.storage_path
|
|
||||||
}
|
|
||||||
};
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Load a raw payload directly from filesystem path
|
|
||||||
*
|
|
||||||
* @param storagePath - Path to gzipped JSON file
|
|
||||||
* @returns Parsed JSON payload
|
|
||||||
*/
|
|
||||||
export async function loadPayloadFromPath(storagePath: string): Promise<any> {
|
|
||||||
const compressed = await fs.promises.readFile(storagePath);
|
|
||||||
const decompressed = await gunzip(compressed);
|
|
||||||
return JSON.parse(decompressed.toString('utf8'));
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Get the latest payload for a dispensary
|
|
||||||
*
|
|
||||||
* @param pool - Database connection pool
|
|
||||||
* @param dispensaryId - ID of the dispensary
|
|
||||||
* @returns LoadPayloadResult or null if none exists
|
|
||||||
*/
|
|
||||||
export async function getLatestPayload(
|
|
||||||
pool: Pool,
|
|
||||||
dispensaryId: number
|
|
||||||
): Promise<LoadPayloadResult | null> {
|
|
||||||
const result = await pool.query(`
|
|
||||||
SELECT id, dispensary_id, crawl_run_id, storage_path, product_count, fetched_at
|
|
||||||
FROM raw_crawl_payloads
|
|
||||||
WHERE dispensary_id = $1
|
|
||||||
ORDER BY fetched_at DESC
|
|
||||||
LIMIT 1
|
|
||||||
`, [dispensaryId]);
|
|
||||||
|
|
||||||
if (result.rows.length === 0) {
|
|
||||||
return null;
|
|
||||||
}
|
|
||||||
|
|
||||||
const row = result.rows[0];
|
|
||||||
const payload = await loadPayloadFromPath(row.storage_path);
|
|
||||||
|
|
||||||
return {
|
|
||||||
payload,
|
|
||||||
metadata: {
|
|
||||||
id: row.id,
|
|
||||||
dispensaryId: row.dispensary_id,
|
|
||||||
crawlRunId: row.crawl_run_id,
|
|
||||||
productCount: row.product_count,
|
|
||||||
fetchedAt: row.fetched_at,
|
|
||||||
storagePath: row.storage_path
|
|
||||||
}
|
|
||||||
};
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Get two payloads for comparison (latest and previous, or by IDs)
|
|
||||||
*
|
|
||||||
* @param pool - Database connection pool
|
|
||||||
* @param dispensaryId - ID of the dispensary
|
|
||||||
* @param limit - Number of recent payloads to retrieve (default 2)
|
|
||||||
* @returns Array of LoadPayloadResult, most recent first
|
|
||||||
*/
|
|
||||||
export async function getRecentPayloads(
|
|
||||||
pool: Pool,
|
|
||||||
dispensaryId: number,
|
|
||||||
limit: number = 2
|
|
||||||
): Promise<LoadPayloadResult[]> {
|
|
||||||
const result = await pool.query(`
|
|
||||||
SELECT id, dispensary_id, crawl_run_id, storage_path, product_count, fetched_at
|
|
||||||
FROM raw_crawl_payloads
|
|
||||||
WHERE dispensary_id = $1
|
|
||||||
ORDER BY fetched_at DESC
|
|
||||||
LIMIT $2
|
|
||||||
`, [dispensaryId, limit]);
|
|
||||||
|
|
||||||
const payloads: LoadPayloadResult[] = [];
|
|
||||||
|
|
||||||
for (const row of result.rows) {
|
|
||||||
const payload = await loadPayloadFromPath(row.storage_path);
|
|
||||||
payloads.push({
|
|
||||||
payload,
|
|
||||||
metadata: {
|
|
||||||
id: row.id,
|
|
||||||
dispensaryId: row.dispensary_id,
|
|
||||||
crawlRunId: row.crawl_run_id,
|
|
||||||
productCount: row.product_count,
|
|
||||||
fetchedAt: row.fetched_at,
|
|
||||||
storagePath: row.storage_path
|
|
||||||
}
|
|
||||||
});
|
|
||||||
}
|
|
||||||
|
|
||||||
return payloads;
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* List payload metadata without loading files (for browsing/pagination)
|
|
||||||
*
|
|
||||||
* @param pool - Database connection pool
|
|
||||||
* @param options - Query options
|
|
||||||
* @returns Array of metadata rows
|
|
||||||
*/
|
|
||||||
export async function listPayloadMetadata(
|
|
||||||
pool: Pool,
|
|
||||||
options: {
|
|
||||||
dispensaryId?: number;
|
|
||||||
startDate?: Date;
|
|
||||||
endDate?: Date;
|
|
||||||
limit?: number;
|
|
||||||
offset?: number;
|
|
||||||
} = {}
|
|
||||||
): Promise<Array<{
|
|
||||||
id: number;
|
|
||||||
dispensaryId: number;
|
|
||||||
crawlRunId: number | null;
|
|
||||||
storagePath: string;
|
|
||||||
productCount: number;
|
|
||||||
sizeBytes: number;
|
|
||||||
sizeBytesRaw: number;
|
|
||||||
fetchedAt: Date;
|
|
||||||
}>> {
|
|
||||||
const conditions: string[] = [];
|
|
||||||
const params: any[] = [];
|
|
||||||
let paramIndex = 1;
|
|
||||||
|
|
||||||
if (options.dispensaryId) {
|
|
||||||
conditions.push(`dispensary_id = $${paramIndex++}`);
|
|
||||||
params.push(options.dispensaryId);
|
|
||||||
}
|
|
||||||
|
|
||||||
if (options.startDate) {
|
|
||||||
conditions.push(`fetched_at >= $${paramIndex++}`);
|
|
||||||
params.push(options.startDate);
|
|
||||||
}
|
|
||||||
|
|
||||||
if (options.endDate) {
|
|
||||||
conditions.push(`fetched_at <= $${paramIndex++}`);
|
|
||||||
params.push(options.endDate);
|
|
||||||
}
|
|
||||||
|
|
||||||
const whereClause = conditions.length > 0 ? `WHERE ${conditions.join(' AND ')}` : '';
|
|
||||||
const limit = options.limit || 50;
|
|
||||||
const offset = options.offset || 0;
|
|
||||||
|
|
||||||
params.push(limit, offset);
|
|
||||||
|
|
||||||
const result = await pool.query(`
|
|
||||||
SELECT
|
|
||||||
id,
|
|
||||||
dispensary_id,
|
|
||||||
crawl_run_id,
|
|
||||||
storage_path,
|
|
||||||
product_count,
|
|
||||||
size_bytes,
|
|
||||||
size_bytes_raw,
|
|
||||||
fetched_at
|
|
||||||
FROM raw_crawl_payloads
|
|
||||||
${whereClause}
|
|
||||||
ORDER BY fetched_at DESC
|
|
||||||
LIMIT $${paramIndex++} OFFSET $${paramIndex}
|
|
||||||
`, params);
|
|
||||||
|
|
||||||
return result.rows.map(row => ({
|
|
||||||
id: row.id,
|
|
||||||
dispensaryId: row.dispensary_id,
|
|
||||||
crawlRunId: row.crawl_run_id,
|
|
||||||
storagePath: row.storage_path,
|
|
||||||
productCount: row.product_count,
|
|
||||||
sizeBytes: row.size_bytes,
|
|
||||||
sizeBytesRaw: row.size_bytes_raw,
|
|
||||||
fetchedAt: row.fetched_at
|
|
||||||
}));
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Delete old payloads (for retention policy)
|
|
||||||
*
|
|
||||||
* @param pool - Database connection pool
|
|
||||||
* @param olderThan - Delete payloads older than this date
|
|
||||||
* @returns Number of payloads deleted
|
|
||||||
*/
|
|
||||||
export async function deleteOldPayloads(
|
|
||||||
pool: Pool,
|
|
||||||
olderThan: Date
|
|
||||||
): Promise<number> {
|
|
||||||
// Get paths first
|
|
||||||
const result = await pool.query(`
|
|
||||||
SELECT id, storage_path FROM raw_crawl_payloads
|
|
||||||
WHERE fetched_at < $1
|
|
||||||
`, [olderThan]);
|
|
||||||
|
|
||||||
// Delete files
|
|
||||||
for (const row of result.rows) {
|
|
||||||
try {
|
|
||||||
await fs.promises.unlink(row.storage_path);
|
|
||||||
} catch (err: any) {
|
|
||||||
if (err.code !== 'ENOENT') {
|
|
||||||
console.warn(`[PayloadStorage] Failed to delete ${row.storage_path}: ${err.message}`);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Delete DB records
|
|
||||||
await pool.query(`
|
|
||||||
DELETE FROM raw_crawl_payloads
|
|
||||||
WHERE fetched_at < $1
|
|
||||||
`, [olderThan]);
|
|
||||||
|
|
||||||
console.log(`[PayloadStorage] Deleted ${result.rows.length} payloads older than ${olderThan.toISOString()}`);
|
|
||||||
|
|
||||||
return result.rows.length;
|
|
||||||
}
|
|
||||||
4
cannaiq/dist/index.html
vendored
4
cannaiq/dist/index.html
vendored
@@ -7,8 +7,8 @@
|
|||||||
<title>CannaIQ - Cannabis Menu Intelligence Platform</title>
|
<title>CannaIQ - Cannabis Menu Intelligence Platform</title>
|
||||||
<meta name="description" content="CannaIQ provides real-time cannabis dispensary menu data, product tracking, and analytics for dispensaries across Arizona." />
|
<meta name="description" content="CannaIQ provides real-time cannabis dispensary menu data, product tracking, and analytics for dispensaries across Arizona." />
|
||||||
<meta name="keywords" content="cannabis, dispensary, menu, products, analytics, Arizona" />
|
<meta name="keywords" content="cannabis, dispensary, menu, products, analytics, Arizona" />
|
||||||
<script type="module" crossorigin src="/assets/index-BXmp5CSY.js"></script>
|
<script type="module" crossorigin src="/assets/index-BML8-px1.js"></script>
|
||||||
<link rel="stylesheet" crossorigin href="/assets/index-4959QN4j.css">
|
<link rel="stylesheet" crossorigin href="/assets/index-B2gR-58G.css">
|
||||||
</head>
|
</head>
|
||||||
<body>
|
<body>
|
||||||
<div id="root"></div>
|
<div id="root"></div>
|
||||||
|
|||||||
@@ -1518,11 +1518,10 @@ class ApiClient {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Intelligence API
|
// Intelligence API
|
||||||
async getIntelligenceBrands(params?: { limit?: number; offset?: number; state?: string }) {
|
async getIntelligenceBrands(params?: { limit?: number; offset?: number }) {
|
||||||
const searchParams = new URLSearchParams();
|
const searchParams = new URLSearchParams();
|
||||||
if (params?.limit) searchParams.append('limit', params.limit.toString());
|
if (params?.limit) searchParams.append('limit', params.limit.toString());
|
||||||
if (params?.offset) searchParams.append('offset', params.offset.toString());
|
if (params?.offset) searchParams.append('offset', params.offset.toString());
|
||||||
if (params?.state) searchParams.append('state', params.state);
|
|
||||||
const queryString = searchParams.toString() ? `?${searchParams.toString()}` : '';
|
const queryString = searchParams.toString() ? `?${searchParams.toString()}` : '';
|
||||||
return this.request<{
|
return this.request<{
|
||||||
brands: Array<{
|
brands: Array<{
|
||||||
@@ -1537,10 +1536,7 @@ class ApiClient {
|
|||||||
}>(`/api/admin/intelligence/brands${queryString}`);
|
}>(`/api/admin/intelligence/brands${queryString}`);
|
||||||
}
|
}
|
||||||
|
|
||||||
async getIntelligencePricing(params?: { state?: string }) {
|
async getIntelligencePricing() {
|
||||||
const searchParams = new URLSearchParams();
|
|
||||||
if (params?.state) searchParams.append('state', params.state);
|
|
||||||
const queryString = searchParams.toString() ? `?${searchParams.toString()}` : '';
|
|
||||||
return this.request<{
|
return this.request<{
|
||||||
byCategory: Array<{
|
byCategory: Array<{
|
||||||
category: string;
|
category: string;
|
||||||
@@ -1556,7 +1552,7 @@ class ApiClient {
|
|||||||
maxPrice: number;
|
maxPrice: number;
|
||||||
totalProducts: number;
|
totalProducts: number;
|
||||||
};
|
};
|
||||||
}>(`/api/admin/intelligence/pricing${queryString}`);
|
}>('/api/admin/intelligence/pricing');
|
||||||
}
|
}
|
||||||
|
|
||||||
async getIntelligenceStoreActivity(params?: { state?: string; chainId?: number; limit?: number }) {
|
async getIntelligenceStoreActivity(params?: { state?: string; chainId?: number; limit?: number }) {
|
||||||
@@ -2888,27 +2884,6 @@ class ApiClient {
|
|||||||
`/api/tasks/store/${dispensaryId}/active`
|
`/api/tasks/store/${dispensaryId}/active`
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Task Pool Control
|
|
||||||
async getTaskPoolStatus() {
|
|
||||||
return this.request<{ success: boolean; paused: boolean; message: string }>(
|
|
||||||
'/api/tasks/pool/status'
|
|
||||||
);
|
|
||||||
}
|
|
||||||
|
|
||||||
async pauseTaskPool() {
|
|
||||||
return this.request<{ success: boolean; paused: boolean; message: string }>(
|
|
||||||
'/api/tasks/pool/pause',
|
|
||||||
{ method: 'POST' }
|
|
||||||
);
|
|
||||||
}
|
|
||||||
|
|
||||||
async resumeTaskPool() {
|
|
||||||
return this.request<{ success: boolean; paused: boolean; message: string }>(
|
|
||||||
'/api/tasks/pool/resume',
|
|
||||||
{ method: 'POST' }
|
|
||||||
);
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
export const api = new ApiClient(API_URL);
|
export const api = new ApiClient(API_URL);
|
||||||
|
|||||||
@@ -3,7 +3,6 @@ import { useNavigate } from 'react-router-dom';
|
|||||||
import { Layout } from '../components/Layout';
|
import { Layout } from '../components/Layout';
|
||||||
import { api } from '../lib/api';
|
import { api } from '../lib/api';
|
||||||
import { trackProductClick } from '../lib/analytics';
|
import { trackProductClick } from '../lib/analytics';
|
||||||
import { useStateFilter } from '../hooks/useStateFilter';
|
|
||||||
import {
|
import {
|
||||||
Building2,
|
Building2,
|
||||||
MapPin,
|
MapPin,
|
||||||
@@ -13,7 +12,6 @@ import {
|
|||||||
Search,
|
Search,
|
||||||
TrendingUp,
|
TrendingUp,
|
||||||
BarChart3,
|
BarChart3,
|
||||||
ChevronDown,
|
|
||||||
} from 'lucide-react';
|
} from 'lucide-react';
|
||||||
|
|
||||||
interface BrandData {
|
interface BrandData {
|
||||||
@@ -27,8 +25,6 @@ interface BrandData {
|
|||||||
|
|
||||||
export function IntelligenceBrands() {
|
export function IntelligenceBrands() {
|
||||||
const navigate = useNavigate();
|
const navigate = useNavigate();
|
||||||
const { selectedState, setSelectedState, stateParam, stateLabel, isAllStates } = useStateFilter();
|
|
||||||
const [availableStates, setAvailableStates] = useState<string[]>([]);
|
|
||||||
const [brands, setBrands] = useState<BrandData[]>([]);
|
const [brands, setBrands] = useState<BrandData[]>([]);
|
||||||
const [loading, setLoading] = useState(true);
|
const [loading, setLoading] = useState(true);
|
||||||
const [searchTerm, setSearchTerm] = useState('');
|
const [searchTerm, setSearchTerm] = useState('');
|
||||||
@@ -36,19 +32,12 @@ export function IntelligenceBrands() {
|
|||||||
|
|
||||||
useEffect(() => {
|
useEffect(() => {
|
||||||
loadBrands();
|
loadBrands();
|
||||||
}, [stateParam]);
|
|
||||||
|
|
||||||
useEffect(() => {
|
|
||||||
// Load available states
|
|
||||||
api.getOrchestratorStates().then(data => {
|
|
||||||
setAvailableStates(data.states?.map((s: any) => s.state) || []);
|
|
||||||
}).catch(console.error);
|
|
||||||
}, []);
|
}, []);
|
||||||
|
|
||||||
const loadBrands = async () => {
|
const loadBrands = async () => {
|
||||||
try {
|
try {
|
||||||
setLoading(true);
|
setLoading(true);
|
||||||
const data = await api.getIntelligenceBrands({ limit: 500, state: stateParam });
|
const data = await api.getIntelligenceBrands({ limit: 500 });
|
||||||
setBrands(data.brands || []);
|
setBrands(data.brands || []);
|
||||||
} catch (error) {
|
} catch (error) {
|
||||||
console.error('Failed to load brands:', error);
|
console.error('Failed to load brands:', error);
|
||||||
@@ -180,33 +169,10 @@ export function IntelligenceBrands() {
|
|||||||
|
|
||||||
{/* Top Brands Chart */}
|
{/* Top Brands Chart */}
|
||||||
<div className="bg-white rounded-lg border border-gray-200 p-4">
|
<div className="bg-white rounded-lg border border-gray-200 p-4">
|
||||||
<div className="flex items-center justify-between mb-4">
|
<h3 className="text-lg font-semibold text-gray-900 mb-4 flex items-center gap-2">
|
||||||
<h3 className="text-lg font-semibold text-gray-900 flex items-center gap-2">
|
|
||||||
<BarChart3 className="w-5 h-5 text-blue-500" />
|
<BarChart3 className="w-5 h-5 text-blue-500" />
|
||||||
Top 10 Brands by Store Count
|
Top 10 Brands by Store Count
|
||||||
</h3>
|
</h3>
|
||||||
<div className="dropdown dropdown-end">
|
|
||||||
<button tabIndex={0} className="btn btn-sm btn-outline gap-2">
|
|
||||||
{stateLabel}
|
|
||||||
<ChevronDown className="w-4 h-4" />
|
|
||||||
</button>
|
|
||||||
<ul tabIndex={0} className="dropdown-content z-[1] menu p-2 shadow bg-base-100 rounded-box w-40 max-h-60 overflow-y-auto">
|
|
||||||
<li>
|
|
||||||
<a onClick={() => setSelectedState(null)} className={isAllStates ? 'active' : ''}>
|
|
||||||
All States
|
|
||||||
</a>
|
|
||||||
</li>
|
|
||||||
<li className="divider"></li>
|
|
||||||
{availableStates.map((state) => (
|
|
||||||
<li key={state}>
|
|
||||||
<a onClick={() => setSelectedState(state)} className={selectedState === state ? 'active' : ''}>
|
|
||||||
{state}
|
|
||||||
</a>
|
|
||||||
</li>
|
|
||||||
))}
|
|
||||||
</ul>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
<div className="space-y-2">
|
<div className="space-y-2">
|
||||||
{topBrands.map((brand, idx) => (
|
{topBrands.map((brand, idx) => (
|
||||||
<div key={brand.brandName} className="flex items-center gap-3">
|
<div key={brand.brandName} className="flex items-center gap-3">
|
||||||
|
|||||||
@@ -2,7 +2,6 @@ import { useEffect, useState } from 'react';
|
|||||||
import { useNavigate } from 'react-router-dom';
|
import { useNavigate } from 'react-router-dom';
|
||||||
import { Layout } from '../components/Layout';
|
import { Layout } from '../components/Layout';
|
||||||
import { api } from '../lib/api';
|
import { api } from '../lib/api';
|
||||||
import { useStateFilter } from '../hooks/useStateFilter';
|
|
||||||
import {
|
import {
|
||||||
DollarSign,
|
DollarSign,
|
||||||
Building2,
|
Building2,
|
||||||
@@ -12,7 +11,6 @@ import {
|
|||||||
TrendingUp,
|
TrendingUp,
|
||||||
TrendingDown,
|
TrendingDown,
|
||||||
BarChart3,
|
BarChart3,
|
||||||
ChevronDown,
|
|
||||||
} from 'lucide-react';
|
} from 'lucide-react';
|
||||||
|
|
||||||
interface CategoryPricing {
|
interface CategoryPricing {
|
||||||
@@ -33,27 +31,18 @@ interface OverallPricing {
|
|||||||
|
|
||||||
export function IntelligencePricing() {
|
export function IntelligencePricing() {
|
||||||
const navigate = useNavigate();
|
const navigate = useNavigate();
|
||||||
const { selectedState, setSelectedState, stateParam, stateLabel, isAllStates } = useStateFilter();
|
|
||||||
const [availableStates, setAvailableStates] = useState<string[]>([]);
|
|
||||||
const [categories, setCategories] = useState<CategoryPricing[]>([]);
|
const [categories, setCategories] = useState<CategoryPricing[]>([]);
|
||||||
const [overall, setOverall] = useState<OverallPricing | null>(null);
|
const [overall, setOverall] = useState<OverallPricing | null>(null);
|
||||||
const [loading, setLoading] = useState(true);
|
const [loading, setLoading] = useState(true);
|
||||||
|
|
||||||
useEffect(() => {
|
useEffect(() => {
|
||||||
loadPricing();
|
loadPricing();
|
||||||
}, [stateParam]);
|
|
||||||
|
|
||||||
useEffect(() => {
|
|
||||||
// Load available states
|
|
||||||
api.getOrchestratorStates().then(data => {
|
|
||||||
setAvailableStates(data.states?.map((s: any) => s.state) || []);
|
|
||||||
}).catch(console.error);
|
|
||||||
}, []);
|
}, []);
|
||||||
|
|
||||||
const loadPricing = async () => {
|
const loadPricing = async () => {
|
||||||
try {
|
try {
|
||||||
setLoading(true);
|
setLoading(true);
|
||||||
const data = await api.getIntelligencePricing({ state: stateParam });
|
const data = await api.getIntelligencePricing();
|
||||||
setCategories(data.byCategory || []);
|
setCategories(data.byCategory || []);
|
||||||
setOverall(data.overall || null);
|
setOverall(data.overall || null);
|
||||||
} catch (error) {
|
} catch (error) {
|
||||||
@@ -95,27 +84,6 @@ export function IntelligencePricing() {
|
|||||||
</p>
|
</p>
|
||||||
</div>
|
</div>
|
||||||
<div className="flex gap-2">
|
<div className="flex gap-2">
|
||||||
<div className="dropdown dropdown-end">
|
|
||||||
<button tabIndex={0} className="btn btn-sm btn-outline gap-2">
|
|
||||||
{stateLabel}
|
|
||||||
<ChevronDown className="w-4 h-4" />
|
|
||||||
</button>
|
|
||||||
<ul tabIndex={0} className="dropdown-content z-[1] menu p-2 shadow bg-base-100 rounded-box w-40 max-h-60 overflow-y-auto">
|
|
||||||
<li>
|
|
||||||
<a onClick={() => setSelectedState(null)} className={isAllStates ? 'active' : ''}>
|
|
||||||
All States
|
|
||||||
</a>
|
|
||||||
</li>
|
|
||||||
<li className="divider"></li>
|
|
||||||
{availableStates.map((state) => (
|
|
||||||
<li key={state}>
|
|
||||||
<a onClick={() => setSelectedState(state)} className={selectedState === state ? 'active' : ''}>
|
|
||||||
{state}
|
|
||||||
</a>
|
|
||||||
</li>
|
|
||||||
))}
|
|
||||||
</ul>
|
|
||||||
</div>
|
|
||||||
<button
|
<button
|
||||||
onClick={() => navigate('/admin/intelligence/brands')}
|
onClick={() => navigate('/admin/intelligence/brands')}
|
||||||
className="btn btn-sm btn-outline gap-1"
|
className="btn btn-sm btn-outline gap-1"
|
||||||
@@ -182,7 +150,7 @@ export function IntelligencePricing() {
|
|||||||
<div>
|
<div>
|
||||||
<p className="text-sm text-gray-500">Products Priced</p>
|
<p className="text-sm text-gray-500">Products Priced</p>
|
||||||
<p className="text-2xl font-bold">
|
<p className="text-2xl font-bold">
|
||||||
{(overall.totalProducts || 0).toLocaleString()}
|
{overall.totalProducts.toLocaleString()}
|
||||||
</p>
|
</p>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
@@ -196,29 +164,43 @@ export function IntelligencePricing() {
|
|||||||
<BarChart3 className="w-5 h-5 text-green-500" />
|
<BarChart3 className="w-5 h-5 text-green-500" />
|
||||||
Average Price by Category
|
Average Price by Category
|
||||||
</h3>
|
</h3>
|
||||||
<div className="space-y-2">
|
<div className="space-y-3">
|
||||||
{sortedCategories.slice(0, 12).map((cat) => {
|
{sortedCategories.map((cat) => (
|
||||||
const maxPrice = Math.max(...sortedCategories.map(c => c.avgPrice || 0), 1);
|
|
||||||
const barWidth = Math.min(((cat.avgPrice || 0) / maxPrice) * 100, 100);
|
|
||||||
return (
|
|
||||||
<div key={cat.category} className="flex items-center gap-3">
|
<div key={cat.category} className="flex items-center gap-3">
|
||||||
<span className="text-sm font-medium w-28 truncate shrink-0" title={cat.category}>
|
<span className="text-sm font-medium w-32 truncate" title={cat.category}>
|
||||||
{cat.category || 'Unknown'}
|
{cat.category || 'Unknown'}
|
||||||
</span>
|
</span>
|
||||||
<div className="flex-1 min-w-0">
|
<div className="flex-1 relative">
|
||||||
<div className="bg-gray-100 rounded h-5 overflow-hidden">
|
{/* Price range bar */}
|
||||||
|
<div className="bg-gray-100 rounded-full h-6 relative">
|
||||||
|
{/* Min-Max range */}
|
||||||
<div
|
<div
|
||||||
className="bg-gradient-to-r from-emerald-400 to-emerald-500 h-5 rounded transition-all"
|
className="absolute top-0 h-6 bg-blue-100 rounded-full"
|
||||||
style={{ width: `${barWidth}%` }}
|
style={{
|
||||||
|
left: `${(cat.minPrice / (overall?.maxPrice || 100)) * 100}%`,
|
||||||
|
width: `${((cat.maxPrice - cat.minPrice) / (overall?.maxPrice || 100)) * 100}%`,
|
||||||
|
}}
|
||||||
|
/>
|
||||||
|
{/* Average marker */}
|
||||||
|
<div
|
||||||
|
className="absolute top-0 h-6 w-1 bg-green-500 rounded"
|
||||||
|
style={{ left: `${(cat.avgPrice / (overall?.maxPrice || 100)) * 100}%` }}
|
||||||
/>
|
/>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
<span className="text-sm font-mono font-semibold text-emerald-600 w-16 text-right shrink-0">
|
<div className="flex gap-4 text-xs w-48">
|
||||||
{formatPrice(cat.avgPrice)}
|
<span className="text-gray-500">
|
||||||
|
Min: <span className="text-blue-600 font-mono">{formatPrice(cat.minPrice)}</span>
|
||||||
|
</span>
|
||||||
|
<span className="text-gray-500">
|
||||||
|
Avg: <span className="text-green-600 font-mono font-bold">{formatPrice(cat.avgPrice)}</span>
|
||||||
|
</span>
|
||||||
|
<span className="text-gray-500">
|
||||||
|
Max: <span className="text-orange-600 font-mono">{formatPrice(cat.maxPrice)}</span>
|
||||||
</span>
|
</span>
|
||||||
</div>
|
</div>
|
||||||
);
|
</div>
|
||||||
})}
|
))}
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
@@ -254,7 +236,7 @@ export function IntelligencePricing() {
|
|||||||
<span className="font-medium">{cat.category || 'Unknown'}</span>
|
<span className="font-medium">{cat.category || 'Unknown'}</span>
|
||||||
</td>
|
</td>
|
||||||
<td className="text-center">
|
<td className="text-center">
|
||||||
<span className="font-mono">{(cat.productCount || 0).toLocaleString()}</span>
|
<span className="font-mono">{cat.productCount.toLocaleString()}</span>
|
||||||
</td>
|
</td>
|
||||||
<td className="text-right">
|
<td className="text-right">
|
||||||
<span className="font-mono text-blue-600">{formatPrice(cat.minPrice)}</span>
|
<span className="font-mono text-blue-600">{formatPrice(cat.minPrice)}</span>
|
||||||
|
|||||||
@@ -47,11 +47,10 @@ export function IntelligenceStores() {
|
|||||||
state: stateParam,
|
state: stateParam,
|
||||||
limit: 500,
|
limit: 500,
|
||||||
});
|
});
|
||||||
const storeList = data.stores || [];
|
setStores(data.stores || []);
|
||||||
setStores(storeList);
|
|
||||||
|
|
||||||
// Extract unique states from response for dropdown counts
|
// Extract unique states from response for dropdown counts
|
||||||
const uniqueStates = [...new Set(storeList.map((s: StoreActivity) => s.state))].filter(Boolean).sort() as string[];
|
const uniqueStates = [...new Set(data.stores.map((s: StoreActivity) => s.state))].sort();
|
||||||
setLocalStates(uniqueStates);
|
setLocalStates(uniqueStates);
|
||||||
} catch (error) {
|
} catch (error) {
|
||||||
console.error('Failed to load stores:', error);
|
console.error('Failed to load stores:', error);
|
||||||
@@ -98,12 +97,12 @@ export function IntelligenceStores() {
|
|||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Calculate stats with null safety
|
// Calculate stats
|
||||||
const totalSKUs = stores.reduce((sum, s) => sum + (s.skuCount || 0), 0);
|
const totalSKUs = stores.reduce((sum, s) => sum + s.skuCount, 0);
|
||||||
const totalSnapshots = stores.reduce((sum, s) => sum + (s.snapshotCount || 0), 0);
|
const totalSnapshots = stores.reduce((sum, s) => sum + s.snapshotCount, 0);
|
||||||
const storesWithFrequency = stores.filter(s => s.crawlFrequencyHours != null);
|
const avgFrequency = stores.filter(s => s.crawlFrequencyHours).length > 0
|
||||||
const avgFrequency = storesWithFrequency.length > 0
|
? stores.filter(s => s.crawlFrequencyHours).reduce((sum, s) => sum + (s.crawlFrequencyHours || 0), 0) /
|
||||||
? storesWithFrequency.reduce((sum, s) => sum + (s.crawlFrequencyHours || 0), 0) / storesWithFrequency.length
|
stores.filter(s => s.crawlFrequencyHours).length
|
||||||
: 0;
|
: 0;
|
||||||
|
|
||||||
return (
|
return (
|
||||||
@@ -263,10 +262,10 @@ export function IntelligenceStores() {
|
|||||||
)}
|
)}
|
||||||
</td>
|
</td>
|
||||||
<td className="text-center">
|
<td className="text-center">
|
||||||
<span className="font-mono">{(store.skuCount || 0).toLocaleString()}</span>
|
<span className="font-mono">{store.skuCount.toLocaleString()}</span>
|
||||||
</td>
|
</td>
|
||||||
<td className="text-center">
|
<td className="text-center">
|
||||||
<span className="font-mono">{(store.snapshotCount || 0).toLocaleString()}</span>
|
<span className="font-mono">{store.snapshotCount.toLocaleString()}</span>
|
||||||
</td>
|
</td>
|
||||||
<td>
|
<td>
|
||||||
<span className={store.lastCrawl ? 'text-green-600' : 'text-gray-400'}>
|
<span className={store.lastCrawl ? 'text-green-600' : 'text-gray-400'}>
|
||||||
|
|||||||
@@ -8,6 +8,7 @@
|
|||||||
import { useState, useEffect } from 'react';
|
import { useState, useEffect } from 'react';
|
||||||
import { useNavigate } from 'react-router-dom';
|
import { useNavigate } from 'react-router-dom';
|
||||||
import { Layout } from '../components/Layout';
|
import { Layout } from '../components/Layout';
|
||||||
|
import { StateBadge } from '../components/StateSelector';
|
||||||
import { useStateStore } from '../store/stateStore';
|
import { useStateStore } from '../store/stateStore';
|
||||||
import { api } from '../lib/api';
|
import { api } from '../lib/api';
|
||||||
import {
|
import {
|
||||||
@@ -285,6 +286,7 @@ export default function NationalDashboard() {
|
|||||||
</p>
|
</p>
|
||||||
</div>
|
</div>
|
||||||
<div className="flex items-center gap-3">
|
<div className="flex items-center gap-3">
|
||||||
|
<StateBadge />
|
||||||
<button
|
<button
|
||||||
onClick={handleRefreshMetrics}
|
onClick={handleRefreshMetrics}
|
||||||
disabled={refreshing}
|
disabled={refreshing}
|
||||||
@@ -301,7 +303,7 @@ export default function NationalDashboard() {
|
|||||||
<>
|
<>
|
||||||
<div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-4 gap-4">
|
<div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-4 gap-4">
|
||||||
<MetricCard
|
<MetricCard
|
||||||
title="States"
|
title="Active States"
|
||||||
value={summary.activeStates}
|
value={summary.activeStates}
|
||||||
icon={Globe}
|
icon={Globe}
|
||||||
/>
|
/>
|
||||||
|
|||||||
@@ -14,9 +14,8 @@ import {
|
|||||||
ChevronUp,
|
ChevronUp,
|
||||||
Gauge,
|
Gauge,
|
||||||
Users,
|
Users,
|
||||||
Power,
|
Calendar,
|
||||||
Play,
|
Zap,
|
||||||
Square,
|
|
||||||
} from 'lucide-react';
|
} from 'lucide-react';
|
||||||
|
|
||||||
interface Task {
|
interface Task {
|
||||||
@@ -83,27 +82,6 @@ const STATUS_COLORS: Record<string, string> = {
|
|||||||
stale: 'bg-gray-100 text-gray-800',
|
stale: 'bg-gray-100 text-gray-800',
|
||||||
};
|
};
|
||||||
|
|
||||||
const getStatusIcon = (status: string, poolPaused: boolean): React.ReactNode => {
|
|
||||||
switch (status) {
|
|
||||||
case 'pending':
|
|
||||||
return <Clock className="w-4 h-4" />;
|
|
||||||
case 'claimed':
|
|
||||||
return <PlayCircle className="w-4 h-4" />;
|
|
||||||
case 'running':
|
|
||||||
// Don't spin when pool is paused
|
|
||||||
return <RefreshCw className={`w-4 h-4 ${!poolPaused ? 'animate-spin' : ''}`} />;
|
|
||||||
case 'completed':
|
|
||||||
return <CheckCircle2 className="w-4 h-4" />;
|
|
||||||
case 'failed':
|
|
||||||
return <XCircle className="w-4 h-4" />;
|
|
||||||
case 'stale':
|
|
||||||
return <AlertTriangle className="w-4 h-4" />;
|
|
||||||
default:
|
|
||||||
return null;
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
// Static version for summary cards (always shows animation)
|
|
||||||
const STATUS_ICONS: Record<string, React.ReactNode> = {
|
const STATUS_ICONS: Record<string, React.ReactNode> = {
|
||||||
pending: <Clock className="w-4 h-4" />,
|
pending: <Clock className="w-4 h-4" />,
|
||||||
claimed: <PlayCircle className="w-4 h-4" />,
|
claimed: <PlayCircle className="w-4 h-4" />,
|
||||||
@@ -138,8 +116,6 @@ export default function TasksDashboard() {
|
|||||||
const [capacity, setCapacity] = useState<CapacityMetric[]>([]);
|
const [capacity, setCapacity] = useState<CapacityMetric[]>([]);
|
||||||
const [loading, setLoading] = useState(true);
|
const [loading, setLoading] = useState(true);
|
||||||
const [error, setError] = useState<string | null>(null);
|
const [error, setError] = useState<string | null>(null);
|
||||||
const [poolPaused, setPoolPaused] = useState(false);
|
|
||||||
const [poolLoading, setPoolLoading] = useState(false);
|
|
||||||
|
|
||||||
// Filters
|
// Filters
|
||||||
const [roleFilter, setRoleFilter] = useState<string>('');
|
const [roleFilter, setRoleFilter] = useState<string>('');
|
||||||
@@ -147,10 +123,13 @@ export default function TasksDashboard() {
|
|||||||
const [searchQuery, setSearchQuery] = useState('');
|
const [searchQuery, setSearchQuery] = useState('');
|
||||||
const [showCapacity, setShowCapacity] = useState(true);
|
const [showCapacity, setShowCapacity] = useState(true);
|
||||||
|
|
||||||
|
// Actions
|
||||||
|
const [actionLoading, setActionLoading] = useState(false);
|
||||||
|
const [actionMessage, setActionMessage] = useState<string | null>(null);
|
||||||
|
|
||||||
const fetchData = async () => {
|
const fetchData = async () => {
|
||||||
try {
|
try {
|
||||||
const [tasksRes, countsRes, capacityRes, poolStatus] = await Promise.all([
|
const [tasksRes, countsRes, capacityRes] = await Promise.all([
|
||||||
api.getTasks({
|
api.getTasks({
|
||||||
role: roleFilter || undefined,
|
role: roleFilter || undefined,
|
||||||
status: statusFilter || undefined,
|
status: statusFilter || undefined,
|
||||||
@@ -158,13 +137,11 @@ export default function TasksDashboard() {
|
|||||||
}),
|
}),
|
||||||
api.getTaskCounts(),
|
api.getTaskCounts(),
|
||||||
api.getTaskCapacity(),
|
api.getTaskCapacity(),
|
||||||
api.getTaskPoolStatus(),
|
|
||||||
]);
|
]);
|
||||||
|
|
||||||
setTasks(tasksRes.tasks || []);
|
setTasks(tasksRes.tasks || []);
|
||||||
setCounts(countsRes);
|
setCounts(countsRes);
|
||||||
setCapacity(capacityRes.metrics || []);
|
setCapacity(capacityRes.metrics || []);
|
||||||
setPoolPaused(poolStatus.paused);
|
|
||||||
setError(null);
|
setError(null);
|
||||||
} catch (err: any) {
|
} catch (err: any) {
|
||||||
setError(err.message || 'Failed to load tasks');
|
setError(err.message || 'Failed to load tasks');
|
||||||
@@ -173,28 +150,39 @@ export default function TasksDashboard() {
|
|||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
const togglePool = async () => {
|
useEffect(() => {
|
||||||
setPoolLoading(true);
|
fetchData();
|
||||||
|
const interval = setInterval(fetchData, 10000); // Refresh every 10 seconds
|
||||||
|
return () => clearInterval(interval);
|
||||||
|
}, [roleFilter, statusFilter]);
|
||||||
|
|
||||||
|
const handleGenerateResync = async () => {
|
||||||
|
setActionLoading(true);
|
||||||
try {
|
try {
|
||||||
if (poolPaused) {
|
const result = await api.generateResyncTasks();
|
||||||
await api.resumeTaskPool();
|
setActionMessage(`Generated ${result.tasks_created} resync tasks`);
|
||||||
setPoolPaused(false);
|
fetchData();
|
||||||
} else {
|
|
||||||
await api.pauseTaskPool();
|
|
||||||
setPoolPaused(true);
|
|
||||||
}
|
|
||||||
} catch (err: any) {
|
} catch (err: any) {
|
||||||
setError(err.message || 'Failed to toggle pool');
|
setActionMessage(`Error: ${err.message}`);
|
||||||
} finally {
|
} finally {
|
||||||
setPoolLoading(false);
|
setActionLoading(false);
|
||||||
|
setTimeout(() => setActionMessage(null), 5000);
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
useEffect(() => {
|
const handleRecoverStale = async () => {
|
||||||
|
setActionLoading(true);
|
||||||
|
try {
|
||||||
|
const result = await api.recoverStaleTasks();
|
||||||
|
setActionMessage(`Recovered ${result.tasks_recovered} stale tasks`);
|
||||||
fetchData();
|
fetchData();
|
||||||
const interval = setInterval(fetchData, 15000); // Auto-refresh every 15 seconds
|
} catch (err: any) {
|
||||||
return () => clearInterval(interval);
|
setActionMessage(`Error: ${err.message}`);
|
||||||
}, [roleFilter, statusFilter]);
|
} finally {
|
||||||
|
setActionLoading(false);
|
||||||
|
setTimeout(() => setActionMessage(null), 5000);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
const filteredTasks = tasks.filter((task) => {
|
const filteredTasks = tasks.filter((task) => {
|
||||||
if (searchQuery) {
|
if (searchQuery) {
|
||||||
@@ -237,32 +225,45 @@ export default function TasksDashboard() {
|
|||||||
</p>
|
</p>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div className="flex items-center gap-4">
|
<div className="flex gap-2">
|
||||||
{/* Pool Toggle */}
|
|
||||||
<button
|
<button
|
||||||
onClick={togglePool}
|
onClick={handleGenerateResync}
|
||||||
disabled={poolLoading}
|
disabled={actionLoading}
|
||||||
className={`flex items-center gap-2 px-4 py-2 rounded-lg font-medium transition-colors ${
|
className="flex items-center gap-2 px-4 py-2 bg-emerald-600 text-white rounded-lg hover:bg-emerald-700 disabled:opacity-50"
|
||||||
poolPaused
|
>
|
||||||
? 'bg-emerald-100 text-emerald-700 hover:bg-emerald-200'
|
<Calendar className="w-4 h-4" />
|
||||||
: 'bg-red-100 text-red-700 hover:bg-red-200'
|
Generate Resync
|
||||||
|
</button>
|
||||||
|
<button
|
||||||
|
onClick={handleRecoverStale}
|
||||||
|
disabled={actionLoading}
|
||||||
|
className="flex items-center gap-2 px-4 py-2 bg-gray-600 text-white rounded-lg hover:bg-gray-700 disabled:opacity-50"
|
||||||
|
>
|
||||||
|
<Zap className="w-4 h-4" />
|
||||||
|
Recover Stale
|
||||||
|
</button>
|
||||||
|
<button
|
||||||
|
onClick={fetchData}
|
||||||
|
className="flex items-center gap-2 px-4 py-2 bg-gray-100 text-gray-700 rounded-lg hover:bg-gray-200"
|
||||||
|
>
|
||||||
|
<RefreshCw className="w-4 h-4" />
|
||||||
|
Refresh
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{/* Action Message */}
|
||||||
|
{actionMessage && (
|
||||||
|
<div
|
||||||
|
className={`p-4 rounded-lg ${
|
||||||
|
actionMessage.startsWith('Error')
|
||||||
|
? 'bg-red-50 text-red-700'
|
||||||
|
: 'bg-green-50 text-green-700'
|
||||||
}`}
|
}`}
|
||||||
>
|
>
|
||||||
{poolPaused ? (
|
{actionMessage}
|
||||||
<>
|
</div>
|
||||||
<Play className={`w-5 h-5 ${poolLoading ? 'animate-pulse' : ''}`} />
|
|
||||||
Start Pool
|
|
||||||
</>
|
|
||||||
) : (
|
|
||||||
<>
|
|
||||||
<Square className={`w-5 h-5 ${poolLoading ? 'animate-pulse' : ''}`} />
|
|
||||||
Stop Pool
|
|
||||||
</>
|
|
||||||
)}
|
)}
|
||||||
</button>
|
|
||||||
<span className="text-sm text-gray-400">Auto-refreshes every 15s</span>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
{error && (
|
{error && (
|
||||||
<div className="p-4 bg-red-50 text-red-700 rounded-lg">{error}</div>
|
<div className="p-4 bg-red-50 text-red-700 rounded-lg">{error}</div>
|
||||||
@@ -495,7 +496,7 @@ export default function TasksDashboard() {
|
|||||||
STATUS_COLORS[task.status]
|
STATUS_COLORS[task.status]
|
||||||
}`}
|
}`}
|
||||||
>
|
>
|
||||||
{getStatusIcon(task.status, poolPaused)}
|
{STATUS_ICONS[task.status]}
|
||||||
{task.status}
|
{task.status}
|
||||||
</span>
|
</span>
|
||||||
</td>
|
</td>
|
||||||
|
|||||||
@@ -18,9 +18,6 @@ import {
|
|||||||
Server,
|
Server,
|
||||||
MapPin,
|
MapPin,
|
||||||
Trash2,
|
Trash2,
|
||||||
Plus,
|
|
||||||
Minus,
|
|
||||||
Loader2,
|
|
||||||
} from 'lucide-react';
|
} from 'lucide-react';
|
||||||
|
|
||||||
// Worker from registry
|
// Worker from registry
|
||||||
@@ -72,14 +69,6 @@ interface Task {
|
|||||||
worker_id: string | null;
|
worker_id: string | null;
|
||||||
}
|
}
|
||||||
|
|
||||||
// K8s replica info (added 2024-12-10)
|
|
||||||
interface K8sReplicas {
|
|
||||||
current: number;
|
|
||||||
desired: number;
|
|
||||||
available: number;
|
|
||||||
updated: number;
|
|
||||||
}
|
|
||||||
|
|
||||||
function formatRelativeTime(dateStr: string | null): string {
|
function formatRelativeTime(dateStr: string | null): string {
|
||||||
if (!dateStr) return '-';
|
if (!dateStr) return '-';
|
||||||
const date = new Date(dateStr);
|
const date = new Date(dateStr);
|
||||||
@@ -226,53 +215,10 @@ export function WorkersDashboard() {
|
|||||||
const [loading, setLoading] = useState(true);
|
const [loading, setLoading] = useState(true);
|
||||||
const [error, setError] = useState<string | null>(null);
|
const [error, setError] = useState<string | null>(null);
|
||||||
|
|
||||||
// K8s scaling state (added 2024-12-10)
|
|
||||||
const [k8sReplicas, setK8sReplicas] = useState<K8sReplicas | null>(null);
|
|
||||||
const [k8sError, setK8sError] = useState<string | null>(null);
|
|
||||||
const [scaling, setScaling] = useState(false);
|
|
||||||
const [targetReplicas, setTargetReplicas] = useState<number | null>(null);
|
|
||||||
|
|
||||||
// Pagination
|
// Pagination
|
||||||
const [page, setPage] = useState(0);
|
const [page, setPage] = useState(0);
|
||||||
const workersPerPage = 15;
|
const workersPerPage = 15;
|
||||||
|
|
||||||
// Fetch K8s replica count (added 2024-12-10)
|
|
||||||
const fetchK8sReplicas = useCallback(async () => {
|
|
||||||
try {
|
|
||||||
const res = await api.get('/api/workers/k8s/replicas');
|
|
||||||
if (res.data.success && res.data.replicas) {
|
|
||||||
setK8sReplicas(res.data.replicas);
|
|
||||||
if (targetReplicas === null) {
|
|
||||||
setTargetReplicas(res.data.replicas.desired);
|
|
||||||
}
|
|
||||||
setK8sError(null);
|
|
||||||
}
|
|
||||||
} catch (err: any) {
|
|
||||||
// K8s not available (local dev or no RBAC)
|
|
||||||
setK8sError(err.response?.data?.error || 'K8s not available');
|
|
||||||
setK8sReplicas(null);
|
|
||||||
}
|
|
||||||
}, [targetReplicas]);
|
|
||||||
|
|
||||||
// Scale workers (added 2024-12-10)
|
|
||||||
const handleScale = useCallback(async (replicas: number) => {
|
|
||||||
if (replicas < 0 || replicas > 20) return;
|
|
||||||
setScaling(true);
|
|
||||||
try {
|
|
||||||
const res = await api.post('/api/workers/k8s/scale', { replicas });
|
|
||||||
if (res.data.success) {
|
|
||||||
setTargetReplicas(replicas);
|
|
||||||
// Refresh after a short delay to see the change
|
|
||||||
setTimeout(fetchK8sReplicas, 1000);
|
|
||||||
}
|
|
||||||
} catch (err: any) {
|
|
||||||
console.error('Scale error:', err);
|
|
||||||
setK8sError(err.response?.data?.error || 'Failed to scale');
|
|
||||||
} finally {
|
|
||||||
setScaling(false);
|
|
||||||
}
|
|
||||||
}, [fetchK8sReplicas]);
|
|
||||||
|
|
||||||
const fetchData = useCallback(async () => {
|
const fetchData = useCallback(async () => {
|
||||||
try {
|
try {
|
||||||
// Fetch workers from registry
|
// Fetch workers from registry
|
||||||
@@ -315,14 +261,9 @@ export function WorkersDashboard() {
|
|||||||
|
|
||||||
useEffect(() => {
|
useEffect(() => {
|
||||||
fetchData();
|
fetchData();
|
||||||
fetchK8sReplicas(); // Added 2024-12-10
|
|
||||||
const interval = setInterval(fetchData, 5000);
|
const interval = setInterval(fetchData, 5000);
|
||||||
const k8sInterval = setInterval(fetchK8sReplicas, 10000); // K8s refresh every 10s
|
return () => clearInterval(interval);
|
||||||
return () => {
|
}, [fetchData]);
|
||||||
clearInterval(interval);
|
|
||||||
clearInterval(k8sInterval);
|
|
||||||
};
|
|
||||||
}, [fetchData, fetchK8sReplicas]);
|
|
||||||
|
|
||||||
// Paginated workers
|
// Paginated workers
|
||||||
const paginatedWorkers = workers.slice(
|
const paginatedWorkers = workers.slice(
|
||||||
@@ -389,68 +330,6 @@ export function WorkersDashboard() {
|
|||||||
</div>
|
</div>
|
||||||
)}
|
)}
|
||||||
|
|
||||||
{/* K8s Scaling Card (added 2024-12-10) */}
|
|
||||||
{k8sReplicas && (
|
|
||||||
<div className="bg-white rounded-lg border border-gray-200 p-4">
|
|
||||||
<div className="flex items-center justify-between">
|
|
||||||
<div className="flex items-center gap-3">
|
|
||||||
<div className="w-10 h-10 bg-purple-100 rounded-lg flex items-center justify-center">
|
|
||||||
<Server className="w-5 h-5 text-purple-600" />
|
|
||||||
</div>
|
|
||||||
<div>
|
|
||||||
<p className="text-sm text-gray-500">K8s Worker Pods</p>
|
|
||||||
<p className="text-xl font-semibold">
|
|
||||||
{k8sReplicas.current} / {k8sReplicas.desired}
|
|
||||||
{k8sReplicas.current !== k8sReplicas.desired && (
|
|
||||||
<span className="text-sm font-normal text-yellow-600 ml-2">scaling...</span>
|
|
||||||
)}
|
|
||||||
</p>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
<div className="flex items-center gap-2">
|
|
||||||
<button
|
|
||||||
onClick={() => handleScale((targetReplicas || k8sReplicas.desired) - 1)}
|
|
||||||
disabled={scaling || (targetReplicas || k8sReplicas.desired) <= 0}
|
|
||||||
className="w-8 h-8 flex items-center justify-center bg-gray-100 text-gray-700 rounded-lg hover:bg-gray-200 disabled:opacity-50 disabled:cursor-not-allowed transition-colors"
|
|
||||||
title="Scale down"
|
|
||||||
>
|
|
||||||
<Minus className="w-4 h-4" />
|
|
||||||
</button>
|
|
||||||
<input
|
|
||||||
type="number"
|
|
||||||
min="0"
|
|
||||||
max="20"
|
|
||||||
value={targetReplicas ?? k8sReplicas.desired}
|
|
||||||
onChange={(e) => setTargetReplicas(Math.max(0, Math.min(20, parseInt(e.target.value) || 0)))}
|
|
||||||
onBlur={() => {
|
|
||||||
if (targetReplicas !== null && targetReplicas !== k8sReplicas.desired) {
|
|
||||||
handleScale(targetReplicas);
|
|
||||||
}
|
|
||||||
}}
|
|
||||||
onKeyDown={(e) => {
|
|
||||||
if (e.key === 'Enter' && targetReplicas !== null && targetReplicas !== k8sReplicas.desired) {
|
|
||||||
handleScale(targetReplicas);
|
|
||||||
}
|
|
||||||
}}
|
|
||||||
className="w-16 text-center border border-gray-300 rounded-lg px-2 py-1 text-lg font-semibold"
|
|
||||||
/>
|
|
||||||
<button
|
|
||||||
onClick={() => handleScale((targetReplicas || k8sReplicas.desired) + 1)}
|
|
||||||
disabled={scaling || (targetReplicas || k8sReplicas.desired) >= 20}
|
|
||||||
className="w-8 h-8 flex items-center justify-center bg-gray-100 text-gray-700 rounded-lg hover:bg-gray-200 disabled:opacity-50 disabled:cursor-not-allowed transition-colors"
|
|
||||||
title="Scale up"
|
|
||||||
>
|
|
||||||
<Plus className="w-4 h-4" />
|
|
||||||
</button>
|
|
||||||
{scaling && <Loader2 className="w-4 h-4 text-purple-600 animate-spin ml-2" />}
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
{k8sError && (
|
|
||||||
<p className="text-xs text-red-500 mt-2">{k8sError}</p>
|
|
||||||
)}
|
|
||||||
</div>
|
|
||||||
)}
|
|
||||||
|
|
||||||
{/* Stats Cards */}
|
{/* Stats Cards */}
|
||||||
<div className="grid grid-cols-5 gap-4">
|
<div className="grid grid-cols-5 gap-4">
|
||||||
<div className="bg-white rounded-lg border border-gray-200 p-4">
|
<div className="bg-white rounded-lg border border-gray-200 p-4">
|
||||||
|
|||||||
@@ -1,67 +1,4 @@
|
|||||||
# Task Worker Deployment
|
# Task Worker Pods
|
||||||
#
|
|
||||||
# Simple Deployment that runs task-worker.js to process tasks from worker_tasks queue.
|
|
||||||
# Workers pull tasks using DB-level locking (FOR UPDATE SKIP LOCKED).
|
|
||||||
#
|
|
||||||
# The worker will wait up to 60 minutes for active proxies to be added before failing.
|
|
||||||
# This allows deployment to succeed even if proxies aren't configured yet.
|
|
||||||
---
|
|
||||||
apiVersion: apps/v1
|
|
||||||
kind: Deployment
|
|
||||||
metadata:
|
|
||||||
name: scraper-worker
|
|
||||||
namespace: dispensary-scraper
|
|
||||||
spec:
|
|
||||||
replicas: 5
|
|
||||||
selector:
|
|
||||||
matchLabels:
|
|
||||||
app: scraper-worker
|
|
||||||
template:
|
|
||||||
metadata:
|
|
||||||
labels:
|
|
||||||
app: scraper-worker
|
|
||||||
spec:
|
|
||||||
imagePullSecrets:
|
|
||||||
- name: regcred
|
|
||||||
containers:
|
|
||||||
- name: worker
|
|
||||||
image: code.cannabrands.app/creationshop/dispensary-scraper:latest
|
|
||||||
command: ["node"]
|
|
||||||
args: ["dist/tasks/task-worker.js"]
|
|
||||||
envFrom:
|
|
||||||
- configMapRef:
|
|
||||||
name: scraper-config
|
|
||||||
- secretRef:
|
|
||||||
name: scraper-secrets
|
|
||||||
env:
|
|
||||||
- name: WORKER_MODE
|
|
||||||
value: "true"
|
|
||||||
- name: POD_NAME
|
|
||||||
valueFrom:
|
|
||||||
fieldRef:
|
|
||||||
fieldPath: metadata.name
|
|
||||||
resources:
|
|
||||||
requests:
|
|
||||||
memory: "256Mi"
|
|
||||||
cpu: "100m"
|
|
||||||
limits:
|
|
||||||
memory: "512Mi"
|
|
||||||
cpu: "500m"
|
|
||||||
livenessProbe:
|
|
||||||
exec:
|
|
||||||
command:
|
|
||||||
- /bin/sh
|
|
||||||
- -c
|
|
||||||
- "pgrep -f 'task-worker' > /dev/null"
|
|
||||||
initialDelaySeconds: 60
|
|
||||||
periodSeconds: 30
|
|
||||||
failureThreshold: 3
|
|
||||||
terminationGracePeriodSeconds: 60
|
|
||||||
---
|
|
||||||
# =============================================================================
|
|
||||||
# ALTERNATIVE: StatefulSet with multiple workers per pod (not currently used)
|
|
||||||
# =============================================================================
|
|
||||||
# Task Worker Pods (StatefulSet)
|
|
||||||
# Each pod runs 5 role-agnostic workers that pull tasks from worker_tasks queue.
|
# Each pod runs 5 role-agnostic workers that pull tasks from worker_tasks queue.
|
||||||
#
|
#
|
||||||
# Architecture:
|
# Architecture:
|
||||||
|
|||||||
@@ -1,365 +0,0 @@
|
|||||||
# Workflow Documentation - December 10, 2025
|
|
||||||
|
|
||||||
## Purpose
|
|
||||||
|
|
||||||
This document captures the intended behavior for the CannaiQ crawl system, specifically around proxy rotation, fingerprinting, and anti-detection.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Stealth & Anti-Detection Requirements
|
|
||||||
|
|
||||||
### 1. Task Determines Work, Proxy Determines Identity
|
|
||||||
|
|
||||||
The task payload contains:
|
|
||||||
- `dispensary_id` - which store to crawl
|
|
||||||
- `role` - what type of work (product_resync, entry_point_discovery, etc.)
|
|
||||||
|
|
||||||
The **proxy** determines the session identity:
|
|
||||||
- Proxy location (city, state, timezone) → sets Accept-Language and timezone headers
|
|
||||||
- Language is always English (`en-US`)
|
|
||||||
|
|
||||||
**Flow:**
|
|
||||||
```
|
|
||||||
Task claimed
|
|
||||||
│
|
|
||||||
└─► Get proxy from rotation
|
|
||||||
│
|
|
||||||
└─► Proxy has location (city, state, timezone)
|
|
||||||
│
|
|
||||||
└─► Build headers using proxy's timezone
|
|
||||||
- Accept-Language: en-US,en;q=0.9
|
|
||||||
- Timezone-consistent behavior
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2. On 403 Block - Immediate Backoff
|
|
||||||
|
|
||||||
When a 403 is received:
|
|
||||||
|
|
||||||
1. **Immediately** stop using current IP
|
|
||||||
2. Get a new proxy (new IP)
|
|
||||||
3. Get a new UA/fingerprint
|
|
||||||
4. Retry the request
|
|
||||||
|
|
||||||
**Per-proxy failure tracking:**
|
|
||||||
- Track UA rotation attempts per proxy
|
|
||||||
- After 3 UA/fingerprint rotations on the same proxy → disable that proxy
|
|
||||||
- This means: if we rotate UA 3 times and still get 403, the proxy is burned
|
|
||||||
|
|
||||||
### 3. Fingerprint Rotation Rules
|
|
||||||
|
|
||||||
Each request uses:
|
|
||||||
- Proxy (IP)
|
|
||||||
- User-Agent
|
|
||||||
- sec-ch-ua headers (Client Hints)
|
|
||||||
- Accept-Language (from proxy location)
|
|
||||||
|
|
||||||
On 403:
|
|
||||||
1. Record failure on current proxy
|
|
||||||
2. Rotate to new proxy
|
|
||||||
3. Pick new random fingerprint
|
|
||||||
4. If same proxy fails 3 times with different fingerprints → disable proxy
|
|
||||||
|
|
||||||
### 4. Proxy Table Schema
|
|
||||||
|
|
||||||
```sql
|
|
||||||
CREATE TABLE proxies (
|
|
||||||
id SERIAL PRIMARY KEY,
|
|
||||||
host VARCHAR(255) NOT NULL,
|
|
||||||
port INTEGER NOT NULL,
|
|
||||||
username VARCHAR(100),
|
|
||||||
password VARCHAR(100),
|
|
||||||
protocol VARCHAR(10) DEFAULT 'http',
|
|
||||||
active BOOLEAN DEFAULT true,
|
|
||||||
|
|
||||||
-- Location (determines session headers)
|
|
||||||
city VARCHAR(100),
|
|
||||||
state VARCHAR(50),
|
|
||||||
country VARCHAR(100),
|
|
||||||
country_code VARCHAR(10),
|
|
||||||
timezone VARCHAR(50),
|
|
||||||
|
|
||||||
-- Health tracking
|
|
||||||
failure_count INTEGER DEFAULT 0,
|
|
||||||
consecutive_403_count INTEGER DEFAULT 0, -- Track 403s specifically
|
|
||||||
last_used_at TIMESTAMPTZ,
|
|
||||||
last_failure_at TIMESTAMPTZ,
|
|
||||||
last_error TEXT,
|
|
||||||
|
|
||||||
-- Performance
|
|
||||||
response_time_ms INTEGER,
|
|
||||||
max_connections INTEGER DEFAULT 1
|
|
||||||
);
|
|
||||||
```
|
|
||||||
|
|
||||||
### 5. Failure Threshold
|
|
||||||
|
|
||||||
- **3 consecutive 403s** with different fingerprints → disable proxy
|
|
||||||
- Reset `consecutive_403_count` to 0 on successful request
|
|
||||||
- General `failure_count` tracks all errors (timeouts, connection errors, etc.)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Implementation Status
|
|
||||||
|
|
||||||
### COMPLETED - December 10, 2025
|
|
||||||
|
|
||||||
All code changes have been implemented per this specification:
|
|
||||||
|
|
||||||
#### 1. crawl-rotator.ts ✅
|
|
||||||
|
|
||||||
- [x] Added `consecutive403Count` to Proxy interface
|
|
||||||
- [x] Added `markBlocked()` method that increments `consecutive_403_count` and disables proxy at 3
|
|
||||||
- [x] Added `getProxyTimezone()` to return current proxy's timezone
|
|
||||||
- [x] `markSuccess()` now resets `consecutive_403_count` to 0
|
|
||||||
- [x] Replaced hardcoded UA list with `intoli/user-agents` library for realistic fingerprints
|
|
||||||
- [x] `BrowserFingerprint` interface includes full fingerprint data (UA, platform, screen size, viewport, sec-ch-ua headers)
|
|
||||||
|
|
||||||
#### 2. client.ts ✅
|
|
||||||
|
|
||||||
- [x] `startSession()` no longer takes state/timezone params
|
|
||||||
- [x] `startSession()` gets identity from proxy via `crawlRotator.getProxyLocation()`
|
|
||||||
- [x] Added `handle403Block()` that:
|
|
||||||
- Calls `crawlRotator.recordBlock()` (tracks consecutive 403s)
|
|
||||||
- Immediately rotates both proxy and fingerprint via `rotateBoth()`
|
|
||||||
- Returns false if no more proxies available
|
|
||||||
- [x] `executeGraphQL()` calls `handle403Block()` on 403 (not `rotateProxyOn403`)
|
|
||||||
- [x] `fetchPage()` uses same 403 handling
|
|
||||||
- [x] 500ms backoff after rotation (not linear delay)
|
|
||||||
|
|
||||||
#### 3. Task Handlers ✅
|
|
||||||
|
|
||||||
- [x] `entry-point-discovery.ts`: `startSession()` called with no params
|
|
||||||
- [x] `product-refresh.ts`: `startSession()` called with no params
|
|
||||||
|
|
||||||
#### 4. Dependencies ✅
|
|
||||||
|
|
||||||
- [x] Added `user-agents` npm package for realistic UA generation
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Files Changed
|
|
||||||
|
|
||||||
| File | Changes |
|
|
||||||
|------|---------|
|
|
||||||
| `backend/src/services/crawl-rotator.ts` | Complete rewrite with `consecutive403Count`, `markBlocked()`, `intoli/user-agents` |
|
|
||||||
| `backend/src/platforms/dutchie/client.ts` | `startSession()` uses proxy location, `handle403Block()` for 403 handling |
|
|
||||||
| `backend/src/tasks/handlers/entry-point-discovery.ts` | `startSession()` no params |
|
|
||||||
| `backend/src/tasks/handlers/product-refresh.ts` | `startSession()` no params |
|
|
||||||
| `backend/package.json` | Added `user-agents` dependency |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Migration Required
|
|
||||||
|
|
||||||
The `proxies` table needs `consecutive_403_count` column if not already present:
|
|
||||||
|
|
||||||
```sql
|
|
||||||
ALTER TABLE proxies ADD COLUMN IF NOT EXISTS consecutive_403_count INTEGER DEFAULT 0;
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Key Behaviors Summary
|
|
||||||
|
|
||||||
| Behavior | Implementation |
|
|
||||||
|----------|----------------|
|
|
||||||
| Session identity | From proxy location (`getProxyLocation()`) |
|
|
||||||
| Language | Always `en-US,en;q=0.9` |
|
|
||||||
| 403 handling | `handle403Block()` → `recordBlock()` → `rotateBoth()` |
|
|
||||||
| Proxy disable | After 3 consecutive 403s (`consecutive403Count >= 3`) |
|
|
||||||
| Success reset | `markSuccess()` resets `consecutive403Count` to 0 |
|
|
||||||
| UA generation | `intoli/user-agents` library (daily updated, realistic fingerprints) |
|
|
||||||
| Fingerprint data | Full: UA, platform, screen size, viewport, sec-ch-ua headers |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## User-Agent Generation
|
|
||||||
|
|
||||||
### Data Source
|
|
||||||
|
|
||||||
The `intoli/user-agents` npm library provides daily-updated market share data collected from Intoli's residential proxy network (millions of real users). The package auto-releases new versions daily to npm.
|
|
||||||
|
|
||||||
### Device Category Distribution (hardcoded)
|
|
||||||
|
|
||||||
| Category | Share |
|
|
||||||
|----------|-------|
|
|
||||||
| Mobile | 62% |
|
|
||||||
| Desktop | 36% |
|
|
||||||
| Tablet | 2% |
|
|
||||||
|
|
||||||
### Browser Filter (whitelist only)
|
|
||||||
|
|
||||||
Only these browsers are allowed:
|
|
||||||
- Chrome (67%)
|
|
||||||
- Safari (20%)
|
|
||||||
- Edge (6%)
|
|
||||||
- Firefox (3%)
|
|
||||||
|
|
||||||
Samsung Internet, Opera, and other niche browsers are filtered out.
|
|
||||||
|
|
||||||
### Desktop OS Distribution (from library)
|
|
||||||
|
|
||||||
| OS | Share |
|
|
||||||
|----|-------|
|
|
||||||
| Windows | 72% |
|
|
||||||
| macOS | 17% |
|
|
||||||
| Linux | 4% |
|
|
||||||
|
|
||||||
### UA Lifecycle
|
|
||||||
|
|
||||||
1. **Session start** (new proxy IP obtained) → Roll device category (62/36/2) → Generate UA filtered to device + top 4 browsers → Store on session
|
|
||||||
2. **UA sticks** until IP rotates (403 block or manual rotation)
|
|
||||||
3. **IP rotation** triggers new UA generation
|
|
||||||
|
|
||||||
### Failure Handling
|
|
||||||
|
|
||||||
- If UA generation fails → Alert admin dashboard, **stop crawl immediately**
|
|
||||||
- No fallback to static UA list
|
|
||||||
- This forces investigation rather than silent degradation
|
|
||||||
|
|
||||||
### Session Logging
|
|
||||||
|
|
||||||
Each session logs:
|
|
||||||
- Device category (mobile/desktop/tablet)
|
|
||||||
- Full UA string
|
|
||||||
- Browser name (Chrome/Safari/Edge/Firefox)
|
|
||||||
- IP address (from proxy)
|
|
||||||
- Session start timestamp
|
|
||||||
|
|
||||||
Logs are rotated monthly.
|
|
||||||
|
|
||||||
### Implementation
|
|
||||||
|
|
||||||
Located in `backend/src/services/crawl-rotator.ts`:
|
|
||||||
|
|
||||||
```typescript
|
|
||||||
// Per workflow-12102025.md: Device category distribution
|
|
||||||
const DEVICE_WEIGHTS = { mobile: 62, desktop: 36, tablet: 2 };
|
|
||||||
|
|
||||||
// Per workflow-12102025.md: Browser whitelist
|
|
||||||
const ALLOWED_BROWSERS = ['Chrome', 'Safari', 'Edge', 'Firefox'];
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## HTTP Fingerprinting
|
|
||||||
|
|
||||||
### Goal
|
|
||||||
|
|
||||||
Make HTTP requests indistinguishable from real browser traffic. No repeatable footprint.
|
|
||||||
|
|
||||||
### Components
|
|
||||||
|
|
||||||
1. **Full Header Set** - All headers a real browser sends
|
|
||||||
2. **Header Ordering** - Browser-specific order (Chrome vs Firefox vs Safari)
|
|
||||||
3. **TLS Fingerprint** - Use `curl-impersonate` to match browser TLS signature
|
|
||||||
4. **Dynamic Referer** - Set per dispensary being crawled
|
|
||||||
5. **Natural Randomization** - Vary optional headers like real users
|
|
||||||
|
|
||||||
### Required Headers
|
|
||||||
|
|
||||||
| Header | Chrome | Firefox | Safari | Notes |
|
|
||||||
|--------|--------|---------|--------|-------|
|
|
||||||
| `User-Agent` | ✅ | ✅ | ✅ | From UA generation |
|
|
||||||
| `Accept` | ✅ | ✅ | ✅ | Content types |
|
|
||||||
| `Accept-Language` | ✅ | ✅ | ✅ | Always `en-US,en;q=0.9` |
|
|
||||||
| `Accept-Encoding` | ✅ | ✅ | ✅ | `gzip, deflate, br` |
|
|
||||||
| `Connection` | ✅ | ✅ | ✅ | `keep-alive` |
|
|
||||||
| `Origin` | ✅ | ✅ | ✅ | `https://dutchie.com` (POST only) |
|
|
||||||
| `Referer` | ✅ | ✅ | ✅ | Dynamic per dispensary |
|
|
||||||
| `sec-ch-ua` | ✅ | ❌ | ❌ | Chromium only |
|
|
||||||
| `sec-ch-ua-mobile` | ✅ | ❌ | ❌ | Chromium only |
|
|
||||||
| `sec-ch-ua-platform` | ✅ | ❌ | ❌ | Chromium only |
|
|
||||||
| `sec-fetch-dest` | ✅ | ✅ | ❌ | `empty` for XHR |
|
|
||||||
| `sec-fetch-mode` | ✅ | ✅ | ❌ | `cors` for XHR |
|
|
||||||
| `sec-fetch-site` | ✅ | ✅ | ❌ | `same-origin` |
|
|
||||||
| `Upgrade-Insecure-Requests` | ✅ | ✅ | ✅ | `1` (page loads only) |
|
|
||||||
| `DNT` | ~30% | ~30% | ~30% | Randomized per session |
|
|
||||||
|
|
||||||
### Header Ordering
|
|
||||||
|
|
||||||
Each browser sends headers in a specific order. Fingerprinting services detect mismatches.
|
|
||||||
|
|
||||||
**Chrome order (GraphQL request):**
|
|
||||||
1. Host
|
|
||||||
2. Connection
|
|
||||||
3. Content-Length (POST)
|
|
||||||
4. sec-ch-ua
|
|
||||||
5. DNT (if enabled)
|
|
||||||
6. sec-ch-ua-mobile
|
|
||||||
7. User-Agent
|
|
||||||
8. sec-ch-ua-platform
|
|
||||||
9. Content-Type (POST)
|
|
||||||
10. Accept
|
|
||||||
11. Origin (POST)
|
|
||||||
12. sec-fetch-site
|
|
||||||
13. sec-fetch-mode
|
|
||||||
14. sec-fetch-dest
|
|
||||||
15. Referer
|
|
||||||
16. Accept-Encoding
|
|
||||||
17. Accept-Language
|
|
||||||
|
|
||||||
**Firefox order (GraphQL request):**
|
|
||||||
1. Host
|
|
||||||
2. User-Agent
|
|
||||||
3. Accept
|
|
||||||
4. Accept-Language
|
|
||||||
5. Accept-Encoding
|
|
||||||
6. Content-Type (POST)
|
|
||||||
7. Content-Length (POST)
|
|
||||||
8. Origin (POST)
|
|
||||||
9. DNT (if enabled)
|
|
||||||
10. Connection
|
|
||||||
11. Referer
|
|
||||||
12. sec-fetch-dest
|
|
||||||
13. sec-fetch-mode
|
|
||||||
14. sec-fetch-site
|
|
||||||
|
|
||||||
**Safari order (GraphQL request):**
|
|
||||||
1. Host
|
|
||||||
2. Connection
|
|
||||||
3. Content-Length (POST)
|
|
||||||
4. Accept
|
|
||||||
5. User-Agent
|
|
||||||
6. Content-Type (POST)
|
|
||||||
7. Origin (POST)
|
|
||||||
8. Referer
|
|
||||||
9. Accept-Encoding
|
|
||||||
10. Accept-Language
|
|
||||||
|
|
||||||
### TLS Fingerprinting
|
|
||||||
|
|
||||||
Use `curl-impersonate` instead of standard curl:
|
|
||||||
- `curl_chrome131` - Mimics Chrome 131 TLS handshake
|
|
||||||
- `curl_ff133` - Mimics Firefox 133 TLS handshake
|
|
||||||
- `curl_safari17` - Mimics Safari 17 TLS handshake
|
|
||||||
|
|
||||||
Match TLS binary to browser in UA.
|
|
||||||
|
|
||||||
### Dynamic Referer
|
|
||||||
|
|
||||||
Set Referer to the dispensary's actual page URL:
|
|
||||||
|
|
||||||
```
|
|
||||||
Crawling "harvest-of-tempe" → Referer: https://dutchie.com/dispensary/harvest-of-tempe
|
|
||||||
Crawling "zen-leaf-mesa" → Referer: https://dutchie.com/dispensary/zen-leaf-mesa
|
|
||||||
```
|
|
||||||
|
|
||||||
Derived from dispensary's `menu_url` field.
|
|
||||||
|
|
||||||
### Natural Randomization
|
|
||||||
|
|
||||||
Per-session randomization (set once when session starts, consistent for session):
|
|
||||||
|
|
||||||
| Feature | Distribution | Implementation |
|
|
||||||
|---------|--------------|----------------|
|
|
||||||
| DNT header | 30% have it | `Math.random() < 0.30` |
|
|
||||||
| Accept quality values | Slight variation | `q=0.9` vs `q=0.8` |
|
|
||||||
|
|
||||||
### Implementation Files
|
|
||||||
|
|
||||||
| File | Purpose |
|
|
||||||
|------|---------|
|
|
||||||
| `src/services/crawl-rotator.ts` | `BrowserFingerprint` includes full header config |
|
|
||||||
| `src/platforms/dutchie/client.ts` | Build headers from fingerprint, use curl-impersonate |
|
|
||||||
| `src/services/http-fingerprint.ts` | Header ordering per browser (NEW) |
|
|
||||||
Reference in New Issue
Block a user