Compare commits

..

1 Commits

Author SHA1 Message Date
Kelly
daab0ae9b2 feat(api): Add payload query API and trusted origins management
Query API:
- GET /api/payloads/store/:id/query - Filter products with flexible params
  (brand, category, price_min/max, thc_min/max, search, sort, pagination)
- GET /api/payloads/store/:id/aggregate - Group by brand/category with metrics
  (count, avg_price, min_price, max_price, avg_thc, in_stock_count)
- Documentation at docs/QUERY_API.md

Trusted Origins Admin:
- GET/POST/PUT/DELETE /api/admin/trusted-origins - Manage auth bypass list
- Trusted IPs, domains, and regex patterns stored in DB
- 5-minute cache with invalidation on admin updates
- Fallback to hardcoded defaults if DB unavailable
- Migration 085 creates table with seed data

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 23:28:05 -07:00
13 changed files with 1632 additions and 437 deletions

343
backend/docs/QUERY_API.md Normal file
View File

@@ -0,0 +1,343 @@
# CannaiQ Query API
Query raw crawl payload data with flexible filters, sorting, and aggregation.
## Base URL
```
https://cannaiq.co/api/payloads
```
## Authentication
Include your API key in the header:
```
X-API-Key: your-api-key
```
---
## Endpoints
### 1. Query Products
Filter and search products from a store's latest crawl data.
```
GET /api/payloads/store/{dispensaryId}/query
```
#### Query Parameters
| Parameter | Type | Description |
|-----------|------|-------------|
| `brand` | string | Filter by brand name (partial match) |
| `category` | string | Filter by category (flower, vape, edible, etc.) |
| `subcategory` | string | Filter by subcategory |
| `strain_type` | string | Filter by strain (indica, sativa, hybrid, cbd) |
| `in_stock` | boolean | Filter by stock status (true/false) |
| `price_min` | number | Minimum price |
| `price_max` | number | Maximum price |
| `thc_min` | number | Minimum THC percentage |
| `thc_max` | number | Maximum THC percentage |
| `search` | string | Search product name (partial match) |
| `fields` | string | Comma-separated fields to return |
| `limit` | number | Max results (default 100, max 1000) |
| `offset` | number | Skip results for pagination |
| `sort` | string | Sort by: name, price, thc, brand |
| `order` | string | Sort order: asc, desc |
#### Available Fields
When using `fields` parameter, you can request:
- `id` - Product ID
- `name` - Product name
- `brand` - Brand name
- `category` - Product category
- `subcategory` - Product subcategory
- `strain_type` - Indica/Sativa/Hybrid/CBD
- `price` - Current price
- `price_med` - Medical price
- `price_rec` - Recreational price
- `thc` - THC percentage
- `cbd` - CBD percentage
- `weight` - Product weight/size
- `status` - Stock status
- `in_stock` - Boolean in-stock flag
- `image_url` - Product image
- `description` - Product description
#### Examples
**Get all flower products under $40:**
```
GET /api/payloads/store/112/query?category=flower&price_max=40
```
**Search for "Blue Dream" with high THC:**
```
GET /api/payloads/store/112/query?search=blue+dream&thc_min=20
```
**Get only name and price for Alien Labs products:**
```
GET /api/payloads/store/112/query?brand=Alien+Labs&fields=name,price,thc
```
**Get top 10 highest THC products:**
```
GET /api/payloads/store/112/query?sort=thc&order=desc&limit=10
```
**Paginate through in-stock products:**
```
GET /api/payloads/store/112/query?in_stock=true&limit=50&offset=0
GET /api/payloads/store/112/query?in_stock=true&limit=50&offset=50
```
#### Response
```json
{
"success": true,
"dispensaryId": 112,
"payloadId": 45,
"fetchedAt": "2025-12-11T10:30:00Z",
"query": {
"filters": {
"brand": "Alien Labs",
"category": null,
"price_max": null
},
"sort": "price",
"order": "asc",
"limit": 100,
"offset": 0
},
"pagination": {
"total": 15,
"returned": 15,
"limit": 100,
"offset": 0,
"has_more": false
},
"products": [
{
"id": "507f1f77bcf86cd799439011",
"name": "Alien Labs - Baklava 3.5g",
"brand": "Alien Labs",
"category": "flower",
"strain_type": "hybrid",
"price": 55,
"thc": "28.5",
"in_stock": true
}
]
}
```
---
### 2. Aggregate Data
Group products and calculate metrics.
```
GET /api/payloads/store/{dispensaryId}/aggregate
```
#### Query Parameters
| Parameter | Type | Description |
|-----------|------|-------------|
| `group_by` | string | **Required.** Field to group by: brand, category, subcategory, strain_type |
| `metrics` | string | Comma-separated metrics (default: count) |
#### Available Metrics
- `count` - Number of products
- `avg_price` - Average price
- `min_price` - Lowest price
- `max_price` - Highest price
- `avg_thc` - Average THC percentage
- `in_stock_count` - Number of in-stock products
#### Examples
**Count products by brand:**
```
GET /api/payloads/store/112/aggregate?group_by=brand
```
**Get price stats by category:**
```
GET /api/payloads/store/112/aggregate?group_by=category&metrics=count,avg_price,min_price,max_price
```
**Get THC averages by strain type:**
```
GET /api/payloads/store/112/aggregate?group_by=strain_type&metrics=count,avg_thc
```
**Brand analysis with stock info:**
```
GET /api/payloads/store/112/aggregate?group_by=brand&metrics=count,avg_price,in_stock_count
```
#### Response
```json
{
"success": true,
"dispensaryId": 112,
"payloadId": 45,
"fetchedAt": "2025-12-11T10:30:00Z",
"groupBy": "brand",
"metrics": ["count", "avg_price"],
"totalProducts": 450,
"groupCount": 85,
"aggregations": [
{
"brand": "Alien Labs",
"count": 15,
"avg_price": 52.33
},
{
"brand": "Connected",
"count": 12,
"avg_price": 48.50
}
]
}
```
---
### 3. Compare Stores (Price Comparison)
Query the same data from multiple stores and compare in your app:
```javascript
// Get flower prices from Store A
const storeA = await fetch('/api/payloads/store/112/query?category=flower&fields=name,brand,price');
// Get flower prices from Store B
const storeB = await fetch('/api/payloads/store/115/query?category=flower&fields=name,brand,price');
// Compare in your app
const dataA = await storeA.json();
const dataB = await storeB.json();
// Find matching products and compare prices
```
---
### 4. Price History
For historical price data, use the snapshots endpoint:
```
GET /api/v1/products/{productId}/history?days=30
```
Or compare payloads over time:
```
GET /api/payloads/store/{dispensaryId}/diff?from={payloadId1}&to={payloadId2}
```
The diff endpoint shows:
- Products added
- Products removed
- Price changes
- Stock changes
---
### 5. List Stores
Get available dispensaries to query:
```
GET /api/stores
```
Returns all stores with their IDs, names, and locations.
---
## Use Cases
### Price Comparison App
```javascript
// 1. Get stores in Arizona
const stores = await fetch('/api/stores?state=AZ').then(r => r.json());
// 2. Query flower prices from each store
const prices = await Promise.all(
stores.map(store =>
fetch(`/api/payloads/store/${store.id}/query?category=flower&fields=name,brand,price`)
.then(r => r.json())
)
);
// 3. Build comparison matrix in your app
```
### Brand Analytics Dashboard
```javascript
// Get brand presence across stores
const brandData = await Promise.all(
storeIds.map(id =>
fetch(`/api/payloads/store/${id}/aggregate?group_by=brand&metrics=count,avg_price`)
.then(r => r.json())
)
);
// Aggregate brand presence across all stores
```
### Deal Finder
```javascript
// Find high-THC flower under $30
const deals = await fetch(
'/api/payloads/store/112/query?category=flower&price_max=30&thc_min=20&in_stock=true&sort=thc&order=desc'
).then(r => r.json());
```
### Inventory Tracker
```javascript
// Get products that went out of stock
const diff = await fetch('/api/payloads/store/112/diff').then(r => r.json());
const outOfStock = diff.details.stockChanges.filter(
p => p.newStatus !== 'Active'
);
```
---
## Rate Limits
- Default: 100 requests/minute per API key
- Contact support for higher limits
## Error Responses
```json
{
"success": false,
"error": "Error message here"
}
```
Common errors:
- `404` - Store or payload not found
- `400` - Missing required parameter
- `401` - Invalid or missing API key
- `429` - Rate limit exceeded

View File

@@ -1,77 +0,0 @@
apiVersion: v1
kind: Service
metadata:
name: scraper-worker
namespace: dispensary-scraper
labels:
app: scraper-worker
spec:
clusterIP: None # Headless service required for StatefulSet
selector:
app: scraper-worker
ports:
- port: 3010
name: http
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: scraper-worker
namespace: dispensary-scraper
spec:
serviceName: scraper-worker
replicas: 8
podManagementPolicy: Parallel # Start all pods at once
updateStrategy:
type: OnDelete # Pods only update when manually deleted - no automatic restarts
selector:
matchLabels:
app: scraper-worker
template:
metadata:
labels:
app: scraper-worker
spec:
terminationGracePeriodSeconds: 60
imagePullSecrets:
- name: regcred
containers:
- name: worker
image: code.cannabrands.app/creationshop/dispensary-scraper:latest
imagePullPolicy: Always
command: ["node"]
args: ["dist/tasks/task-worker.js"]
env:
- name: WORKER_MODE
value: "true"
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MAX_CONCURRENT_TASKS
value: "50"
- name: API_BASE_URL
value: http://scraper
- name: NODE_OPTIONS
value: --max-old-space-size=1500
envFrom:
- configMapRef:
name: scraper-config
- secretRef:
name: scraper-secrets
resources:
requests:
cpu: 100m
memory: 1Gi
limits:
cpu: 500m
memory: 2Gi
livenessProbe:
exec:
command:
- /bin/sh
- -c
- pgrep -f 'task-worker' > /dev/null
initialDelaySeconds: 10
periodSeconds: 30
failureThreshold: 3

View File

@@ -1,168 +0,0 @@
-- Migration 085: Add IP and fingerprint columns for preflight reporting
-- These columns were missing from migration 084
-- ===================================================================
-- PART 1: Add IP address columns to worker_registry
-- ===================================================================
-- IP address detected during curl/axios preflight
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS curl_ip VARCHAR(45);
-- IP address detected during http/Puppeteer preflight
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS http_ip VARCHAR(45);
-- ===================================================================
-- PART 2: Add fingerprint data column
-- ===================================================================
-- Browser fingerprint data captured during Puppeteer preflight
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS fingerprint_data JSONB;
-- ===================================================================
-- PART 3: Add combined preflight status/timestamp for convenience
-- ===================================================================
-- Overall preflight status (computed from both transports)
-- Values: 'pending', 'passed', 'partial', 'failed'
-- - 'pending': neither transport tested
-- - 'passed': both transports passed (or http passed for browser-only)
-- - 'partial': at least one passed
-- - 'failed': no transport passed
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS preflight_status VARCHAR(20) DEFAULT 'pending';
-- Most recent preflight completion timestamp
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS preflight_at TIMESTAMPTZ;
-- ===================================================================
-- PART 4: Update function to set preflight status
-- ===================================================================
CREATE OR REPLACE FUNCTION update_worker_preflight(
p_worker_id VARCHAR(100),
p_transport VARCHAR(10), -- 'curl' or 'http'
p_status VARCHAR(20), -- 'passed', 'failed', 'skipped'
p_ip VARCHAR(45) DEFAULT NULL,
p_response_ms INTEGER DEFAULT NULL,
p_error TEXT DEFAULT NULL,
p_fingerprint JSONB DEFAULT NULL
) RETURNS VOID AS $$
DECLARE
v_curl_status VARCHAR(20);
v_http_status VARCHAR(20);
v_overall_status VARCHAR(20);
BEGIN
IF p_transport = 'curl' THEN
UPDATE worker_registry
SET
preflight_curl_status = p_status,
preflight_curl_at = NOW(),
preflight_curl_ms = p_response_ms,
preflight_curl_error = p_error,
curl_ip = p_ip,
updated_at = NOW()
WHERE worker_id = p_worker_id;
ELSIF p_transport = 'http' THEN
UPDATE worker_registry
SET
preflight_http_status = p_status,
preflight_http_at = NOW(),
preflight_http_ms = p_response_ms,
preflight_http_error = p_error,
http_ip = p_ip,
fingerprint_data = COALESCE(p_fingerprint, fingerprint_data),
updated_at = NOW()
WHERE worker_id = p_worker_id;
END IF;
-- Update overall preflight status
SELECT preflight_curl_status, preflight_http_status
INTO v_curl_status, v_http_status
FROM worker_registry
WHERE worker_id = p_worker_id;
-- Compute overall status
IF v_curl_status = 'passed' AND v_http_status = 'passed' THEN
v_overall_status := 'passed';
ELSIF v_curl_status = 'passed' OR v_http_status = 'passed' THEN
v_overall_status := 'partial';
ELSIF v_curl_status = 'failed' OR v_http_status = 'failed' THEN
v_overall_status := 'failed';
ELSE
v_overall_status := 'pending';
END IF;
UPDATE worker_registry
SET
preflight_status = v_overall_status,
preflight_at = NOW()
WHERE worker_id = p_worker_id;
END;
$$ LANGUAGE plpgsql;
-- ===================================================================
-- PART 5: Update v_active_workers view
-- ===================================================================
DROP VIEW IF EXISTS v_active_workers;
CREATE VIEW v_active_workers AS
SELECT
wr.id,
wr.worker_id,
wr.friendly_name,
wr.role,
wr.status,
wr.pod_name,
wr.hostname,
wr.started_at,
wr.last_heartbeat_at,
wr.last_task_at,
wr.tasks_completed,
wr.tasks_failed,
wr.current_task_id,
-- IP addresses from preflights
wr.curl_ip,
wr.http_ip,
-- Combined preflight status
wr.preflight_status,
wr.preflight_at,
-- Detailed preflight status per transport
wr.preflight_curl_status,
wr.preflight_http_status,
wr.preflight_curl_at,
wr.preflight_http_at,
wr.preflight_curl_error,
wr.preflight_http_error,
wr.preflight_curl_ms,
wr.preflight_http_ms,
-- Fingerprint data
wr.fingerprint_data,
-- Computed fields
EXTRACT(EPOCH FROM (NOW() - wr.last_heartbeat_at)) as seconds_since_heartbeat,
CASE
WHEN wr.status = 'offline' THEN 'offline'
WHEN wr.last_heartbeat_at < NOW() - INTERVAL '2 minutes' THEN 'stale'
WHEN wr.current_task_id IS NOT NULL THEN 'busy'
ELSE 'ready'
END as health_status,
-- Capability flags (can this worker handle curl/http tasks?)
(wr.preflight_curl_status = 'passed') as can_curl,
(wr.preflight_http_status = 'passed') as can_http
FROM worker_registry wr
WHERE wr.status != 'terminated'
ORDER BY wr.status = 'active' DESC, wr.last_heartbeat_at DESC;
-- ===================================================================
-- Comments
-- ===================================================================
COMMENT ON COLUMN worker_registry.curl_ip IS 'IP address detected during curl/axios preflight';
COMMENT ON COLUMN worker_registry.http_ip IS 'IP address detected during Puppeteer preflight';
COMMENT ON COLUMN worker_registry.fingerprint_data IS 'Browser fingerprint captured during Puppeteer preflight';
COMMENT ON COLUMN worker_registry.preflight_status IS 'Overall preflight status: pending, passed, partial, failed';
COMMENT ON COLUMN worker_registry.preflight_at IS 'Most recent preflight completion timestamp';

View File

@@ -0,0 +1,59 @@
-- Migration 085: Trusted Origins Management
-- Allows admin to manage trusted IPs and domains via UI instead of hardcoded values
-- Trusted origins table (IPs and domains that bypass API key auth)
CREATE TABLE IF NOT EXISTS trusted_origins (
id SERIAL PRIMARY KEY,
-- Origin type: 'ip', 'domain', 'pattern'
origin_type VARCHAR(20) NOT NULL CHECK (origin_type IN ('ip', 'domain', 'pattern')),
-- The actual value
-- For ip: '127.0.0.1', '::1', '192.168.1.0/24'
-- For domain: 'cannaiq.co', 'findadispo.com'
-- For pattern: '^https://.*\.cannabrands\.app$' (regex)
origin_value VARCHAR(255) NOT NULL,
-- Description for admin reference
description TEXT,
-- Active flag
active BOOLEAN DEFAULT true,
-- Audit
created_at TIMESTAMPTZ DEFAULT NOW(),
created_by INTEGER REFERENCES users(id),
updated_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(origin_type, origin_value)
);
-- Index for quick lookups
CREATE INDEX IF NOT EXISTS idx_trusted_origins_active ON trusted_origins(active) WHERE active = true;
CREATE INDEX IF NOT EXISTS idx_trusted_origins_type ON trusted_origins(origin_type, active);
-- Seed with current hardcoded values
INSERT INTO trusted_origins (origin_type, origin_value, description) VALUES
-- Trusted IPs (localhost)
('ip', '127.0.0.1', 'Localhost IPv4'),
('ip', '::1', 'Localhost IPv6'),
('ip', '::ffff:127.0.0.1', 'Localhost IPv4-mapped IPv6'),
-- Trusted domains
('domain', 'cannaiq.co', 'CannaiQ production'),
('domain', 'www.cannaiq.co', 'CannaiQ production (www)'),
('domain', 'findadispo.com', 'FindADispo production'),
('domain', 'www.findadispo.com', 'FindADispo production (www)'),
('domain', 'findagram.co', 'Findagram production'),
('domain', 'www.findagram.co', 'Findagram production (www)'),
('domain', 'localhost:3010', 'Local backend dev'),
('domain', 'localhost:8080', 'Local admin dev'),
('domain', 'localhost:5173', 'Local Vite dev'),
-- Pattern-based (regex)
('pattern', '^https://.*\.cannabrands\.app$', 'All cannabrands.app subdomains'),
('pattern', '^https://.*\.cannaiq\.co$', 'All cannaiq.co subdomains')
ON CONFLICT (origin_type, origin_value) DO NOTHING;
-- Add comment
COMMENT ON TABLE trusted_origins IS 'IPs and domains that bypass API key authentication. Managed via /admin.';

View File

@@ -35,6 +35,8 @@
"puppeteer-extra-plugin-stealth": "^2.11.2",
"sharp": "^0.32.0",
"socks-proxy-agent": "^8.0.2",
"swagger-jsdoc": "^6.2.8",
"swagger-ui-express": "^5.0.1",
"user-agents": "^1.1.669",
"uuid": "^9.0.1",
"zod": "^3.22.4"
@@ -47,11 +49,53 @@
"@types/node": "^20.10.5",
"@types/node-cron": "^3.0.11",
"@types/pg": "^8.15.6",
"@types/swagger-jsdoc": "^6.0.4",
"@types/swagger-ui-express": "^4.1.8",
"@types/uuid": "^9.0.7",
"tsx": "^4.7.0",
"typescript": "^5.3.3"
}
},
"node_modules/@apidevtools/json-schema-ref-parser": {
"version": "9.1.2",
"resolved": "https://registry.npmjs.org/@apidevtools/json-schema-ref-parser/-/json-schema-ref-parser-9.1.2.tgz",
"integrity": "sha512-r1w81DpR+KyRWd3f+rk6TNqMgedmAxZP5v5KWlXQWlgMUUtyEJch0DKEci1SorPMiSeM8XPl7MZ3miJ60JIpQg==",
"dependencies": {
"@jsdevtools/ono": "^7.1.3",
"@types/json-schema": "^7.0.6",
"call-me-maybe": "^1.0.1",
"js-yaml": "^4.1.0"
}
},
"node_modules/@apidevtools/openapi-schemas": {
"version": "2.1.0",
"resolved": "https://registry.npmjs.org/@apidevtools/openapi-schemas/-/openapi-schemas-2.1.0.tgz",
"integrity": "sha512-Zc1AlqrJlX3SlpupFGpiLi2EbteyP7fXmUOGup6/DnkRgjP9bgMM/ag+n91rsv0U1Gpz0H3VILA/o3bW7Ua6BQ==",
"engines": {
"node": ">=10"
}
},
"node_modules/@apidevtools/swagger-methods": {
"version": "3.0.2",
"resolved": "https://registry.npmjs.org/@apidevtools/swagger-methods/-/swagger-methods-3.0.2.tgz",
"integrity": "sha512-QAkD5kK2b1WfjDS/UQn/qQkbwF31uqRjPTrsCs5ZG9BQGAkjwvqGFjjPqAuzac/IYzpPtRzjCP1WrTuAIjMrXg=="
},
"node_modules/@apidevtools/swagger-parser": {
"version": "10.0.3",
"resolved": "https://registry.npmjs.org/@apidevtools/swagger-parser/-/swagger-parser-10.0.3.tgz",
"integrity": "sha512-sNiLY51vZOmSPFZA5TF35KZ2HbgYklQnTSDnkghamzLb3EkNtcQnrBQEj5AOCxHpTtXpqMCRM1CrmV2rG6nw4g==",
"dependencies": {
"@apidevtools/json-schema-ref-parser": "^9.0.6",
"@apidevtools/openapi-schemas": "^2.0.4",
"@apidevtools/swagger-methods": "^3.0.2",
"@jsdevtools/ono": "^7.1.3",
"call-me-maybe": "^1.0.1",
"z-schema": "^5.0.1"
},
"peerDependencies": {
"openapi-types": ">=7"
}
},
"node_modules/@babel/code-frame": {
"version": "7.27.1",
"resolved": "https://registry.npmjs.org/@babel/code-frame/-/code-frame-7.27.1.tgz",
@@ -494,6 +538,11 @@
"resolved": "https://registry.npmjs.org/@ioredis/commands/-/commands-1.4.0.tgz",
"integrity": "sha512-aFT2yemJJo+TZCmieA7qnYGQooOS7QfNmYrzGtsYd3g9j5iDP8AimYYAesf79ohjbLG12XxC4nG5DyEnC88AsQ=="
},
"node_modules/@jsdevtools/ono": {
"version": "7.1.3",
"resolved": "https://registry.npmjs.org/@jsdevtools/ono/-/ono-7.1.3.tgz",
"integrity": "sha512-4JQNk+3mVzK3xh2rqd6RB4J46qUR19azEHBneZyTZM+c456qOrbbM/5xcR8huNCCcbVt7+UmizG6GuUvPvKUYg=="
},
"node_modules/@jsep-plugin/assignment": {
"version": "1.3.0",
"resolved": "https://registry.npmjs.org/@jsep-plugin/assignment/-/assignment-1.3.0.tgz",
@@ -761,6 +810,12 @@
"resolved": "https://registry.npmjs.org/ms/-/ms-2.1.2.tgz",
"integrity": "sha512-sGkPx+VjMtmA6MX27oA4FBFELFCZZ4S4XqeGOXCv68tT+jb3vk/RyaKWP0PTKyWtmLSM0b+adUTEvbs1PEaH2w=="
},
"node_modules/@scarf/scarf": {
"version": "1.4.0",
"resolved": "https://registry.npmjs.org/@scarf/scarf/-/scarf-1.4.0.tgz",
"integrity": "sha512-xxeapPiUXdZAE3che6f3xogoJPeZgig6omHEy1rIY5WVsB3H2BHNnZH+gHG6x91SCWyQCzWGsuL2Hh3ClO5/qQ==",
"hasInstallScript": true
},
"node_modules/@tootallnate/quickjs-emscripten": {
"version": "0.23.0",
"resolved": "https://registry.npmjs.org/@tootallnate/quickjs-emscripten/-/quickjs-emscripten-0.23.0.tgz",
@@ -855,6 +910,11 @@
"resolved": "https://registry.npmjs.org/@types/js-yaml/-/js-yaml-4.0.9.tgz",
"integrity": "sha512-k4MGaQl5TGo/iipqb2UDG2UwjXziSWkh0uysQelTlJpX1qGlpUZYm8PnO4DxG1qBomtJUdYJ6qR6xdIah10JLg=="
},
"node_modules/@types/json-schema": {
"version": "7.0.15",
"resolved": "https://registry.npmjs.org/@types/json-schema/-/json-schema-7.0.15.tgz",
"integrity": "sha512-5+fP8P8MFNC+AyZCDxrB2pkZFPGzqQWUzpSeuuVLvm8VMcorNYavBqoFcxK8bQz4Qsbn4oUEEem4wDLfcysGHA=="
},
"node_modules/@types/jsonwebtoken": {
"version": "9.0.10",
"resolved": "https://registry.npmjs.org/@types/jsonwebtoken/-/jsonwebtoken-9.0.10.tgz",
@@ -960,6 +1020,22 @@
"@types/node": "*"
}
},
"node_modules/@types/swagger-jsdoc": {
"version": "6.0.4",
"resolved": "https://registry.npmjs.org/@types/swagger-jsdoc/-/swagger-jsdoc-6.0.4.tgz",
"integrity": "sha512-W+Xw5epcOZrF/AooUM/PccNMSAFOKWZA5dasNyMujTwsBkU74njSJBpvCCJhHAJ95XRMzQrrW844Btu0uoetwQ==",
"dev": true
},
"node_modules/@types/swagger-ui-express": {
"version": "4.1.8",
"resolved": "https://registry.npmjs.org/@types/swagger-ui-express/-/swagger-ui-express-4.1.8.tgz",
"integrity": "sha512-AhZV8/EIreHFmBV5wAs0gzJUNq9JbbSXgJLQubCC0jtIo6prnI9MIRRxnU4MZX9RB9yXxF1V4R7jtLl/Wcj31g==",
"dev": true,
"dependencies": {
"@types/express": "*",
"@types/serve-static": "*"
}
},
"node_modules/@types/uuid": {
"version": "9.0.8",
"resolved": "https://registry.npmjs.org/@types/uuid/-/uuid-9.0.8.tgz",
@@ -1434,6 +1510,11 @@
"url": "https://github.com/sponsors/ljharb"
}
},
"node_modules/call-me-maybe": {
"version": "1.0.2",
"resolved": "https://registry.npmjs.org/call-me-maybe/-/call-me-maybe-1.0.2.tgz",
"integrity": "sha512-HpX65o1Hnr9HH25ojC1YGs7HCQLq0GCOibSaWER0eNpgJ/Z1MZv2mTc7+xh6WOPxbRVcmgbv4hGU+uSQ/2xFZQ=="
},
"node_modules/callsites": {
"version": "3.1.0",
"resolved": "https://registry.npmjs.org/callsites/-/callsites-3.1.0.tgz",
@@ -1594,6 +1675,14 @@
"node": ">= 0.8"
}
},
"node_modules/commander": {
"version": "6.2.0",
"resolved": "https://registry.npmjs.org/commander/-/commander-6.2.0.tgz",
"integrity": "sha512-zP4jEKbe8SHzKJYQmq8Y9gYjtO/POJLgIdKgV7B9qNmABVFVc+ctqSX6iXh4mCpJfRBOabiZ2YKPg8ciDw6C+Q==",
"engines": {
"node": ">= 6"
}
},
"node_modules/concat-map": {
"version": "0.0.1",
"resolved": "https://registry.npmjs.org/concat-map/-/concat-map-0.0.1.tgz",
@@ -1863,6 +1952,17 @@
"resolved": "https://registry.npmjs.org/devtools-protocol/-/devtools-protocol-0.0.1232444.tgz",
"integrity": "sha512-pM27vqEfxSxRkTMnF+XCmxSEb6duO5R+t8A9DEEJgy4Wz2RVanje2mmj99B6A3zv2r/qGfYlOvYznUhuokizmg=="
},
"node_modules/doctrine": {
"version": "3.0.0",
"resolved": "https://registry.npmjs.org/doctrine/-/doctrine-3.0.0.tgz",
"integrity": "sha512-yS+Q5i3hBf7GBkd4KG8a7eBNNWNGLTaEwwYWUijIYM7zrlYDM0BFXHjjPWlWZ1Rg7UaddZeIDmi9jF3HmqiQ2w==",
"dependencies": {
"esutils": "^2.0.2"
},
"engines": {
"node": ">=6.0.0"
}
},
"node_modules/dom-serializer": {
"version": "2.0.0",
"resolved": "https://registry.npmjs.org/dom-serializer/-/dom-serializer-2.0.0.tgz",
@@ -3258,6 +3358,12 @@
"resolved": "https://registry.npmjs.org/lodash.defaults/-/lodash.defaults-4.2.0.tgz",
"integrity": "sha512-qjxPLHd3r5DnsdGacqOMU6pb/avJzdh9tFX2ymgoZE27BmjXrNy/y4LoaiTeAb+O3gL8AfpJGtqfX/ae2leYYQ=="
},
"node_modules/lodash.get": {
"version": "4.4.2",
"resolved": "https://registry.npmjs.org/lodash.get/-/lodash.get-4.4.2.tgz",
"integrity": "sha512-z+Uw/vLuy6gQe8cfaFWD7p0wVv8fJl3mbzXh33RS+0oW2wvUqiRXiQ69gLWSLpgB5/6sU+r6BlQR0MBILadqTQ==",
"deprecated": "This package is deprecated. Use the optional chaining (?.) operator instead."
},
"node_modules/lodash.includes": {
"version": "4.3.0",
"resolved": "https://registry.npmjs.org/lodash.includes/-/lodash.includes-4.3.0.tgz",
@@ -3273,6 +3379,12 @@
"resolved": "https://registry.npmjs.org/lodash.isboolean/-/lodash.isboolean-3.0.3.tgz",
"integrity": "sha512-Bz5mupy2SVbPHURB98VAcw+aHh4vRV5IPNhILUCsOzRmsTmSQ17jIuqopAentWoehktxGd9e/hbIXq980/1QJg=="
},
"node_modules/lodash.isequal": {
"version": "4.5.0",
"resolved": "https://registry.npmjs.org/lodash.isequal/-/lodash.isequal-4.5.0.tgz",
"integrity": "sha512-pDo3lu8Jhfjqls6GkMgpahsF9kCyayhgykjyLMNFTKWrpVdAQtYyB4muAMWozBB4ig/dtWAmsMxLEI8wuz+DYQ==",
"deprecated": "This package is deprecated. Use require('node:util').isDeepStrictEqual instead."
},
"node_modules/lodash.isinteger": {
"version": "4.0.4",
"resolved": "https://registry.npmjs.org/lodash.isinteger/-/lodash.isinteger-4.0.4.tgz",
@@ -3293,6 +3405,11 @@
"resolved": "https://registry.npmjs.org/lodash.isstring/-/lodash.isstring-4.0.1.tgz",
"integrity": "sha512-0wJxfxH1wgO3GrbuP+dTTk7op+6L41QCXbGINEmD+ny/G/eCqGzxyCsh7159S+mgDDcoarnBw6PC1PS5+wUGgw=="
},
"node_modules/lodash.mergewith": {
"version": "4.6.2",
"resolved": "https://registry.npmjs.org/lodash.mergewith/-/lodash.mergewith-4.6.2.tgz",
"integrity": "sha512-GK3g5RPZWTRSeLSpgP8Xhra+pnjBC56q9FZYe1d5RN3TJ35dbkGy3YqBSMbyCrlbi+CM9Z3Jk5yTL7RCsqboyQ=="
},
"node_modules/lodash.once": {
"version": "4.1.1",
"resolved": "https://registry.npmjs.org/lodash.once/-/lodash.once-4.1.1.tgz",
@@ -3748,6 +3865,12 @@
"wrappy": "1"
}
},
"node_modules/openapi-types": {
"version": "12.1.3",
"resolved": "https://registry.npmjs.org/openapi-types/-/openapi-types-12.1.3.tgz",
"integrity": "sha512-N4YtSYJqghVu4iek2ZUvcN/0aqH1kRDuNqzcycDxhOUpg7GdvLa2F3DgS6yBNhInhv2r/6I0Flkn7CqL8+nIcw==",
"peer": true
},
"node_modules/openid-client": {
"version": "6.8.1",
"resolved": "https://registry.npmjs.org/openid-client/-/openid-client-6.8.1.tgz",
@@ -5188,6 +5311,78 @@
}
]
},
"node_modules/swagger-jsdoc": {
"version": "6.2.8",
"resolved": "https://registry.npmjs.org/swagger-jsdoc/-/swagger-jsdoc-6.2.8.tgz",
"integrity": "sha512-VPvil1+JRpmJ55CgAtn8DIcpBs0bL5L3q5bVQvF4tAW/k/9JYSj7dCpaYCAv5rufe0vcCbBRQXGvzpkWjvLklQ==",
"dependencies": {
"commander": "6.2.0",
"doctrine": "3.0.0",
"glob": "7.1.6",
"lodash.mergewith": "^4.6.2",
"swagger-parser": "^10.0.3",
"yaml": "2.0.0-1"
},
"bin": {
"swagger-jsdoc": "bin/swagger-jsdoc.js"
},
"engines": {
"node": ">=12.0.0"
}
},
"node_modules/swagger-jsdoc/node_modules/glob": {
"version": "7.1.6",
"resolved": "https://registry.npmjs.org/glob/-/glob-7.1.6.tgz",
"integrity": "sha512-LwaxwyZ72Lk7vZINtNNrywX0ZuLyStrdDtabefZKAY5ZGJhVtgdznluResxNmPitE0SAO+O26sWTHeKSI2wMBA==",
"deprecated": "Glob versions prior to v9 are no longer supported",
"dependencies": {
"fs.realpath": "^1.0.0",
"inflight": "^1.0.4",
"inherits": "2",
"minimatch": "^3.0.4",
"once": "^1.3.0",
"path-is-absolute": "^1.0.0"
},
"engines": {
"node": "*"
},
"funding": {
"url": "https://github.com/sponsors/isaacs"
}
},
"node_modules/swagger-parser": {
"version": "10.0.3",
"resolved": "https://registry.npmjs.org/swagger-parser/-/swagger-parser-10.0.3.tgz",
"integrity": "sha512-nF7oMeL4KypldrQhac8RyHerJeGPD1p2xDh900GPvc+Nk7nWP6jX2FcC7WmkinMoAmoO774+AFXcWsW8gMWEIg==",
"dependencies": {
"@apidevtools/swagger-parser": "10.0.3"
},
"engines": {
"node": ">=10"
}
},
"node_modules/swagger-ui-dist": {
"version": "5.31.0",
"resolved": "https://registry.npmjs.org/swagger-ui-dist/-/swagger-ui-dist-5.31.0.tgz",
"integrity": "sha512-zSUTIck02fSga6rc0RZP3b7J7wgHXwLea8ZjgLA3Vgnb8QeOl3Wou2/j5QkzSGeoz6HusP/coYuJl33aQxQZpg==",
"dependencies": {
"@scarf/scarf": "=1.4.0"
}
},
"node_modules/swagger-ui-express": {
"version": "5.0.1",
"resolved": "https://registry.npmjs.org/swagger-ui-express/-/swagger-ui-express-5.0.1.tgz",
"integrity": "sha512-SrNU3RiBGTLLmFU8GIJdOdanJTl4TOmT27tt3bWWHppqYmAZ6IDuEuBvMU6nZq0zLEe6b/1rACXCgLZqO6ZfrA==",
"dependencies": {
"swagger-ui-dist": ">=5.0.0"
},
"engines": {
"node": ">= v0.10.32"
},
"peerDependencies": {
"express": ">=4.0.0 || >=5.0.0-beta"
}
},
"node_modules/tar": {
"version": "6.2.1",
"resolved": "https://registry.npmjs.org/tar/-/tar-6.2.1.tgz",
@@ -5406,6 +5601,14 @@
"uuid": "dist/bin/uuid"
}
},
"node_modules/validator": {
"version": "13.15.23",
"resolved": "https://registry.npmjs.org/validator/-/validator-13.15.23.tgz",
"integrity": "sha512-4yoz1kEWqUjzi5zsPbAS/903QXSYp0UOtHsPpp7p9rHAw/W+dkInskAE386Fat3oKRROwO98d9ZB0G4cObgUyw==",
"engines": {
"node": ">= 0.10"
}
},
"node_modules/vary": {
"version": "1.1.2",
"resolved": "https://registry.npmjs.org/vary/-/vary-1.1.2.tgz",
@@ -5584,6 +5787,14 @@
"resolved": "https://registry.npmjs.org/yallist/-/yallist-4.0.0.tgz",
"integrity": "sha512-3wdGidZyq5PB084XLES5TpOSRA3wjXAlIWMhum2kRcv/41Sn2emQ0dycQW4uZXLejwKvg6EsvbdlVL+FYEct7A=="
},
"node_modules/yaml": {
"version": "2.0.0-1",
"resolved": "https://registry.npmjs.org/yaml/-/yaml-2.0.0-1.tgz",
"integrity": "sha512-W7h5dEhywMKenDJh2iX/LABkbFnBxasD27oyXWDS/feDsxiw0dD5ncXdYXgkvAsXIY2MpW/ZKkr9IU30DBdMNQ==",
"engines": {
"node": ">= 6"
}
},
"node_modules/yargs": {
"version": "17.7.2",
"resolved": "https://registry.npmjs.org/yargs/-/yargs-17.7.2.tgz",
@@ -5618,6 +5829,34 @@
"fd-slicer": "~1.1.0"
}
},
"node_modules/z-schema": {
"version": "5.0.5",
"resolved": "https://registry.npmjs.org/z-schema/-/z-schema-5.0.5.tgz",
"integrity": "sha512-D7eujBWkLa3p2sIpJA0d1pr7es+a7m0vFAnZLlCEKq/Ij2k0MLi9Br2UPxoxdYystm5K1yeBGzub0FlYUEWj2Q==",
"dependencies": {
"lodash.get": "^4.4.2",
"lodash.isequal": "^4.5.0",
"validator": "^13.7.0"
},
"bin": {
"z-schema": "bin/z-schema"
},
"engines": {
"node": ">=8.0.0"
},
"optionalDependencies": {
"commander": "^9.4.1"
}
},
"node_modules/z-schema/node_modules/commander": {
"version": "9.5.0",
"resolved": "https://registry.npmjs.org/commander/-/commander-9.5.0.tgz",
"integrity": "sha512-KRs7WVDKg86PWiuAqhDrAQnTXZKraVcCc6vFdL14qrZ/DcWwuRo7VoiYXalXO7S5GKpqYiVEwCbgFDfxNHKJBQ==",
"optional": true,
"engines": {
"node": "^12.20.0 || >=14"
}
},
"node_modules/zod": {
"version": "3.25.76",
"resolved": "https://registry.npmjs.org/zod/-/zod-3.25.76.tgz",

View File

@@ -49,6 +49,8 @@
"puppeteer-extra-plugin-stealth": "^2.11.2",
"sharp": "^0.32.0",
"socks-proxy-agent": "^8.0.2",
"swagger-jsdoc": "^6.2.8",
"swagger-ui-express": "^5.0.1",
"user-agents": "^1.1.669",
"uuid": "^9.0.1",
"zod": "^3.22.4"
@@ -61,6 +63,8 @@
"@types/node": "^20.10.5",
"@types/node-cron": "^3.0.11",
"@types/pg": "^8.15.6",
"@types/swagger-jsdoc": "^6.0.4",
"@types/swagger-ui-express": "^4.1.8",
"@types/uuid": "^9.0.7",
"tsx": "^4.7.0",
"typescript": "^5.3.3"

View File

@@ -7,6 +7,7 @@
*
* NO username/password auth in API. Use tokens only.
*
* Trusted origins are managed via /admin and stored in the trusted_origins table.
* Localhost bypass: curl from 127.0.0.1 gets automatic admin access.
*/
import { Request, Response, NextFunction } from 'express';
@@ -16,8 +17,8 @@ import { pool } from '../db/pool';
const JWT_SECRET = process.env.JWT_SECRET || 'change_this_in_production';
// Trusted origins that bypass auth for internal/same-origin requests
const TRUSTED_ORIGINS = [
// Fallback trusted origins (used if DB unavailable)
const FALLBACK_TRUSTED_ORIGINS = [
'https://cannaiq.co',
'https://www.cannaiq.co',
'https://findadispo.com',
@@ -29,31 +30,108 @@ const TRUSTED_ORIGINS = [
'http://localhost:5173',
];
// Pattern-based trusted origins (wildcards)
const TRUSTED_ORIGIN_PATTERNS = [
/^https:\/\/.*\.cannabrands\.app$/, // *.cannabrands.app
/^https:\/\/.*\.cannaiq\.co$/, // *.cannaiq.co
const FALLBACK_TRUSTED_PATTERNS = [
/^https:\/\/.*\.cannabrands\.app$/,
/^https:\/\/.*\.cannaiq\.co$/,
];
// Trusted IPs for internal pod-to-pod communication
const TRUSTED_IPS = [
const FALLBACK_TRUSTED_IPS = [
'127.0.0.1',
'::1',
'::ffff:127.0.0.1',
];
// Cache for DB-backed trusted origins
let trustedOriginsCache: {
ips: Set<string>;
domains: Set<string>;
patterns: RegExp[];
loadedAt: Date;
} | null = null;
/**
* Load trusted origins from DB with caching (5 min TTL)
*/
async function loadTrustedOrigins(): Promise<{
ips: Set<string>;
domains: Set<string>;
patterns: RegExp[];
}> {
// Return cached if fresh
if (trustedOriginsCache) {
const age = Date.now() - trustedOriginsCache.loadedAt.getTime();
if (age < 5 * 60 * 1000) {
return trustedOriginsCache;
}
}
try {
const result = await pool.query(`
SELECT origin_type, origin_value
FROM trusted_origins
WHERE active = true
`);
const ips = new Set<string>();
const domains = new Set<string>();
const patterns: RegExp[] = [];
for (const row of result.rows) {
switch (row.origin_type) {
case 'ip':
ips.add(row.origin_value);
break;
case 'domain':
// Store as full origin for comparison
if (!row.origin_value.startsWith('http')) {
domains.add(`https://${row.origin_value}`);
domains.add(`http://${row.origin_value}`);
} else {
domains.add(row.origin_value);
}
break;
case 'pattern':
try {
patterns.push(new RegExp(row.origin_value));
} catch {
console.warn(`[Auth] Invalid trusted origin pattern: ${row.origin_value}`);
}
break;
}
}
trustedOriginsCache = { ips, domains, patterns, loadedAt: new Date() };
return trustedOriginsCache;
} catch (error) {
// DB not available or table doesn't exist - use fallbacks
return {
ips: new Set(FALLBACK_TRUSTED_IPS),
domains: new Set(FALLBACK_TRUSTED_ORIGINS),
patterns: FALLBACK_TRUSTED_PATTERNS,
};
}
}
/**
* Clear trusted origins cache (called when admin updates origins)
*/
export function clearTrustedOriginsCache() {
trustedOriginsCache = null;
}
/**
* Check if request is from a trusted origin/IP
*/
function isTrustedRequest(req: Request): boolean {
async function isTrustedRequest(req: Request): Promise<boolean> {
const { ips, domains, patterns } = await loadTrustedOrigins();
// Check origin header
const origin = req.headers.origin;
if (origin) {
if (TRUSTED_ORIGINS.includes(origin)) {
if (domains.has(origin)) {
return true;
}
// Check pattern-based origins (wildcards like *.cannabrands.app)
for (const pattern of TRUSTED_ORIGIN_PATTERNS) {
for (const pattern of patterns) {
if (pattern.test(origin)) {
return true;
}
@@ -63,16 +141,15 @@ function isTrustedRequest(req: Request): boolean {
// Check referer header (for same-origin requests without CORS)
const referer = req.headers.referer;
if (referer) {
for (const trusted of TRUSTED_ORIGINS) {
for (const trusted of domains) {
if (referer.startsWith(trusted)) {
return true;
}
}
// Check pattern-based referers
try {
const refererUrl = new URL(referer);
const refererOrigin = refererUrl.origin;
for (const pattern of TRUSTED_ORIGIN_PATTERNS) {
for (const pattern of patterns) {
if (pattern.test(refererOrigin)) {
return true;
}
@@ -84,7 +161,7 @@ function isTrustedRequest(req: Request): boolean {
// Check IP for internal requests (pod-to-pod, localhost)
const clientIp = req.ip || req.socket.remoteAddress || '';
if (TRUSTED_IPS.includes(clientIp)) {
if (ips.has(clientIp)) {
return true;
}
@@ -200,7 +277,7 @@ export async function authMiddleware(req: AuthRequest, res: Response, next: Next
}
// No token provided - check trusted origins for API access (WordPress, etc.)
if (isTrustedRequest(req)) {
if (await isTrustedRequest(req)) {
req.user = {
id: 0,
email: 'internal@system',

View File

@@ -147,6 +147,8 @@ import workerRegistryRoutes from './routes/worker-registry';
// Per TASK_WORKFLOW_2024-12-10.md: Raw payload access API
import payloadsRoutes from './routes/payloads';
import k8sRoutes from './routes/k8s';
import trustedOriginsRoutes from './routes/trusted-origins';
// Mark requests from trusted domains (cannaiq.co, findagram.co, findadispo.com)
// These domains can access the API without authentication
@@ -200,6 +202,10 @@ app.use('/api/admin/orchestrator', orchestratorAdminRoutes);
app.use('/api/admin/debug', adminDebugRoutes);
console.log('[AdminDebug] Routes registered at /api/admin/debug');
// Admin routes - trusted origins management (IPs, domains that bypass auth)
app.use('/api/admin/trusted-origins', trustedOriginsRoutes);
console.log('[TrustedOrigins] Routes registered at /api/admin/trusted-origins');
// Admin routes - intelligence (brands, pricing analytics)
app.use('/api/admin/intelligence', intelligenceRoutes);
console.log('[Intelligence] Routes registered at /api/admin/intelligence');

View File

@@ -28,8 +28,55 @@ const router = Router();
const getDbPool = (): Pool => getPool() as unknown as Pool;
/**
* GET /api/payloads
* List payload metadata (paginated)
* @swagger
* /payloads:
* get:
* summary: List payload metadata
* description: Returns paginated list of raw crawl payload metadata. Does not include the actual payload data.
* tags: [Payloads]
* parameters:
* - in: query
* name: limit
* schema:
* type: integer
* default: 50
* maximum: 100
* description: Number of payloads to return
* - in: query
* name: offset
* schema:
* type: integer
* default: 0
* description: Number of payloads to skip
* - in: query
* name: dispensary_id
* schema:
* type: integer
* description: Filter by dispensary ID
* responses:
* 200:
* description: List of payload metadata
* content:
* application/json:
* schema:
* type: object
* properties:
* success:
* type: boolean
* example: true
* payloads:
* type: array
* items:
* $ref: '#/components/schemas/PayloadMetadata'
* pagination:
* type: object
* properties:
* limit:
* type: integer
* offset:
* type: integer
* 500:
* description: Server error
*/
router.get('/', async (req: Request, res: Response) => {
try {
@@ -56,8 +103,35 @@ router.get('/', async (req: Request, res: Response) => {
});
/**
* GET /api/payloads/:id
* Get payload metadata by ID
* @swagger
* /payloads/{id}:
* get:
* summary: Get payload metadata by ID
* description: Returns metadata for a specific payload including dispensary name, size, and timestamps.
* tags: [Payloads]
* parameters:
* - in: path
* name: id
* required: true
* schema:
* type: integer
* description: Payload ID
* responses:
* 200:
* description: Payload metadata
* content:
* application/json:
* schema:
* type: object
* properties:
* success:
* type: boolean
* payload:
* $ref: '#/components/schemas/PayloadMetadata'
* 404:
* description: Payload not found
* 500:
* description: Server error
*/
router.get('/:id', async (req: Request, res: Response) => {
try {
@@ -97,8 +171,43 @@ router.get('/:id', async (req: Request, res: Response) => {
});
/**
* GET /api/payloads/:id/data
* Get full payload JSON (decompressed from disk)
* @swagger
* /payloads/{id}/data:
* get:
* summary: Get full payload data
* description: Returns the complete raw crawl payload JSON, decompressed from disk. This includes all products from the crawl.
* tags: [Payloads]
* parameters:
* - in: path
* name: id
* required: true
* schema:
* type: integer
* description: Payload ID
* responses:
* 200:
* description: Full payload data
* content:
* application/json:
* schema:
* type: object
* properties:
* success:
* type: boolean
* metadata:
* $ref: '#/components/schemas/PayloadMetadata'
* data:
* type: object
* description: Raw GraphQL response with products array
* properties:
* products:
* type: array
* items:
* type: object
* 404:
* description: Payload not found
* 500:
* description: Server error
*/
router.get('/:id/data', async (req: Request, res: Response) => {
try {
@@ -123,8 +232,48 @@ router.get('/:id/data', async (req: Request, res: Response) => {
});
/**
* GET /api/payloads/store/:dispensaryId
* List payloads for a specific store
* @swagger
* /payloads/store/{dispensaryId}:
* get:
* summary: List payloads for a store
* description: Returns paginated list of payload metadata for a specific dispensary.
* tags: [Payloads]
* parameters:
* - in: path
* name: dispensaryId
* required: true
* schema:
* type: integer
* description: Dispensary ID
* - in: query
* name: limit
* schema:
* type: integer
* default: 20
* maximum: 100
* - in: query
* name: offset
* schema:
* type: integer
* default: 0
* responses:
* 200:
* description: List of payloads for store
* content:
* application/json:
* schema:
* type: object
* properties:
* success:
* type: boolean
* dispensaryId:
* type: integer
* payloads:
* type: array
* items:
* $ref: '#/components/schemas/PayloadMetadata'
* 500:
* description: Server error
*/
router.get('/store/:dispensaryId', async (req: Request, res: Response) => {
try {
@@ -152,8 +301,42 @@ router.get('/store/:dispensaryId', async (req: Request, res: Response) => {
});
/**
* GET /api/payloads/store/:dispensaryId/latest
* Get the latest payload for a store (with full data)
* @swagger
* /payloads/store/{dispensaryId}/latest:
* get:
* summary: Get latest payload for a store
* description: Returns the most recent raw crawl payload for a dispensary, including full product data.
* tags: [Payloads]
* parameters:
* - in: path
* name: dispensaryId
* required: true
* schema:
* type: integer
* description: Dispensary ID
* responses:
* 200:
* description: Latest payload with full data
* content:
* application/json:
* schema:
* type: object
* properties:
* success:
* type: boolean
* metadata:
* $ref: '#/components/schemas/PayloadMetadata'
* data:
* type: object
* properties:
* products:
* type: array
* items:
* type: object
* 404:
* description: No payloads found for dispensary
* 500:
* description: Server error
*/
router.get('/store/:dispensaryId/latest', async (req: Request, res: Response) => {
try {
@@ -181,12 +364,107 @@ router.get('/store/:dispensaryId/latest', async (req: Request, res: Response) =>
});
/**
* GET /api/payloads/store/:dispensaryId/diff
* Compare two payloads for a store
*
* Query params:
* - from: payload ID (older)
* - to: payload ID (newer) - optional, defaults to latest
* @swagger
* /payloads/store/{dispensaryId}/diff:
* get:
* summary: Compare two payloads
* description: |
* Compares two crawl payloads for a store and returns the differences.
* If no IDs are provided, compares the two most recent payloads.
* Returns added products, removed products, price changes, and stock changes.
* tags: [Payloads]
* parameters:
* - in: path
* name: dispensaryId
* required: true
* schema:
* type: integer
* description: Dispensary ID
* - in: query
* name: from
* schema:
* type: integer
* description: Older payload ID (optional)
* - in: query
* name: to
* schema:
* type: integer
* description: Newer payload ID (optional)
* responses:
* 200:
* description: Payload diff results
* content:
* application/json:
* schema:
* type: object
* properties:
* success:
* type: boolean
* from:
* type: object
* properties:
* id:
* type: integer
* fetchedAt:
* type: string
* format: date-time
* productCount:
* type: integer
* to:
* type: object
* properties:
* id:
* type: integer
* fetchedAt:
* type: string
* format: date-time
* productCount:
* type: integer
* diff:
* type: object
* properties:
* added:
* type: integer
* removed:
* type: integer
* priceChanges:
* type: integer
* stockChanges:
* type: integer
* details:
* type: object
* properties:
* added:
* type: array
* items:
* type: object
* removed:
* type: array
* items:
* type: object
* priceChanges:
* type: array
* items:
* type: object
* properties:
* id:
* type: string
* name:
* type: string
* oldPrice:
* type: number
* newPrice:
* type: number
* stockChanges:
* type: array
* items:
* type: object
* 400:
* description: Need at least 2 payloads to diff
* 404:
* description: One or both payloads not found
* 500:
* description: Server error
*/
router.get('/store/:dispensaryId/diff', async (req: Request, res: Response) => {
try {
@@ -331,4 +609,370 @@ router.get('/store/:dispensaryId/diff', async (req: Request, res: Response) => {
}
});
/**
* GET /api/payloads/store/:dispensaryId/query
* Query products from the latest payload with flexible filters
*
* Query params:
* - brand: Filter by brand name (partial match)
* - category: Filter by category (exact match)
* - subcategory: Filter by subcategory
* - strain_type: Filter by strain type (indica, sativa, hybrid, cbd)
* - in_stock: Filter by stock status (true/false)
* - price_min: Minimum price
* - price_max: Maximum price
* - thc_min: Minimum THC percentage
* - thc_max: Maximum THC percentage
* - search: Search product name (partial match)
* - fields: Comma-separated list of fields to return
* - limit: Max results (default 100, max 1000)
* - offset: Skip results for pagination
* - sort: Sort field (name, price, thc, brand)
* - order: Sort order (asc, desc)
*/
router.get('/store/:dispensaryId/query', async (req: Request, res: Response) => {
try {
const pool = getDbPool();
const dispensaryId = parseInt(req.params.dispensaryId);
// Get latest payload
const result = await getLatestPayload(pool, dispensaryId);
if (!result) {
return res.status(404).json({
success: false,
error: `No payloads found for dispensary ${dispensaryId}`,
});
}
let products = result.payload.products || [];
// Parse query params
const {
brand,
category,
subcategory,
strain_type,
in_stock,
price_min,
price_max,
thc_min,
thc_max,
search,
fields,
limit: limitStr,
offset: offsetStr,
sort,
order,
} = req.query;
// Apply filters
if (brand) {
const brandLower = (brand as string).toLowerCase();
products = products.filter((p: any) =>
p.brand?.name?.toLowerCase().includes(brandLower)
);
}
if (category) {
const catLower = (category as string).toLowerCase();
products = products.filter((p: any) =>
p.category?.toLowerCase() === catLower ||
p.Category?.toLowerCase() === catLower
);
}
if (subcategory) {
const subLower = (subcategory as string).toLowerCase();
products = products.filter((p: any) =>
p.subcategory?.toLowerCase() === subLower ||
p.subCategory?.toLowerCase() === subLower
);
}
if (strain_type) {
const strainLower = (strain_type as string).toLowerCase();
products = products.filter((p: any) =>
p.strainType?.toLowerCase() === strainLower ||
p.strain_type?.toLowerCase() === strainLower
);
}
if (in_stock !== undefined) {
const wantInStock = in_stock === 'true';
products = products.filter((p: any) => {
const status = p.Status || p.status;
const isInStock = status === 'Active' || status === 'In Stock' || status === 'in_stock';
return wantInStock ? isInStock : !isInStock;
});
}
if (price_min !== undefined) {
const min = parseFloat(price_min as string);
products = products.filter((p: any) => {
const price = p.Prices?.[0]?.price || p.price;
return price >= min;
});
}
if (price_max !== undefined) {
const max = parseFloat(price_max as string);
products = products.filter((p: any) => {
const price = p.Prices?.[0]?.price || p.price;
return price <= max;
});
}
if (thc_min !== undefined) {
const min = parseFloat(thc_min as string);
products = products.filter((p: any) => {
const thc = p.potencyThc?.formatted || p.thc || 0;
const thcNum = typeof thc === 'string' ? parseFloat(thc) : thc;
return thcNum >= min;
});
}
if (thc_max !== undefined) {
const max = parseFloat(thc_max as string);
products = products.filter((p: any) => {
const thc = p.potencyThc?.formatted || p.thc || 0;
const thcNum = typeof thc === 'string' ? parseFloat(thc) : thc;
return thcNum <= max;
});
}
if (search) {
const searchLower = (search as string).toLowerCase();
products = products.filter((p: any) =>
p.name?.toLowerCase().includes(searchLower)
);
}
// Sort
if (sort) {
const sortOrder = order === 'desc' ? -1 : 1;
products.sort((a: any, b: any) => {
let aVal: any, bVal: any;
switch (sort) {
case 'name':
aVal = a.name || '';
bVal = b.name || '';
break;
case 'price':
aVal = a.Prices?.[0]?.price || a.price || 0;
bVal = b.Prices?.[0]?.price || b.price || 0;
break;
case 'thc':
aVal = parseFloat(a.potencyThc?.formatted || a.thc || '0');
bVal = parseFloat(b.potencyThc?.formatted || b.thc || '0');
break;
case 'brand':
aVal = a.brand?.name || '';
bVal = b.brand?.name || '';
break;
default:
return 0;
}
if (aVal < bVal) return -1 * sortOrder;
if (aVal > bVal) return 1 * sortOrder;
return 0;
});
}
// Pagination
const totalCount = products.length;
const limit = Math.min(parseInt(limitStr as string) || 100, 1000);
const offset = parseInt(offsetStr as string) || 0;
products = products.slice(offset, offset + limit);
// Field selection - normalize product structure
const normalizedProducts = products.map((p: any) => {
const normalized: any = {
id: p._id || p.id,
name: p.name,
brand: p.brand?.name || p.brandName,
category: p.category || p.Category,
subcategory: p.subcategory || p.subCategory,
strain_type: p.strainType || p.strain_type,
price: p.Prices?.[0]?.price || p.price,
price_med: p.Prices?.[0]?.priceMed || p.priceMed,
price_rec: p.Prices?.[0]?.priceRec || p.priceRec,
thc: p.potencyThc?.formatted || p.thc,
cbd: p.potencyCbd?.formatted || p.cbd,
weight: p.Prices?.[0]?.weight || p.weight,
status: p.Status || p.status,
in_stock: (p.Status || p.status) === 'Active',
image_url: p.image || p.imageUrl || p.image_url,
description: p.description,
};
// If specific fields requested, filter
if (fields) {
const requestedFields = (fields as string).split(',').map(f => f.trim());
const filtered: any = {};
for (const field of requestedFields) {
if (normalized.hasOwnProperty(field)) {
filtered[field] = normalized[field];
}
}
return filtered;
}
return normalized;
});
res.json({
success: true,
dispensaryId,
payloadId: result.metadata.id,
fetchedAt: result.metadata.fetchedAt,
query: {
filters: {
brand: brand || null,
category: category || null,
subcategory: subcategory || null,
strain_type: strain_type || null,
in_stock: in_stock || null,
price_min: price_min || null,
price_max: price_max || null,
thc_min: thc_min || null,
thc_max: thc_max || null,
search: search || null,
},
sort: sort || null,
order: order || 'asc',
limit,
offset,
},
pagination: {
total: totalCount,
returned: normalizedProducts.length,
limit,
offset,
has_more: offset + limit < totalCount,
},
products: normalizedProducts,
});
} catch (error: any) {
console.error('[Payloads] Query error:', error.message);
res.status(500).json({ success: false, error: error.message });
}
});
/**
* GET /api/payloads/store/:dispensaryId/aggregate
* Aggregate data from the latest payload
*
* Query params:
* - group_by: Field to group by (brand, category, subcategory, strain_type)
* - metrics: Comma-separated metrics (count, avg_price, min_price, max_price, avg_thc)
*/
router.get('/store/:dispensaryId/aggregate', async (req: Request, res: Response) => {
try {
const pool = getDbPool();
const dispensaryId = parseInt(req.params.dispensaryId);
const result = await getLatestPayload(pool, dispensaryId);
if (!result) {
return res.status(404).json({
success: false,
error: `No payloads found for dispensary ${dispensaryId}`,
});
}
const products = result.payload.products || [];
const groupBy = req.query.group_by as string;
const metricsParam = req.query.metrics as string || 'count';
const metrics = metricsParam.split(',').map(m => m.trim());
if (!groupBy) {
return res.status(400).json({
success: false,
error: 'group_by parameter is required (brand, category, subcategory, strain_type)',
});
}
// Group products
const groups: Map<string, any[]> = new Map();
for (const p of products) {
let key: string;
switch (groupBy) {
case 'brand':
key = p.brand?.name || 'Unknown';
break;
case 'category':
key = p.category || p.Category || 'Unknown';
break;
case 'subcategory':
key = p.subcategory || p.subCategory || 'Unknown';
break;
case 'strain_type':
key = p.strainType || p.strain_type || 'Unknown';
break;
default:
key = 'Unknown';
}
if (!groups.has(key)) {
groups.set(key, []);
}
groups.get(key)!.push(p);
}
// Calculate metrics
const aggregations: any[] = [];
for (const [key, items] of groups) {
const agg: any = { [groupBy]: key };
for (const metric of metrics) {
switch (metric) {
case 'count':
agg.count = items.length;
break;
case 'avg_price':
const prices = items.map(p => p.Prices?.[0]?.price || p.price).filter(p => p != null);
agg.avg_price = prices.length > 0 ? prices.reduce((a, b) => a + b, 0) / prices.length : null;
break;
case 'min_price':
const minPrices = items.map(p => p.Prices?.[0]?.price || p.price).filter(p => p != null);
agg.min_price = minPrices.length > 0 ? Math.min(...minPrices) : null;
break;
case 'max_price':
const maxPrices = items.map(p => p.Prices?.[0]?.price || p.price).filter(p => p != null);
agg.max_price = maxPrices.length > 0 ? Math.max(...maxPrices) : null;
break;
case 'avg_thc':
const thcs = items.map(p => parseFloat(p.potencyThc?.formatted || p.thc || '0')).filter(t => t > 0);
agg.avg_thc = thcs.length > 0 ? thcs.reduce((a, b) => a + b, 0) / thcs.length : null;
break;
case 'in_stock_count':
agg.in_stock_count = items.filter(p => (p.Status || p.status) === 'Active').length;
break;
}
}
aggregations.push(agg);
}
// Sort by count descending
aggregations.sort((a, b) => (b.count || 0) - (a.count || 0));
res.json({
success: true,
dispensaryId,
payloadId: result.metadata.id,
fetchedAt: result.metadata.fetchedAt,
groupBy,
metrics,
totalProducts: products.length,
groupCount: aggregations.length,
aggregations,
});
} catch (error: any) {
console.error('[Payloads] Aggregate error:', error.message);
res.status(500).json({ success: false, error: error.message });
}
});
export default router;

View File

@@ -0,0 +1,224 @@
/**
* Trusted Origins Admin Routes
*
* Manage IPs and domains that bypass API key authentication.
* Available at /api/admin/trusted-origins
*/
import { Router, Response } from 'express';
import { pool } from '../db/pool';
import { AuthRequest, authMiddleware, requireRole, clearTrustedOriginsCache } from '../auth/middleware';
const router = Router();
// All routes require admin auth
router.use(authMiddleware);
router.use(requireRole('admin', 'superadmin'));
/**
* GET /api/admin/trusted-origins
* List all trusted origins
*/
router.get('/', async (req: AuthRequest, res: Response) => {
try {
const result = await pool.query(`
SELECT
id,
origin_type,
origin_value,
description,
active,
created_at,
updated_at
FROM trusted_origins
ORDER BY origin_type, origin_value
`);
res.json({
success: true,
origins: result.rows,
counts: {
total: result.rows.length,
active: result.rows.filter(r => r.active).length,
ips: result.rows.filter(r => r.origin_type === 'ip').length,
domains: result.rows.filter(r => r.origin_type === 'domain').length,
patterns: result.rows.filter(r => r.origin_type === 'pattern').length,
},
});
} catch (error: any) {
console.error('[TrustedOrigins] List error:', error.message);
res.status(500).json({ success: false, error: error.message });
}
});
/**
* POST /api/admin/trusted-origins
* Add a new trusted origin
*/
router.post('/', async (req: AuthRequest, res: Response) => {
try {
const { origin_type, origin_value, description } = req.body;
if (!origin_type || !origin_value) {
return res.status(400).json({
success: false,
error: 'origin_type and origin_value are required',
});
}
if (!['ip', 'domain', 'pattern'].includes(origin_type)) {
return res.status(400).json({
success: false,
error: 'origin_type must be: ip, domain, or pattern',
});
}
// Validate pattern if regex
if (origin_type === 'pattern') {
try {
new RegExp(origin_value);
} catch {
return res.status(400).json({
success: false,
error: 'Invalid regex pattern',
});
}
}
const result = await pool.query(`
INSERT INTO trusted_origins (origin_type, origin_value, description, created_by)
VALUES ($1, $2, $3, $4)
RETURNING id, origin_type, origin_value, description, active, created_at
`, [origin_type, origin_value, description || null, req.user?.id || null]);
// Invalidate cache
clearTrustedOriginsCache();
res.json({
success: true,
origin: result.rows[0],
});
} catch (error: any) {
if (error.code === '23505') {
return res.status(409).json({
success: false,
error: 'This origin already exists',
});
}
console.error('[TrustedOrigins] Add error:', error.message);
res.status(500).json({ success: false, error: error.message });
}
});
/**
* PUT /api/admin/trusted-origins/:id
* Update a trusted origin
*/
router.put('/:id', async (req: AuthRequest, res: Response) => {
try {
const id = parseInt(req.params.id);
const { origin_type, origin_value, description, active } = req.body;
// Validate pattern if regex
if (origin_type === 'pattern' && origin_value) {
try {
new RegExp(origin_value);
} catch {
return res.status(400).json({
success: false,
error: 'Invalid regex pattern',
});
}
}
const result = await pool.query(`
UPDATE trusted_origins
SET
origin_type = COALESCE($1, origin_type),
origin_value = COALESCE($2, origin_value),
description = COALESCE($3, description),
active = COALESCE($4, active),
updated_at = NOW()
WHERE id = $5
RETURNING id, origin_type, origin_value, description, active, updated_at
`, [origin_type, origin_value, description, active, id]);
if (result.rows.length === 0) {
return res.status(404).json({ success: false, error: 'Origin not found' });
}
// Invalidate cache
clearTrustedOriginsCache();
res.json({
success: true,
origin: result.rows[0],
});
} catch (error: any) {
console.error('[TrustedOrigins] Update error:', error.message);
res.status(500).json({ success: false, error: error.message });
}
});
/**
* DELETE /api/admin/trusted-origins/:id
* Delete a trusted origin
*/
router.delete('/:id', async (req: AuthRequest, res: Response) => {
try {
const id = parseInt(req.params.id);
const result = await pool.query(`
DELETE FROM trusted_origins WHERE id = $1 RETURNING id, origin_value
`, [id]);
if (result.rows.length === 0) {
return res.status(404).json({ success: false, error: 'Origin not found' });
}
// Invalidate cache
clearTrustedOriginsCache();
res.json({
success: true,
deleted: result.rows[0],
});
} catch (error: any) {
console.error('[TrustedOrigins] Delete error:', error.message);
res.status(500).json({ success: false, error: error.message });
}
});
/**
* POST /api/admin/trusted-origins/:id/toggle
* Toggle active status
*/
router.post('/:id/toggle', async (req: AuthRequest, res: Response) => {
try {
const id = parseInt(req.params.id);
const result = await pool.query(`
UPDATE trusted_origins
SET active = NOT active, updated_at = NOW()
WHERE id = $1
RETURNING id, origin_type, origin_value, active
`, [id]);
if (result.rows.length === 0) {
return res.status(404).json({ success: false, error: 'Origin not found' });
}
// Invalidate cache
clearTrustedOriginsCache();
res.json({
success: true,
origin: result.rows[0],
});
} catch (error: any) {
console.error('[TrustedOrigins] Toggle error:', error.message);
res.status(500).json({ success: false, error: error.message });
}
});
export default router;

View File

@@ -23,8 +23,6 @@
import { Router, Request, Response } from 'express';
import { pool } from '../db/pool';
import os from 'os';
import { runPuppeteerPreflightWithRetry } from '../services/puppeteer-preflight';
import { CrawlRotator } from '../services/crawl-rotator';
const router = Router();
@@ -866,58 +864,4 @@ router.get('/pods', async (_req: Request, res: Response) => {
}
});
// ============================================================
// PREFLIGHT SMOKE TEST
// ============================================================
/**
* POST /api/worker-registry/preflight-test
* Run an HTTP (Puppeteer) preflight test and return results
*
* This is a smoke test endpoint to verify the preflight system works.
* Returns IP, fingerprint data, bot detection results, and products fetched.
*/
router.post('/preflight-test', async (_req: Request, res: Response) => {
try {
console.log('[PreflightTest] Starting HTTP preflight smoke test...');
// Create a temporary CrawlRotator for the test
const crawlRotator = new CrawlRotator();
// Run the Puppeteer preflight (with 1 retry)
const startTime = Date.now();
const result = await runPuppeteerPreflightWithRetry(crawlRotator, 1);
const duration = Date.now() - startTime;
console.log(`[PreflightTest] Completed in ${duration}ms - passed: ${result.passed}`);
res.json({
success: true,
test: 'http_preflight',
duration_ms: duration,
result: {
passed: result.passed,
proxy_ip: result.proxyIp,
fingerprint: result.fingerprint,
bot_detection: result.botDetection,
products_returned: result.productsReturned,
browser_user_agent: result.browserUserAgent,
ip_verified: result.ipVerified,
proxy_available: result.proxyAvailable,
proxy_connected: result.proxyConnected,
antidetect_ready: result.antidetectReady,
response_time_ms: result.responseTimeMs,
error: result.error
}
});
} catch (error: any) {
console.error('[PreflightTest] Error:', error.message);
res.status(500).json({
success: false,
test: 'http_preflight',
error: error.message
});
}
});
export default router;

View File

@@ -26,34 +26,6 @@ const TEST_PLATFORM_ID = '6405ef617056e8014d79101b';
const FINGERPRINT_DEMO_URL = 'https://demo.fingerprint.com/';
const AMIUNIQUE_URL = 'https://amiunique.org/fingerprint';
// IP geolocation API for timezone lookup (free, no key required)
const IP_API_URL = 'http://ip-api.com/json';
/**
* Look up timezone from IP address using ip-api.com
* Returns IANA timezone (e.g., 'America/New_York') or null on failure
*/
async function getTimezoneFromIp(ip: string): Promise<{ timezone: string; city?: string; region?: string } | null> {
try {
const axios = require('axios');
const response = await axios.get(`${IP_API_URL}/${ip}?fields=status,timezone,city,regionName`, {
timeout: 5000,
});
if (response.data?.status === 'success' && response.data?.timezone) {
return {
timezone: response.data.timezone,
city: response.data.city,
region: response.data.regionName,
};
}
return null;
} catch (err: any) {
console.log(`[PuppeteerPreflight] IP geolocation lookup failed: ${err.message}`);
return null;
}
}
export interface PuppeteerPreflightResult extends PreflightResult {
method: 'http';
/** Number of products returned (proves API access) */
@@ -70,13 +42,6 @@ export interface PuppeteerPreflightResult extends PreflightResult {
expectedProxyIp?: string;
/** Whether IP verification passed (detected IP matches proxy) */
ipVerified?: boolean;
/** Detected timezone from IP geolocation */
detectedTimezone?: string;
/** Detected location from IP geolocation */
detectedLocation?: {
city?: string;
region?: string;
};
}
/**
@@ -171,52 +136,7 @@ export async function runPuppeteerPreflight(
};
// =========================================================================
// STEP 1a: Get IP address directly via simple API (more reliable than scraping)
// =========================================================================
console.log(`[PuppeteerPreflight] Getting proxy IP address...`);
try {
const ipApiResponse = await page.evaluate(async () => {
try {
const response = await fetch('https://api.ipify.org?format=json');
const data = await response.json();
return { ip: data.ip, error: null };
} catch (err: any) {
return { ip: null, error: err.message };
}
});
if (ipApiResponse.ip) {
result.proxyIp = ipApiResponse.ip;
result.proxyConnected = true;
console.log(`[PuppeteerPreflight] Detected proxy IP: ${ipApiResponse.ip}`);
// Look up timezone from IP
const geoData = await getTimezoneFromIp(ipApiResponse.ip);
if (geoData) {
result.detectedTimezone = geoData.timezone;
result.detectedLocation = { city: geoData.city, region: geoData.region };
console.log(`[PuppeteerPreflight] IP Geolocation: ${geoData.city}, ${geoData.region} (${geoData.timezone})`);
// Set browser timezone to match proxy location via CDP
try {
const client = await page.target().createCDPSession();
await client.send('Emulation.setTimezoneOverride', { timezoneId: geoData.timezone });
console.log(`[PuppeteerPreflight] Browser timezone set to: ${geoData.timezone}`);
} catch (tzErr: any) {
console.log(`[PuppeteerPreflight] Failed to set browser timezone: ${tzErr.message}`);
}
} else {
console.log(`[PuppeteerPreflight] WARNING: Could not determine timezone from IP - timezone mismatch possible`);
}
} else {
console.log(`[PuppeteerPreflight] IP lookup failed: ${ipApiResponse.error || 'unknown error'}`);
}
} catch (ipErr: any) {
console.log(`[PuppeteerPreflight] IP API error: ${ipErr.message}`);
}
// =========================================================================
// STEP 1b: Visit fingerprint.com demo to verify anti-detect
// STEP 1: Visit fingerprint.com demo to verify anti-detect and get IP
// =========================================================================
console.log(`[PuppeteerPreflight] Testing anti-detect at ${FINGERPRINT_DEMO_URL}...`);
@@ -279,8 +199,6 @@ export async function runPuppeteerPreflight(
// Don't fail - residential proxies often show different egress IPs
}
}
// Note: Timezone already set earlier via ipify.org IP lookup
}
if (fingerprintData.visitorId) {

View File

@@ -435,47 +435,29 @@ export class TaskWorker {
/**
* Report preflight status to worker_registry
* Function signature: update_worker_preflight(worker_id, transport, status, ip, response_ms, error, fingerprint)
*/
private async reportPreflightStatus(): Promise<void> {
try {
// Update worker_registry directly via SQL (more reliable than API)
// CURL preflight - includes IP address
await this.pool.query(`
SELECT update_worker_preflight($1, 'curl', $2, $3, $4, $5, $6)
SELECT update_worker_preflight($1, 'curl', $2, $3, $4)
`, [
this.workerId,
this.preflightCurlPassed ? 'passed' : 'failed',
this.preflightCurlResult?.proxyIp || null,
this.preflightCurlResult?.responseTimeMs || null,
this.preflightCurlResult?.error || null,
null, // No fingerprint for curl
]);
// HTTP preflight - includes IP, fingerprint, and timezone data
const httpFingerprint = this.preflightHttpResult ? {
...this.preflightHttpResult.fingerprint,
detectedTimezone: (this.preflightHttpResult as any).detectedTimezone,
detectedLocation: (this.preflightHttpResult as any).detectedLocation,
productsReturned: this.preflightHttpResult.productsReturned,
botDetection: (this.preflightHttpResult as any).botDetection,
} : null;
await this.pool.query(`
SELECT update_worker_preflight($1, 'http', $2, $3, $4, $5, $6)
SELECT update_worker_preflight($1, 'http', $2, $3, $4)
`, [
this.workerId,
this.preflightHttpPassed ? 'passed' : 'failed',
this.preflightHttpResult?.proxyIp || null,
this.preflightHttpResult?.responseTimeMs || null,
this.preflightHttpResult?.error || null,
httpFingerprint ? JSON.stringify(httpFingerprint) : null,
]);
console.log(`[TaskWorker] Preflight status reported to worker_registry`);
if (this.preflightHttpResult?.proxyIp) {
console.log(`[TaskWorker] HTTP IP: ${this.preflightHttpResult.proxyIp}, Timezone: ${(this.preflightHttpResult as any).detectedTimezone || 'unknown'}`);
}
} catch (err: any) {
// Non-fatal - worker can still function
console.warn(`[TaskWorker] Could not report preflight status: ${err.message}`);