Compare commits

..

13 Commits

Author SHA1 Message Date
Kelly
8a09691e91 feat(tasks): Add task pool start/stop toggle
- Add task-pool-state.ts for shared pause/resume state
- Add /api/tasks/pool/status, pause, resume endpoints
- Add Start/Stop Pool toggle button to TasksDashboard
- Spinner stops when pool is closed
- Fix is_active column name in store-discovery.ts
- Fix missing active column in task-service.ts claimTask
- Auto-refresh every 15 seconds

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 00:07:14 -07:00
kelly
01810c40a1 Merge pull request 'fix(worker): Wait for proxies instead of crashing' (#27) from fix/worker-proxy-wait into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/27
2025-12-11 06:43:29 +00:00
Kelly
1b46ab699d fix(national): Show all states count, not filtered "active" states
The "Active States" metric was arbitrary and confusing. Changed to
show total states count - all states in the system regardless of
whether they have data or not.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 23:25:50 -07:00
Kelly
ac1995f63f fix(pricing): Simplify category chart to prevent overflow
- Replace complex price range bars with simple horizontal bars
- Use overflow-hidden to prevent bars extending beyond container
- Calculate bar width as percentage of max avg price
- Limit to top 12 categories for cleaner display
- Fixed-width labels prevent layout shift

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 23:24:31 -07:00
Kelly
de93669652 fix(national): Count active states by product data, not crawl status
Active states should count states with actual product data, not just
states where crawling is enabled. A state can have historical data
even if crawling is currently disabled.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 23:23:20 -07:00
Kelly
dffc124920 fix(national): Fix active states count and remove StateBadge
- Change active_states to count states with crawl_enabled=true dispensaries
- Filter all national summary queries by crawl_enabled=true
- Remove unused StateBadge from National Dashboard header
- StateBadge was showing "Arizona" with no way to change it

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 23:22:19 -07:00
Kelly
932ceb0287 feat(intelligence): Add state filter to all Intelligence pages
- Add state filter to Intelligence Brands API and frontend
- Add state filter to Intelligence Pricing API and frontend
- Add state filter to Intelligence Stores API and frontend
- Fix null safety issues with toLocaleString() calls
- Update backend /stores endpoint to return skuCount, snapshotCount, chainName
- Add overall stats to pricing endpoint

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 23:19:54 -07:00
Kelly
824d48fd85 fix: Add curl to Docker, add active flag to worker_tasks
- Install curl in Docker container for Dutchie HTTP requests
- Add 'active' column to worker_tasks (default false) to prevent
  accidental task execution on startup
- Update task-service to only claim tasks where active=true

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 23:12:09 -07:00
Kelly
47fdab0382 fix: Filter orchestrator states by crawl_enabled
The states dropdown was showing count of ALL dispensaries instead of
just crawl-enabled ones. Now correctly filters to match the actual
stores that would be displayed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 23:09:04 -07:00
Kelly
ed7ddc6375 ci: Add database migration step to deploy pipeline
Migrations now run automatically before deployments:
1. Build new Docker image
2. Run migrations using the new image
3. Deploy to Kubernetes

Requires new secrets: db_host, db_port, db_name, db_user, db_pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 23:07:25 -07:00
Kelly
cf06f4a8c0 feat(worker): Listen for proxy_added notifications
- Workers now use PostgreSQL LISTEN/NOTIFY to wake up immediately when proxies are added
- Added trigger on proxies table to NOTIFY 'proxy_added' when active proxy inserted/updated
- Falls back to 30s polling if LISTEN fails

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 22:58:00 -07:00
kelly
61e915968f Merge pull request 'feat(tasks): Refactor task workflow with payload/refresh separation' (#26) from feat/task-workflow-refactor into master 2025-12-11 05:24:11 +00:00
kelly
a4338669a9 Merge pull request 'fix(auth): Prioritize JWT token over trusted origin bypass' (#24) from fix/auth-token-priority into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/24
2025-12-11 01:34:10 +00:00
20 changed files with 1203 additions and 200 deletions

View File

@@ -163,7 +163,32 @@ steps:
event: push
# ===========================================
# STAGE 3: Deploy (after Docker builds)
# STAGE 3: Run Database Migrations (before deploy)
# ===========================================
migrate:
image: code.cannabrands.app/creationshop/dispensary-scraper:${CI_COMMIT_SHA:0:8}
environment:
CANNAIQ_DB_HOST:
from_secret: db_host
CANNAIQ_DB_PORT:
from_secret: db_port
CANNAIQ_DB_NAME:
from_secret: db_name
CANNAIQ_DB_USER:
from_secret: db_user
CANNAIQ_DB_PASS:
from_secret: db_pass
commands:
- cd /app
- node dist/db/migrate.js
depends_on:
- docker-backend
when:
branch: master
event: push
# ===========================================
# STAGE 4: Deploy (after migrations)
# ===========================================
deploy:
image: bitnami/kubectl:latest
@@ -182,7 +207,7 @@ steps:
- kubectl rollout status deployment/scraper -n dispensary-scraper --timeout=300s
- kubectl rollout status deployment/cannaiq-frontend -n dispensary-scraper --timeout=120s
depends_on:
- docker-backend
- migrate
- docker-cannaiq
- docker-findadispo
- docker-findagram

View File

@@ -25,8 +25,9 @@ ENV APP_GIT_SHA=${APP_GIT_SHA}
ENV APP_BUILD_TIME=${APP_BUILD_TIME}
ENV CONTAINER_IMAGE_TAG=${CONTAINER_IMAGE_TAG}
# Install Chromium dependencies
# Install Chromium dependencies and curl for HTTP requests
RUN apt-get update && apt-get install -y \
curl \
chromium \
fonts-liberation \
libnss3 \

View File

@@ -0,0 +1,27 @@
-- Migration: 082_proxy_notification_trigger
-- Date: 2024-12-11
-- Description: Add PostgreSQL NOTIFY trigger to alert workers when proxies are added
-- Create function to notify workers when active proxy is added/activated
CREATE OR REPLACE FUNCTION notify_proxy_added()
RETURNS TRIGGER AS $$
BEGIN
-- Only notify if proxy is active
IF NEW.active = true THEN
PERFORM pg_notify('proxy_added', NEW.id::text);
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- Drop existing trigger if any
DROP TRIGGER IF EXISTS proxy_added_trigger ON proxies;
-- Create trigger on insert and update of active column
CREATE TRIGGER proxy_added_trigger
AFTER INSERT OR UPDATE OF active ON proxies
FOR EACH ROW
EXECUTE FUNCTION notify_proxy_added();
COMMENT ON FUNCTION notify_proxy_added() IS
'Sends PostgreSQL NOTIFY to proxy_added channel when an active proxy is added or activated. Workers LISTEN on this channel to wake up immediately.';

View File

@@ -1,6 +1,6 @@
{
"name": "dutchie-menus-backend",
"version": "1.5.1",
"version": "1.6.0",
"lockfileVersion": 3,
"requires": true,
"packages": {
@@ -46,6 +46,97 @@
"resolved": "https://registry.npmjs.org/@ioredis/commands/-/commands-1.4.0.tgz",
"integrity": "sha512-aFT2yemJJo+TZCmieA7qnYGQooOS7QfNmYrzGtsYd3g9j5iDP8AimYYAesf79ohjbLG12XxC4nG5DyEnC88AsQ=="
},
"node_modules/@jsep-plugin/assignment": {
"version": "1.3.0",
"resolved": "https://registry.npmjs.org/@jsep-plugin/assignment/-/assignment-1.3.0.tgz",
"integrity": "sha512-VVgV+CXrhbMI3aSusQyclHkenWSAm95WaiKrMxRFam3JSUiIaQjoMIw2sEs/OX4XifnqeQUN4DYbJjlA8EfktQ==",
"engines": {
"node": ">= 10.16.0"
},
"peerDependencies": {
"jsep": "^0.4.0||^1.0.0"
}
},
"node_modules/@jsep-plugin/regex": {
"version": "1.0.4",
"resolved": "https://registry.npmjs.org/@jsep-plugin/regex/-/regex-1.0.4.tgz",
"integrity": "sha512-q7qL4Mgjs1vByCaTnDFcBnV9HS7GVPJX5vyVoCgZHNSC9rjwIlmbXG5sUuorR5ndfHAIlJ8pVStxvjXHbNvtUg==",
"engines": {
"node": ">= 10.16.0"
},
"peerDependencies": {
"jsep": "^0.4.0||^1.0.0"
}
},
"node_modules/@kubernetes/client-node": {
"version": "1.4.0",
"resolved": "https://registry.npmjs.org/@kubernetes/client-node/-/client-node-1.4.0.tgz",
"integrity": "sha512-Zge3YvF7DJi264dU1b3wb/GmzR99JhUpqTvp+VGHfwZT+g7EOOYNScDJNZwXy9cszyIGPIs0VHr+kk8e95qqrA==",
"dependencies": {
"@types/js-yaml": "^4.0.1",
"@types/node": "^24.0.0",
"@types/node-fetch": "^2.6.13",
"@types/stream-buffers": "^3.0.3",
"form-data": "^4.0.0",
"hpagent": "^1.2.0",
"isomorphic-ws": "^5.0.0",
"js-yaml": "^4.1.0",
"jsonpath-plus": "^10.3.0",
"node-fetch": "^2.7.0",
"openid-client": "^6.1.3",
"rfc4648": "^1.3.0",
"socks-proxy-agent": "^8.0.4",
"stream-buffers": "^3.0.2",
"tar-fs": "^3.0.9",
"ws": "^8.18.2"
}
},
"node_modules/@kubernetes/client-node/node_modules/@types/node": {
"version": "24.10.3",
"resolved": "https://registry.npmjs.org/@types/node/-/node-24.10.3.tgz",
"integrity": "sha512-gqkrWUsS8hcm0r44yn7/xZeV1ERva/nLgrLxFRUGb7aoNMIJfZJ3AC261zDQuOAKC7MiXai1WCpYc48jAHoShQ==",
"dependencies": {
"undici-types": "~7.16.0"
}
},
"node_modules/@kubernetes/client-node/node_modules/tar-fs": {
"version": "3.1.1",
"resolved": "https://registry.npmjs.org/tar-fs/-/tar-fs-3.1.1.tgz",
"integrity": "sha512-LZA0oaPOc2fVo82Txf3gw+AkEd38szODlptMYejQUhndHMLQ9M059uXR+AfS7DNo0NpINvSqDsvyaCrBVkptWg==",
"dependencies": {
"pump": "^3.0.0",
"tar-stream": "^3.1.5"
},
"optionalDependencies": {
"bare-fs": "^4.0.1",
"bare-path": "^3.0.0"
}
},
"node_modules/@kubernetes/client-node/node_modules/undici-types": {
"version": "7.16.0",
"resolved": "https://registry.npmjs.org/undici-types/-/undici-types-7.16.0.tgz",
"integrity": "sha512-Zz+aZWSj8LE6zoxD+xrjh4VfkIG8Ya6LvYkZqtUQGJPZjYl53ypCaUwWqo7eI0x66KBGeRo+mlBEkMSeSZ38Nw=="
},
"node_modules/@kubernetes/client-node/node_modules/ws": {
"version": "8.18.3",
"resolved": "https://registry.npmjs.org/ws/-/ws-8.18.3.tgz",
"integrity": "sha512-PEIGCY5tSlUt50cqyMXfCzX+oOPqN0vuGqWzbcJ2xvnkzkq46oOpz7dQaTDBdfICb4N14+GARUDw2XV2N4tvzg==",
"engines": {
"node": ">=10.0.0"
},
"peerDependencies": {
"bufferutil": "^4.0.1",
"utf-8-validate": ">=5.0.2"
},
"peerDependenciesMeta": {
"bufferutil": {
"optional": true
},
"utf-8-validate": {
"optional": true
}
}
},
"node_modules/@mapbox/node-pre-gyp": {
"version": "1.0.11",
"resolved": "https://registry.npmjs.org/@mapbox/node-pre-gyp/-/node-pre-gyp-1.0.11.tgz",
@@ -251,6 +342,11 @@
"integrity": "sha512-r8Tayk8HJnX0FztbZN7oVqGccWgw98T/0neJphO91KkmOzug1KkofZURD4UaD5uH8AqcFLfdPErnBod0u71/qg==",
"dev": true
},
"node_modules/@types/js-yaml": {
"version": "4.0.9",
"resolved": "https://registry.npmjs.org/@types/js-yaml/-/js-yaml-4.0.9.tgz",
"integrity": "sha512-k4MGaQl5TGo/iipqb2UDG2UwjXziSWkh0uysQelTlJpX1qGlpUZYm8PnO4DxG1qBomtJUdYJ6qR6xdIah10JLg=="
},
"node_modules/@types/jsonwebtoken": {
"version": "9.0.10",
"resolved": "https://registry.npmjs.org/@types/jsonwebtoken/-/jsonwebtoken-9.0.10.tgz",
@@ -276,7 +372,6 @@
"version": "20.19.25",
"resolved": "https://registry.npmjs.org/@types/node/-/node-20.19.25.tgz",
"integrity": "sha512-ZsJzA5thDQMSQO788d7IocwwQbI8B5OPzmqNvpf3NY/+MHDAS759Wo0gd2WQeXYt5AAAQjzcrTVC6SKCuYgoCQ==",
"devOptional": true,
"dependencies": {
"undici-types": "~6.21.0"
}
@@ -287,6 +382,15 @@
"integrity": "sha512-0ikrnug3/IyneSHqCBeslAhlK2aBfYek1fGo4bP4QnZPmiqSGRK+Oy7ZMisLWkesffJvQ1cqAcBnJC+8+nxIAg==",
"dev": true
},
"node_modules/@types/node-fetch": {
"version": "2.6.13",
"resolved": "https://registry.npmjs.org/@types/node-fetch/-/node-fetch-2.6.13.tgz",
"integrity": "sha512-QGpRVpzSaUs30JBSGPjOg4Uveu384erbHBoT1zeONvyCfwQxIkUshLAOqN/k9EjGviPRmWTTe6aH2qySWKTVSw==",
"dependencies": {
"@types/node": "*",
"form-data": "^4.0.4"
}
},
"node_modules/@types/pg": {
"version": "8.15.6",
"resolved": "https://registry.npmjs.org/@types/pg/-/pg-8.15.6.tgz",
@@ -340,6 +444,14 @@
"@types/node": "*"
}
},
"node_modules/@types/stream-buffers": {
"version": "3.0.8",
"resolved": "https://registry.npmjs.org/@types/stream-buffers/-/stream-buffers-3.0.8.tgz",
"integrity": "sha512-J+7VaHKNvlNPJPEJXX/fKa9DZtR/xPMwuIbe+yNOwp1YB+ApUOBv2aUpEoBJEi8nJgbgs1x8e73ttg0r1rSUdw==",
"dependencies": {
"@types/node": "*"
}
},
"node_modules/@types/uuid": {
"version": "9.0.8",
"resolved": "https://registry.npmjs.org/@types/uuid/-/uuid-9.0.8.tgz",
@@ -520,6 +632,78 @@
}
}
},
"node_modules/bare-fs": {
"version": "4.5.2",
"resolved": "https://registry.npmjs.org/bare-fs/-/bare-fs-4.5.2.tgz",
"integrity": "sha512-veTnRzkb6aPHOvSKIOy60KzURfBdUflr5VReI+NSaPL6xf+XLdONQgZgpYvUuZLVQ8dCqxpBAudaOM1+KpAUxw==",
"optional": true,
"dependencies": {
"bare-events": "^2.5.4",
"bare-path": "^3.0.0",
"bare-stream": "^2.6.4",
"bare-url": "^2.2.2",
"fast-fifo": "^1.3.2"
},
"engines": {
"bare": ">=1.16.0"
},
"peerDependencies": {
"bare-buffer": "*"
},
"peerDependenciesMeta": {
"bare-buffer": {
"optional": true
}
}
},
"node_modules/bare-os": {
"version": "3.6.2",
"resolved": "https://registry.npmjs.org/bare-os/-/bare-os-3.6.2.tgz",
"integrity": "sha512-T+V1+1srU2qYNBmJCXZkUY5vQ0B4FSlL3QDROnKQYOqeiQR8UbjNHlPa+TIbM4cuidiN9GaTaOZgSEgsvPbh5A==",
"optional": true,
"engines": {
"bare": ">=1.14.0"
}
},
"node_modules/bare-path": {
"version": "3.0.0",
"resolved": "https://registry.npmjs.org/bare-path/-/bare-path-3.0.0.tgz",
"integrity": "sha512-tyfW2cQcB5NN8Saijrhqn0Zh7AnFNsnczRcuWODH0eYAXBsJ5gVxAUuNr7tsHSC6IZ77cA0SitzT+s47kot8Mw==",
"optional": true,
"dependencies": {
"bare-os": "^3.0.1"
}
},
"node_modules/bare-stream": {
"version": "2.7.0",
"resolved": "https://registry.npmjs.org/bare-stream/-/bare-stream-2.7.0.tgz",
"integrity": "sha512-oyXQNicV1y8nc2aKffH+BUHFRXmx6VrPzlnaEvMhram0nPBrKcEdcyBg5r08D0i8VxngHFAiVyn1QKXpSG0B8A==",
"optional": true,
"dependencies": {
"streamx": "^2.21.0"
},
"peerDependencies": {
"bare-buffer": "*",
"bare-events": "*"
},
"peerDependenciesMeta": {
"bare-buffer": {
"optional": true
},
"bare-events": {
"optional": true
}
}
},
"node_modules/bare-url": {
"version": "2.3.2",
"resolved": "https://registry.npmjs.org/bare-url/-/bare-url-2.3.2.tgz",
"integrity": "sha512-ZMq4gd9ngV5aTMa5p9+UfY0b3skwhHELaDkhEHetMdX0LRkW9kzaym4oo/Eh+Ghm0CCDuMTsRIGM/ytUc1ZYmw==",
"optional": true,
"dependencies": {
"bare-path": "^3.0.0"
}
},
"node_modules/base64-js": {
"version": "1.5.1",
"resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.5.1.tgz",
@@ -2019,6 +2203,14 @@
"node": ">=16.0.0"
}
},
"node_modules/hpagent": {
"version": "1.2.0",
"resolved": "https://registry.npmjs.org/hpagent/-/hpagent-1.2.0.tgz",
"integrity": "sha512-A91dYTeIB6NoXG+PxTQpCCDDnfHsW9kc06Lvpu1TEe9gnd6ZFeiBoRO9JvzEv6xK7EX97/dUE8g/vBMTqTS3CA==",
"engines": {
"node": ">=14"
}
},
"node_modules/htmlparser2": {
"version": "10.0.0",
"resolved": "https://registry.npmjs.org/htmlparser2/-/htmlparser2-10.0.0.tgz",
@@ -2382,6 +2574,22 @@
"node": ">=0.10.0"
}
},
"node_modules/isomorphic-ws": {
"version": "5.0.0",
"resolved": "https://registry.npmjs.org/isomorphic-ws/-/isomorphic-ws-5.0.0.tgz",
"integrity": "sha512-muId7Zzn9ywDsyXgTIafTry2sV3nySZeUDe6YedVd1Hvuuep5AsIlqK+XefWpYTyJG5e503F2xIuT2lcU6rCSw==",
"peerDependencies": {
"ws": "*"
}
},
"node_modules/jose": {
"version": "6.1.3",
"resolved": "https://registry.npmjs.org/jose/-/jose-6.1.3.tgz",
"integrity": "sha512-0TpaTfihd4QMNwrz/ob2Bp7X04yuxJkjRGi4aKmOqwhov54i6u79oCv7T+C7lo70MKH6BesI3vscD1yb/yzKXQ==",
"funding": {
"url": "https://github.com/sponsors/panva"
}
},
"node_modules/js-tokens": {
"version": "4.0.0",
"resolved": "https://registry.npmjs.org/js-tokens/-/js-tokens-4.0.0.tgz",
@@ -2398,6 +2606,14 @@
"js-yaml": "bin/js-yaml.js"
}
},
"node_modules/jsep": {
"version": "1.4.0",
"resolved": "https://registry.npmjs.org/jsep/-/jsep-1.4.0.tgz",
"integrity": "sha512-B7qPcEVE3NVkmSJbaYxvv4cHkVW7DQsZz13pUMrfS8z8Q/BuShN+gcTXrUlPiGqM2/t/EEaI030bpxMqY8gMlw==",
"engines": {
"node": ">= 10.16.0"
}
},
"node_modules/json-parse-even-better-errors": {
"version": "2.3.1",
"resolved": "https://registry.npmjs.org/json-parse-even-better-errors/-/json-parse-even-better-errors-2.3.1.tgz",
@@ -2419,6 +2635,23 @@
"graceful-fs": "^4.1.6"
}
},
"node_modules/jsonpath-plus": {
"version": "10.3.0",
"resolved": "https://registry.npmjs.org/jsonpath-plus/-/jsonpath-plus-10.3.0.tgz",
"integrity": "sha512-8TNmfeTCk2Le33A3vRRwtuworG/L5RrgMvdjhKZxvyShO+mBu2fP50OWUjRLNtvw344DdDarFh9buFAZs5ujeA==",
"dependencies": {
"@jsep-plugin/assignment": "^1.3.0",
"@jsep-plugin/regex": "^1.0.4",
"jsep": "^1.4.0"
},
"bin": {
"jsonpath": "bin/jsonpath-cli.js",
"jsonpath-plus": "bin/jsonpath-cli.js"
},
"engines": {
"node": ">=18.0.0"
}
},
"node_modules/jsonwebtoken": {
"version": "9.0.2",
"resolved": "https://registry.npmjs.org/jsonwebtoken/-/jsonwebtoken-9.0.2.tgz",
@@ -2493,6 +2726,11 @@
"resolved": "https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz",
"integrity": "sha512-v2kDEe57lecTulaDIuNTPy3Ry4gLGJ6Z1O3vE1krgXZNrsQ+LFTGHVxVjcXPs17LhbZVGedAJv8XZ1tvj5FvSg=="
},
"node_modules/lodash.clonedeep": {
"version": "4.5.0",
"resolved": "https://registry.npmjs.org/lodash.clonedeep/-/lodash.clonedeep-4.5.0.tgz",
"integrity": "sha512-H5ZhCF25riFd9uB5UCkVKo61m3S/xZk1x4wA6yp/L3RFP6Z/eHH1ymQcGLo7J3GMPfm0V/7m1tryHuGVxpqEBQ=="
},
"node_modules/lodash.defaults": {
"version": "4.2.0",
"resolved": "https://registry.npmjs.org/lodash.defaults/-/lodash.defaults-4.2.0.tgz",
@@ -2942,6 +3180,14 @@
"url": "https://github.com/fb55/nth-check?sponsor=1"
}
},
"node_modules/oauth4webapi": {
"version": "3.8.3",
"resolved": "https://registry.npmjs.org/oauth4webapi/-/oauth4webapi-3.8.3.tgz",
"integrity": "sha512-pQ5BsX3QRTgnt5HxgHwgunIRaDXBdkT23tf8dfzmtTIL2LTpdmxgbpbBm0VgFWAIDlezQvQCTgnVIUmHupXHxw==",
"funding": {
"url": "https://github.com/sponsors/panva"
}
},
"node_modules/object-assign": {
"version": "4.1.1",
"resolved": "https://registry.npmjs.org/object-assign/-/object-assign-4.1.1.tgz",
@@ -2980,6 +3226,18 @@
"wrappy": "1"
}
},
"node_modules/openid-client": {
"version": "6.8.1",
"resolved": "https://registry.npmjs.org/openid-client/-/openid-client-6.8.1.tgz",
"integrity": "sha512-VoYT6enBo6Vj2j3Q5Ec0AezS+9YGzQo1f5Xc42lreMGlfP4ljiXPKVDvCADh+XHCV/bqPu/wWSiCVXbJKvrODw==",
"dependencies": {
"jose": "^6.1.0",
"oauth4webapi": "^3.8.2"
},
"funding": {
"url": "https://github.com/sponsors/panva"
}
},
"node_modules/pac-proxy-agent": {
"version": "7.2.0",
"resolved": "https://registry.npmjs.org/pac-proxy-agent/-/pac-proxy-agent-7.2.0.tgz",
@@ -3883,6 +4141,11 @@
"url": "https://github.com/privatenumber/resolve-pkg-maps?sponsor=1"
}
},
"node_modules/rfc4648": {
"version": "1.5.4",
"resolved": "https://registry.npmjs.org/rfc4648/-/rfc4648-1.5.4.tgz",
"integrity": "sha512-rRg/6Lb+IGfJqO05HZkN50UtY7K/JhxJag1kP23+zyMfrvoB0B7RWv06MbOzoc79RgCdNTiUaNsTT1AJZ7Z+cg=="
},
"node_modules/rimraf": {
"version": "3.0.2",
"resolved": "https://registry.npmjs.org/rimraf/-/rimraf-3.0.2.tgz",
@@ -4313,6 +4576,14 @@
"node": ">= 0.8"
}
},
"node_modules/stream-buffers": {
"version": "3.0.3",
"resolved": "https://registry.npmjs.org/stream-buffers/-/stream-buffers-3.0.3.tgz",
"integrity": "sha512-pqMqwQCso0PBJt2PQmDO0cFj0lyqmiwOMiMSkVtRokl7e+ZTRYgDHKnuZNbqjiJXgsg4nuqtD/zxuo9KqTp0Yw==",
"engines": {
"node": ">= 0.10.0"
}
},
"node_modules/streamx": {
"version": "2.23.0",
"resolved": "https://registry.npmjs.org/streamx/-/streamx-2.23.0.tgz",
@@ -4532,8 +4803,7 @@
"node_modules/undici-types": {
"version": "6.21.0",
"resolved": "https://registry.npmjs.org/undici-types/-/undici-types-6.21.0.tgz",
"integrity": "sha512-iwDZqg0QAGrg9Rav5H4n0M64c3mkR59cJ6wQp+7C4nI0gsmExaedaYLNO44eT4AtBBwjbTiGPMlt2Md0T9H9JQ==",
"devOptional": true
"integrity": "sha512-iwDZqg0QAGrg9Rav5H4n0M64c3mkR59cJ6wQp+7C4nI0gsmExaedaYLNO44eT4AtBBwjbTiGPMlt2Md0T9H9JQ=="
},
"node_modules/universalify": {
"version": "2.0.1",
@@ -4556,6 +4826,14 @@
"resolved": "https://registry.npmjs.org/urlpattern-polyfill/-/urlpattern-polyfill-10.0.0.tgz",
"integrity": "sha512-H/A06tKD7sS1O1X2SshBVeA5FLycRpjqiBeqGKmBwBDBy28EnRjORxTNe269KSSr5un5qyWi1iL61wLxpd+ZOg=="
},
"node_modules/user-agents": {
"version": "1.1.669",
"resolved": "https://registry.npmjs.org/user-agents/-/user-agents-1.1.669.tgz",
"integrity": "sha512-pbIzG+AOqCaIpySKJ4IAm1l0VyE4jMnK4y1thV8lm8PYxI+7X5uWcppOK7zY79TCKKTAnJH3/4gaVIZHsjrmJA==",
"dependencies": {
"lodash.clonedeep": "^4.5.0"
}
},
"node_modules/util": {
"version": "0.12.5",
"resolved": "https://registry.npmjs.org/util/-/util-0.12.5.tgz",

View File

@@ -702,12 +702,10 @@ export class StateQueryService {
async getNationalSummary(): Promise<NationalSummary> {
const stateMetrics = await this.getAllStateMetrics();
// Get all states count and aggregate metrics
const result = await this.pool.query(`
SELECT
COUNT(DISTINCT s.code) AS total_states,
COUNT(DISTINCT CASE WHEN EXISTS (
SELECT 1 FROM dispensaries d WHERE d.state = s.code AND d.menu_type IS NOT NULL
) THEN s.code END) AS active_states,
(SELECT COUNT(*) FROM dispensaries WHERE state IS NOT NULL) AS total_stores,
(SELECT COUNT(*) FROM store_products sp
JOIN dispensaries d ON sp.dispensary_id = d.id
@@ -725,7 +723,7 @@ export class StateQueryService {
return {
totalStates: parseInt(data.total_states),
activeStates: parseInt(data.active_states),
activeStates: parseInt(data.total_states), // Same as totalStates - all states shown
totalStores: parseInt(data.total_stores),
totalProducts: parseInt(data.total_products),
totalBrands: parseInt(data.total_brands),

View File

@@ -14,13 +14,25 @@ router.use(authMiddleware);
/**
* GET /api/admin/intelligence/brands
* List all brands with state presence, store counts, and pricing
* Query params:
* - state: Filter by state (e.g., "AZ")
* - limit: Max results (default 500)
* - offset: Pagination offset
*/
router.get('/brands', async (req: Request, res: Response) => {
try {
const { limit = '500', offset = '0' } = req.query;
const { limit = '500', offset = '0', state } = req.query;
const limitNum = Math.min(parseInt(limit as string, 10), 1000);
const offsetNum = parseInt(offset as string, 10);
// Build WHERE clause based on state filter
let stateFilter = '';
const params: any[] = [limitNum, offsetNum];
if (state && state !== 'all') {
stateFilter = 'AND d.state = $3';
params.push(state);
}
const { rows } = await pool.query(`
SELECT
sp.brand_name_raw as brand_name,
@@ -32,17 +44,26 @@ router.get('/brands', async (req: Request, res: Response) => {
FROM store_products sp
JOIN dispensaries d ON sp.dispensary_id = d.id
WHERE sp.brand_name_raw IS NOT NULL AND sp.brand_name_raw != ''
${stateFilter}
GROUP BY sp.brand_name_raw
ORDER BY store_count DESC, sku_count DESC
LIMIT $1 OFFSET $2
`, [limitNum, offsetNum]);
`, params);
// Get total count
// Get total count with same state filter
const countParams: any[] = [];
let countStateFilter = '';
if (state && state !== 'all') {
countStateFilter = 'AND d.state = $1';
countParams.push(state);
}
const { rows: countRows } = await pool.query(`
SELECT COUNT(DISTINCT brand_name_raw) as total
FROM store_products
WHERE brand_name_raw IS NOT NULL AND brand_name_raw != ''
`);
SELECT COUNT(DISTINCT sp.brand_name_raw) as total
FROM store_products sp
JOIN dispensaries d ON sp.dispensary_id = d.id
WHERE sp.brand_name_raw IS NOT NULL AND sp.brand_name_raw != ''
${countStateFilter}
`, countParams);
res.json({
brands: rows.map((r: any) => ({
@@ -147,23 +168,58 @@ router.get('/brands/:brandName/penetration', async (req: Request, res: Response)
/**
* GET /api/admin/intelligence/pricing
* Get pricing analytics by category
* Query params:
* - state: Filter by state (e.g., "AZ")
*/
router.get('/pricing', async (req: Request, res: Response) => {
try {
const { rows: categoryRows } = await pool.query(`
SELECT
sp.category_raw as category,
ROUND(AVG(sp.price_rec)::numeric, 2) as avg_price,
MIN(sp.price_rec) as min_price,
MAX(sp.price_rec) as max_price,
ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY sp.price_rec)::numeric, 2) as median_price,
COUNT(*) as product_count
FROM store_products sp
WHERE sp.category_raw IS NOT NULL AND sp.price_rec > 0
GROUP BY sp.category_raw
ORDER BY product_count DESC
`);
const { state } = req.query;
// Build WHERE clause based on state filter
let stateFilter = '';
const categoryParams: any[] = [];
const stateQueryParams: any[] = [];
const overallParams: any[] = [];
if (state && state !== 'all') {
stateFilter = 'AND d.state = $1';
categoryParams.push(state);
overallParams.push(state);
}
// Category pricing with optional state filter
const categoryQuery = state && state !== 'all'
? `
SELECT
sp.category_raw as category,
ROUND(AVG(sp.price_rec)::numeric, 2) as avg_price,
MIN(sp.price_rec) as min_price,
MAX(sp.price_rec) as max_price,
ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY sp.price_rec)::numeric, 2) as median_price,
COUNT(*) as product_count
FROM store_products sp
JOIN dispensaries d ON sp.dispensary_id = d.id
WHERE sp.category_raw IS NOT NULL AND sp.price_rec > 0 ${stateFilter}
GROUP BY sp.category_raw
ORDER BY product_count DESC
`
: `
SELECT
sp.category_raw as category,
ROUND(AVG(sp.price_rec)::numeric, 2) as avg_price,
MIN(sp.price_rec) as min_price,
MAX(sp.price_rec) as max_price,
ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY sp.price_rec)::numeric, 2) as median_price,
COUNT(*) as product_count
FROM store_products sp
WHERE sp.category_raw IS NOT NULL AND sp.price_rec > 0
GROUP BY sp.category_raw
ORDER BY product_count DESC
`;
const { rows: categoryRows } = await pool.query(categoryQuery, categoryParams);
// State pricing
const { rows: stateRows } = await pool.query(`
SELECT
d.state,
@@ -178,6 +234,31 @@ router.get('/pricing', async (req: Request, res: Response) => {
ORDER BY avg_price DESC
`);
// Overall stats with optional state filter
const overallQuery = state && state !== 'all'
? `
SELECT
ROUND(AVG(sp.price_rec)::numeric, 2) as avg_price,
MIN(sp.price_rec) as min_price,
MAX(sp.price_rec) as max_price,
COUNT(*) as total_products
FROM store_products sp
JOIN dispensaries d ON sp.dispensary_id = d.id
WHERE sp.price_rec > 0 ${stateFilter}
`
: `
SELECT
ROUND(AVG(sp.price_rec)::numeric, 2) as avg_price,
MIN(sp.price_rec) as min_price,
MAX(sp.price_rec) as max_price,
COUNT(*) as total_products
FROM store_products sp
WHERE sp.price_rec > 0
`;
const { rows: overallRows } = await pool.query(overallQuery, overallParams);
const overall = overallRows[0];
res.json({
byCategory: categoryRows.map((r: any) => ({
category: r.category,
@@ -194,6 +275,12 @@ router.get('/pricing', async (req: Request, res: Response) => {
maxPrice: r.max_price ? parseFloat(r.max_price) : null,
productCount: parseInt(r.product_count, 10),
})),
overall: {
avgPrice: overall?.avg_price ? parseFloat(overall.avg_price) : null,
minPrice: overall?.min_price ? parseFloat(overall.min_price) : null,
maxPrice: overall?.max_price ? parseFloat(overall.max_price) : null,
totalProducts: parseInt(overall?.total_products || '0', 10),
},
});
} catch (error: any) {
console.error('[Intelligence] Error fetching pricing:', error.message);
@@ -204,9 +291,23 @@ router.get('/pricing', async (req: Request, res: Response) => {
/**
* GET /api/admin/intelligence/stores
* Get store intelligence summary
* Query params:
* - state: Filter by state (e.g., "AZ")
* - limit: Max results (default 200)
*/
router.get('/stores', async (req: Request, res: Response) => {
try {
const { state, limit = '200' } = req.query;
const limitNum = Math.min(parseInt(limit as string, 10), 500);
// Build WHERE clause based on state filter
let stateFilter = '';
const params: any[] = [limitNum];
if (state && state !== 'all') {
stateFilter = 'AND d.state = $2';
params.push(state);
}
const { rows: storeRows } = await pool.query(`
SELECT
d.id,
@@ -216,17 +317,22 @@ router.get('/stores', async (req: Request, res: Response) => {
d.state,
d.menu_type,
d.crawl_enabled,
COUNT(DISTINCT sp.id) as product_count,
c.name as chain_name,
COUNT(DISTINCT sp.id) as sku_count,
COUNT(DISTINCT sp.brand_name_raw) as brand_count,
ROUND(AVG(sp.price_rec)::numeric, 2) as avg_price,
MAX(sp.updated_at) as last_product_update
MAX(sp.updated_at) as last_crawl,
(SELECT COUNT(*) FROM store_product_snapshots sps
WHERE sps.store_product_id IN (SELECT id FROM store_products WHERE dispensary_id = d.id)) as snapshot_count
FROM dispensaries d
LEFT JOIN store_products sp ON sp.dispensary_id = d.id
WHERE d.state IS NOT NULL
GROUP BY d.id, d.name, d.dba_name, d.city, d.state, d.menu_type, d.crawl_enabled
ORDER BY product_count DESC
LIMIT 200
`);
LEFT JOIN chains c ON d.chain_id = c.id
WHERE d.state IS NOT NULL AND d.crawl_enabled = true
${stateFilter}
GROUP BY d.id, d.name, d.dba_name, d.city, d.state, d.menu_type, d.crawl_enabled, c.name
ORDER BY sku_count DESC
LIMIT $1
`, params);
res.json({
stores: storeRows.map((r: any) => ({
@@ -237,10 +343,13 @@ router.get('/stores', async (req: Request, res: Response) => {
state: r.state,
menuType: r.menu_type,
crawlEnabled: r.crawl_enabled,
productCount: parseInt(r.product_count || '0', 10),
chainName: r.chain_name || null,
skuCount: parseInt(r.sku_count || '0', 10),
snapshotCount: parseInt(r.snapshot_count || '0', 10),
brandCount: parseInt(r.brand_count || '0', 10),
avgPrice: r.avg_price ? parseFloat(r.avg_price) : null,
lastProductUpdate: r.last_product_update,
lastCrawl: r.last_crawl,
crawlFrequencyHours: 4, // Default crawl frequency
})),
total: storeRows.length,
});

View File

@@ -78,14 +78,14 @@ router.get('/metrics', async (_req: Request, res: Response) => {
/**
* GET /api/admin/orchestrator/states
* Returns array of states with at least one known dispensary
* Returns array of states with at least one crawl-enabled dispensary
*/
router.get('/states', async (_req: Request, res: Response) => {
try {
const { rows } = await pool.query(`
SELECT DISTINCT state, COUNT(*) as store_count
FROM dispensaries
WHERE state IS NOT NULL
WHERE state IS NOT NULL AND crawl_enabled = true
GROUP BY state
ORDER BY state
`);

View File

@@ -13,6 +13,12 @@ import {
TaskFilter,
} from '../tasks/task-service';
import { pool } from '../db/pool';
import {
isTaskPoolPaused,
pauseTaskPool,
resumeTaskPool,
getTaskPoolStatus,
} from '../tasks/task-pool-state';
const router = Router();
@@ -592,4 +598,42 @@ router.post('/migration/full-migrate', async (req: Request, res: Response) => {
}
});
/**
* GET /api/tasks/pool/status
* Check if task pool is paused
*/
router.get('/pool/status', async (_req: Request, res: Response) => {
const status = getTaskPoolStatus();
res.json({
success: true,
...status,
});
});
/**
* POST /api/tasks/pool/pause
* Pause the task pool - workers won't pick up new tasks
*/
router.post('/pool/pause', async (_req: Request, res: Response) => {
pauseTaskPool();
res.json({
success: true,
paused: true,
message: 'Task pool paused - workers will not pick up new tasks',
});
});
/**
* POST /api/tasks/pool/resume
* Resume the task pool - workers will pick up tasks again
*/
router.post('/pool/resume', async (_req: Request, res: Response) => {
resumeTaskPool();
res.json({
success: true,
paused: false,
message: 'Task pool resumed - workers will pick up new tasks',
});
});
export default router;

View File

@@ -25,7 +25,7 @@ export async function handleStoreDiscovery(ctx: TaskContext): Promise<TaskResult
try {
// Get states to discover
const statesResult = await pool.query(`
SELECT code FROM states WHERE active = true ORDER BY code
SELECT code FROM states WHERE is_active = true ORDER BY code
`);
const stateCodes = statesResult.rows.map(r => r.code);

View File

@@ -0,0 +1,35 @@
/**
* Task Pool State
*
* Shared state for task pool pause/resume functionality.
* This is kept separate to avoid circular dependencies between
* task-service.ts and routes/tasks.ts.
*
* State is in-memory and resets on server restart.
* By default, the pool is OPEN (not paused).
*/
let taskPoolPaused = false;
export function isTaskPoolPaused(): boolean {
return taskPoolPaused;
}
export function pauseTaskPool(): void {
taskPoolPaused = true;
console.log('[TaskPool] Task pool PAUSED - workers will not pick up new tasks');
}
export function resumeTaskPool(): void {
taskPoolPaused = false;
console.log('[TaskPool] Task pool RESUMED - workers can pick up tasks');
}
export function getTaskPoolStatus(): { paused: boolean; message: string } {
return {
paused: taskPoolPaused,
message: taskPoolPaused
? 'Task pool is paused - workers will not pick up new tasks'
: 'Task pool is open - workers are picking up tasks',
};
}

View File

@@ -9,6 +9,7 @@
*/
import { pool } from '../db/pool';
import { isTaskPoolPaused } from './task-pool-state';
// Helper to check if a table exists
async function tableExists(tableName: string): Promise<boolean> {
@@ -149,8 +150,14 @@ class TaskService {
/**
* Claim a task atomically for a worker
* If role is null, claims ANY available task (role-agnostic worker)
* Returns null if task pool is paused.
*/
async claimTask(role: TaskRole | null, workerId: string): Promise<WorkerTask | null> {
// Check if task pool is paused - don't claim any tasks
if (isTaskPoolPaused()) {
return null;
}
if (role) {
// Role-specific claiming - use the SQL function
const result = await pool.query(

View File

@@ -117,40 +117,79 @@ export class TaskWorker {
* Called once on worker startup before processing any tasks.
*
* IMPORTANT: Proxies are REQUIRED. Workers will wait until proxies are available.
* Workers listen for PostgreSQL NOTIFY 'proxy_added' to wake up immediately when proxies are added.
*/
private async initializeStealth(): Promise<void> {
const MAX_WAIT_MINUTES = 60;
const RETRY_INTERVAL_MS = 30000; // 30 seconds
const maxAttempts = (MAX_WAIT_MINUTES * 60 * 1000) / RETRY_INTERVAL_MS;
const POLL_INTERVAL_MS = 30000; // 30 seconds fallback polling
const maxAttempts = (MAX_WAIT_MINUTES * 60 * 1000) / POLL_INTERVAL_MS;
let attempts = 0;
let notifyClient: any = null;
while (attempts < maxAttempts) {
try {
// Load proxies from database
await this.crawlRotator.initialize();
const stats = this.crawlRotator.proxy.getStats();
if (stats.activeProxies > 0) {
console.log(`[TaskWorker] Loaded ${stats.activeProxies} proxies (${stats.avgSuccessRate.toFixed(1)}% avg success rate)`);
// Wire rotator to Dutchie client - proxies will be used for ALL requests
setCrawlRotator(this.crawlRotator);
console.log(`[TaskWorker] Stealth initialized: ${this.crawlRotator.userAgent.getCount()} fingerprints, proxy REQUIRED for all requests`);
return;
}
attempts++;
console.log(`[TaskWorker] No active proxies available (attempt ${attempts}). Waiting ${RETRY_INTERVAL_MS / 1000}s for proxies to be added...`);
await this.sleep(RETRY_INTERVAL_MS);
} catch (error: any) {
attempts++;
console.log(`[TaskWorker] Error loading proxies (attempt ${attempts}): ${error.message}. Retrying in ${RETRY_INTERVAL_MS / 1000}s...`);
await this.sleep(RETRY_INTERVAL_MS);
}
// Set up PostgreSQL LISTEN for proxy notifications
try {
notifyClient = await this.pool.connect();
await notifyClient.query('LISTEN proxy_added');
console.log(`[TaskWorker] Listening for proxy_added notifications...`);
} catch (err: any) {
console.log(`[TaskWorker] Could not set up LISTEN (will poll): ${err.message}`);
}
throw new Error(`No active proxies available after waiting ${MAX_WAIT_MINUTES} minutes. Add proxies to the database.`);
// Create a promise that resolves when notified
let notifyResolve: (() => void) | null = null;
if (notifyClient) {
notifyClient.on('notification', (msg: any) => {
if (msg.channel === 'proxy_added') {
console.log(`[TaskWorker] Received proxy_added notification!`);
if (notifyResolve) notifyResolve();
}
});
}
try {
while (attempts < maxAttempts) {
try {
// Load proxies from database
await this.crawlRotator.initialize();
const stats = this.crawlRotator.proxy.getStats();
if (stats.activeProxies > 0) {
console.log(`[TaskWorker] Loaded ${stats.activeProxies} proxies (${stats.avgSuccessRate.toFixed(1)}% avg success rate)`);
// Wire rotator to Dutchie client - proxies will be used for ALL requests
setCrawlRotator(this.crawlRotator);
console.log(`[TaskWorker] Stealth initialized: ${this.crawlRotator.userAgent.getCount()} fingerprints, proxy REQUIRED for all requests`);
return;
}
attempts++;
console.log(`[TaskWorker] No active proxies available (attempt ${attempts}). Waiting for proxies...`);
// Wait for either notification or timeout
await new Promise<void>((resolve) => {
notifyResolve = resolve;
setTimeout(resolve, POLL_INTERVAL_MS);
});
} catch (error: any) {
attempts++;
console.log(`[TaskWorker] Error loading proxies (attempt ${attempts}): ${error.message}. Retrying...`);
await this.sleep(POLL_INTERVAL_MS);
}
}
throw new Error(`No active proxies available after waiting ${MAX_WAIT_MINUTES} minutes. Add proxies to the database.`);
} finally {
// Clean up LISTEN connection
if (notifyClient) {
try {
await notifyClient.query('UNLISTEN proxy_added');
notifyClient.release();
} catch {
// Ignore cleanup errors
}
}
}
}
/**

View File

@@ -7,8 +7,8 @@
<title>CannaIQ - Cannabis Menu Intelligence Platform</title>
<meta name="description" content="CannaIQ provides real-time cannabis dispensary menu data, product tracking, and analytics for dispensaries across Arizona." />
<meta name="keywords" content="cannabis, dispensary, menu, products, analytics, Arizona" />
<script type="module" crossorigin src="/assets/index-BML8-px1.js"></script>
<link rel="stylesheet" crossorigin href="/assets/index-B2gR-58G.css">
<script type="module" crossorigin src="/assets/index-BXmp5CSY.js"></script>
<link rel="stylesheet" crossorigin href="/assets/index-4959QN4j.css">
</head>
<body>
<div id="root"></div>

View File

@@ -1518,10 +1518,11 @@ class ApiClient {
}
// Intelligence API
async getIntelligenceBrands(params?: { limit?: number; offset?: number }) {
async getIntelligenceBrands(params?: { limit?: number; offset?: number; state?: string }) {
const searchParams = new URLSearchParams();
if (params?.limit) searchParams.append('limit', params.limit.toString());
if (params?.offset) searchParams.append('offset', params.offset.toString());
if (params?.state) searchParams.append('state', params.state);
const queryString = searchParams.toString() ? `?${searchParams.toString()}` : '';
return this.request<{
brands: Array<{
@@ -1536,7 +1537,10 @@ class ApiClient {
}>(`/api/admin/intelligence/brands${queryString}`);
}
async getIntelligencePricing() {
async getIntelligencePricing(params?: { state?: string }) {
const searchParams = new URLSearchParams();
if (params?.state) searchParams.append('state', params.state);
const queryString = searchParams.toString() ? `?${searchParams.toString()}` : '';
return this.request<{
byCategory: Array<{
category: string;
@@ -1552,7 +1556,7 @@ class ApiClient {
maxPrice: number;
totalProducts: number;
};
}>('/api/admin/intelligence/pricing');
}>(`/api/admin/intelligence/pricing${queryString}`);
}
async getIntelligenceStoreActivity(params?: { state?: string; chainId?: number; limit?: number }) {
@@ -2884,6 +2888,27 @@ class ApiClient {
`/api/tasks/store/${dispensaryId}/active`
);
}
// Task Pool Control
async getTaskPoolStatus() {
return this.request<{ success: boolean; paused: boolean; message: string }>(
'/api/tasks/pool/status'
);
}
async pauseTaskPool() {
return this.request<{ success: boolean; paused: boolean; message: string }>(
'/api/tasks/pool/pause',
{ method: 'POST' }
);
}
async resumeTaskPool() {
return this.request<{ success: boolean; paused: boolean; message: string }>(
'/api/tasks/pool/resume',
{ method: 'POST' }
);
}
}
export const api = new ApiClient(API_URL);

View File

@@ -3,6 +3,7 @@ import { useNavigate } from 'react-router-dom';
import { Layout } from '../components/Layout';
import { api } from '../lib/api';
import { trackProductClick } from '../lib/analytics';
import { useStateFilter } from '../hooks/useStateFilter';
import {
Building2,
MapPin,
@@ -12,6 +13,7 @@ import {
Search,
TrendingUp,
BarChart3,
ChevronDown,
} from 'lucide-react';
interface BrandData {
@@ -25,6 +27,8 @@ interface BrandData {
export function IntelligenceBrands() {
const navigate = useNavigate();
const { selectedState, setSelectedState, stateParam, stateLabel, isAllStates } = useStateFilter();
const [availableStates, setAvailableStates] = useState<string[]>([]);
const [brands, setBrands] = useState<BrandData[]>([]);
const [loading, setLoading] = useState(true);
const [searchTerm, setSearchTerm] = useState('');
@@ -32,12 +36,19 @@ export function IntelligenceBrands() {
useEffect(() => {
loadBrands();
}, [stateParam]);
useEffect(() => {
// Load available states
api.getOrchestratorStates().then(data => {
setAvailableStates(data.states?.map((s: any) => s.state) || []);
}).catch(console.error);
}, []);
const loadBrands = async () => {
try {
setLoading(true);
const data = await api.getIntelligenceBrands({ limit: 500 });
const data = await api.getIntelligenceBrands({ limit: 500, state: stateParam });
setBrands(data.brands || []);
} catch (error) {
console.error('Failed to load brands:', error);
@@ -169,10 +180,33 @@ export function IntelligenceBrands() {
{/* Top Brands Chart */}
<div className="bg-white rounded-lg border border-gray-200 p-4">
<h3 className="text-lg font-semibold text-gray-900 mb-4 flex items-center gap-2">
<BarChart3 className="w-5 h-5 text-blue-500" />
Top 10 Brands by Store Count
</h3>
<div className="flex items-center justify-between mb-4">
<h3 className="text-lg font-semibold text-gray-900 flex items-center gap-2">
<BarChart3 className="w-5 h-5 text-blue-500" />
Top 10 Brands by Store Count
</h3>
<div className="dropdown dropdown-end">
<button tabIndex={0} className="btn btn-sm btn-outline gap-2">
{stateLabel}
<ChevronDown className="w-4 h-4" />
</button>
<ul tabIndex={0} className="dropdown-content z-[1] menu p-2 shadow bg-base-100 rounded-box w-40 max-h-60 overflow-y-auto">
<li>
<a onClick={() => setSelectedState(null)} className={isAllStates ? 'active' : ''}>
All States
</a>
</li>
<li className="divider"></li>
{availableStates.map((state) => (
<li key={state}>
<a onClick={() => setSelectedState(state)} className={selectedState === state ? 'active' : ''}>
{state}
</a>
</li>
))}
</ul>
</div>
</div>
<div className="space-y-2">
{topBrands.map((brand, idx) => (
<div key={brand.brandName} className="flex items-center gap-3">

View File

@@ -2,6 +2,7 @@ import { useEffect, useState } from 'react';
import { useNavigate } from 'react-router-dom';
import { Layout } from '../components/Layout';
import { api } from '../lib/api';
import { useStateFilter } from '../hooks/useStateFilter';
import {
DollarSign,
Building2,
@@ -11,6 +12,7 @@ import {
TrendingUp,
TrendingDown,
BarChart3,
ChevronDown,
} from 'lucide-react';
interface CategoryPricing {
@@ -31,18 +33,27 @@ interface OverallPricing {
export function IntelligencePricing() {
const navigate = useNavigate();
const { selectedState, setSelectedState, stateParam, stateLabel, isAllStates } = useStateFilter();
const [availableStates, setAvailableStates] = useState<string[]>([]);
const [categories, setCategories] = useState<CategoryPricing[]>([]);
const [overall, setOverall] = useState<OverallPricing | null>(null);
const [loading, setLoading] = useState(true);
useEffect(() => {
loadPricing();
}, [stateParam]);
useEffect(() => {
// Load available states
api.getOrchestratorStates().then(data => {
setAvailableStates(data.states?.map((s: any) => s.state) || []);
}).catch(console.error);
}, []);
const loadPricing = async () => {
try {
setLoading(true);
const data = await api.getIntelligencePricing();
const data = await api.getIntelligencePricing({ state: stateParam });
setCategories(data.byCategory || []);
setOverall(data.overall || null);
} catch (error) {
@@ -84,6 +95,27 @@ export function IntelligencePricing() {
</p>
</div>
<div className="flex gap-2">
<div className="dropdown dropdown-end">
<button tabIndex={0} className="btn btn-sm btn-outline gap-2">
{stateLabel}
<ChevronDown className="w-4 h-4" />
</button>
<ul tabIndex={0} className="dropdown-content z-[1] menu p-2 shadow bg-base-100 rounded-box w-40 max-h-60 overflow-y-auto">
<li>
<a onClick={() => setSelectedState(null)} className={isAllStates ? 'active' : ''}>
All States
</a>
</li>
<li className="divider"></li>
{availableStates.map((state) => (
<li key={state}>
<a onClick={() => setSelectedState(state)} className={selectedState === state ? 'active' : ''}>
{state}
</a>
</li>
))}
</ul>
</div>
<button
onClick={() => navigate('/admin/intelligence/brands')}
className="btn btn-sm btn-outline gap-1"
@@ -150,7 +182,7 @@ export function IntelligencePricing() {
<div>
<p className="text-sm text-gray-500">Products Priced</p>
<p className="text-2xl font-bold">
{overall.totalProducts.toLocaleString()}
{(overall.totalProducts || 0).toLocaleString()}
</p>
</div>
</div>
@@ -164,43 +196,29 @@ export function IntelligencePricing() {
<BarChart3 className="w-5 h-5 text-green-500" />
Average Price by Category
</h3>
<div className="space-y-3">
{sortedCategories.map((cat) => (
<div key={cat.category} className="flex items-center gap-3">
<span className="text-sm font-medium w-32 truncate" title={cat.category}>
{cat.category || 'Unknown'}
</span>
<div className="flex-1 relative">
{/* Price range bar */}
<div className="bg-gray-100 rounded-full h-6 relative">
{/* Min-Max range */}
<div
className="absolute top-0 h-6 bg-blue-100 rounded-full"
style={{
left: `${(cat.minPrice / (overall?.maxPrice || 100)) * 100}%`,
width: `${((cat.maxPrice - cat.minPrice) / (overall?.maxPrice || 100)) * 100}%`,
}}
/>
{/* Average marker */}
<div
className="absolute top-0 h-6 w-1 bg-green-500 rounded"
style={{ left: `${(cat.avgPrice / (overall?.maxPrice || 100)) * 100}%` }}
/>
<div className="space-y-2">
{sortedCategories.slice(0, 12).map((cat) => {
const maxPrice = Math.max(...sortedCategories.map(c => c.avgPrice || 0), 1);
const barWidth = Math.min(((cat.avgPrice || 0) / maxPrice) * 100, 100);
return (
<div key={cat.category} className="flex items-center gap-3">
<span className="text-sm font-medium w-28 truncate shrink-0" title={cat.category}>
{cat.category || 'Unknown'}
</span>
<div className="flex-1 min-w-0">
<div className="bg-gray-100 rounded h-5 overflow-hidden">
<div
className="bg-gradient-to-r from-emerald-400 to-emerald-500 h-5 rounded transition-all"
style={{ width: `${barWidth}%` }}
/>
</div>
</div>
</div>
<div className="flex gap-4 text-xs w-48">
<span className="text-gray-500">
Min: <span className="text-blue-600 font-mono">{formatPrice(cat.minPrice)}</span>
</span>
<span className="text-gray-500">
Avg: <span className="text-green-600 font-mono font-bold">{formatPrice(cat.avgPrice)}</span>
</span>
<span className="text-gray-500">
Max: <span className="text-orange-600 font-mono">{formatPrice(cat.maxPrice)}</span>
<span className="text-sm font-mono font-semibold text-emerald-600 w-16 text-right shrink-0">
{formatPrice(cat.avgPrice)}
</span>
</div>
</div>
))}
);
})}
</div>
</div>
@@ -236,7 +254,7 @@ export function IntelligencePricing() {
<span className="font-medium">{cat.category || 'Unknown'}</span>
</td>
<td className="text-center">
<span className="font-mono">{cat.productCount.toLocaleString()}</span>
<span className="font-mono">{(cat.productCount || 0).toLocaleString()}</span>
</td>
<td className="text-right">
<span className="font-mono text-blue-600">{formatPrice(cat.minPrice)}</span>

View File

@@ -47,10 +47,11 @@ export function IntelligenceStores() {
state: stateParam,
limit: 500,
});
setStores(data.stores || []);
const storeList = data.stores || [];
setStores(storeList);
// Extract unique states from response for dropdown counts
const uniqueStates = [...new Set(data.stores.map((s: StoreActivity) => s.state))].sort();
const uniqueStates = [...new Set(storeList.map((s: StoreActivity) => s.state))].filter(Boolean).sort() as string[];
setLocalStates(uniqueStates);
} catch (error) {
console.error('Failed to load stores:', error);
@@ -97,12 +98,12 @@ export function IntelligenceStores() {
);
}
// Calculate stats
const totalSKUs = stores.reduce((sum, s) => sum + s.skuCount, 0);
const totalSnapshots = stores.reduce((sum, s) => sum + s.snapshotCount, 0);
const avgFrequency = stores.filter(s => s.crawlFrequencyHours).length > 0
? stores.filter(s => s.crawlFrequencyHours).reduce((sum, s) => sum + (s.crawlFrequencyHours || 0), 0) /
stores.filter(s => s.crawlFrequencyHours).length
// Calculate stats with null safety
const totalSKUs = stores.reduce((sum, s) => sum + (s.skuCount || 0), 0);
const totalSnapshots = stores.reduce((sum, s) => sum + (s.snapshotCount || 0), 0);
const storesWithFrequency = stores.filter(s => s.crawlFrequencyHours != null);
const avgFrequency = storesWithFrequency.length > 0
? storesWithFrequency.reduce((sum, s) => sum + (s.crawlFrequencyHours || 0), 0) / storesWithFrequency.length
: 0;
return (
@@ -262,10 +263,10 @@ export function IntelligenceStores() {
)}
</td>
<td className="text-center">
<span className="font-mono">{store.skuCount.toLocaleString()}</span>
<span className="font-mono">{(store.skuCount || 0).toLocaleString()}</span>
</td>
<td className="text-center">
<span className="font-mono">{store.snapshotCount.toLocaleString()}</span>
<span className="font-mono">{(store.snapshotCount || 0).toLocaleString()}</span>
</td>
<td>
<span className={store.lastCrawl ? 'text-green-600' : 'text-gray-400'}>

View File

@@ -8,7 +8,6 @@
import { useState, useEffect } from 'react';
import { useNavigate } from 'react-router-dom';
import { Layout } from '../components/Layout';
import { StateBadge } from '../components/StateSelector';
import { useStateStore } from '../store/stateStore';
import { api } from '../lib/api';
import {
@@ -286,7 +285,6 @@ export default function NationalDashboard() {
</p>
</div>
<div className="flex items-center gap-3">
<StateBadge />
<button
onClick={handleRefreshMetrics}
disabled={refreshing}
@@ -303,7 +301,7 @@ export default function NationalDashboard() {
<>
<div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-4 gap-4">
<MetricCard
title="Active States"
title="States"
value={summary.activeStates}
icon={Globe}
/>

View File

@@ -14,8 +14,9 @@ import {
ChevronUp,
Gauge,
Users,
Calendar,
Zap,
Power,
Play,
Square,
} from 'lucide-react';
interface Task {
@@ -82,6 +83,27 @@ const STATUS_COLORS: Record<string, string> = {
stale: 'bg-gray-100 text-gray-800',
};
const getStatusIcon = (status: string, poolPaused: boolean): React.ReactNode => {
switch (status) {
case 'pending':
return <Clock className="w-4 h-4" />;
case 'claimed':
return <PlayCircle className="w-4 h-4" />;
case 'running':
// Don't spin when pool is paused
return <RefreshCw className={`w-4 h-4 ${!poolPaused ? 'animate-spin' : ''}`} />;
case 'completed':
return <CheckCircle2 className="w-4 h-4" />;
case 'failed':
return <XCircle className="w-4 h-4" />;
case 'stale':
return <AlertTriangle className="w-4 h-4" />;
default:
return null;
}
};
// Static version for summary cards (always shows animation)
const STATUS_ICONS: Record<string, React.ReactNode> = {
pending: <Clock className="w-4 h-4" />,
claimed: <PlayCircle className="w-4 h-4" />,
@@ -116,6 +138,8 @@ export default function TasksDashboard() {
const [capacity, setCapacity] = useState<CapacityMetric[]>([]);
const [loading, setLoading] = useState(true);
const [error, setError] = useState<string | null>(null);
const [poolPaused, setPoolPaused] = useState(false);
const [poolLoading, setPoolLoading] = useState(false);
// Filters
const [roleFilter, setRoleFilter] = useState<string>('');
@@ -123,13 +147,10 @@ export default function TasksDashboard() {
const [searchQuery, setSearchQuery] = useState('');
const [showCapacity, setShowCapacity] = useState(true);
// Actions
const [actionLoading, setActionLoading] = useState(false);
const [actionMessage, setActionMessage] = useState<string | null>(null);
const fetchData = async () => {
try {
const [tasksRes, countsRes, capacityRes] = await Promise.all([
const [tasksRes, countsRes, capacityRes, poolStatus] = await Promise.all([
api.getTasks({
role: roleFilter || undefined,
status: statusFilter || undefined,
@@ -137,11 +158,13 @@ export default function TasksDashboard() {
}),
api.getTaskCounts(),
api.getTaskCapacity(),
api.getTaskPoolStatus(),
]);
setTasks(tasksRes.tasks || []);
setCounts(countsRes);
setCapacity(capacityRes.metrics || []);
setPoolPaused(poolStatus.paused);
setError(null);
} catch (err: any) {
setError(err.message || 'Failed to load tasks');
@@ -150,40 +173,29 @@ export default function TasksDashboard() {
}
};
const togglePool = async () => {
setPoolLoading(true);
try {
if (poolPaused) {
await api.resumeTaskPool();
setPoolPaused(false);
} else {
await api.pauseTaskPool();
setPoolPaused(true);
}
} catch (err: any) {
setError(err.message || 'Failed to toggle pool');
} finally {
setPoolLoading(false);
}
};
useEffect(() => {
fetchData();
const interval = setInterval(fetchData, 10000); // Refresh every 10 seconds
const interval = setInterval(fetchData, 15000); // Auto-refresh every 15 seconds
return () => clearInterval(interval);
}, [roleFilter, statusFilter]);
const handleGenerateResync = async () => {
setActionLoading(true);
try {
const result = await api.generateResyncTasks();
setActionMessage(`Generated ${result.tasks_created} resync tasks`);
fetchData();
} catch (err: any) {
setActionMessage(`Error: ${err.message}`);
} finally {
setActionLoading(false);
setTimeout(() => setActionMessage(null), 5000);
}
};
const handleRecoverStale = async () => {
setActionLoading(true);
try {
const result = await api.recoverStaleTasks();
setActionMessage(`Recovered ${result.tasks_recovered} stale tasks`);
fetchData();
} catch (err: any) {
setActionMessage(`Error: ${err.message}`);
} finally {
setActionLoading(false);
setTimeout(() => setActionMessage(null), 5000);
}
};
const filteredTasks = tasks.filter((task) => {
if (searchQuery) {
const query = searchQuery.toLowerCase();
@@ -225,46 +237,33 @@ export default function TasksDashboard() {
</p>
</div>
<div className="flex gap-2">
<div className="flex items-center gap-4">
{/* Pool Toggle */}
<button
onClick={handleGenerateResync}
disabled={actionLoading}
className="flex items-center gap-2 px-4 py-2 bg-emerald-600 text-white rounded-lg hover:bg-emerald-700 disabled:opacity-50"
onClick={togglePool}
disabled={poolLoading}
className={`flex items-center gap-2 px-4 py-2 rounded-lg font-medium transition-colors ${
poolPaused
? 'bg-emerald-100 text-emerald-700 hover:bg-emerald-200'
: 'bg-red-100 text-red-700 hover:bg-red-200'
}`}
>
<Calendar className="w-4 h-4" />
Generate Resync
</button>
<button
onClick={handleRecoverStale}
disabled={actionLoading}
className="flex items-center gap-2 px-4 py-2 bg-gray-600 text-white rounded-lg hover:bg-gray-700 disabled:opacity-50"
>
<Zap className="w-4 h-4" />
Recover Stale
</button>
<button
onClick={fetchData}
className="flex items-center gap-2 px-4 py-2 bg-gray-100 text-gray-700 rounded-lg hover:bg-gray-200"
>
<RefreshCw className="w-4 h-4" />
Refresh
{poolPaused ? (
<>
<Play className={`w-5 h-5 ${poolLoading ? 'animate-pulse' : ''}`} />
Start Pool
</>
) : (
<>
<Square className={`w-5 h-5 ${poolLoading ? 'animate-pulse' : ''}`} />
Stop Pool
</>
)}
</button>
<span className="text-sm text-gray-400">Auto-refreshes every 15s</span>
</div>
</div>
{/* Action Message */}
{actionMessage && (
<div
className={`p-4 rounded-lg ${
actionMessage.startsWith('Error')
? 'bg-red-50 text-red-700'
: 'bg-green-50 text-green-700'
}`}
>
{actionMessage}
</div>
)}
{error && (
<div className="p-4 bg-red-50 text-red-700 rounded-lg">{error}</div>
)}
@@ -496,7 +495,7 @@ export default function TasksDashboard() {
STATUS_COLORS[task.status]
}`}
>
{STATUS_ICONS[task.status]}
{getStatusIcon(task.status, poolPaused)}
{task.status}
</span>
</td>

365
workflow-12102025.md Normal file
View File

@@ -0,0 +1,365 @@
# Workflow Documentation - December 10, 2025
## Purpose
This document captures the intended behavior for the CannaiQ crawl system, specifically around proxy rotation, fingerprinting, and anti-detection.
---
## Stealth & Anti-Detection Requirements
### 1. Task Determines Work, Proxy Determines Identity
The task payload contains:
- `dispensary_id` - which store to crawl
- `role` - what type of work (product_resync, entry_point_discovery, etc.)
The **proxy** determines the session identity:
- Proxy location (city, state, timezone) → sets Accept-Language and timezone headers
- Language is always English (`en-US`)
**Flow:**
```
Task claimed
└─► Get proxy from rotation
└─► Proxy has location (city, state, timezone)
└─► Build headers using proxy's timezone
- Accept-Language: en-US,en;q=0.9
- Timezone-consistent behavior
```
### 2. On 403 Block - Immediate Backoff
When a 403 is received:
1. **Immediately** stop using current IP
2. Get a new proxy (new IP)
3. Get a new UA/fingerprint
4. Retry the request
**Per-proxy failure tracking:**
- Track UA rotation attempts per proxy
- After 3 UA/fingerprint rotations on the same proxy → disable that proxy
- This means: if we rotate UA 3 times and still get 403, the proxy is burned
### 3. Fingerprint Rotation Rules
Each request uses:
- Proxy (IP)
- User-Agent
- sec-ch-ua headers (Client Hints)
- Accept-Language (from proxy location)
On 403:
1. Record failure on current proxy
2. Rotate to new proxy
3. Pick new random fingerprint
4. If same proxy fails 3 times with different fingerprints → disable proxy
### 4. Proxy Table Schema
```sql
CREATE TABLE proxies (
id SERIAL PRIMARY KEY,
host VARCHAR(255) NOT NULL,
port INTEGER NOT NULL,
username VARCHAR(100),
password VARCHAR(100),
protocol VARCHAR(10) DEFAULT 'http',
active BOOLEAN DEFAULT true,
-- Location (determines session headers)
city VARCHAR(100),
state VARCHAR(50),
country VARCHAR(100),
country_code VARCHAR(10),
timezone VARCHAR(50),
-- Health tracking
failure_count INTEGER DEFAULT 0,
consecutive_403_count INTEGER DEFAULT 0, -- Track 403s specifically
last_used_at TIMESTAMPTZ,
last_failure_at TIMESTAMPTZ,
last_error TEXT,
-- Performance
response_time_ms INTEGER,
max_connections INTEGER DEFAULT 1
);
```
### 5. Failure Threshold
- **3 consecutive 403s** with different fingerprints → disable proxy
- Reset `consecutive_403_count` to 0 on successful request
- General `failure_count` tracks all errors (timeouts, connection errors, etc.)
---
## Implementation Status
### COMPLETED - December 10, 2025
All code changes have been implemented per this specification:
#### 1. crawl-rotator.ts ✅
- [x] Added `consecutive403Count` to Proxy interface
- [x] Added `markBlocked()` method that increments `consecutive_403_count` and disables proxy at 3
- [x] Added `getProxyTimezone()` to return current proxy's timezone
- [x] `markSuccess()` now resets `consecutive_403_count` to 0
- [x] Replaced hardcoded UA list with `intoli/user-agents` library for realistic fingerprints
- [x] `BrowserFingerprint` interface includes full fingerprint data (UA, platform, screen size, viewport, sec-ch-ua headers)
#### 2. client.ts ✅
- [x] `startSession()` no longer takes state/timezone params
- [x] `startSession()` gets identity from proxy via `crawlRotator.getProxyLocation()`
- [x] Added `handle403Block()` that:
- Calls `crawlRotator.recordBlock()` (tracks consecutive 403s)
- Immediately rotates both proxy and fingerprint via `rotateBoth()`
- Returns false if no more proxies available
- [x] `executeGraphQL()` calls `handle403Block()` on 403 (not `rotateProxyOn403`)
- [x] `fetchPage()` uses same 403 handling
- [x] 500ms backoff after rotation (not linear delay)
#### 3. Task Handlers ✅
- [x] `entry-point-discovery.ts`: `startSession()` called with no params
- [x] `product-refresh.ts`: `startSession()` called with no params
#### 4. Dependencies ✅
- [x] Added `user-agents` npm package for realistic UA generation
---
## Files Changed
| File | Changes |
|------|---------|
| `backend/src/services/crawl-rotator.ts` | Complete rewrite with `consecutive403Count`, `markBlocked()`, `intoli/user-agents` |
| `backend/src/platforms/dutchie/client.ts` | `startSession()` uses proxy location, `handle403Block()` for 403 handling |
| `backend/src/tasks/handlers/entry-point-discovery.ts` | `startSession()` no params |
| `backend/src/tasks/handlers/product-refresh.ts` | `startSession()` no params |
| `backend/package.json` | Added `user-agents` dependency |
---
## Migration Required
The `proxies` table needs `consecutive_403_count` column if not already present:
```sql
ALTER TABLE proxies ADD COLUMN IF NOT EXISTS consecutive_403_count INTEGER DEFAULT 0;
```
---
## Key Behaviors Summary
| Behavior | Implementation |
|----------|----------------|
| Session identity | From proxy location (`getProxyLocation()`) |
| Language | Always `en-US,en;q=0.9` |
| 403 handling | `handle403Block()``recordBlock()``rotateBoth()` |
| Proxy disable | After 3 consecutive 403s (`consecutive403Count >= 3`) |
| Success reset | `markSuccess()` resets `consecutive403Count` to 0 |
| UA generation | `intoli/user-agents` library (daily updated, realistic fingerprints) |
| Fingerprint data | Full: UA, platform, screen size, viewport, sec-ch-ua headers |
---
## User-Agent Generation
### Data Source
The `intoli/user-agents` npm library provides daily-updated market share data collected from Intoli's residential proxy network (millions of real users). The package auto-releases new versions daily to npm.
### Device Category Distribution (hardcoded)
| Category | Share |
|----------|-------|
| Mobile | 62% |
| Desktop | 36% |
| Tablet | 2% |
### Browser Filter (whitelist only)
Only these browsers are allowed:
- Chrome (67%)
- Safari (20%)
- Edge (6%)
- Firefox (3%)
Samsung Internet, Opera, and other niche browsers are filtered out.
### Desktop OS Distribution (from library)
| OS | Share |
|----|-------|
| Windows | 72% |
| macOS | 17% |
| Linux | 4% |
### UA Lifecycle
1. **Session start** (new proxy IP obtained) → Roll device category (62/36/2) → Generate UA filtered to device + top 4 browsers → Store on session
2. **UA sticks** until IP rotates (403 block or manual rotation)
3. **IP rotation** triggers new UA generation
### Failure Handling
- If UA generation fails → Alert admin dashboard, **stop crawl immediately**
- No fallback to static UA list
- This forces investigation rather than silent degradation
### Session Logging
Each session logs:
- Device category (mobile/desktop/tablet)
- Full UA string
- Browser name (Chrome/Safari/Edge/Firefox)
- IP address (from proxy)
- Session start timestamp
Logs are rotated monthly.
### Implementation
Located in `backend/src/services/crawl-rotator.ts`:
```typescript
// Per workflow-12102025.md: Device category distribution
const DEVICE_WEIGHTS = { mobile: 62, desktop: 36, tablet: 2 };
// Per workflow-12102025.md: Browser whitelist
const ALLOWED_BROWSERS = ['Chrome', 'Safari', 'Edge', 'Firefox'];
```
---
## HTTP Fingerprinting
### Goal
Make HTTP requests indistinguishable from real browser traffic. No repeatable footprint.
### Components
1. **Full Header Set** - All headers a real browser sends
2. **Header Ordering** - Browser-specific order (Chrome vs Firefox vs Safari)
3. **TLS Fingerprint** - Use `curl-impersonate` to match browser TLS signature
4. **Dynamic Referer** - Set per dispensary being crawled
5. **Natural Randomization** - Vary optional headers like real users
### Required Headers
| Header | Chrome | Firefox | Safari | Notes |
|--------|--------|---------|--------|-------|
| `User-Agent` | ✅ | ✅ | ✅ | From UA generation |
| `Accept` | ✅ | ✅ | ✅ | Content types |
| `Accept-Language` | ✅ | ✅ | ✅ | Always `en-US,en;q=0.9` |
| `Accept-Encoding` | ✅ | ✅ | ✅ | `gzip, deflate, br` |
| `Connection` | ✅ | ✅ | ✅ | `keep-alive` |
| `Origin` | ✅ | ✅ | ✅ | `https://dutchie.com` (POST only) |
| `Referer` | ✅ | ✅ | ✅ | Dynamic per dispensary |
| `sec-ch-ua` | ✅ | ❌ | ❌ | Chromium only |
| `sec-ch-ua-mobile` | ✅ | ❌ | ❌ | Chromium only |
| `sec-ch-ua-platform` | ✅ | ❌ | ❌ | Chromium only |
| `sec-fetch-dest` | ✅ | ✅ | ❌ | `empty` for XHR |
| `sec-fetch-mode` | ✅ | ✅ | ❌ | `cors` for XHR |
| `sec-fetch-site` | ✅ | ✅ | ❌ | `same-origin` |
| `Upgrade-Insecure-Requests` | ✅ | ✅ | ✅ | `1` (page loads only) |
| `DNT` | ~30% | ~30% | ~30% | Randomized per session |
### Header Ordering
Each browser sends headers in a specific order. Fingerprinting services detect mismatches.
**Chrome order (GraphQL request):**
1. Host
2. Connection
3. Content-Length (POST)
4. sec-ch-ua
5. DNT (if enabled)
6. sec-ch-ua-mobile
7. User-Agent
8. sec-ch-ua-platform
9. Content-Type (POST)
10. Accept
11. Origin (POST)
12. sec-fetch-site
13. sec-fetch-mode
14. sec-fetch-dest
15. Referer
16. Accept-Encoding
17. Accept-Language
**Firefox order (GraphQL request):**
1. Host
2. User-Agent
3. Accept
4. Accept-Language
5. Accept-Encoding
6. Content-Type (POST)
7. Content-Length (POST)
8. Origin (POST)
9. DNT (if enabled)
10. Connection
11. Referer
12. sec-fetch-dest
13. sec-fetch-mode
14. sec-fetch-site
**Safari order (GraphQL request):**
1. Host
2. Connection
3. Content-Length (POST)
4. Accept
5. User-Agent
6. Content-Type (POST)
7. Origin (POST)
8. Referer
9. Accept-Encoding
10. Accept-Language
### TLS Fingerprinting
Use `curl-impersonate` instead of standard curl:
- `curl_chrome131` - Mimics Chrome 131 TLS handshake
- `curl_ff133` - Mimics Firefox 133 TLS handshake
- `curl_safari17` - Mimics Safari 17 TLS handshake
Match TLS binary to browser in UA.
### Dynamic Referer
Set Referer to the dispensary's actual page URL:
```
Crawling "harvest-of-tempe" → Referer: https://dutchie.com/dispensary/harvest-of-tempe
Crawling "zen-leaf-mesa" → Referer: https://dutchie.com/dispensary/zen-leaf-mesa
```
Derived from dispensary's `menu_url` field.
### Natural Randomization
Per-session randomization (set once when session starts, consistent for session):
| Feature | Distribution | Implementation |
|---------|--------------|----------------|
| DNT header | 30% have it | `Math.random() < 0.30` |
| Accept quality values | Slight variation | `q=0.9` vs `q=0.8` |
### Implementation Files
| File | Purpose |
|------|---------|
| `src/services/crawl-rotator.ts` | `BrowserFingerprint` includes full header config |
| `src/platforms/dutchie/client.ts` | Build headers from fingerprint, use curl-impersonate |
| `src/services/http-fingerprint.ts` | Header ordering per browser (NEW) |