Compare commits

...

370 Commits

Author SHA1 Message Date
Kelly
754a46c56f chore: trigger CI rebuild
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
2025-12-16 09:19:52 -07:00
Kelly
e450d2e99e fix(ci): use local registry mirror instead of mirror.gcr.io
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Switch Kaniko registry-mirror from mirror.gcr.io to 10.100.9.70:5000
to pull base images from local registry instead of GCR.
2025-12-16 09:09:15 -07:00
Kelly
205a8b3159 chore: retry CI for visibility-events fix
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-16 08:56:59 -07:00
Kelly
8bd29d11bb fix: Use correct column names in visibility-events query
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Changed name -> name_raw and brand -> brand_name_raw to match
store_products table schema.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-16 02:21:01 -07:00
Kelly
4e7b3d2336 fix: Update DATABASE_URL to point to primary PostgreSQL server
Changed from 10.100.6.50 (secondary/replica in read-only mode) to
10.100.7.50 (primary) to fix read-only transaction errors.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-16 01:13:49 -07:00
Kelly
849123693a fix(ci): Use unquoted heredoc for kubeconfig token injection
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- Changed heredoc from 'KUBEEOF' (quoted) to KUBEEOF (unquoted)
- This allows shell variable expansion of $K8S_TOKEN directly
- Removed sed replacement step that was failing due to YAML escaping issues
2025-12-15 21:55:52 -07:00
Kelly
a1227f77b9 chore: retry CI
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 21:53:00 -07:00
Kelly
415e89a012 chore: retry CI with k8s_token secret
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 21:26:06 -07:00
Kelly
45844c6281 ci: Embed kubeconfig, use k8s_token secret for token only 2025-12-15 21:19:26 -07:00
Kelly
24c9586d81 ci: Skip base64 - use raw kubeconfig in secret
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 21:09:54 -07:00
Kelly
f8d61446d5 chore: retry CI with correct kubeconfig
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 20:57:19 -07:00
Kelly
0f859d1c75 chore: retry CI after kubeconfig fix
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 20:43:40 -07:00
Kelly
52dc669782 ci: Remove clone/volume config (requires admin trust)
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Woodpecker doesn't allow custom clone or volumes without elevated trust.
Kaniko layer caching (--cache-repo) still works (registry-based).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 20:16:18 -07:00
Kelly
2e47996354 ci: Add shallow git clone (depth: 1)
Only fetch latest commit instead of full history.
Reduces checkout time and bandwidth.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 20:05:33 -07:00
Kelly
f25d4eaf27 ci: Add npm and Docker layer caching
- PR steps: shared npm-cache volume for faster npm ci
- Docker builds: --cache-repo to local registry for layer caching
- Kaniko will reuse npm install layer when package.json unchanged

First build populates cache, subsequent builds much faster.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 20:04:15 -07:00
Kelly
61a6be888c ci: Consolidate back to 4 docker steps
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Remove separate build steps (didn't save time)
- Use original multi-stage Dockerfiles
- Delete unused Dockerfile.ci files
- 4 parallel docker builds + deploy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 20:01:11 -07:00
Kelly
09c2b3a0e1 ci: Use node:22 instead of node:22-alpine for builds
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Alpine uses musl libc which breaks Rollup's native bindings.
Debian-based node:22 uses glibc and works correctly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 19:56:48 -07:00
Kelly
cec34198c7 ci: Add slim Dockerfile.ci files for faster CI builds
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Add Dockerfile.ci for backend, cannaiq, findadispo, findagram
- Frontend Dockerfiles just copy pre-built assets to nginx
- Backend Dockerfile copies pre-built dist/node_modules
- Reduces Docker build time by doing npm ci/build in CI step

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 19:43:08 -07:00
Kelly
3c10e07e45 feat(ci): Push built images to local registry for faster K8s pulls
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Build images push to 10.100.9.70:5000/cannaiq/*
- Deploy pulls from local registry (no external network)
- Removed git.spdy.io registry auth (not needed for local)
- Added --insecure-registry for HTTP local registry
2025-12-15 19:16:16 -07:00
Kelly
3582c2e9e2 fix(k8s): Use external Postgres/Redis/MinIO services
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Update secrets.yaml with correct MinIO credentials
- Add Redis connection details
- Remove postgres.yaml (use external 10.100.6.50)
- Remove redis.yaml (use external 10.100.9.50)
2025-12-15 19:03:05 -07:00
Kelly
c6874977ee docs: Add spdy.io infrastructure credentials
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 18:59:18 -07:00
Kelly
68430f5c22 fix(ci): Use mirror.gcr.io as registry mirror for Kaniko
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 18:52:02 -07:00
Kelly
ccefd325aa fix(ci): Use hardcoded Woodpecker workspace path for Kaniko
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 18:49:56 -07:00
Kelly
e119c5af53 chore: trigger CI
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 18:44:24 -07:00
Kelly
e61224aaed fix(ci): Use CI_WORKSPACE for Kaniko context paths
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 18:42:33 -07:00
Kelly
7cf1b7643f feat(ci): Use local registry 10.100.9.70:5000 for base images
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 18:28:20 -07:00
Kelly
74f813d68f feat(ci): Switch to Kaniko for Docker builds (no daemon, better DNS)
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 18:20:53 -07:00
Kelly
f38f1024de fix(docker): Use mirror.gcr.io in all Dockerfiles to avoid rate limits
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 18:19:18 -07:00
Kelly
358099c58a chore: trigger CI
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 18:12:41 -07:00
Kelly
7fdcfc4fc4 fix(ci): Use mirror.gcr.io to avoid Docker Hub rate limits
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 18:11:16 -07:00
Kelly
541b461283 fix(ci): Use public node:20 image for typecheck steps
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 18:08:46 -07:00
Kelly
8f25cf10ab chore: retry CI
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 17:06:42 -07:00
Kelly
79e434212f chore: retry CI
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 16:45:05 -07:00
Kelly
600172eff6 chore: retry CI
Some checks are pending
ci/woodpecker/push/woodpecker Pipeline is running
2025-12-15 15:51:40 -07:00
Kelly
4c12763fa1 chore: retry CI
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 13:18:53 -07:00
Kelly
2cb9a093f4 chore: retry CI
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 12:29:45 -07:00
Kelly
15ab40a820 chore: trigger CI build
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 12:14:14 -07:00
Kelly
2708fbe319 feat(brands): Add calculated tags with configurable thresholds
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Tags assigned per store:
- must_win: High-revenue store with room to grow SKUs
- at_risk: High OOS% (losing shelf presence)
- top_performer: High sales + good inventory management
- growth: Above-average velocity
- low_inventory: Low days on hand

Configurable via query params:
- ?must_win_max_skus=5
- ?at_risk_oos_pct=30
- ?top_performer_max_oos=15
- ?low_inventory_days=7

Response includes tag_thresholds showing applied values.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 12:06:44 -07:00
Kelly
231d49e3e8 feat(brands): Add margin estimation to stores/performance endpoint
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Add ?margin_pct query param (default 50% industry standard)
- Returns margin_pct and margin_est per store
- Includes margin_pct_assumed in response metadata

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 12:02:36 -07:00
Kelly
17defa046c feat(api): Add /api/brands/:brand/stores/performance endpoint
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Add comprehensive per-store performance endpoint for Cannabrands integration.
Returns all metrics in one call for easy merging with internal order data.

Response includes per store:
- active_skus, oos_skus, total_skus, oos_pct
- avg_daily_units (velocity from inventory deltas)
- avg_days_on_hand (stock / daily velocity)
- total_sales_est (units × price × days)
- lost_opportunity (OOS days × velocity × price)
- categories breakdown (JSON object)
- avg_price, total_stock

Query params: ?days=28&state=AZ&limit=100&offset=0

Matches Hoodie Analytics columns for Order Management view.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 11:57:38 -07:00
Kelly
d76a5fb3c5 feat(api): Add brand analytics API endpoints
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Add comprehensive brand-level analytics endpoints at /api/brands:

Brand Discovery:
- GET /api/brands - List all brands with summary metrics
- GET /api/brands/search - Search brands by name
- GET /api/brands/top - Top brands by distribution

Brand Overview:
- GET /api/brands/:brand - Full brand intelligence dashboard
- GET /api/brands/:brand/analytics - Alias for overview

Sales & Velocity:
- GET /api/brands/:brand/sales - Sales data (4wk, daily avg)
- GET /api/brands/:brand/velocity - Units/day by SKU
- GET /api/brands/:brand/trends - Weekly sales trends

Inventory & Stock:
- GET /api/brands/:brand/inventory - Current stock levels
- GET /api/brands/:brand/oos - Out-of-stock products
- GET /api/brands/:brand/low-stock - Products below threshold

Pricing:
- GET /api/brands/:brand/pricing - Current prices
- GET /api/brands/:brand/price-history - Price changes over time

Distribution:
- GET /api/brands/:brand/distribution - Store count, market coverage
- GET /api/brands/:brand/stores - Stores carrying brand
- GET /api/brands/:brand/gaps - Whitespace opportunities

Events & Alerts:
- GET /api/brands/:brand/events - Visibility events
- POST /api/brands/:brand/events/:id/ack - Acknowledge alert

Products:
- GET /api/brands/:brand/products - All SKUs with metrics
- GET /api/brands/:brand/products/:sku - Single product deep dive

All endpoints support ?state=XX, ?days=N, and ?category=X filters.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 11:06:23 -07:00
Kelly
f19fc59583 chore: retry CI
Some checks are pending
ci/woodpecker/push/woodpecker Pipeline is running
2025-12-15 09:59:11 -07:00
Kelly
4c183c87a9 chore: retry CI after registry fix
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 09:35:05 -07:00
Kelly
ffa05f89c4 chore: trigger CI on develop
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 09:06:43 -07:00
Kelly
9aa885211e ci: Allow builds from develop branch
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 09:00:42 -07:00
Kelly
b24b2fbc89 chore: trigger CI
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 08:55:00 -07:00
Kelly
f7371de3d1 fix: Use public Docker Hub images instead of private registry
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Switch FROM directives to use public Docker Hub images:
- node:20-slim (instead of git.spdy.io/creationshop/node:20-slim)
- nginx:alpine (instead of git.spdy.io/creationshop/nginx:alpine)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 08:23:35 -07:00
Kelly
98970acf13 fix: Update Dockerfiles to use git.spdy.io registry
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 08:18:53 -07:00
Kelly
f0e636ac70 chore: trigger CI
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-15 08:16:02 -07:00
Kelly
138ade07e1 chore: test CI 2025-12-15 08:11:33 -07:00
Kelly
728168e799 chore: trigger CI 2025-12-15 08:10:06 -07:00
Kelly
42c6bb7424 chore: trigger CI 2025-12-15 08:06:31 -07:00
Kelly
b32e847270 chore: test CI agent 2025-12-15 08:03:39 -07:00
Kelly
287627195c chore: trigger CI 2025-12-15 08:01:56 -07:00
Kelly
bfb965fa44 chore: trigger CI 2025-12-15 07:58:10 -07:00
Kelly
7bbc77a854 chore: trigger CI 2025-12-15 07:57:40 -07:00
Kelly
39ba522643 chore: trigger CI 2025-12-15 07:57:08 -07:00
Kelly
6ea4cd0d05 chore: trigger CI 2025-12-15 07:56:50 -07:00
Kelly
520cba9d31 chore: trigger CI 2025-12-15 07:56:44 -07:00
Kelly
331b6273ac chore: trigger build 2025-12-15 07:53:42 -07:00
Kelly
d4a18cc3ce chore: test CI agent 2025-12-15 07:51:47 -07:00
Kelly
977803d862 chore: trigger CI build 2025-12-15 07:48:39 -07:00
Kelly
48c640aae5 chore: trigger CI 2025-12-15 07:44:38 -07:00
Kelly
918a1c6b26 chore: trigger CI after Woodpecker activation 2025-12-15 07:26:25 -07:00
Kelly
c7541ec2eb chore: Rename all references from dispensary-scraper to cannaiq 2025-12-15 07:19:33 -07:00
Kelly
8676762d6b chore: trigger CI build 2025-12-15 06:51:58 -07:00
Kelly
3f393ef77f fix: Correct repo name in auto-merge URL 2025-12-15 06:42:16 -07:00
Kelly
a8360c7260 feat: Migrate to spdy.io infrastructure
- Namespace: dispensary-scraper → cannaiq
- Registry: code.cannabrands.app → git.spdy.io
- Database: External PostgreSQL at 10.100.6.50
- MinIO: Internal at 10.100.9.80:9000
- CI: ci.spdy.io

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 06:40:48 -07:00
Kelly
0979c9c37a Revert "feat(scheduler): Support sub-hour interval_minutes in task_schedules"
This reverts commit b607fd7f44.
2025-12-14 18:50:25 -07:00
Kelly
b607fd7f44 feat(scheduler): Support sub-hour interval_minutes in task_schedules
- Add interval_minutes column to TaskSchedule interface
- Prefer interval_minutes over interval_hours when calculating next_run_at
- Add jitter (0-20% of interval) for sub-hour schedules to prevent detection
- Update getSchedules() to include interval_minutes and dispensary_name
- Update updateSchedule() to allow setting interval_minutes
- Add migration 121 for interval_minutes column

Part of Real-Time Inventory Tracking feature.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 18:22:55 -07:00
Kelly
bf988529eb fix(ci): switch from buildx to regular docker plugin
BuildKit container driver has sysctl permission issues in LXC.
Using plugins/docker instead of woodpeckerci/plugin-docker-buildx.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 17:05:06 -07:00
Kelly
04153a2efa chore: retry CI after docker update 2025-12-14 17:01:05 -07:00
Kelly
a1a6876064 chore: retry CI after docker restart 2025-12-14 16:55:33 -07:00
Kelly
83466a03c3 chore: retry CI build 2025-12-14 16:40:52 -07:00
Kelly
35d6a17740 feat: Add daily baseline payload logic (12:01 AM - 3:00 AM window)
- Replace saveRawPayload with saveDailyBaseline in all handlers
- Full payloads only saved once per day per store during window
- Inventory snapshots still saved every crawl (lightweight tracking)
- Add last_baseline_at column to dispensaries table
- Show baseline status in Per-Store Schedules dashboard
- Display baseline window info (12:01 AM - 3:00 AM) in UI

Reduces storage ~95% for high-frequency stores while maintaining
full audit capability via daily baselines.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 16:24:41 -07:00
Kelly
294d3db7a2 fix: Remove NOW() from partial indexes (not immutable)
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 15:58:05 -07:00
Kelly
bbbd21ba94 chore: Ignore test scripts and .claude directory 2025-12-14 15:57:27 -07:00
Kelly
3496be3064 feat(treez): Fetch all products with match_all query (+19% more)
- Update buildProductQuery() to use match_all by default
- Captures hidden, below-threshold, and out-of-stock products
- Add extractPrimaryImage() and extractImages() to normalizer
- Add product_refresh_treez handler for platform-specific refresh
- Add product_refresh_treez to TaskRole type

Best Dispensary: 228 → 271 products (+43)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 15:56:06 -07:00
Kelly
af859a85f9 feat: Add Real-Time Inventory Tracking infrastructure
Implements per-store high-frequency crawl scheduling and inventory
snapshot tracking for sales velocity estimation (Hoodie Analytics parity).

Database migrations:
- 117: Per-store crawl_interval_minutes and next_crawl_at columns
- 118: inventory_snapshots table (30-day retention)
- 119: product_visibility_events table for OOS/brand alerts (90-day)

Backend changes:
- inventory-snapshots.ts: Shared utility normalizing Dutchie/Jane/Treez
- visibility-events.ts: Detects OOS, price changes, brand drops
- task-scheduler.ts: checkHighFrequencyStores() runs every 60s
- Handler updates: 2-line additions to save snapshots/events

API endpoints:
- GET /api/tasks/schedules/high-frequency
- PUT /api/tasks/schedules/high-frequency/:id
- DELETE /api/tasks/schedules/high-frequency/:id

Frontend:
- TasksDashboard: Per-Store Schedules section with stats

Features:
- Per-store intervals (15/30/60 min configurable)
- Jitter (0-20%) to avoid detection patterns
- Cross-platform support (Dutchie, Jane, Treez)
- No crawler core changes - scheduling/post-crawl only

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 15:53:04 -07:00
Kelly
d3f5e4ef4b feat(nav): Add Payloads menu item to admin sidebar
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 14:26:37 -07:00
Kelly
abef265ae9 feat(workers): Add platform badge (D/J/T) to active tasks display
- Add PlatformBadge component showing D=Dutchie, J=Jane, T=Treez
- Include platform field in worker-registry API response
- Fix null running_seconds displaying as "nulls"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 12:21:23 -07:00
Kelly
b28a91fca5 fix: Task completion result and null duration display bugs
1. task-worker.ts: Pass full result object to completeTask instead of
   non-existent result.data property (was causing {} to be stored)

2. WorkersDashboard.tsx: Handle null running_seconds in formatSecondsToTime
   (was displaying "nulls" due to JS type coercion)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 12:02:05 -07:00
Kelly
60b221e7fb feat: Add payloads dashboard, disable snapshots, fix scheduler
Frontend:
- Add PayloadsDashboard page with search, filter, view, and diff
- Update TasksDashboard default sort: pending → claimed → completed
- Add payload API methods to api.ts

Backend:
- Disable snapshot creation in product-refresh handler
- Remove product_refresh from schedule role options
- Disable compression in payload-storage (plain JSON for debugging)
- Fix task-scheduler: map 'embedded' menu_type to 'dutchie' platform
- Fix task-scheduler: use schedule.interval_hours as skipRecentHours

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 11:54:25 -07:00
Kelly
15cb657f13 fix(docker): Revert to libasound2 for Debian bookworm
- libasound2t64 is for Debian trixie (13), not bookworm (12)
- Keep build tools fix for native modules

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 11:07:03 -07:00
Kelly
f15920e508 fix(docker): Add build tools for native modules and fix Debian package name
- Add python3 and build-essential to builder stage for bcrypt/sharp compilation
- Change libasound2 to libasound2t64 for Debian bookworm compatibility
- Copy pre-built node_modules from builder instead of re-running npm install
- Prune dev dependencies in builder for smaller production image

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 11:01:22 -07:00
Kelly
9518ca48a5 feat(tasks): Task tracking, IP-per-store, and schedule edit fixes
- Add task completion verification with DB and output layers
- Add reconciliation loop to sync worker memory with DB state
- Implement IP-per-store-per-platform conflict detection
- Add task ID hash to MinIO payload filenames for traceability
- Fix schedule edit modal with dispensary info in API responses
- Add task ID display after dispensary name in worker dashboard
- Add migrations for proxy_ip and source tracking columns

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 10:49:21 -07:00
Kelly
3e9667571f fix(ui): Restore round worker slot circles with hover tooltips 2025-12-14 09:58:24 -07:00
Kelly
8f6efd377b fix(ui): Remove 'test' from fingerprint tooltip 2025-12-14 03:40:17 -07:00
Kelly
83e9718d78 fix(ui): Worker slot preflight checklist and fingerprint hover
- Fix fingerprint tooltip to use actual API field names (browserName, deviceCategory, detectedTimezone)
- Show real preflight steps: HTTP Preflight, Geo Session, Pool Ready
- Checkmarks appear as each step completes, spinners while in progress

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 03:38:46 -07:00
Kelly
f5cb17e1d4 feat(dutchie): Full payload with specials and all product statuses
- Set includeEnterpriseSpecials: true to get BOGO/sale deal names
- Set Status: 'All' to capture both Active and Inactive (sold out) products
- Make schedules query backward-compatible for missing pool_id column

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 03:35:25 -07:00
Kelly
f48a503e82 fix(tasks): Filter out disabled dispensaries in createStaggeredTasks
Tasks were being created for dispensaries with crawl_enabled=false
(duplicates, deprecated stores). Added EXISTS check to filter only
crawl_enabled=true stores before creating tasks.

This prevents errors like:
"Dispensary 207 not found or not crawl_enabled"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 03:07:03 -07:00
Kelly
e7b392141a feat(ui): Task pool toggle, sortable columns, worker slot visualization
Tasks Dashboard:
- Add clickable Pool Open/Paused toggle button in header
- Add sortable columns (ID, Role, Store, Status, Worker, Duration, Created)
- Show menu_type and pool badges under Store column
- Add Pool column to Schedules table
- Filter stores by platform in Create Task modal

Workers Dashboard:
- Redesign pod visualization to show 3 worker slots per pod
- Each slot shows preflight checklist (Overload? Terminating? Pool Query?)
- Once qualified, shows City/State, Proxy IP, Antidetect status
- Hover shows full fingerprint data (browser, platform, bot detection)

Backend:
- Add menu_type to listTasks query
- Add pool_id/pool_name to schedules query with task_pools JOIN
- Migration 114: Add pool_id column to task_schedules table

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 03:00:19 -07:00
Kelly
15a5a4239e fix(tasks): Make pool JOIN defensive when table doesn't exist
Auto-migrate fails early, so task_pools may not exist yet.
Check table existence before including pool columns/joins.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 02:29:07 -07:00
Kelly
20d7534b93 fix(ci): prefix docker tags with sha- to prevent scientific notation parsing
Git SHAs like 1861e183 or 698995e4 get parsed as scientific notation
by JSON parsers, breaking Docker tag creation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 02:10:17 -07:00
Kelly
698995e46f chore: bump task worker version comment
Force new git SHA to avoid CI scientific notation bug.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 02:02:30 -07:00
Kelly
1861e18396 feat(workers): Implement geo-based task pools
Workers now follow the correct flow:
1. Check what pools have pending tasks
2. Claim a pool (e.g., Phoenix AZ)
3. Get Evomi proxy for that geo
4. Run preflight with geo proxy
5. Pull tasks from pool (up to 6 stores)
6. Execute tasks
7. Release pool when exhausted (6 stores visited)

Task pools group dispensaries by metro area (100mi radius):
- Phoenix AZ, Tucson AZ
- Los Angeles CA, San Francisco CA, San Diego CA, Sacramento CA
- Denver CO, Chicago IL, Boston MA, Detroit MI
- Las Vegas NV, Reno NV, Newark NJ, New York NY
- Oklahoma City OK, Tulsa OK, Portland OR, Seattle WA

Benefits:
- Workers know geo BEFORE getting proxy (no more "No geo assigned")
- IP diversity within metro area (Phoenix worker can use Tempe IP)
- Simpler worker logic - just match pool geo
- Pre-organized tasks, not grouped at claim time

New files:
- migrations/113_task_pools.sql - schema, seed data, functions
- src/services/task-pool.ts - TypeScript service

Env vars:
- USE_TASK_POOLS=true (new system)
- USE_IDENTITY_POOL=false (disabled)

🤖 Generated with [Claude Code](https://claude.ai/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-14 01:41:52 -07:00
Kelly
eedc027ff6 fix(workers): Report geo to worker_registry when identity claimed
Workers were showing "No geo assigned" on dashboard because geo info
was set internally but never reported to worker_registry after
identity pool claim.

Now updates current_state and current_city columns when identity
is claimed, so dashboard shows correct geo assignment.

Also documents CI/CD batching rule to minimize build time.

🤖 Generated with [Claude Code](https://claude.ai/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-14 01:14:31 -07:00
Kelly
ec5fcd9bc4 fix(proxy): Use rotating IPs instead of sticky sessions
Removes session parameter from Evomi proxy URL so each request
gets a different IP. Prevents all workers from sharing same IP.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 00:48:04 -07:00
Kelly
58150dafa6 docs: Add CI/CD workflow rule - commit and wait
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 00:46:43 -07:00
Kelly
06adab7225 fix(preflight): Add state fallback when IP lookup fails
- Try ip-api.com first, then ipapi.co as fallback
- If both fail, use state coords from targetState param
- Prevents workers from getting stuck in preflight loop

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 00:24:31 -07:00
Kelly
38d7678a2e feat(antidetect): Use actual proxy IP location for browser fingerprint
- Replace hardcoded state coords with IP geolocation lookup via ip-api.com
- Browser timezone and geolocation now match actual proxy IP location
- City-level proxy targeting already in place via Evomi _city- parameter
- Add browser-factory.ts shared utility for antidetect setup

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 23:49:25 -07:00
Kelly
aac1181f3d perf(analytics): Fix 7.5s national summary endpoint
- Use denormalized d.product_count instead of JOIN to store_products
- Remove expensive per-product aggregations (avg_price, brand counts, stock)
- Query now runs in <100ms instead of 7.5s

The massive JOIN between dispensaries and store_products was causing
the slow load. State metrics now use pre-computed product_count column.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 23:31:10 -07:00
Kelly
4eaf7e50d7 feat(ui): Add dropdown for Add User/Origin button
- Single dropdown button shows both options
- Selecting an option switches to that tab and opens modal
- Cleaner UX than separate buttons per tab

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 23:06:42 -07:00
Kelly
4cb4e1c502 feat(workers): Session pool system - claim tasks first, then get IP
New worker flow (enabled via USE_SESSION_POOL=true):
1. Worker claims up to 6 tasks for same geo (atomically marked claimed)
2. Gets Evomi proxy for that geo
3. Checks IP availability (not in use, not in 8hr cooldown)
4. Locks IP exclusively to this worker
5. Runs preflight with locked IP
6. Executes tasks (3 concurrent)
7. After 6 tasks, retires session (8hr IP cooldown)
8. Repeats with new IP

Key files:
- migrations/112_worker_session_pool.sql: Session table + atomic claiming
- services/worker-session.ts: Session lifecycle management
- tasks/task-worker.ts: sessionPoolMainLoop() with new flow
- services/crawl-rotator.ts: setFixedProxy() for session locking

Failed tasks return to pending for retry by another worker.
No two workers can share same IP simultaneously.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 22:54:45 -07:00
Kelly
f0bb454ca2 fix(workers): Require geo for worker qualification status
- Workers without geo now show orange "NO GEO" badge instead of gold qualified
- Orange ring + X badge on avatar when preflight OK but no geo
- Gold ring + checkmark only when fully qualified (preflight + geo)
- Add VenetianMask icon for antidetect status indicator
- Lock K8s replica count at exactly 8 pods in CLAUDE.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 21:58:47 -07:00
Kelly
b8bdc48c1e fix: Remove non-existent progress columns from worker tasks query 2025-12-13 21:24:55 -07:00
Kelly
8173fd2845 fix: Rename duplicate formatDuration function 2025-12-13 21:03:15 -07:00
Kelly
3921e66933 perf: Use denormalized product_count in pipeline and favorites routes
- pipeline.ts: Replace correlated subquery with d.product_count
- consumer-favorites.ts: Replace correlated subquery with d.product_count

Correlated subqueries were causing N+1 query patterns. Using the
denormalized column is O(1) lookup per row.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 20:45:14 -07:00
Kelly
ad79605961 perf(dashboard): Fix slow activity endpoint with denormalized column + cache
- Use dispensaries.product_count instead of correlated subquery
- Add 1 minute in-memory cache for /dashboard/activity
- Reduces query time from ~30s to <100ms

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 20:43:12 -07:00
Kelly
6439de5cd4 feat(ui): Nested task slots in worker dashboard
Backend:
- Add active_tasks array to GET /worker-registry/workers response
- Include task details: role, dispensary, running_seconds, progress

Frontend:
- Show nested task list under each worker with duration
- Display empty slots when worker has capacity
- Update pod visualization to show 3 task slot nodes
- Active slots pulse blue, empty slots gray
- Hover for task details (dispensary, duration, progress)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 20:40:15 -07:00
Kelly
b51ba17d32 fix(ui): Fallback to fingerprint region for worker geo display
geoState was only using current_state column which is often null.
Now falls back to fingerprint.detectedLocation.region like geoCity does.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 20:34:50 -07:00
Kelly
2d631dfad0 feat(ui): Auto-detect trusted origin type from URL/pattern
- Remove Type dropdown from trusted origins form
- Auto-detect domain, IP, or regex from input
- Convert *.domain.com wildcards to proper regex
- Simplify form to just Name, URL/Pattern, Description

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 20:23:03 -07:00
Kelly
072388ffb2 fix(identity): Use unique session IDs for proxy rotation + add task pool gate
- Fix buildEvomiProxyUrl to use passed session ID from identity pool
  instead of truncating to worker+region (causing same IP for all workers)
- Add task pool gate feature with database-backed state
- Add /tasks/pool/toggle endpoint and UI toggle button
- Fix isTaskPoolPaused() missing await in claimTask

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 20:17:52 -07:00
Kelly
b456fe5097 feat: Add trusted origins management UI at /users
- Create trusted_origins table for DB-backed origin management
- Add API routes for CRUD operations on trusted origins
- Add tabbed interface on /users page with Users and Trusted Origins tabs
- Seeds default trusted origins (cannaiq.co, findadispo.com, findagram.co, etc.)
- Fix TypeScript error in WorkersDashboard fingerprint type

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 19:54:26 -07:00
Kelly
eb5b2a876e feat(pwa): Add update prompt for new versions
- Change registerType from 'autoUpdate' to 'prompt'
- Add UpdatePrompt component that shows when new version available
- Users see banner with "Update" or "Later" buttons
- Service worker checks for updates every hour

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-13 19:52:51 -07:00
Kelly
ad09aadcc9 feat(cannaiq): Add PWA support with vite-plugin-pwa
- Add vite-plugin-pwa for service worker and manifest generation
- Configure workbox for asset caching and API runtime caching
- Add sharp for icon generation from SVG
- Create generate-icons.js script to create 192x192 and 512x512 PNGs
- Update build script to auto-generate icons before build

App is now installable as a PWA with offline support.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-13 19:49:15 -07:00
Kelly
a020e31a46 feat(treez): CDP interception client for Elasticsearch API capture
Rewrites Treez platform client to use CDP (Chrome DevTools Protocol)
interception instead of DOM scraping. Key changes:

- Uses Puppeteer Stealth plugin to bypass headless detection
- Intercepts Elasticsearch API responses via CDP Network.responseReceived
- Captures full product data including inventory levels (availableUnits)
- Adds comprehensive TypeScript types for all Treez data structures
- Updates queries.ts with automatic session management
- Fixes product-discovery-treez handler for new API shape

Tested with Best Dispensary: 142 products across 10 categories captured
with inventory data, pricing, and lab results.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-13 19:25:49 -07:00
Kelly
83f629fec4 feat: Add identity pool for diverse IP/fingerprint rotation
- Add worker_identities table and metro_areas for city groupings
- Create IdentityPoolService for claiming/releasing identities
- Each identity used for 3-5 tasks, then 2-3 hour cooldown
- Integrate with task-worker via USE_IDENTITY_POOL feature flag
- Update puppeteer-preflight to accept custom proxy URLs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 18:46:58 -07:00
Kelly
d810592bf2 fix(ui): Show city from fingerprint data in worker dashboard
City was captured in preflight fingerprint JSON but not displayed.
Now falls back to fingerprint.detectedLocation.city if current_city is null.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 18:08:41 -07:00
Kelly
d02c347ef6 fix(proxy): Use correct Evomi host rp.evomi.com
Was using rpc.evomi.com which doesn't exist.
Residential proxies use rp.evomi.com:1000

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 17:35:43 -07:00
Kelly
d779a08bbf fix(preflight): Use Evomi proxy API before falling back to DB pool
The preflight was only checking DB proxy pool which was empty.
Now checks Evomi API first (when configured), then falls back to DB.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 17:14:23 -07:00
Kelly
1490c60d2a fix(ui): Update TasksDashboard badges for consistency
- Platform badge now shows green (emerald) for dutchie even when null/undefined
- State badge shows "ALL" (uppercase) with indigo color when no state specified
- Remove "(HTTP transport)" from store discovery description

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 17:10:57 -07:00
Kelly
ba15802a77 perf(puppeteer): Block analytics/tracking domains to save proxy bandwidth
Block requests to non-essential domains:
- googletagmanager.com, google-analytics.com (analytics)
- launchdarkly.com (feature flags)
- assets2.dutchie.com (CDN assets - we only need GraphQL)
- sentry.io (error tracking)
- segment.io/segment.com, amplitude.com, mixpanel.com (analytics)
- hotjar.com, fullstory.com (session recording)

Applied to both product-discovery-dutchie.ts and puppeteer-preflight.ts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 16:50:28 -07:00
Kelly
d8a22fba53 docs: Add Evomi residential proxy API documentation
- Document priority order (Evomi API first, DB fallback)
- List environment variables and defaults
- Show K8s secret location
- Explain proxy URL format with geo targeting

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 16:47:58 -07:00
Kelly
cf99ef9e09 fix(worker): Use Evomi API first, DB proxies as fallback
- Check Evomi API availability before waiting for DB proxies
- If EVOMI_USER/EVOMI_PASS configured, proceed immediately
- Only fall back to DB proxy polling if Evomi not configured
- Added clear comments explaining proxy initialization order

This fixes workers getting stuck waiting for DB proxies when
Evomi API is available for on-demand geo-targeted proxies.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 16:45:52 -07:00
Kelly
3d0ea21007 style(dashboard): Remove 'scraper-' prefix from worker IDs
Display 'worker-...' instead of 'scraper-worker-...' in admin view

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 16:41:33 -07:00
Kelly
023cfc127f fix(preflight): Apply stored fingerprint to task browser
- Add WorkerFingerprint interface with timezone, city, state, ip, locale
- Store fingerprint in TaskWorker after preflight passes
- Pass fingerprint through TaskContext to handlers
- Apply timezone via CDP and locale via Accept-Language header
- Ensures browser fingerprint matches proxy IP location

This fixes anti-detect detection where timezone/locale mismatch
with proxy IP was getting blocked by Cloudflare.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 16:40:52 -07:00
Kelly
5ea92e25af fix(dashboard): Restore preflight display with gold shield + geo
- Show gold shield icon with city/state for qualified workers
- Restore IP address, fingerprint, and antidetect status rows
- Keep geo session fields in worker interface

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 16:31:06 -07:00
Kelly
3b8171d94e feat(admin): Add platform badge and selector to schedules
- Add Platform column to schedules table with colored badges
  - Dutchie: emerald, Jane: pink, Treez: amber
- Add platform dropdown to schedule edit modal
- Add platform selector buttons to create task modal
- Store discovery tasks now pass state in payload

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 16:09:53 -07:00
Kelly
d7da0b938d feat(jane): Direct Algolia product fetch and multi-platform product-refresh
- Add fetchProductsByStoreIdDirect() for reliable Algolia product fetching
- Update product-discovery-jane to use direct Algolia instead of network interception
- Fix product-refresh handler to support both Dutchie and Jane payloads
  - Handle both `products` (Dutchie) and `hits` (Jane) formats
  - Use platform-appropriate raw_json structure for normalizers
  - Fix consecutive_misses tracking to use correct provider
  - Extract product IDs correctly (Dutchie _id vs Jane product_id)
- Add store discovery deduplication (prefer REC over MED at same location)
- Add storeTypes field to DiscoveredStore interface
- Add scripts: run-jane-store-discovery.ts, run-jane-product-discovery.ts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 16:05:50 -07:00
Kelly
88e590d026 feat: Worker geo sessions for state-based task assignment
Workers are now geo-locked to a specific state for their session:
- Session = 60 minutes OR 7 store visits (whichever comes first)
- Workers ONLY claim tasks matching their assigned state
- State assignment prioritizes: most pending tasks, fewest workers

Changes:
- Migration 108: geo session columns, claim_task with geo filter,
  assign_worker_geo(), check_worker_geo_session(), worker_state_capacity view
- task-worker.ts: ensureGeoSession() method before task claiming
- worker-registry.ts: /state-capacity and /geo-sessions API endpoints
- WorkersDashboard: Show qualified icon + geo state in Preflight column

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 16:00:09 -07:00
Kelly
c215d11a84 feat: Platform isolation, Evomi geo-targeting, proxy management
Platform isolation:
- Rename handlers to {task}-{platform}.ts convention
- Deprecate -curl variants (now _deprecated-*)
- Platform-based routing in task-worker.ts
- Add Jane platform handlers and client

Evomi geo-targeting:
- Add dynamic proxy URL builder with state/city targeting
- Session stickiness per worker per state (30 min)
- Fallback to static proxy table when API unavailable
- Add proxy tracking columns to worker_tasks

Proxy management:
- New /proxies admin page for visibility
- Track proxy_ip, proxy_geo, proxy_source per task
- Show active sessions and task history

Validation filtering:
- Filter by validated stores (platform_dispensary_id + menu_url)
- Mark incomplete stores as deprecated
- Update all dashboard/stats queries

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 15:16:48 -07:00
Kelly
59e0e45f8f feat(discovery): Add self-healing and rename schedule
- Rename 'store_discovery_dutchie' to 'Store Discovery' (platform badge via platform field)
- Add self-healing: scan for stores missing payloads and queue product_discovery
- Catches stores added before chaining was implemented
- Limits to 50 stores per run to avoid overwhelming the system

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 14:14:21 -07:00
Kelly
e9a688fbb3 feat(api): Add stage field to dispensary PUT endpoint
Allows updating dispensary stage via API for better data management.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 13:37:47 -07:00
Kelly
8b3ae40089 feat: Remove Run Now, add source tracking, optimize dashboard
- Remove /run-now endpoint (use task priority instead)
- Add source tracking to worker_tasks (source, source_schedule_id, source_metadata)
- Parallelize dashboard API calls (Promise.all)
- Add 1-5 min caching to /markets/dashboard and /national/summary
- Add performance indexes for dashboard queries

Migrations:
- 104: Task source tracking columns
- 105: Dashboard performance indexes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 13:23:35 -07:00
Kelly
a8fec97bcb feat: Support per-dispensary schedules (not just per-state)
- Add dispensary_id column to task_schedules table
- Update scheduler to handle single-dispensary schedules
- Update run-now endpoint to handle single-dispensary schedules
- Update frontend modal to pass dispensary_id when 1 store selected
- Fix existing "Deeply Rooted Hourly" schedule with dispensary_id=112

Now when you select ONE store and check "Make recurring", it creates
a schedule that runs for that specific store every interval.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 12:03:08 -07:00
Kelly
c969c7385b fix: Handle product_refresh and payload_fetch in run-now endpoint
The run-now endpoint only fanned out to stores for product_discovery
schedules, not product_refresh or payload_fetch. This caused single
tasks to be created without dispensary_id, which then failed.

Now all crawl roles (product_discovery, product_refresh, payload_fetch)
with state_code properly fan out to individual store tasks.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 03:49:10 -07:00
Kelly
5084cb1a85 fix: Block images/fonts/media in Puppeteer to save bandwidth
Add request interception to all Puppeteer handlers to block unnecessary
resources (images, fonts, media, stylesheets). We only need HTML/JS for
the session cookie, then the GraphQL JSON response.

This was causing 2.4GB of bandwidth from assets2.dutchie.com - every
page visit downloaded all product thumbnails, logos, etc.

Files updated:
- product-discovery-http.ts
- entry-point-discovery.ts
- store-discovery-http.ts
- store-discovery-state.ts
- puppeteer-preflight.ts

Note: Product images from payload are still downloaded once to MinIO
via image-storage.ts - this only blocks browser-rendered page images.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 03:28:12 -07:00
Kelly
ec6843dfd6 feat: Add working hours for natural traffic patterns
Workers check their timezone (from preflight IP geolocation) and current
hour's weight probability to determine availability. This creates natural
traffic patterns - more workers active during peak hours, fewer during
off-peak. Tasks queue up at night and drain during the day.

Migrations:
- 099: working_hours table with hourly weights by profile
- 100: Add timezone column to worker_registry
- 101: Store timezone from preflight IP geolocation
- 102: check_working_hours() function with probability roll

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 03:24:42 -07:00
Kelly
268429b86c feat: Use MinIO for permanent product image storage
- Rewrite image-storage.ts to use MinIO instead of ephemeral local filesystem
- Images downloaded ONCE from Dutchie CDN, stored permanently in MinIO
- Check MinIO before downloading (skipIfExists) to avoid re-downloads
- Convert images to webp before storage
- Storage path: images/products/<state>/<store>/<brand>/<product>/image-<hash>.webp
- Public URL: https://cdn.cannabrands.app/cannaiq/images/...

This fixes the 2.4GB bandwidth issue from repeatedly downloading images
that were lost when K8s pods restarted.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 03:24:22 -07:00
Kelly
5c08135007 feat(plugin): Add Elementor dynamic tags and product loop widget v1.7.0
WordPress Plugin:
- Add dynamic tags for all product payload fields (name, brand, price, THC, effects, etc.)
- Add Product Loop widget with filtering, sorting, and layout options
- Register CannaIQ widget category in Elementor
- Update build script to auto-upload to MinIO CDN
- Remove legacy dutchie references
- Bump version to 1.7.0

Backend:
- Redirect /downloads/* to CDN instead of serving from local filesystem

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 03:10:01 -07:00
Kelly
9f0d68d4c9 Revert "feat: Store full Dutchie payload in latest_raw_payload"
This reverts commit e11400566e.
2025-12-13 02:33:30 -07:00
Kelly
e11400566e feat: Store full Dutchie payload in latest_raw_payload
Now stores the complete raw product JSON from Dutchie on every
product refresh. This enables querying any Dutchie field
(terpenes, effects, description, etc.) without schema changes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 02:31:00 -07:00
Kelly
987ed062d5 feat(tasks): Add "Make recurring" toggle to Create Task modal
- Add checkbox to convert one-time task into recurring schedule
- When enabled, shows schedule name, interval, and state filter options
- Schedule runs immediately after creation so tasks appear right away
- Update button text to "Create Schedule & Run" when recurring
- Remove separate "New Schedule" button from schedules section
- Update empty state text to guide users to new flow

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 02:27:39 -07:00
Kelly
e50f54e621 revert: Remove brand aliasing migrations
Dutchie already provides unique brand UUIDs (provider_brand_id).
No need for separate brand normalization/aliasing logic.
Use provider_brand_id for brand grouping instead.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 02:16:05 -07:00
Kelly
983cd71fc2 feat: Performance optimizations and preflight improvements
- Add missing /api/analytics/national/summary endpoint
- Optimize dashboard activity queries (subquery vs JOIN+GROUP BY)
- Add PreflightSummary component to Workers page with gold qualified badge
- Add preflight retry logic - workers retry every 30s until qualified
- Run stale task cleanup on ALL workers (not just worker-0)
- Add preflight fields to worker-registry API (ip, fingerprint, is_qualified)

Database indexes added:
- idx_store_products_created_at (for recent products)
- idx_dispensaries_last_crawl_at (for recent scrapes)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 02:06:33 -07:00
Kelly
7849ee0256 feat: Add POST /api/tasks/fix-null-methods endpoint
Updates null method tasks to 'http' for proper worker qualification

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 01:33:09 -07:00
Kelly
432842f442 fix: Ensure all crawl tasks use method='http' transport
- product_discovery → product_refresh now sets method: 'http'
- product_refresh → entry_point_discovery now sets method: 'http'
- All crawl tasks now require HTTP preflight to claim

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 01:31:22 -07:00
Kelly
94ebbb2497 fix: State dropdown and locked platform in schedule modal
- State Code → State dropdown with available states
- Platform field locked to 'dutchie' (read-only)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 01:26:58 -07:00
Kelly
e826a4dd3e fix: Add consecutive_failures column to migration
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 01:20:59 -07:00
Kelly
b7e96359ef feat: Auto-retry failed tasks with exponential backoff
- Hard failures now auto-retry up to 3 times
- Exponential backoff: 5, 10, 20 minutes
- Only permanently fails after max retries exceeded
- Soft failures still requeue immediately

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 01:19:48 -07:00
Kelly
b1c1955082 feat: Add POST /api/tasks/retry-failed endpoint
Resets failed tasks back to pending for retry.
Options: role (filter), max_age_hours (default 24), limit (default 100)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 01:18:07 -07:00
Kelly
95c23fcdff fix: Add consecutive_successes column to dispensaries table
Required for stage checkpoint tracking in task handlers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 01:15:59 -07:00
Kelly
7067db68fc feat: Add server-side brand search to Intelligence page
- Backend: Add 'search' param to /api/admin/intelligence/brands
- Frontend: Debounced search triggers server-side query
- Now searches ALL brands, not just top 500

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 01:14:02 -07:00
Kelly
271faf0f00 perf: Optimize dashboard queries for faster load times
- Use pg_stat for approximate product count (instant vs full scan)
- LIMIT on DISTINCT queries for brand/category counts
- Single combined query (reduces round trips)
- Add index on store_product_snapshots.captured_at
- Add index on worker_tasks.worker_id and created_at

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 01:09:02 -07:00
Kelly
291a8279bd fix(entry-point-discovery): Self-healing duplicate detection
When resolving platform_dispensary_id, check if it already exists on
another dispensary. If so, mark current dispensary as duplicate instead
of failing with unique constraint violation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 00:46:10 -07:00
Kelly
b69d03c02f feat: Add stage checkpoints to task handlers and fix worker name display
Stage checkpoints (observational, non-blocking):
- product_refresh: success → 'production', failure tracking → 'failing' after 3
- product_discovery: success → 'hydrating', failure tracking
- entry_point_discovery: success → 'promoted', failure tracking

Worker name fix:
- Join worker_registry in tasks query to get friendly_name directly
- Update TasksDashboard to use worker_name from joined query
- Fallback to registry lookup then pod ID suffix

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 00:43:00 -07:00
Kelly
54f59c6082 fix(analytics): Fix market-summary store count and add search indexes
- market-summary now counts from store_products table (not product_variants)
- Added trigram indexes for fast ILIKE product searches

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 00:35:17 -07:00
Kelly
c16c3083b1 fix(cannaiq): Fix TasksDashboard worker API call
- Add getWorkerRegistry() method to API client
- Change TasksDashboard to use getWorkerRegistry() instead of non-existent getWorkers()

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 00:30:01 -07:00
Kelly
656b00332e fix: Show worker friendly names and calculate duration in Tasks page
- Fetch workers list to get friendly names
- Create workerNameMap lookup for friendly names
- Show friendly name (e.g., "Yuki") instead of pod suffix
- Calculate duration from started_at/completed_at when duration_sec is null
- Show elapsed time with "..." suffix for running tasks
- Also search by worker friendly name

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2025-12-13 00:21:02 -07:00
Kelly
843f6ded75 fix: Remove deploy status, autorefresh, and refresh button from Orchestrator
- Remove DeployStatus component import and usage
- Remove autoRefresh checkbox and refreshing state
- Remove refresh button
- Simplify header layout

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2025-12-13 00:19:03 -07:00
Kelly
0175a6817e fix: Fix chain badges and table styling in IntelligenceStores
- Add whitespace-nowrap to badges to prevent text wrapping
- Add border to chain badges for consistency
- Update table styling to match Workers/Tasks pages:
  - Uppercase headers with proper tracking
  - Consistent padding and text sizes
  - Right-align numeric columns (SKUs, Snapshots)
  - Purple Dashboard button matching Workers page style
- Increase table max-height for better visibility

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2025-12-13 00:17:58 -07:00
Kelly
24dd301d84 fix: State stores endpoint returns only Dutchie stores with products
- Filter by menu_type = 'dutchie'
- Use INNER JOIN + HAVING to only return stores with products
- Stores without product discovery are excluded

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 00:15:23 -07:00
Kelly
1d6211db19 perf: Add store_intelligence_cache for fast /intelligence/stores
- Remove costly correlated subquery (snapshot_count) from /stores endpoint
- Add migration 092 for store_intelligence_cache table
- Update analytics_refresh to populate cache with pre-computed metrics
- Add /intelligence/stores/cached endpoint using cache table

Performance: O(n*m) → O(1) for snapshot counts, ~10x faster response

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2025-12-13 00:13:41 -07:00
Kelly
e62f927218 feat: Auto-retry failed proxies after cooldown period
- Add last_failed_at column to track failure time
- Failed proxies auto-retry after 4 hours (configurable)
- Proxies permanently failed after 10 failures
- Add /retry-stats and /reenable-failed API endpoints
- markProxySuccess() re-enables recovered proxies

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 00:08:44 -07:00
Kelly
675f42841e feat: Import 500 Evomi residential proxies
- Update unique constraint to include username/password for session-based proxies
- All proxies imported as inactive (run Test All to verify and activate)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 00:05:21 -07:00
Kelly
472dbdf418 fix: Support http://host:port:user:pass proxy format in bulk import
Add regex pattern to parseProxyLine() for non-standard colon-separated
format used by some proxy providers (e.g., Evomi residential proxies).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 23:58:24 -07:00
Kelly
5fcc03aff4 style: Use lowercase state codes in API URLs
/api/state/az/summary instead of /api/state/AZ/summary
Backend already handles case conversion

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 23:54:14 -07:00
Kelly
2d489e068b fix: Correct mv_state_metrics to use brand_name_raw
- Changed unique_brands from COUNT(brand_id) to COUNT(brand_name_raw)
- brand_id is often NULL, brand_name_raw has actual data
- AZ now correctly shows 462 brands (was 144)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 23:50:53 -07:00
Kelly
470097eb19 fix: Intelligence stores endpoint and UI consistency
- Fix stores endpoint to only show stores with actual products (INNER JOIN + HAVING)
- Update badge colors to match Workers/Tasks dashboard style
- Use emerald/amber/red/gray color scheme consistently
- Chain badge now uses purple (bg-purple-100)
- Add migration 092 to fix Trulieve store URLs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 23:37:28 -07:00
Kelly
5af86edf83 feat: Update last_payload_at and last_store_discovery_at timestamps
- payload-storage.ts: Update dispensaries.last_payload_at when saving payload
- promotion.ts: Update dispensaries.last_store_discovery_at on INSERT/UPDATE

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2025-12-12 23:30:57 -07:00
Kelly
55b26e9153 feat: Auto-healing entry_point_discovery with browser-first transport
- Rewrote entry_point_discovery with auto-healing scheme:
  1. Check dutchie_discovery_locations for existing platform_location_id
  2. Browser-based GraphQL with 5x network retries
  3. Mark as needs_investigation on hard failure
- Browser (Puppeteer) is now DEFAULT transport - curl only when explicit
- Added migration 091 for tracking columns:
  - last_store_discovery_at: When store_discovery updated record
  - last_payload_at: When last product payload was saved
- Updated CODEBASE_MAP.md with transport rules documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2025-12-12 22:55:21 -07:00
Kelly
97bfdb9618 fix: Show worker friendly names in Live Activity panel 2025-12-12 22:37:40 -07:00
Kelly
6f49c5e84a fix: Use database task counts for Completed/Failed stats on Workers page 2025-12-12 22:36:37 -07:00
Kelly
a6f09ee6e3 fix: Calculate stale task count from heartbeat age 2025-12-12 22:15:16 -07:00
Kelly
c62f8cbf06 feat: Parallelized store discovery, modification tracking, and task deduplication
Store Discovery Parallelization:
- Add store_discovery_state handler for per-state parallel discovery
- Add POST /api/tasks/batch/store-discovery endpoint
- 8 workers can now process states in parallel (~30-45 min vs 3+ hours)

Modification Tracking (Migration 090):
- Add last_modified_at, last_modified_by_task, last_modified_task_id to dispensaries
- Add same columns to store_products
- Update all handlers to set tracking info on modifications

Stale Task Recovery:
- Add periodic stale cleanup every 10 minutes (worker-0 only)
- Prevents orphaned tasks from blocking queue after worker crashes

Task Deduplication:
- createStaggeredTasks now skips if pending/active task exists for same role
- Skips if same role completed within last 4 hours
- API responses include skipped count

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2025-12-12 22:15:04 -07:00
kelly
e4e8438d8b Merge pull request 'feat: Worker improvements and Run Now duplicate prevention' (#64) from feat/minio-payload-storage into master 2025-12-13 03:35:48 +00:00
Kelly
822d2b0609 feat: Idempotent entry_point_discovery with bulk endpoint
- Track id_resolution_status, attempts, and errors in handler
- Add POST /api/tasks/batch/entry-point-discovery endpoint
- Skip already-resolved stores, retry failed with force flag
2025-12-12 20:27:36 -07:00
Kelly
dfd36dacf8 fix: Show next run time correctly for schedules 2025-12-12 20:21:15 -07:00
Kelly
4ea7139ed5 feat: Add step reporting to all task handlers
Added updateStep() calls to:
- payload-fetch-curl: loading → preflight → fetching → saving
- product-refresh: loading → normalizing → upserting
- store-discovery-http: starting → preflight → navigating → fetching

This enables real-time visibility of worker progress in the dashboard.
2025-12-12 20:14:00 -07:00
Kelly
63023a4061 feat: Worker improvements and Run Now duplicate prevention
- Fix Run Now to prevent duplicate task creation
- Add loading state to Run Now button in UI
- Return early when no stores need refresh
- Worker dashboard improvements
- Browser pooling architecture updates
- K8s worker config updates (8 replicas, 3 concurrent tasks)
2025-12-12 20:11:31 -07:00
kelly
13a80e893e Merge pull request 'feat: Add MinIO/S3 support for payload storage' (#63) from feat/minio-payload-storage into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/63
2025-12-12 19:00:29 +00:00
Kelly
c98c409f59 feat: Add MinIO/S3 support for payload storage
- Update payload-storage.ts to use MinIO when configured
- Payloads stored at: cannaiq/payloads/{year}/{month}/{day}/store_{id}_{ts}.json.gz
- Falls back to local filesystem when MINIO_* env vars not set
- Enables shared storage across all worker pods
- Fixes ephemeral storage issue where payloads were lost on pod restart

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 11:30:57 -07:00
kelly
6c8993f7bd Merge pull request 'fix(workers): Increase max concurrent tasks to 15' (#62) from feat/proxy-reload-and-bulk-import into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/62
2025-12-12 18:19:04 +00:00
Kelly
92f88fdcd6 fix(workers): Increase max concurrent tasks to 15 and add K8s permission rule
- Change MAX_CONCURRENT_TASKS default from 3 to 15
- Add CLAUDE.md rule requiring explicit permission before kubectl commands

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 10:54:33 -07:00
kelly
fd4a9b1434 Merge pull request 'feat(scheduler): Immutable schedules and HTTP-only pipeline' (#61) from feat/proxy-reload-and-bulk-import into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/61
2025-12-12 16:37:16 +00:00
Kelly
832ef1cf83 feat(scheduler): Immutable schedules and HTTP-only pipeline
## Changes
- **Migration 089**: Add is_immutable and method columns to task_schedules
  - Per-state product_discovery schedules (4h default)
  - Store discovery weekly (168h)
  - All schedules use HTTP transport (Puppeteer/browser)
- **Task Scheduler**: HTTP-only product discovery with per-state scheduling
  - Each state has its own immutable schedule
  - Schedules can be edited (interval/priority) but not deleted
- **TasksDashboard UI**: Full immutability support
  - Lock icon for immutable schedules
  - State and Method columns in schedules table
  - Disabled delete for immutable, restricted edit fields
- **Store Discovery HTTP**: Auto-queue product_discovery for new stores
- **Migration 088**: Discovery payloads storage schema

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 09:24:08 -07:00
kelly
b05eaceaf0 Merge pull request 'feat(tasks): Dual transport handlers and self-healing product_refresh' (#60) from feat/proxy-reload-and-bulk-import into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/60
2025-12-12 10:33:13 +00:00
kelly
909470d3dc Merge pull request 'fix(proxy): Convert non-standard proxy URL format and simplify preflight' (#59) from feat/proxy-reload-and-bulk-import into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/59
2025-12-12 10:03:14 +00:00
Kelly
9a24b4896c feat(tasks): Dual transport handlers and self-healing product_refresh
- Rename product-discovery.ts to product-discovery-curl.ts (axios-based)
- Rename payload-fetch.ts to payload-fetch-curl.ts
- Add product-discovery-http.ts (Puppeteer browser-based handler)
- Add method field to CreateTaskParams for transport selection
- Update task-service to insert method column on task creation
- Update task-worker with getHandlerForTask() for dual transport routing
- product_refresh now queues upstream tasks when no payload exists:
  - Has platform_dispensary_id → queues product_discovery (http)
  - No platform_dispensary_id → queues entry_point_discovery

This enables HTTP workers to pick up browser-based tasks while curl
workers handle axios-based tasks, and prevents product_refresh from
failing repeatedly when no crawl has been performed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 03:02:56 -07:00
Kelly
dd8fce6e35 fix(proxy): Convert non-standard proxy URL format and simplify preflight
- CrawlRotator.getProxyUrl() now converts non-standard format (http://host:port:user:pass) to standard format (http://user:pass@host:port)
- Simplify puppeteer preflight to only use ipify.org for IP verification (much lighter than fingerprint.com)
- Remove heavy anti-detect site tests from preflight - not needed, trust stealth plugin
- Fixes 503 errors when using session-based residential proxies

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 02:13:51 -07:00
kelly
65b96d9cb9 Merge pull request 'feat(workers): Add proxy reload, staggered tasks, and bulk proxy import' (#58) from feat/proxy-reload-and-bulk-import into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/58
2025-12-12 09:11:23 +00:00
Kelly
f82eed4dc3 feat(workers): Add proxy reload, staggered tasks, and bulk proxy import
- Periodic proxy reload: Workers now reload proxies every 60s to pick up changes
- Staggered task scheduling: New API endpoints for creating tasks with delays
- Bulk proxy import: Script supports multiple URL formats including host:port:user:pass
- Proxy URL column: Migration 086 adds proxy_url for non-standard formats

Key changes:
- crawl-rotator.ts: Added reloadIfStale(), isStale(), setReloadInterval()
- task-worker.ts: Calls reloadIfStale() in main loop
- task-service.ts: Added createStaggeredTasks() and createAZStoreTasks()
- tasks.ts: Added POST /batch/staggered and /batch/az-stores endpoints
- import-proxies.ts: New script for bulk proxy import
- CLAUDE.md: Documented staggered task workflow

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 01:53:15 -07:00
kelly
d997ec51a2 Merge pull request 'feat(tasks): Consolidate schedule management into task_schedules' (#57) from feat/task-schedules-consolidation into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/57
2025-12-12 08:31:29 +00:00
Kelly
6490df9faf feat(tasks): Consolidate schedule management into task_schedules
- Add schedule CRUD endpoints to /api/tasks/schedules
- Add Schedules section to TasksDashboard with edit/delete/bulk actions
- Deprecate job_schedules table (entries disabled in DB)
- Mark CrawlSchedulePage as deprecated (removed from menu)
- Add deprecation comments to legacy schedule methods in api.ts
- Add migration comments to workers.ts explaining consolidation

Key changes:
- Schedule management now at /admin/tasks instead of /admin/schedule
- task_schedules uses interval_hours (simpler than base_interval_minutes + jitter)
- All schedule routes placed before /:id to avoid Express route conflicts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 01:15:21 -07:00
kelly
d86190912f Merge pull request 'feat(api): Add payload query API and trusted origins management' (#51) from feat/query-api-and-trusted-origins into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/51
2025-12-12 07:49:54 +00:00
kelly
a077f81c65 Merge pull request 'fix(preflight): Phase 2 - Correct parameter order and add IP/fingerprint reporting' (#56) from feat/preflight-phase2-reporting into master 2025-12-12 07:35:02 +00:00
Kelly
6bcadd9e71 fix(preflight): Correct parameter order and add IP/fingerprint reporting
- Fix update_worker_preflight call to use correct parameter order:
  (worker_id, transport, status, ip, response_ms, error, fingerprint)
- Add proxyIp to both curl and http preflight reports
- Add fingerprint JSONB with timezone, location, and bot detection data
- Log HTTP IP and timezone after preflight completes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 00:32:45 -07:00
kelly
a77bf8611a Merge pull request 'feat(workers): Preflight phase 1 - Schema, StatefulSet, and timezone matching' (#55) from feat/preflight-phase1-schema into master 2025-12-12 07:30:53 +00:00
Kelly
33feca3138 fix(antidetect): Match browser timezone to proxy IP location
- Add IP geolocation lookup via ip-api.com to get timezone from proxy IP
- Use ipify.org API for reliable proxy IP detection (replaces unreliable fingerprint.com scraping)
- Set browser timezone via CDP Emulation.setTimezoneOverride to match proxy location
- Add detectedTimezone and detectedLocation to preflight result
- Add /api/worker-registry/preflight-test endpoint for smoke testing

Fixes timezone mismatch where browser showed America/Phoenix while proxy was in America/New_York

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 00:25:39 -07:00
kelly
7d85a97b63 Merge pull request 'feat: Preflight schema and StatefulSet' (#54) from feat/preflight-phase1-schema into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/54
2025-12-12 07:14:40 +00:00
Kelly
ce081effd4 feat(workers): Add preflight schema and StatefulSet
- Migration 085: Add curl_ip, http_ip, fingerprint_data, preflight_status,
  preflight_at columns to worker_registry
- StatefulSet manifest for 8 persistent workers with OnDelete update strategy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 23:45:04 -07:00
Kelly
daab0ae9b2 feat(api): Add payload query API and trusted origins management
Query API:
- GET /api/payloads/store/:id/query - Filter products with flexible params
  (brand, category, price_min/max, thc_min/max, search, sort, pagination)
- GET /api/payloads/store/:id/aggregate - Group by brand/category with metrics
  (count, avg_price, min_price, max_price, avg_thc, in_stock_count)
- Documentation at docs/QUERY_API.md

Trusted Origins Admin:
- GET/POST/PUT/DELETE /api/admin/trusted-origins - Manage auth bypass list
- Trusted IPs, domains, and regex patterns stored in DB
- 5-minute cache with invalidation on admin updates
- Fallback to hardcoded defaults if DB unavailable
- Migration 085 creates table with seed data

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 23:28:05 -07:00
kelly
2ed088b4d8 Merge pull request 'feat(api): Add preflight columns to worker registry API response' (#50) from feat/preflight-api-fields into master 2025-12-12 06:22:06 +00:00
Kelly
d3c49fa246 feat(api): Add preflight columns to worker registry API response
Exposes curl_ip, http_ip, preflight_status, preflight_at, and fingerprint_data
in the /api/worker-registry/workers response.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 23:12:55 -07:00
kelly
52cb5014fd Merge pull request 'feat: Dual-transport preflight system for worker fingerprinting' (#49) from fix/ci-and-preflight-enforcement into master 2025-12-12 06:08:13 +00:00
Kelly
50654be910 fix: Restore hydration and product_refresh for store updates
- Moved hydration module back from _deprecated (needed for product_refresh)
- Restored product_refresh handler for processing stored payloads
- Restored geolocation service for findadispo/findagram
- Stubbed system routes that depend on deprecated SyncOrchestrator
- Removed crawler-sandbox route (deprecated)
- Fixed all TypeScript compilation errors

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 23:03:39 -07:00
Kelly
cdab71a1ee feat(workers): Add dual-transport preflight system
Workers now run both curl and http (Puppeteer) preflights on startup:
- curl-preflight.ts: Tests axios + proxy via httpbin.org
- puppeteer-preflight.ts: Tests browser + StealthPlugin via fingerprint.com
  (with amiunique.org fallback)
- Migration 084: Adds preflight columns to worker_registry and method
  column to worker_tasks
- Workers report preflight status, IP, fingerprint, and response time
- Tasks can require specific transport method (curl/http)
- Dashboard shows Transport column with preflight status badges

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 22:47:52 -07:00
Kelly
a35976b9e9 chore: Clean up deprecated code and docs
- Move deprecated directories to src/_deprecated/:
  - hydration/ (old pipeline approach)
  - scraper-v2/ (old Puppeteer scraper)
  - canonical-hydration/ (merged into tasks)
  - Unused services: availability, crawler-logger, geolocation, etc
  - Unused utils: age-gate-playwright, HomepageValidator, stealthBrowser

- Archive outdated docs to docs/_archive/:
  - ANALYTICS_RUNBOOK.md
  - ANALYTICS_V2_EXAMPLES.md
  - BRAND_INTELLIGENCE_API.md
  - CRAWL_PIPELINE.md
  - TASK_WORKFLOW_2024-12-10.md
  - WORKER_TASK_ARCHITECTURE.md
  - ORGANIC_SCRAPING_GUIDE.md

- Add docs/CODEBASE_MAP.md as single source of truth
- Add warning files to deprecated/archived directories
- Slim down CLAUDE.md to essential rules only

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 22:17:40 -07:00
kelly
c68210c485 Merge pull request 'fix(ci): Remove buildx cache and add preflight enforcement' (#48) from fix/ci-and-preflight-enforcement into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/48
2025-12-12 04:53:20 +00:00
Kelly
f2864bd2ad fix(ci): Remove buildx cache and add preflight enforcement
- Remove cache_from/cache_to from CI (plugin bug splitting commas)
- Add preflight() method to CrawlRotator - tests proxy + anti-detect
- Add pre-task preflight check - workers MUST pass before executing
- Add releaseTask() to release tasks back to pending on preflight fail
- Rename proxy_test task to whoami for clarity

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 21:37:22 -07:00
kelly
eca9e85242 Merge pull request 'fix(ci): Fix buildx cache syntax and add proxy_test task' (#47) from feat/ui-polish-and-ci-caching into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/47
2025-12-12 04:30:21 +00:00
Kelly
3f958fbff3 fix(ci): Fix buildx cache_from syntax for array format
Plugin was splitting comma-separated values incorrectly.
Use array format with quoted strings instead.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 21:10:26 -07:00
Kelly
c84ef0396b feat(tasks): Add proxy_test task handler and discovery run tracking
- Add proxy_test task handler that fetches IP via proxy to verify connectivity
- Add discovery_runs migration (083) for tracking store discovery progress
- Register proxy_test in task service and worker

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 21:07:58 -07:00
kelly
e1c67dcee5 Merge pull request 'feat: UI polish and CI caching improvements' (#46) from feat/ui-polish-and-ci-caching into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/46
2025-12-12 03:47:46 +00:00
kelly
34c8a8cc67 Merge pull request 'feat(cannaiq): Add clickable logo, favicon, and remove state selector' (#45) from feat/cannaiq-ui-polish into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/45
2025-12-12 03:36:04 +00:00
Kelly
6cd1f55119 fix(workers): Preserve fantasy names on pod restart
- Re-registration no longer overwrites pod_name with K8s name
- New workers get fantasy name (Aethelgard, Xylos, etc.) as pod_name
- Document worker naming convention in CLAUDE.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 20:35:25 -07:00
Kelly
e918234928 feat(ci): Add npm cache volume for faster typechecks
- Create PVC for shared npm cache across CI jobs
- Configure Woodpecker agent to allow npm-cache volume mount
- Update typecheck steps to use shared cache directory
- First run populates cache, subsequent runs are ~3-4x faster

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 19:27:32 -07:00
Kelly
888a608485 feat(cannaiq): Add clickable logo, favicon, and remove state selector
- Make CannaIQ logo clickable to return to dashboard (sidebar + mobile header)
- Add custom favicon matching the logo design
- Remove state selector dropdown from sidebar navigation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 19:24:21 -07:00
kelly
b5c3b05246 Merge pull request 'fix(workers): Fix false memory backoff and add backing-off color coding' (#44) from fix/worker-memory-backoff into master 2025-12-12 02:13:51 +00:00
Kelly
fdce5e0302 fix(workers): Fix false memory backoff and add backing-off color coding
- Fix memory calculation to use max-old-space-size (1500MB) instead of
  V8's dynamic heapTotal. This prevents false 95%+ readings when idle.
- Add yellow color for backing-off workers in pod visualization
- Update legend and tooltips with backing-off status
- Remove pool toggle from TasksDashboard (moved to Workers page)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 19:11:42 -07:00
Kelly
4679b245de perf(ci): Enable Docker layer caching for faster builds
Add cache_from and cache_to settings to all docker-buildx steps.
Uses registry-based caching to avoid rebuilding npm install layer
when package.json hasn't changed.

Expected improvement: 14min backend build → ~3-4min on cache hit.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 18:43:43 -07:00
kelly
a837070f54 Merge pull request 'refactor(admin): Consolidate JobQueue into TasksDashboard + CI worker resilience' (#43) from fix/ci-worker-resilience into master 2025-12-12 01:21:28 +00:00
Kelly
5a929e9803 refactor(admin): Consolidate JobQueue into TasksDashboard
- Move Create Task modal from JobQueue to TasksDashboard
- Add pagination to TasksDashboard (25 tasks per page)
- Add delete action for failed/completed/pending tasks
- Remove JobQueue page and route
- Rename nav item from "Task Queue" to "Tasks"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 17:41:21 -07:00
kelly
52b0fad410 Merge pull request 'ci: Add worker resilience check to deploy step' (#42) from fix/ci-worker-resilience into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/42
2025-12-12 00:09:48 +00:00
Kelly
9944031eea ci: Add worker resilience check to deploy step
If workers are scaled to 0, CI will now automatically scale them to 5
before updating the image. This prevents workers from being stuck at 0
if manually scaled down for maintenance.

The check only scales up if replicas=0, so it won't interfere with
normal deployments or HPA scaling.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 16:40:53 -07:00
kelly
2babaa7136 Merge pull request 'ci: Remove explicit migration step from deploy' (#41) from fix/ci-remove-migration-step into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/41
2025-12-11 23:11:48 +00:00
Kelly
90567511dd ci: Remove explicit migration step from deploy
Auto-migrate runs at server startup and handles migration errors gracefully.
The explicit kubectl exec migration step was failing due to trigger
already existing (schema_migrations table out of sync).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 15:56:47 -07:00
kelly
beb16ad0cb Merge pull request 'ci: Run migrations via kubectl exec instead of separate step' (#40) from fix/ci-migrate-via-kubectl into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/40
2025-12-11 22:38:21 +00:00
Kelly
fc7fc5ea85 ci: Run migrations via kubectl exec instead of separate step
Removes the migrate step that required db_* secrets (which CI can't
access since postgres is cluster-internal). Instead, run migrations
via kubectl exec on the deployed scraper pod, which already has DB
access via its env vars.

Deploy order:
1. Deploy scraper image
2. Wait for rollout
3. Run migrations via kubectl exec
4. Deploy remaining services

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 15:16:12 -07:00
kelly
ab8956b14b Merge pull request 'fix: Revert CI event array syntax to single value' (#39) from fix/ci-event-syntax into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/39
2025-12-11 22:10:03 +00:00
Kelly
1d9c90641f fix: Revert event array syntax to single value
The [push, manual] array syntax broke CI config parsing.
Reverting to event: push which is known to work.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 14:48:34 -07:00
kelly
6126b907f2 Merge pull request 'ci: Support manual pipeline events for deploy' (#38) from ci/support-manual-pipelines into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/38
2025-12-11 21:12:26 +00:00
Kelly
cc93d2d483 ci: Support manual pipeline events for deploy
Allow deploy steps to run on both push and manual events.
This enables triggering deploys via `woodpecker-cli pipeline create`.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 13:54:04 -07:00
kelly
7642c17ec0 Merge pull request 'ci: Fix pipeline config path' (#37) from fix/ci-config-path into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/37
2025-12-11 20:01:30 +00:00
Kelly
cb60dcf352 ci: Add root-level woodpecker config for CI detection 2025-12-11 12:37:35 -07:00
kelly
5ffe05d519 Merge pull request 'feat: Concurrent task processing with resource-based backoff' (#36) from feat/worker-scaling into master 2025-12-11 18:49:01 +00:00
Kelly
8e2f07c941 feat(workers): Concurrent task processing with resource-based backoff
Workers can now process multiple tasks concurrently (default: 3 max).
Self-regulate based on resource usage - back off at 85% memory or 90% CPU.

Backend changes:
- TaskWorker handles concurrent tasks using async Maps
- Resource monitoring (memory %, CPU %) with backoff logic
- Heartbeat reports active_task_count, max_concurrent_tasks, resource stats
- Decommission support via worker_commands table

Frontend changes:
- Workers Dashboard shows tasks per worker (N/M format)
- Resource badges with color-coded thresholds
- Pod visualization with clickable selection
- Decommission controls per worker

New env vars:
- MAX_CONCURRENT_TASKS (default: 3)
- MEMORY_BACKOFF_THRESHOLD (default: 0.85)
- CPU_BACKOFF_THRESHOLD (default: 0.90)
- BACKOFF_DURATION_MS (default: 10000)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 11:47:24 -07:00
kelly
0b6e615075 Merge pull request 'fix: Use React Router Link to prevent scroll reset' (#35) from feat/worker-scaling into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/35
2025-12-11 17:14:25 +00:00
Kelly
be251c6fb3 fix: Use React Router Link for nav to prevent scroll reset
Changed sidebar NavLink from <a> to <Link> for client-side navigation.
This prevents full page reload and scroll position reset.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 09:59:34 -07:00
kelly
efb1e89e33 Merge pull request 'fix: Show only git SHA in header' (#34) from feat/worker-scaling into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/34
2025-12-11 16:56:04 +00:00
Kelly
529c447413 refactor: Move worker scaling to Workers page with password confirmation
- Worker scaling controls now on /workers page only (removed from /tasks)
- Password confirmation required before scaling
- Show only git SHA in header (removed version number)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 09:41:58 -07:00
Kelly
1eaf95c06b fix: Show only git SHA in header, remove version number
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 09:35:41 -07:00
kelly
138ed17d8b Merge pull request 'feat: Worker scaling from admin UI with password confirmation' (#33) from feat/worker-scaling into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/33
2025-12-11 16:29:04 +00:00
Kelly
a880c41d89 feat: Add password confirmation for worker scaling + RBAC
- Add /api/auth/verify-password endpoint for re-authentication
- Add PasswordConfirmModal component for sensitive actions
- Worker scaling (+/-) now requires password confirmation
- Add RBAC (ServiceAccount, Role, RoleBinding) for scraper pod
- Scraper pod can now read/scale worker deployment via k8s API

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 09:16:27 -07:00
Kelly
2a9ae61dce fix(ui): Make Tasks Dashboard header sticky
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 09:04:21 -07:00
kelly
1f21911fa1 Merge pull request 'feat(admin): Worker scaling controls via k8s API' (#32) from feat/worker-scaling into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/32
2025-12-11 16:01:25 +00:00
Kelly
6f0a58f5d2 fix(k8s): Correct API call signatures for k8s client v1.4
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 08:47:27 -07:00
Kelly
8206dce821 feat(admin): Worker scaling controls via k8s API
- Add /api/k8s/workers endpoint to get deployment status
- Add /api/k8s/workers/scale endpoint to scale replicas (0-50)
- Add worker scaling UI to Tasks Dashboard (+/- 5 workers)
- Shows ready/desired replica count
- Uses in-cluster config in k8s, kubeconfig locally

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 08:24:32 -07:00
kelly
ced1afaa8a Merge pull request 'fix(ci): CI config fix + 25 workers + pool starts paused' (#31) from fix/ci-filename into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/31
2025-12-11 08:47:26 +00:00
Kelly
d6c602c567 fix(ui): Remove Cleanup Stale button from Workers page
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 01:29:25 -07:00
Kelly
a252a7fefd feat(tasks): 25 workers, pool starts paused by default
- Increase worker replicas from 5 to 25
- Task pool now starts PAUSED on deploy, admin must click Start Pool
- Prevents workers from grabbing tasks before system is ready

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 01:19:02 -07:00
Kelly
83b06c21cc fix(tasks): Stop spinner in status cards when pool is paused
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 01:14:25 -07:00
kelly
f5214da54c Merge pull request 'fix(ci): Fix Woodpecker config - remove invalid top-level when' (#30) from fix/ci-filename into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/30
2025-12-11 08:07:16 +00:00
Kelly
e3d4dd0127 fix(ci): Fix Woodpecker config - remove invalid top-level when
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 00:45:08 -07:00
kelly
d0ee0d72f5 Merge pull request 'feat(tasks): Add task pool start/stop toggle' (#29) from feat/task-pool-toggle into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/29
2025-12-11 07:21:02 +00:00
kelly
521f0550cd Merge pull request 'feat: Admin UI cleanup and dispensary schedule page' (#28) from fix/worker-proxy-wait into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/28
2025-12-11 07:08:03 +00:00
Kelly
8a09691e91 feat(tasks): Add task pool start/stop toggle
- Add task-pool-state.ts for shared pause/resume state
- Add /api/tasks/pool/status, pause, resume endpoints
- Add Start/Stop Pool toggle button to TasksDashboard
- Spinner stops when pool is closed
- Fix is_active column name in store-discovery.ts
- Fix missing active column in task-service.ts claimTask
- Auto-refresh every 15 seconds

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 00:07:14 -07:00
Kelly
459ad7d9c9 fix(tasks): Fix missing column errors in task queries
- Change 'active' to 'is_active' in states table query (store-discovery.ts)
- Remove non-existent 'active' column check from worker_tasks query (task-service.ts)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 23:54:28 -07:00
Kelly
d102d27731 feat(admin): Dispensary schedule page and UI cleanup
- Add DispensarySchedule page showing crawl history and upcoming schedule
- Add /dispensaries/:state/:city/:slug/schedule route
- Add API endpoint for store crawl history
- Update View Schedule link to use dispensary-specific route
- Remove colored badges from DispensaryDetail product table (plain text)
- Make Details button ghost style in product table
- Add "Sort by States" option to IntelligenceBrands
- Remove status filter dropdown from Dispensaries page

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 23:50:47 -07:00
kelly
01810c40a1 Merge pull request 'fix(worker): Wait for proxies instead of crashing' (#27) from fix/worker-proxy-wait into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/27
2025-12-11 06:43:29 +00:00
Kelly
b7d33e1cbf fix(admin): Clean up store detail and intelligence pages
- Remove Update dropdown from DispensaryDetail page
- Remove Crawl Now button from StoreDetailPage
- Change "Last Crawl" to "Last Updated" on both detail pages
- Tone down emerald colors on StoreDetailPage (use gray borders/tabs)
- Simplify THC/CBD/Stock badges to plain text
- Remove duplicate state dropdown from IntelligenceStores filters
- Make store rows clickable in IntelligenceStores

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 23:42:50 -07:00
Kelly
5b34b5a78c fix(admin): Consistent navigation across Intelligence pages
- Add state selector dropdown to all three Intelligence pages (Brands, Stores, Pricing)
- Use consistent emerald-styled page navigation badges with current page highlighted
- Remove Refresh buttons from all Intelligence pages
- Update chart styling to use emerald gradient bars (matching Pricing page)
- Load all available states from orchestrator API instead of extracting from local data
- Fix z-index and styling on state dropdown for better visibility

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 23:32:57 -07:00
Kelly
c091d2316b fix(dashboard): Remove refresh button and HealthPanel
- Removed refresh button and refreshing state from Dashboard
- Removed HealthPanel component (deploy status auto-refresh)
- Simplified header layout

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 23:28:18 -07:00
Kelly
e8862b8a8b fix(national): Remove Refresh Metrics button
Removed unused refresh button and related state/handlers from
National Dashboard.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 23:27:01 -07:00
Kelly
1b46ab699d fix(national): Show all states count, not filtered "active" states
The "Active States" metric was arbitrary and confusing. Changed to
show total states count - all states in the system regardless of
whether they have data or not.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 23:25:50 -07:00
Kelly
ac1995f63f fix(pricing): Simplify category chart to prevent overflow
- Replace complex price range bars with simple horizontal bars
- Use overflow-hidden to prevent bars extending beyond container
- Calculate bar width as percentage of max avg price
- Limit to top 12 categories for cleaner display
- Fixed-width labels prevent layout shift

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 23:24:31 -07:00
Kelly
de93669652 fix(national): Count active states by product data, not crawl status
Active states should count states with actual product data, not just
states where crawling is enabled. A state can have historical data
even if crawling is currently disabled.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 23:23:20 -07:00
Kelly
dffc124920 fix(national): Fix active states count and remove StateBadge
- Change active_states to count states with crawl_enabled=true dispensaries
- Filter all national summary queries by crawl_enabled=true
- Remove unused StateBadge from National Dashboard header
- StateBadge was showing "Arizona" with no way to change it

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 23:22:19 -07:00
Kelly
932ceb0287 feat(intelligence): Add state filter to all Intelligence pages
- Add state filter to Intelligence Brands API and frontend
- Add state filter to Intelligence Pricing API and frontend
- Add state filter to Intelligence Stores API and frontend
- Fix null safety issues with toLocaleString() calls
- Update backend /stores endpoint to return skuCount, snapshotCount, chainName
- Add overall stats to pricing endpoint

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 23:19:54 -07:00
Kelly
824d48fd85 fix: Add curl to Docker, add active flag to worker_tasks
- Install curl in Docker container for Dutchie HTTP requests
- Add 'active' column to worker_tasks (default false) to prevent
  accidental task execution on startup
- Update task-service to only claim tasks where active=true

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 23:12:09 -07:00
Kelly
47fdab0382 fix: Filter orchestrator states by crawl_enabled
The states dropdown was showing count of ALL dispensaries instead of
just crawl-enabled ones. Now correctly filters to match the actual
stores that would be displayed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 23:09:04 -07:00
Kelly
ed7ddc6375 ci: Add database migration step to deploy pipeline
Migrations now run automatically before deployments:
1. Build new Docker image
2. Run migrations using the new image
3. Deploy to Kubernetes

Requires new secrets: db_host, db_port, db_name, db_user, db_pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 23:07:25 -07:00
Kelly
cf06f4a8c0 feat(worker): Listen for proxy_added notifications
- Workers now use PostgreSQL LISTEN/NOTIFY to wake up immediately when proxies are added
- Added trigger on proxies table to NOTIFY 'proxy_added' when active proxy inserted/updated
- Falls back to 30s polling if LISTEN fails

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 22:58:00 -07:00
Kelly
a2fa21f65c fix(worker): Wait for proxies instead of crashing on startup
- Task worker now waits up to 60 minutes for active proxies
- Retries every 30 seconds with clear logging
- Updated K8s scraper-worker.yaml with Deployment definition
- Deployment uses task-worker.js entrypoint with correct liveness probe

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 22:55:04 -07:00
kelly
61e915968f Merge pull request 'feat(tasks): Refactor task workflow with payload/refresh separation' (#26) from feat/task-workflow-refactor into master 2025-12-11 05:24:11 +00:00
Kelly
4949b22457 feat(tasks): Refactor task workflow with payload/refresh separation
Major changes:
- Split crawl into payload_fetch (API → disk) and product_refresh (disk → DB)
- Add task chaining: store_discovery → product_discovery → payload_fetch → product_refresh
- Add payload storage utilities for gzipped JSON on filesystem
- Add /api/payloads endpoints for payload access and diffing
- Add DB-driven TaskScheduler with schedule persistence
- Track newDispensaryIds through discovery promotion for chaining
- Add stealth improvements: HTTP fingerprinting, proxy rotation enhancements
- Add Workers dashboard K8s scaling controls

New files:
- src/tasks/handlers/payload-fetch.ts - Fetches from API, saves to disk
- src/services/task-scheduler.ts - DB-driven schedule management
- src/utils/payload-storage.ts - Payload save/load utilities
- src/routes/payloads.ts - Payload API endpoints
- src/services/http-fingerprint.ts - Browser fingerprint generation
- docs/TASK_WORKFLOW_2024-12-10.md - Complete workflow documentation

Migrations:
- 078: Proxy consecutive 403 tracking
- 079: task_schedules table
- 080: raw_crawl_payloads table
- 081: payload column and last_fetch_at

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 22:15:35 -07:00
Kelly
1fb0eb94c2 security: Add authMiddleware to analytics-v2 routes
- Add authMiddleware to analytics-v2.ts to require authentication
- Add permanent rule #6 to CLAUDE.md: "ALL API ROUTES REQUIRE AUTHENTICATION"
- Add forbidden action #19: "Creating API routes without authMiddleware"
- Document authentication flow and trusted origins

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 19:01:44 -07:00
Kelly
9aefb554bc fix: Correct Analytics V2 SQL queries for schema alignment
- Fix JOIN path: store_products -> dispensaries -> states (was incorrectly joining sp.state_id which doesn't exist)
- Fix column names to use *_raw suffixes (category_raw, brand_name_raw, name_raw)
- Fix row mappings to read correct column names from query results
- Add ::timestamp casts for interval arithmetic in StoreAnalyticsService

All Analytics V2 endpoints now work correctly:
- /state/legal-breakdown
- /state/recreational
- /category/all
- /category/rec-vs-med
- /state/:code/summary
- /store/:id/summary

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 18:52:57 -07:00
kelly
a4338669a9 Merge pull request 'fix(auth): Prioritize JWT token over trusted origin bypass' (#24) from fix/auth-token-priority into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/24
2025-12-11 01:34:10 +00:00
Kelly
1fa9ea496c fix(auth): Prioritize JWT token over trusted origin bypass
When a user logs in and has a Bearer token, use their actual identity
instead of falling back to internal@system. This ensures logged-in
users see their real email in the admin UI.

Order of auth:
1. If Bearer token provided → use JWT/API token (real user identity)
2. If no token → check trusted origins (for API access like WordPress)
3. Otherwise → 401 unauthorized

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 18:21:50 -07:00
kelly
31756a2233 Merge pull request 'chore: Add WordPress plugin v1.6.0 download files' (#23) from chore/wordpress-plugin-downloads into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/23
2025-12-11 00:40:53 +00:00
Kelly
166583621b chore: Add WordPress plugin v1.6.0 download files
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 17:23:25 -07:00
kelly
ca952c4674 Merge pull request 'fix(ci): Use YAML map format for docker-buildx build_args' (#21) from fix/ci-build-args-format into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/21
2025-12-10 23:54:33 +00:00
kelly
4054778b6c Merge pull request 'feat: Add wildcard support for trusted domains' (#20) from fix/trusted-origins-wildcards into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/20
2025-12-10 23:54:11 +00:00
Kelly
56a5f00015 fix(ci): Use YAML map format for docker-buildx build_args
The woodpeckerci/plugin-docker-buildx plugin expects build_args as a
YAML map (key: value), not a list. This was causing build args to not
be passed to the Docker build, resulting in unknown git SHA and build
info in the deployed application.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 16:42:05 -07:00
Kelly
a96d50c481 docs(wordpress): Add deprecation comments for legacy shortcode/migration code
Clarifies that crawlsy_* and dutchie_* shortcodes are deprecated aliases
for backward compatibility only. New implementations should use cannaiq_*.

Also documents the token migration logic that preserves old API tokens.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 16:24:56 -07:00
kelly
4806212f46 Merge pull request 'fix(ci): Use YAML list format for docker-buildx build_args' (#18) from fix/ci-build-args into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/18
2025-12-10 22:29:41 +00:00
kelly
2486f3c6b2 Merge pull request 'feat(analytics): Add Brand Intelligence API endpoint' (#19) from feat/brand-intelligence-api into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/19
2025-12-10 22:29:26 +00:00
Kelly
f25bebf6ee feat: Add wildcard support for trusted domains
Add *.cannaiq.co and *.cannabrands.app to trusted domains list.
Updated isTrustedDomain() to recognize *.domain.com as wildcard
that matches the base domain and any subdomain.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 15:29:23 -07:00
Kelly
22dad6d0fc feat: Add wildcard trusted origins for cannaiq.co and cannabrands.app
Add *.cannaiq.co and *.cannabrands.app patterns to both:
- auth/middleware.ts (admin routes)
- public-api.ts (consumer /api/v1/* routes)

This allows any subdomain of these domains to access the API without
requiring an API key.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 15:25:04 -07:00
Kelly
03eab66d35 chore: Bump backend version to 1.6.0
Harmonize backend version with WordPress plugin version so admin UI displays correct version.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 15:06:42 -07:00
Kelly
97b1ab23d8 fix(ci): Use YAML list format for docker-buildx build_args
The woodpecker docker-buildx plugin expects build_args as a YAML list,
not a comma-separated string. The previous format resulted in all args
being passed as a single malformed arg with "*=" prefix.

This fix ensures APP_GIT_SHA, APP_BUILD_TIME, etc. are properly passed
to the Dockerfile so the /api/version endpoint returns correct values.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 14:56:18 -07:00
Kelly
9fff0ba430 feat(analytics): Add Brand Intelligence API endpoint
New endpoint: GET /api/analytics/v2/brand/:name/intelligence

Returns comprehensive brand analytics payload including:
- Performance snapshot (active SKUs, revenue, stores, market share)
- Alerts (lost stores, delisted SKUs, competitor takeovers)
- SKU performance (velocity, status, stock levels)
- Retail footprint (penetration by region, whitespace opportunities)
- Competitive landscape (price positioning, head-to-head comparisons)
- Inventory health (days of stock, risk levels, overstock alerts)
- Promotion effectiveness (baseline vs promo velocity, lift, ROI)

Supports time windows (7d/30d/90d), state filtering, and category filtering.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 14:53:35 -07:00
kelly
7d3e91b2e6 Merge pull request 'feat(wordpress): Add new Elementor widgets and dynamic selectors v1.6.0' (#17) from feat/wordpress-widgets into master 2025-12-10 20:41:44 +00:00
Kelly
74957a9ec5 feat(wordpress): Add new Elementor widgets and dynamic selectors v1.6.0
New Widgets:
- Brand Grid: Display brands in a grid with product counts
- Category List: Show categories in grid/list/pills layouts
- Specials Grid: Display products on sale with discount badges

Enhanced Product Grid Widget:
- Dynamic category dropdown (fetches from API)
- Dynamic brand dropdown (fetches from API)
- "On Special Only" toggle filter

New Plugin Methods:
- fetch_categories() - Get categories from API
- fetch_brands() - Get brands from API
- fetch_specials() - Get products on sale
- get_category_options() - Cached options for Elementor
- get_brand_options() - Cached options for Elementor

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 13:41:17 -07:00
kelly
2d035c46cf Merge pull request 'fix: Findagram brands page crash and PWA icon errors' (#16) from fix/findagram-brands-crash into master 2025-12-10 20:11:40 +00:00
Kelly
53445fe72a fix: Findagram brands page crash and PWA icon errors
- Fix mapBrandForUI to use correct 'brand' field from API response
- Add null check in Brands.jsx filter to prevent crash on undefined names
- Fix BrandPenetrationService sps.brand_name -> sps.brand_name_raw
- Remove missing logo192.png and logo512.png from PWA manifest

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 13:06:23 -07:00
kelly
37cc8956c5 Merge pull request 'fix: Join states through dispensaries in BrandPenetrationService' (#15) from feat/ci-auto-merge into master 2025-12-10 19:36:06 +00:00
Kelly
197c82f921 fix: Join states through dispensaries in BrandPenetrationService
The store_products table doesn't have a state_id column - must join
through dispensaries to get state info. Also fixed column references
to use brand_name_raw and category_raw.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 12:18:10 -07:00
kelly
2c52493a9c Merge pull request 'fix(docker): Use npm install instead of npm ci for reliability' (#14) from feat/ci-auto-merge into master 2025-12-10 18:44:21 +00:00
Kelly
2ee2ba6b8c fix(docker): Use npm install instead of npm ci for reliability
npm ci can fail when package-lock.json has minor mismatches with
package.json. npm install is more forgiving and appropriate for
Docker builds where determinism is less critical than reliability.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-10 11:28:29 -07:00
kelly
bafcf1694a Merge pull request 'feat(analytics): Brand promotional history + specials fix + API key editing' (#13) from feat/ci-auto-merge into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/13
2025-12-10 18:12:59 +00:00
Kelly
95792aab15 feat(analytics): Brand promotional history + specials fix + API key editing
- Add brand promotional history endpoint (GET /api/analytics/v2/brand/:name/promotions)
  - Tracks when products go on special, duration, discounts, quantity sold estimates
  - Aggregates by category with frequency metrics (weekly/monthly)
- Add quantity changes endpoint (GET /api/analytics/v2/store/:id/quantity-changes)
  - Filter by direction (increase/decrease/all) for sales vs restock estimation
- Fix canonical-upsert to populate stock_quantity and total_quantity_available
- Add API key edit functionality in admin UI
  - Edit allowed domains and IPs
  - Display domains in list view

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-10 10:59:03 -07:00
kelly
38ae2c3a3e Merge pull request 'feat/ci-auto-merge' (#12) from feat/ci-auto-merge into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/12
2025-12-10 17:26:21 +00:00
Kelly
249d3c1b7f fix: Build args format for version info + schema-tolerant routes
CI/CD:
- Fix build_args format in woodpecker CI (comma-separated, not YAML list)
- This fixes "unknown" SHA/version showing on remote deployments

Backend schema-tolerant fixes (graceful fallbacks when tables missing):
- users.ts: Check which columns exist before querying
- worker-registry.ts: Return empty result if table doesn't exist
- task-service.ts: Add tableExists() helper, handle missing tables/views
- proxies.ts: Return totalProxies in test-all response

Frontend fixes:
- Proxies: Use total from response for accurate progress display
- SEO PagesTab: Dim Generate button when no AI provider active

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 09:53:21 -07:00
Kelly
9647f94f89 fix: Copy migrations folder to Docker image + fix SQL FILTER syntax
- Dockerfile: Add COPY migrations ./migrations so auto-migrate works on remote
- intelligence.ts: Fix FILTER clause placement in aggregate functions
  - FILTER must be inside AVG(), not wrapping ROUND()
  - Remove redundant FILTER on MIN (already filtered by WHERE)
  - Remove unsupported FILTER on PERCENTILE_CONT

These fixes resolve:
- "Failed to get task counts" (worker_tasks table missing)
- "FILTER specified but round is not an aggregate function" errors
- /national page "column m.state does not exist" (mv_state_metrics missing)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 09:38:05 -07:00
Kelly
afc288d2cf feat(ci): Auto-merge PRs after all type checks pass
Uses Gitea API to merge PR automatically when all typecheck jobs succeed.
Requires gitea_token secret in Woodpecker.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 09:27:26 -07:00
kelly
df01ce6aad Merge pull request 'feat: Auto-migrations on startup, worker exit location, proxy improvements' (#11) from feat/auto-migrations into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/11
2025-12-10 16:07:17 +00:00
Kelly
aea93bc96b fix(ci): Revert volume caching - may have broken CI trigger 2025-12-10 08:53:10 -07:00
Kelly
4e84f30f8b feat: Auto-retry tasks, 403 proxy rotation, task deletion
- Fix 403 handler to rotate BOTH proxy and fingerprint (was only fingerprint)
- Add auto-retry logic to task service (retry up to max_retries before failing)
- Add error tooltip on task status badge showing retry count and error message
- Add DELETE /api/tasks/:id endpoint (only for non-running tasks)
- Add delete button to JobQueue task table

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 08:41:14 -07:00
Kelly
b20a0a4fa5 fix: Add generic delete method to ApiClient + CI speedups
- Add delete<T>() method to ApiClient for WorkersDashboard cleanup
- Add npm cache volume for faster npm ci
- Add TypeScript incremental builds with tsBuildInfoFile cache
- Should significantly speed up repeated CI runs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 08:27:11 -07:00
Kelly
6eb1babc86 feat: Auto-migrations on startup, worker exit location, proxy improvements
- Add auto-migration system that runs SQL files from migrations/ on server startup
- Track applied migrations in schema_migrations table
- Show proxy exit location in Workers dashboard
- Add "Cleanup Stale" button to remove old workers
- Add remove button for individual workers
- Include proxy location (city, state, country) in worker heartbeats
- Update Proxy interface with location fields
- Re-enable bulk proxy import without ON CONFLICT

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 08:05:24 -07:00
kelly
9a9c2f76a2 Merge pull request 'feat: Stealth worker system with mandatory proxy rotation' (#10) from feat/stealth-worker-system into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/10
2025-12-10 08:13:42 +00:00
Kelly
56cc171287 feat: Stealth worker system with mandatory proxy rotation
## Worker System
- Role-agnostic workers that can handle any task type
- Pod-based architecture with StatefulSet (5-15 pods, 5 workers each)
- Custom pod names (Aethelgard, Xylos, Kryll, etc.)
- Worker registry with friendly names and resource monitoring
- Hub-and-spoke visualization on JobQueue page

## Stealth & Anti-Detection (REQUIRED)
- Proxies are MANDATORY - workers fail to start without active proxies
- CrawlRotator initializes on worker startup
- Loads proxies from `proxies` table
- Auto-rotates proxy + fingerprint on 403 errors
- 12 browser fingerprints (Chrome, Firefox, Safari, Edge)
- Locale/timezone matching for geographic consistency

## Task System
- Renamed product_resync → product_refresh
- Task chaining: store_discovery → entry_point → product_discovery
- Priority-based claiming with FOR UPDATE SKIP LOCKED
- Heartbeat and stale task recovery

## UI Updates
- JobQueue: Pod visualization, resource monitoring on hover
- WorkersDashboard: Simplified worker list
- Removed unused filters from task list

## Other
- IP2Location service for visitor analytics
- Findagram consumer features scaffolding
- Documentation updates

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 00:44:59 -07:00
Kelly
0295637ed6 fix: Public API column mappings and OOS detection
- Fix store_products column references (name_raw, brand_name_raw, category_raw)
- Fix v_product_snapshots column references (crawled_at, *_cents pricing)
- Fix dispensaries column references (zipcode, logo_image, remove hours/amenities)
- Add services and license_type to dispensary API response
- Add consecutive_misses OOS tracking to product-resync handler
- Add migration 075 for consecutive_misses column
- Add CRAWL_PIPELINE.md documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 20:44:53 -07:00
Kelly
9c6dd37316 fix(ci): Use YAML list format for docker-buildx build_args
The woodpecker docker-buildx plugin requires build_args as a YAML list,
not a comma-separated string. This fixes the build version/hash not being
passed to the Docker image.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 18:03:50 -07:00
kelly
524d13209a Merge pull request 'fix: Remove legacy imports from task handlers' (#9) from fix/task-handler-typescript-errors into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/9
2025-12-10 00:42:39 +00:00
Kelly
9199db3927 fix: Remove legacy imports from task handlers
- Remove non-existent DutchieClient import from product-resync and entry-point-discovery
- Remove non-existent DiscoveryCrawler import from store-discovery
- Use scrapeStore from scraper-v2 for product resync
- Use discoverState from discovery module for store discovery
- Fix Pool type by using getPool() instead of pool wrapper
- Update FullDiscoveryResult property access to use correct field names

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 17:25:19 -07:00
Kelly
a0652c7c73 fix(types): Fix TypeScript errors in TasksDashboard, Layout, and Users
- Fix TaskCounts type in api.ts to match TasksDashboard interface
- Make VersionInfo.version optional in Layout.tsx
- Fix boolean type in Users.tsx disabled prop

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 17:02:40 -07:00
Kelly
89c262ee20 feat(tasks): Add unified task-based worker architecture
Replace fragmented job systems (job_schedules, dispensary_crawl_jobs, SyncOrchestrator)
with a single unified task queue:

- Add worker_tasks table with atomic task claiming via SELECT FOR UPDATE SKIP LOCKED
- Add TaskService for CRUD, claiming, and capacity metrics
- Add TaskWorker with role-based handlers (resync, discovery, analytics)
- Add /api/tasks endpoints for management and migration from legacy systems
- Add TasksDashboard UI and integrate task counts into main dashboard
- Add comprehensive documentation

Task roles: store_discovery, entry_point_discovery, product_discovery, product_resync, analytics_refresh

Run workers with: WORKER_ROLE=product_resync npx tsx src/tasks/task-worker.ts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 16:27:03 -07:00
Kelly
7f9cf559cf fix(k8s): Update worker deployment to use v2 hydration worker
The old dutchie-az/services/worker.js no longer exists. Workers now use
the hydration pipeline at dist/scripts/run-hydration.js with --loop mode.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 15:01:18 -07:00
Kelly
bbe039c868 feat(api): Add job queue management endpoints and fix SQL type errors
- Add GET /api/job-queue/available - list dispensaries available for crawling
- Add GET /api/job-queue/history - get recent job history with results
- Add POST /api/job-queue/enqueue-batch - queue multiple dispensaries at once
- Add POST /api/job-queue/enqueue-state - queue all crawl-enabled dispensaries for a state
- Add POST /api/job-queue/clear-pending - clear pending jobs with optional filters
- Fix SQL parameter type errors by adding explicit casts ($2::text, $3::integer)
- Fix route ordering to prevent /:id from matching /available and /history

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 14:10:55 -07:00
Kelly
4e5c09a2a5 chore(dashboard): Remove DeployStatus block
Version info already shown in Layout sidebar header.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 14:10:22 -07:00
Kelly
7f65598332 feat(admin): Show version info at top of sidebar
- Add package.json version to /api/version endpoint
- Move version display from footer to top (next to logo)
- Show format: v1.5.1 (abc1234) - 12/9/2024

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 13:58:36 -07:00
Kelly
75315ed91e fix(ci): Use comma-separated build_args for docker-buildx plugin
The docker-buildx plugin expects build_args as a comma-separated string,
not a YAML list. This should fix the build_sha/build_time being null.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 13:56:37 -07:00
Kelly
7fe7d17b43 fix(consumer): Use relative API URLs for findadispo/findagram
The consumer frontends were hardcoded to use cannaiq.co as the API
URL, but each domain has its own /api path in the ingress that routes
to the shared backend. Using relative URLs allows each site to make
API calls to its own domain.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 13:38:10 -07:00
Kelly
7e517b5801 ci: Use self-hosted base images to avoid Docker Hub rate limits
Cached node:20, node:20-slim, and nginx:alpine to code.cannabrands.app.
No more Docker Hub dependency for builds.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 13:07:21 -07:00
Kelly
38ba9021d1 ci: Retry build (Docker Hub rate limit) 2025-12-09 12:58:36 -07:00
Kelly
ddebad48d3 ci: Remove auto-migrations from deploy step
Database was restored from backup - no migrations needed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 12:52:04 -07:00
Kelly
1cebf2e296 fix(health): Add build_sha and build_time to health endpoint
Reads APP_GIT_SHA and APP_BUILD_TIME env vars set during Docker build.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 12:22:52 -07:00
Kelly
1d6e67d837 feat(api): Add store metrics endpoints with localhost bypass
New public API v1 endpoints for third-party integrations:
- GET /api/v1/stores/:id/metrics - Store performance metrics
- GET /api/v1/stores/:id/product-metrics - Product-level price changes
- GET /api/v1/stores/:id/competitor-snapshot - Competitive intelligence

Also adds localhost IP bypass for local development testing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 12:14:13 -07:00
Kelly
cfb4b6e4ce fix(cannaiq): Fix TypeScript error in DeployStatus component
Properly destructure api.get response which returns { data: T }

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 12:08:29 -07:00
Kelly
f418c403d6 feat(auth): Add *.cannabrands.app to trusted origins whitelist
Adds pattern-based origin matching to support wildcard subdomains.
All *.cannabrands.app origins now bypass API key authentication.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 12:06:14 -07:00
Kelly
be4221af46 ci: Retrigger build 2025-12-09 11:54:16 -07:00
Kelly
ca07606b05 feat(k8s): Add Redis deployment for production
- Add k8s/redis.yaml with Redis 7 Alpine deployment
- Add REDIS_HOST and REDIS_PORT to configmap
- Redis configured with 200MB max memory and LRU eviction
- 1GB persistent volume for data persistence

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 11:40:11 -07:00
Kelly
baf1bf2eb7 fix(health): Require Redis in production, optional in local
Redis health check now returns error status when not configured in
production/staging environments, but remains optional in local dev.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 11:38:49 -07:00
Kelly
4ef3a8d72b fix(build): Fix TypeScript errors breaking CI build
- Add missing 'original' property to LocalImageSizes in brand logo download
- Remove test scripts with type errors (test-image-download.ts, test-stealth-with-db.ts)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 11:36:28 -07:00
Kelly
09dd756eff feat(admin): Add deploy status panel to dashboard
Shows running version vs latest git commit, pipeline status with steps,
and how many commits behind if not on latest. Uses Woodpecker and Gitea
APIs to fetch CI/CD information. Auto-refreshes every 30 seconds.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 11:26:41 -07:00
Kelly
ec8ef6210c ci: Run migrations inside K8s cluster after deploy
DB is internal to the cluster, so migrations must run via kubectl exec
into the scraper pod after deployment completes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 11:16:21 -07:00
Kelly
a9b7a4d7a9 ci: Add proper SQL migration runner with tracking
- Creates run-migrations.ts that reads migrations/*.sql files
- Tracks applied migrations in schema_migrations table by filename
- Handles existing version-based schema by adding filename column
- CI now runs migrations before deploy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 11:12:50 -07:00
Kelly
5119d5ccf9 ci: Add migration step before deploy
Migrations now run automatically after Docker builds but before K8s deploy.
Requires DATABASE_URL secret to be configured in Woodpecker.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 11:09:49 -07:00
Kelly
91efd1d03d feat(images): Add local image storage with on-demand resizing
- Store product images locally with hierarchy: /images/products/<state>/<store>/<brand>/<product>/
- Add /img/* proxy endpoint for on-demand resizing via Sharp
- Implement per-product image checking to skip existing downloads
- Fix pathToUrl() to correctly generate /images/... URLs
- Add frontend getImageUrl() helper with preset sizes (thumb, medium, large)
- Update all product pages to use optimized image URLs
- Add stealth session support for Dutchie GraphQL crawls
- Include test scripts for crawl and image verification

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 11:04:50 -07:00
Kelly
aa776226b0 fix(consumer): Wire findagram/findadispo to public API
- Update Dockerfiles to use cannaiq.co as API base URL
- Change findagram API client from /api/az to /api/v1 endpoints
- Add trusted origin bypass in public-api middleware for consumer sites
- Consumer sites (findagram.co, findadispo.com) can now access /api/v1
  endpoints without API key authentication

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 11:04:50 -07:00
kelly
e9435150e9 Merge pull request 'feature/wp-plugin-versioning-and-fixes' (#7) from feature/wp-plugin-versioning-and-fixes into master 2025-12-09 17:15:33 +00:00
Kelly
d399b966e6 ci: trigger build 2025-12-09 10:03:29 -07:00
Kelly
f5f0e25384 ci: trigger build 2025-12-09 10:03:06 -07:00
Kelly
04de33e5f7 fix(ci): Use correct container name 'worker' for scraper-worker deployment
Verified via kubectl: container name in cluster is 'worker', not 'scraper-worker'.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 09:43:39 -07:00
Kelly
37dfea25e1 feat: WordPress plugin versioning + heatmap fix + dynamic latest download
- Add VERSION file (1.5.4) for tracking WP plugin version
- Update plugin headers to 1.5.4 (cannaiq-menus.php, crawlsy-menus.php)
- Add dynamic /downloads/cannaiq-menus-latest.zip route that auto-redirects
  to highest version (no manual symlinks needed)
- Update frontend download links to use -latest.zip
- Fix StateHeatmap.tsx to parse API values as numbers (fixes string concat bug)
- Document versioning rules in CLAUDE.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 09:43:39 -07:00
Kelly
e2166bc25f fix(cannaiq): Parse heatmap values as numbers in frontend
Ensures values from the API are parsed as numbers before using them
in calculations. Fixes string concatenation bug in stats summary.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 09:43:39 -07:00
kelly
b5e8f039bf Merge pull request 'fix(backend): Parse bigint values in heatmap API response' (#6) from feature/seo-template-library-and-enhancements into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/6
2025-12-09 16:26:19 +00:00
Kelly
346e6d1cd8 perf(ci): Parallelize builds, typechecks on PRs only
- PRs: 4 parallel typechecks (~5 mins)
- Master: 4 parallel Docker builds + deploy (~10-15 mins)
- Total time reduced from ~2 hours to ~15-20 mins

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 09:08:12 -07:00
Kelly
be434d25e3 fix(backend): Round heatmap values to 2 decimal places
Prevents long decimal numbers like 37.805740635007325 from displaying
in the UI. Now shows clean values like 37.81.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 08:50:53 -07:00
Kelly
ecc201e9d4 fix(backend): Parse bigint values in heatmap API response
PostgreSQL returns bigint columns as strings. The heatmap API was
returning these raw strings, causing string concatenation instead
of numeric addition in the frontend when summing values.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 08:45:05 -07:00
kelly
67bfdf47a5 Merge pull request 'fix: Add missing type field and pass build args to CI' (#5) from feature/seo-template-library-and-enhancements into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/5
2025-12-09 15:41:57 +00:00
Kelly
3fa22a6ba1 fix: Add missing type field and pass build args to CI
- Add outOfStockProducts to StateMetrics interface
- Add onSpecialProducts to getStateSummary return
- Pass APP_GIT_SHA and other build args to docker build

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 07:44:38 -07:00
kelly
9f898f68db Merge pull request 'feat: SEO template library, discovery pipeline, and orchestrator enhancements' (#4) from feature/seo-template-library-and-enhancements into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/4
2025-12-09 08:13:11 +00:00
Kelly
f78b05360a fix(cannaiq): Fix TypeScript build errors in ApiClient and pages
- Add put() method to ApiClient class
- Update get() method to accept optional params config
- Fix formatDuration to accept undefined type in JobQueue
- Fix DiscoveryLocations API parameter (state -> stateCode)
- Fix stats display path in DiscoveryLocations

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 00:44:35 -07:00
Kelly
2f483b3084 feat: SEO template library, discovery pipeline, and orchestrator enhancements
## SEO Template Library
- Add complete template library with 7 page types (state, city, category, brand, product, search, regeneration)
- Add Template Library tab in SEO Orchestrator with accordion-based editors
- Add template preview, validation, and variable injection engine
- Add API endpoints: /api/seo/templates, preview, validate, generate, regenerate

## Discovery Pipeline
- Add promotion.ts for discovery location validation and promotion
- Add discover-all-states.ts script for multi-state discovery
- Add promotion log migration (067)
- Enhance discovery routes and types

## Orchestrator & Admin
- Add crawl_enabled filter to stores page
- Add API permissions page
- Add job queue management
- Add price analytics routes
- Add markets and intelligence routes
- Enhance dashboard and worker monitoring

## Infrastructure
- Add migrations for worker definitions, SEO settings, field alignment
- Add canonical pipeline for scraper v2
- Update hydration and sync orchestrator
- Enhance multi-state query service

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 00:05:34 -07:00
Kelly
9711d594db feat(orchestrator): Add crawl_enabled filter to stores page
- Backend: Filter stores by crawl_enabled (default: enabled only)
- API: Support crawl_enabled param in getOrchestratorStores
- UI: Add Enabled/Disabled/All filter toggle buttons
- UI: Show crawl status icon in stores table

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-08 14:18:28 -07:00
Kelly
39aebfcb82 fix: Static file paths and crawl_enabled API filters
- Fix static file paths for local development (./public/* instead of /app/public/*)
- Add crawl_enabled and dutchie_verified filters to /api/stores and /api/dispensaries
- Default API to return only enabled stores (crawl_enabled=true)
- Add ?crawl_enabled=false to show disabled, ?crawl_enabled=all to show all

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-08 14:07:17 -07:00
Kelly
5415cac2f3 feat(seo): Add SEO tables to migration and ingress config
- Add seo_pages and seo_page_contents tables to migrate.ts for
  automatic creation on deployment
- Update Home.tsx with minor formatting
- Add ingress configuration updates

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-08 12:58:38 -07:00
kelly
70d2364a6f Merge pull request 'feat: Rename WordPress plugin to CannaIQ Menus v1.5.3' (#3) from feature/cannaiq-menus-plugin-rename into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/3
2025-12-08 18:47:15 +00:00
Kelly
b1ab45f662 fix: Fix TypeScript errors for CI
- Add export {} to cli.ts and discover-and-import-store.ts to treat as modules
- Remove scrapeStore reference in crawler-jobs.ts (legacy scraper removed)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-08 10:45:47 -07:00
Kelly
20300edbb8 fix: Remove platform_id_source from harmonization INSERT
Production DB doesn't have this column - removing to allow creates.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-08 10:23:19 -07:00
Kelly
b7cfec0770 feat: AZ dispensary harmonization with Dutchie source of truth
Major changes:
- Add harmonize-az-dispensaries.ts script to sync dispensaries with Dutchie API
- Add migration 057 for crawl_enabled and dutchie_verified fields
- Remove legacy dutchie-az module (replaced by platforms/dutchie)
- Clean up deprecated crawlers, scrapers, and orchestrator code
- Update location-discovery to not fallback to slug when ID is missing
- Add crawl-rotator service for proxy rotation
- Add types/index.ts for shared type definitions
- Add woodpecker-agent k8s manifest

Harmonization script:
- Queries ConsumerDispensaries API for all 32 AZ cities
- Matches dispensaries by platform_dispensary_id (not slug)
- Updates existing records with full Dutchie data
- Creates new records for unmatched Dutchie dispensaries
- Disables dispensaries not found in Dutchie

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-08 10:19:49 -07:00
Kelly
948a732dd5 feat: Rename WordPress plugin to CannaIQ Menus v1.5.3
- Rename plugin from Crawlsy Menus to CannaIQ Menus
- Update version to 1.5.3
- Update text domain to cannaiq-menus
- Rename all CSS classes from crawlsy-* to cannaiq-*
- Update shortcodes to [cannaiq_products] and [cannaiq_product]
- Add backward compatibility for legacy shortcodes
- Update download links on Home and LandingPage
- Fix health panel Redis timeout issue
- Add clear error message when backend not running

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-08 00:24:43 -07:00
Kelly
bf4ceaf09e fix: Make all migrations idempotent
- 008: Add IF NOT EXISTS to ALTER TABLE ADD COLUMN
- 011: Add IF NOT EXISTS to CREATE TABLE and INDEX
- 012: Add IF NOT EXISTS, DROP TRIGGER IF EXISTS
- 013: Add ON CONFLICT (azdhs_id) DO NOTHING
- 014: Add IF NOT EXISTS to ALTER TABLE ADD COLUMN

All migrations can now be safely re-run without errors.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 23:48:35 -07:00
kelly
fda688b11a Merge pull request 'feat: Responsive UI, SEO pages, AI content generation' (#2) from feature/workers-dashboard into master 2025-12-08 06:43:36 +00:00
kelly
414b97b3c0 Merge pull request 'feature/workers-dashboard' (#1) from feature/workers-dashboard into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/1
2025-12-08 02:54:26 +00:00
519 changed files with 91528 additions and 40623 deletions

7
.gitignore vendored
View File

@@ -51,3 +51,10 @@ coverage/
*.tmp *.tmp
*.temp *.temp
llm-scraper/ llm-scraper/
# Claude Code
.claude/
# Test/debug scripts
backend/scripts/test-*.ts
backend/scripts/run-*.ts

195
.woodpecker.yml Normal file
View File

@@ -0,0 +1,195 @@
steps:
# ===========================================
# PR VALIDATION: Parallel type checks (PRs only)
# ===========================================
typecheck-backend:
image: node:22
commands:
- cd backend
- npm ci --prefer-offline
- npx tsc --noEmit
depends_on: []
when:
event: pull_request
typecheck-cannaiq:
image: node:22
commands:
- cd cannaiq
- npm ci --prefer-offline
- npx tsc --noEmit
depends_on: []
when:
event: pull_request
typecheck-findadispo:
image: node:22
commands:
- cd findadispo/frontend
- npm ci --prefer-offline
- npx tsc --noEmit 2>/dev/null || true
depends_on: []
when:
event: pull_request
typecheck-findagram:
image: node:22
commands:
- cd findagram/frontend
- npm ci --prefer-offline
- npx tsc --noEmit 2>/dev/null || true
depends_on: []
when:
event: pull_request
# ===========================================
# AUTO-MERGE: Merge PR after all checks pass
# ===========================================
auto-merge:
image: alpine:latest
environment:
GITEA_TOKEN:
from_secret: gitea_token
commands:
- apk add --no-cache curl
- |
echo "Merging PR #${CI_COMMIT_PULL_REQUEST}..."
curl -s -X POST \
-H "Authorization: token $GITEA_TOKEN" \
-H "Content-Type: application/json" \
-d '{"Do":"merge"}' \
"https://git.spdy.io/api/v1/repos/Creationshop/cannaiq/pulls/${CI_COMMIT_PULL_REQUEST}/merge"
depends_on:
- typecheck-backend
- typecheck-cannaiq
- typecheck-findadispo
- typecheck-findagram
when:
event: pull_request
# ===========================================
# DOCKER: Multi-stage builds with layer caching
# ===========================================
docker-backend:
image: gcr.io/kaniko-project/executor:debug
commands:
- /kaniko/executor
--context=/woodpecker/src/git.spdy.io/Creationshop/cannaiq/backend
--dockerfile=/woodpecker/src/git.spdy.io/Creationshop/cannaiq/backend/Dockerfile
--destination=10.100.9.70:5000/cannaiq/backend:latest
--destination=10.100.9.70:5000/cannaiq/backend:sha-${CI_COMMIT_SHA:0:8}
--build-arg=APP_BUILD_VERSION=sha-${CI_COMMIT_SHA:0:8}
--build-arg=APP_GIT_SHA=${CI_COMMIT_SHA}
--build-arg=APP_BUILD_TIME=${CI_PIPELINE_CREATED}
--registry-mirror=10.100.9.70:5000
--insecure-registry=10.100.9.70:5000
--cache=true
--cache-repo=10.100.9.70:5000/cannaiq/cache-backend
--cache-ttl=168h
depends_on: []
when:
branch: [master, develop]
event: push
docker-cannaiq:
image: gcr.io/kaniko-project/executor:debug
commands:
- /kaniko/executor
--context=/woodpecker/src/git.spdy.io/Creationshop/cannaiq/cannaiq
--dockerfile=/woodpecker/src/git.spdy.io/Creationshop/cannaiq/cannaiq/Dockerfile
--destination=10.100.9.70:5000/cannaiq/frontend:latest
--destination=10.100.9.70:5000/cannaiq/frontend:sha-${CI_COMMIT_SHA:0:8}
--registry-mirror=10.100.9.70:5000
--insecure-registry=10.100.9.70:5000
--cache=true
--cache-repo=10.100.9.70:5000/cannaiq/cache-cannaiq
--cache-ttl=168h
depends_on: []
when:
branch: [master, develop]
event: push
docker-findadispo:
image: gcr.io/kaniko-project/executor:debug
commands:
- /kaniko/executor
--context=/woodpecker/src/git.spdy.io/Creationshop/cannaiq/findadispo/frontend
--dockerfile=/woodpecker/src/git.spdy.io/Creationshop/cannaiq/findadispo/frontend/Dockerfile
--destination=10.100.9.70:5000/cannaiq/findadispo:latest
--destination=10.100.9.70:5000/cannaiq/findadispo:sha-${CI_COMMIT_SHA:0:8}
--registry-mirror=10.100.9.70:5000
--insecure-registry=10.100.9.70:5000
--cache=true
--cache-repo=10.100.9.70:5000/cannaiq/cache-findadispo
--cache-ttl=168h
depends_on: []
when:
branch: [master, develop]
event: push
docker-findagram:
image: gcr.io/kaniko-project/executor:debug
commands:
- /kaniko/executor
--context=/woodpecker/src/git.spdy.io/Creationshop/cannaiq/findagram/frontend
--dockerfile=/woodpecker/src/git.spdy.io/Creationshop/cannaiq/findagram/frontend/Dockerfile
--destination=10.100.9.70:5000/cannaiq/findagram:latest
--destination=10.100.9.70:5000/cannaiq/findagram:sha-${CI_COMMIT_SHA:0:8}
--registry-mirror=10.100.9.70:5000
--insecure-registry=10.100.9.70:5000
--cache=true
--cache-repo=10.100.9.70:5000/cannaiq/cache-findagram
--cache-ttl=168h
depends_on: []
when:
branch: [master, develop]
event: push
# ===========================================
# DEPLOY: Pull from local registry
# ===========================================
deploy:
image: bitnami/kubectl:latest
environment:
K8S_TOKEN:
from_secret: k8s_token
commands:
- mkdir -p ~/.kube
- |
cat > ~/.kube/config << KUBEEOF
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJkakNDQVIyZ0F3SUJBZ0lCQURBS0JnZ3Foa2pPUFFRREFqQWpNU0V3SHdZRFZRUUREQmhyTTNNdGMyVnkKZG1WeUxXTmhRREUzTmpVM05UUTNPRE13SGhjTk1qVXhNakUwTWpNeU5qSXpXaGNOTXpVeE1qRXlNak15TmpJegpXakFqTVNFd0h3WURWUVFEREJock0zTXRjMlZ5ZG1WeUxXTmhRREUzTmpVM05UUTNPRE13V1RBVEJnY3Foa2pPClBRSUJCZ2dxaGtqT1BRTUJCd05DQUFRWDRNdFJRTW5lWVJVV0s2cjZ3VEV2WjAxNnV4T3NUR3JJZ013TXVnNGwKajQ1bHZ6ZkM1WE1NY1pESnUxZ0t1dVJhVGxlb0xVOVJnSERIUUI4TUwzNTJvMEl3UURBT0JnTlZIUThCQWY4RQpCQU1DQXFRd0R3WURWUjBUQVFIL0JBVXdBd0VCL3pBZEJnTlZIUTRFRmdRVXIzNDZpNE42TFhzaEZsREhvSlU0CjJ1RjZseGN3Q2dZSUtvWkl6ajBFQXdJRFJ3QXdSQUlnVUtqdWRFQWJyS1JDVHROVXZTc1Rmb3FEaHFSeDM5MkYKTFFSVWlKK0hCVElDSUJqOFIxbG1zSnFSRkRHMEpwMGN4OG5ZZnFCaElRQzh6WWdRdTdBZmR4L3IKLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
server: https://10.100.6.10:6443
name: spdy-k3s
contexts:
- context:
cluster: spdy-k3s
namespace: cannaiq
user: cannaiq-admin
name: cannaiq
current-context: cannaiq
users:
- name: cannaiq-admin
user:
token: $K8S_TOKEN
KUBEEOF
- chmod 600 ~/.kube/config
- kubectl set image deployment/scraper scraper=10.100.9.70:5000/cannaiq/backend:sha-${CI_COMMIT_SHA:0:8} -n cannaiq
- kubectl rollout status deployment/scraper -n cannaiq --timeout=300s
- REPLICAS=$(kubectl get deployment scraper-worker -n cannaiq -o jsonpath='{.spec.replicas}'); if [ "$REPLICAS" = "0" ]; then kubectl scale deployment/scraper-worker --replicas=5 -n cannaiq; fi
- kubectl set image deployment/scraper-worker worker=10.100.9.70:5000/cannaiq/backend:sha-${CI_COMMIT_SHA:0:8} -n cannaiq
- kubectl set image deployment/cannaiq-frontend cannaiq-frontend=10.100.9.70:5000/cannaiq/frontend:sha-${CI_COMMIT_SHA:0:8} -n cannaiq
- kubectl set image deployment/findadispo-frontend findadispo-frontend=10.100.9.70:5000/cannaiq/findadispo:sha-${CI_COMMIT_SHA:0:8} -n cannaiq
- kubectl set image deployment/findagram-frontend findagram-frontend=10.100.9.70:5000/cannaiq/findagram:sha-${CI_COMMIT_SHA:0:8} -n cannaiq
- kubectl rollout status deployment/cannaiq-frontend -n cannaiq --timeout=120s
depends_on:
- docker-backend
- docker-cannaiq
- docker-findadispo
- docker-findagram
when:
branch: [master, develop]
event: push

View File

@@ -1,140 +0,0 @@
when:
- event: [push, pull_request]
steps:
# Build checks
typecheck-backend:
image: node:20
commands:
- cd backend
- npm ci
- npx tsc --noEmit || true
build-cannaiq:
image: node:20
commands:
- cd cannaiq
- npm ci
- npx tsc --noEmit
- npm run build
build-findadispo:
image: node:20
commands:
- cd findadispo/frontend
- npm ci
- npm run build
build-findagram:
image: node:20
commands:
- cd findagram/frontend
- npm ci
- npm run build
# Docker builds - only on master
docker-backend:
image: woodpeckerci/plugin-docker-buildx
settings:
registry: code.cannabrands.app
repo: code.cannabrands.app/creationshop/dispensary-scraper
tags:
- latest
- ${CI_COMMIT_SHA:0:8}
dockerfile: backend/Dockerfile
context: backend
username:
from_secret: registry_username
password:
from_secret: registry_password
platforms: linux/amd64
provenance: false
when:
branch: master
event: push
docker-cannaiq:
image: woodpeckerci/plugin-docker-buildx
settings:
registry: code.cannabrands.app
repo: code.cannabrands.app/creationshop/cannaiq-frontend
tags:
- latest
- ${CI_COMMIT_SHA:0:8}
dockerfile: cannaiq/Dockerfile
context: cannaiq
username:
from_secret: registry_username
password:
from_secret: registry_password
platforms: linux/amd64
provenance: false
when:
branch: master
event: push
docker-findadispo:
image: woodpeckerci/plugin-docker-buildx
settings:
registry: code.cannabrands.app
repo: code.cannabrands.app/creationshop/findadispo-frontend
tags:
- latest
- ${CI_COMMIT_SHA:0:8}
dockerfile: findadispo/frontend/Dockerfile
context: findadispo/frontend
username:
from_secret: registry_username
password:
from_secret: registry_password
platforms: linux/amd64
provenance: false
when:
branch: master
event: push
docker-findagram:
image: woodpeckerci/plugin-docker-buildx
settings:
registry: code.cannabrands.app
repo: code.cannabrands.app/creationshop/findagram-frontend
tags:
- latest
- ${CI_COMMIT_SHA:0:8}
dockerfile: findagram/frontend/Dockerfile
context: findagram/frontend
username:
from_secret: registry_username
password:
from_secret: registry_password
platforms: linux/amd64
provenance: false
when:
branch: master
event: push
# Deploy to Kubernetes
deploy:
image: bitnami/kubectl:latest
environment:
KUBECONFIG_CONTENT:
from_secret: kubeconfig_data
commands:
- echo "Deploying to Kubernetes..."
- mkdir -p ~/.kube
- echo "$KUBECONFIG_CONTENT" | tr -d '[:space:]' | base64 -d > ~/.kube/config
- chmod 600 ~/.kube/config
- kubectl set image deployment/scraper scraper=code.cannabrands.app/creationshop/dispensary-scraper:${CI_COMMIT_SHA:0:8} -n dispensary-scraper
- kubectl set image deployment/scraper-worker scraper-worker=code.cannabrands.app/creationshop/dispensary-scraper:${CI_COMMIT_SHA:0:8} -n dispensary-scraper
- kubectl set image deployment/cannaiq-frontend cannaiq-frontend=code.cannabrands.app/creationshop/cannaiq-frontend:${CI_COMMIT_SHA:0:8} -n dispensary-scraper
- kubectl set image deployment/findadispo-frontend findadispo-frontend=code.cannabrands.app/creationshop/findadispo-frontend:${CI_COMMIT_SHA:0:8} -n dispensary-scraper
- kubectl set image deployment/findagram-frontend findagram-frontend=code.cannabrands.app/creationshop/findagram-frontend:${CI_COMMIT_SHA:0:8} -n dispensary-scraper
- kubectl rollout status deployment/scraper -n dispensary-scraper --timeout=300s
- kubectl rollout status deployment/scraper-worker -n dispensary-scraper --timeout=300s
- kubectl rollout status deployment/cannaiq-frontend -n dispensary-scraper --timeout=120s
- kubectl rollout status deployment/findadispo-frontend -n dispensary-scraper --timeout=120s
- kubectl rollout status deployment/findagram-frontend -n dispensary-scraper --timeout=120s
- echo "All deployments complete!"
when:
branch: master
event: push

1283
CLAUDE.md

File diff suppressed because it is too large Load Diff

View File

@@ -1,30 +1,52 @@
# CannaiQ Backend Environment Configuration
# Copy this file to .env and fill in the values
# Server
PORT=3010 PORT=3010
NODE_ENV=development NODE_ENV=development
# ============================================================================= # =============================================================================
# CannaiQ Database (dutchie_menus) - PRIMARY DATABASE # CANNAIQ DATABASE (dutchie_menus) - PRIMARY DATABASE
# ============================================================================= # =============================================================================
# This is where all schema migrations run and where canonical tables live. # This is where ALL schema migrations run and where canonical tables live.
# All CANNAIQ_DB_* variables are REQUIRED - connection will fail if missing. # All CANNAIQ_DB_* variables are REQUIRED - no defaults.
# The application will fail to start if any are missing.
CANNAIQ_DB_HOST=localhost CANNAIQ_DB_HOST=localhost
CANNAIQ_DB_PORT=54320 CANNAIQ_DB_PORT=54320
CANNAIQ_DB_NAME=dutchie_menus CANNAIQ_DB_NAME=dutchie_menus # MUST be dutchie_menus - NOT dutchie_legacy
CANNAIQ_DB_USER=dutchie CANNAIQ_DB_USER=dutchie
CANNAIQ_DB_PASS=dutchie_local_pass CANNAIQ_DB_PASS=dutchie_local_pass
# Alternative: Use a full connection URL instead of individual vars
# If set, this takes priority over individual vars above
# CANNAIQ_DB_URL=postgresql://user:pass@host:port/dutchie_menus
# ============================================================================= # =============================================================================
# Legacy Database (dutchie_legacy) - READ-ONLY SOURCE # LEGACY DATABASE (dutchie_legacy) - READ-ONLY FOR ETL
# ============================================================================= # =============================================================================
# Used ONLY by ETL scripts to read historical data. # Used ONLY by ETL scripts to read historical data.
# NEVER run migrations against this database. # NEVER run migrations against this database.
# These are only needed when running 042_legacy_import.ts
LEGACY_DB_HOST=localhost LEGACY_DB_HOST=localhost
LEGACY_DB_PORT=54320 LEGACY_DB_PORT=54320
LEGACY_DB_NAME=dutchie_legacy LEGACY_DB_NAME=dutchie_legacy # READ-ONLY - never migrated
LEGACY_DB_USER=dutchie LEGACY_DB_USER=dutchie
LEGACY_DB_PASS=dutchie_local_pass LEGACY_DB_PASS=
# Local image storage (no MinIO per CLAUDE.md) # Alternative: Use a full connection URL instead of individual vars
# LEGACY_DB_URL=postgresql://user:pass@host:port/dutchie_legacy
# =============================================================================
# LOCAL STORAGE
# =============================================================================
# Local image storage path (no MinIO)
LOCAL_IMAGES_PATH=./public/images LOCAL_IMAGES_PATH=./public/images
# JWT # =============================================================================
# AUTHENTICATION
# =============================================================================
JWT_SECRET=your-secret-key-change-in-production JWT_SECRET=your-secret-key-change-in-production
ANTHROPIC_API_KEY=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
OPENAI_API_KEY=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

3
backend/.gitignore vendored Normal file
View File

@@ -0,0 +1,3 @@
# IP2Location database (downloaded separately)
data/ip2location/

View File

@@ -1,17 +1,33 @@
# Build stage # Build stage
# Image: code.cannabrands.app/creationshop/dispensary-scraper # Image: git.spdy.io/creationshop/dispensary-scraper
FROM node:20-slim AS builder FROM node:22-slim AS builder
# Install build tools for native modules (bcrypt, sharp)
RUN apt-get update && apt-get install -y \
python3 \
build-essential \
--no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app WORKDIR /app
COPY package*.json ./ COPY package*.json ./
RUN npm ci
# Install dependencies with retry and fallback registry
RUN npm config set fetch-retries 3 && \
npm config set fetch-retry-mintimeout 20000 && \
npm config set fetch-retry-maxtimeout 120000 && \
npm install || \
(npm config set registry https://registry.npmmirror.com && npm install)
COPY . . COPY . .
RUN npm run build RUN npm run build
# Prune dev dependencies for smaller production image
RUN npm prune --production
# Production stage # Production stage
FROM node:20-slim FROM node:22-slim
# Build arguments for version info # Build arguments for version info
ARG APP_BUILD_VERSION=dev ARG APP_BUILD_VERSION=dev
@@ -25,8 +41,9 @@ ENV APP_GIT_SHA=${APP_GIT_SHA}
ENV APP_BUILD_TIME=${APP_BUILD_TIME} ENV APP_BUILD_TIME=${APP_BUILD_TIME}
ENV CONTAINER_IMAGE_TAG=${CONTAINER_IMAGE_TAG} ENV CONTAINER_IMAGE_TAG=${CONTAINER_IMAGE_TAG}
# Install Chromium dependencies # Install Chromium dependencies and curl for HTTP requests
RUN apt-get update && apt-get install -y \ RUN apt-get update && apt-get install -y \
curl \
chromium \ chromium \
fonts-liberation \ fonts-liberation \
libnss3 \ libnss3 \
@@ -43,10 +60,12 @@ ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium
WORKDIR /app WORKDIR /app
COPY package*.json ./ COPY package*.json ./
RUN npm ci --omit=dev COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist COPY --from=builder /app/dist ./dist
# Copy migrations for auto-migrate on startup
COPY migrations ./migrations
# Create local images directory for when MinIO is not configured # Create local images directory for when MinIO is not configured
RUN mkdir -p /app/public/images/products RUN mkdir -p /app/public/images/products

View File

@@ -0,0 +1,268 @@
# CannaiQ Backend Codebase Map
**Last Updated:** 2025-12-12
**Purpose:** Help Claude and developers understand which code is current vs deprecated
---
## Quick Reference: What to Use
### For Crawling/Scraping
| Task | Use This | NOT This |
|------|----------|----------|
| Fetch products | `src/tasks/handlers/payload-fetch.ts` | `src/hydration/*` |
| Process products | `src/tasks/handlers/product-refresh.ts` | `src/scraper-v2/*` |
| GraphQL client | `src/platforms/dutchie/client.ts` | `src/dutchie-az/services/graphql-client.ts` |
| Worker system | `src/tasks/task-worker.ts` | `src/dutchie-az/services/worker.ts` |
### For Database
| Task | Use This | NOT This |
|------|----------|----------|
| Get DB pool | `src/db/pool.ts` | `src/dutchie-az/db/connection.ts` |
| Run migrations | `src/db/migrate.ts` (CLI only) | Never import at runtime |
| Query products | `store_products` table | `products`, `dutchie_products` |
| Query stores | `dispensaries` table | `stores` table |
### For Discovery
| Task | Use This |
|------|----------|
| Discover stores | `src/discovery/*.ts` |
| Run discovery | `npx tsx src/scripts/run-discovery.ts` |
---
## Directory Status
### ACTIVE DIRECTORIES (Use These)
```
src/
├── auth/ # JWT/session auth, middleware
├── db/ # Database pool, migrations
├── discovery/ # Dutchie store discovery pipeline
├── middleware/ # Express middleware
├── multi-state/ # Multi-state query support
├── platforms/ # Platform-specific clients (Dutchie, Jane, etc)
│ └── dutchie/ # THE Dutchie client - use this one
├── routes/ # Express API routes
├── services/ # Core services (logger, scheduler, etc)
├── tasks/ # Task system (workers, handlers, scheduler)
│ └── handlers/ # Task handlers (payload_fetch, product_refresh, etc)
├── types/ # TypeScript types
└── utils/ # Utilities (storage, image processing)
```
### DEPRECATED DIRECTORIES (DO NOT USE)
```
src/
├── hydration/ # DEPRECATED - Old pipeline approach
├── scraper-v2/ # DEPRECATED - Old scraper engine
├── canonical-hydration/# DEPRECATED - Merged into tasks/handlers
├── dutchie-az/ # PARTIAL - Some parts deprecated, some active
│ ├── db/ # DEPRECATED - Use src/db/pool.ts
│ └── services/ # PARTIAL - worker.ts still runs, graphql-client.ts deprecated
├── portals/ # FUTURE - Not yet implemented
├── seo/ # PARTIAL - Settings work, templates WIP
└── system/ # DEPRECATED - Old orchestration system
```
### DEPRECATED FILES (DO NOT USE)
```
src/dutchie-az/db/connection.ts # Use src/db/pool.ts instead
src/dutchie-az/services/graphql-client.ts # Use src/platforms/dutchie/client.ts
src/hydration/*.ts # Entire directory deprecated
src/scraper-v2/*.ts # Entire directory deprecated
```
---
## Key Files Reference
### Entry Points
| File | Purpose | Status |
|------|---------|--------|
| `src/index.ts` | Main Express server | ACTIVE |
| `src/dutchie-az/services/worker.ts` | Worker process entry | ACTIVE |
| `src/tasks/task-worker.ts` | Task worker (new system) | ACTIVE |
### Dutchie Integration
| File | Purpose | Status |
|------|---------|--------|
| `src/platforms/dutchie/client.ts` | GraphQL client, hashes, curl | **PRIMARY** |
| `src/platforms/dutchie/queries.ts` | High-level query functions | ACTIVE |
| `src/platforms/dutchie/index.ts` | Re-exports | ACTIVE |
### Task Handlers
| File | Purpose | Status |
|------|---------|--------|
| `src/tasks/handlers/payload-fetch.ts` | Fetch products from Dutchie | **PRIMARY** |
| `src/tasks/handlers/product-refresh.ts` | Process payload into DB | **PRIMARY** |
| `src/tasks/handlers/entry-point-discovery.ts` | Resolve platform IDs (auto-healing) | **PRIMARY** |
| `src/tasks/handlers/menu-detection.ts` | Detect menu type | ACTIVE |
| `src/tasks/handlers/id-resolution.ts` | Resolve platform IDs (legacy) | LEGACY |
| `src/tasks/handlers/image-download.ts` | Download product images | ACTIVE |
---
## Transport Rules (CRITICAL)
**Browser-based (Puppeteer) is the DEFAULT transport. curl is ONLY allowed when explicitly specified.**
### Transport Selection
| `task.method` | Transport Used | Notes |
|---------------|----------------|-------|
| `null` | Browser (Puppeteer) | DEFAULT - use this for most tasks |
| `'http'` | Browser (Puppeteer) | Explicit browser request |
| `'curl'` | curl-impersonate | ONLY when explicitly needed |
### Why Browser-First?
1. **Anti-detection**: Puppeteer with StealthPlugin evades bot detection
2. **Session cookies**: Browser maintains session state automatically
3. **Fingerprinting**: Real browser fingerprint (TLS, headers, etc.)
4. **Age gates**: Browser can click through age verification
### Entry Point Discovery Auto-Healing
The `entry_point_discovery` handler uses a healing strategy:
```
1. FIRST: Check dutchie_discovery_locations for existing platform_location_id
- By linked dutchie_discovery_id
- By slug match in discovery data
→ If found, NO network call needed
2. SECOND: Browser-based GraphQL (Puppeteer)
- 5x retries for network/proxy failures
- On HTTP 403: rotate proxy and retry
- On HTTP 404 after 2 attempts: mark as 'removed'
3. HARD FAILURE: After exhausting options → 'needs_investigation'
```
### DO NOT Use curl Unless:
- Task explicitly has `method = 'curl'`
- You're testing curl-impersonate binaries
- The API explicitly requires curl fingerprinting
### Files
| File | Transport | Purpose |
|------|-----------|---------|
| `src/services/puppeteer-preflight.ts` | Browser | Preflight check |
| `src/services/curl-preflight.ts` | curl | Preflight check |
| `src/tasks/handlers/entry-point-discovery.ts` | Browser | Platform ID resolution |
| `src/tasks/handlers/payload-fetch.ts` | Both | Product fetching |
### Database
| File | Purpose | Status |
|------|---------|--------|
| `src/db/pool.ts` | Canonical DB pool | **PRIMARY** |
| `src/db/migrate.ts` | Migration runner (CLI only) | CLI ONLY |
| `src/db/auto-migrate.ts` | Auto-run migrations on startup | ACTIVE |
### Configuration
| File | Purpose | Status |
|------|---------|--------|
| `.env` | Environment variables | ACTIVE |
| `package.json` | Dependencies | ACTIVE |
| `tsconfig.json` | TypeScript config | ACTIVE |
---
## GraphQL Hashes (CRITICAL)
The correct hashes are in `src/platforms/dutchie/client.ts`:
```typescript
export const GRAPHQL_HASHES = {
FilteredProducts: 'ee29c060826dc41c527e470e9ae502c9b2c169720faa0a9f5d25e1b9a530a4a0',
GetAddressBasedDispensaryData: '13461f73abf7268770dfd05fe7e10c523084b2bb916a929c08efe3d87531977b',
ConsumerDispensaries: '0a5bfa6ca1d64ae47bcccb7c8077c87147cbc4e6982c17ceec97a2a4948b311b',
GetAllCitiesByState: 'ae547a0466ace5a48f91e55bf6699eacd87e3a42841560f0c0eabed5a0a920e6',
};
```
**ALWAYS** use `Status: 'Active'` for FilteredProducts (not `null` or `'All'`).
---
## Scripts Reference
### Useful Scripts (in `src/scripts/`)
| Script | Purpose |
|--------|---------|
| `run-discovery.ts` | Run Dutchie discovery |
| `crawl-single-store.ts` | Test crawl a single store |
| `test-dutchie-graphql.ts` | Test GraphQL queries |
### One-Off Scripts (probably don't need)
| Script | Purpose |
|--------|---------|
| `harmonize-az-dispensaries.ts` | One-time data cleanup |
| `bootstrap-stores-for-dispensaries.ts` | One-time migration |
| `backfill-*.ts` | Historical backfill scripts |
---
## API Routes
### Active Routes (in `src/routes/`)
| Route File | Mount Point | Purpose |
|------------|-------------|---------|
| `auth.ts` | `/api/auth` | Login/logout/session |
| `stores.ts` | `/api/stores` | Store CRUD |
| `dashboard.ts` | `/api/dashboard` | Dashboard stats |
| `workers.ts` | `/api/workers` | Worker monitoring |
| `pipeline.ts` | `/api/pipeline` | Crawl triggers |
| `discovery.ts` | `/api/discovery` | Discovery management |
| `analytics.ts` | `/api/analytics` | Analytics queries |
| `wordpress.ts` | `/api/v1/wordpress` | WordPress plugin API |
---
## Documentation Files
### Current Docs (in `backend/docs/`)
| Doc | Purpose | Currency |
|-----|---------|----------|
| `TASK_WORKFLOW_2024-12-10.md` | Task system architecture | CURRENT |
| `WORKER_TASK_ARCHITECTURE.md` | Worker/task design | CURRENT |
| `CRAWL_PIPELINE.md` | Crawl pipeline overview | CURRENT |
| `ORGANIC_SCRAPING_GUIDE.md` | Browser-based scraping | CURRENT |
| `CODEBASE_MAP.md` | This file | CURRENT |
| `ANALYTICS_V2_EXAMPLES.md` | Analytics API examples | CURRENT |
| `BRAND_INTELLIGENCE_API.md` | Brand API docs | CURRENT |
### Root Docs
| Doc | Purpose | Currency |
|-----|---------|----------|
| `CLAUDE.md` | Claude instructions | **PRIMARY** |
| `README.md` | Project overview | NEEDS UPDATE |
---
## Common Mistakes to Avoid
1. **Don't use `src/hydration/`** - It's an old approach that was superseded by the task system
2. **Don't use `src/dutchie-az/db/connection.ts`** - Use `src/db/pool.ts` instead
3. **Don't import `src/db/migrate.ts` at runtime** - It will crash. Only use for CLI migrations.
4. **Don't query `stores` table** - It's empty. Use `dispensaries`.
5. **Don't query `products` table** - It's empty. Use `store_products`.
6. **Don't use wrong GraphQL hash** - Always get hash from `GRAPHQL_HASHES` in client.ts
7. **Don't use `Status: null`** - It returns 0 products. Use `Status: 'Active'`.
---
## When in Doubt
1. Check if the file is imported in `src/index.ts` - if not, it may be deprecated
2. Check the last modified date - older files may be stale
3. Look for `DEPRECATED` comments in the code
4. Ask: "Is there a newer version of this in `src/tasks/` or `src/platforms/`?"
5. Read the relevant doc in `docs/` before modifying code

343
backend/docs/QUERY_API.md Normal file
View File

@@ -0,0 +1,343 @@
# CannaiQ Query API
Query raw crawl payload data with flexible filters, sorting, and aggregation.
## Base URL
```
https://cannaiq.co/api/payloads
```
## Authentication
Include your API key in the header:
```
X-API-Key: your-api-key
```
---
## Endpoints
### 1. Query Products
Filter and search products from a store's latest crawl data.
```
GET /api/payloads/store/{dispensaryId}/query
```
#### Query Parameters
| Parameter | Type | Description |
|-----------|------|-------------|
| `brand` | string | Filter by brand name (partial match) |
| `category` | string | Filter by category (flower, vape, edible, etc.) |
| `subcategory` | string | Filter by subcategory |
| `strain_type` | string | Filter by strain (indica, sativa, hybrid, cbd) |
| `in_stock` | boolean | Filter by stock status (true/false) |
| `price_min` | number | Minimum price |
| `price_max` | number | Maximum price |
| `thc_min` | number | Minimum THC percentage |
| `thc_max` | number | Maximum THC percentage |
| `search` | string | Search product name (partial match) |
| `fields` | string | Comma-separated fields to return |
| `limit` | number | Max results (default 100, max 1000) |
| `offset` | number | Skip results for pagination |
| `sort` | string | Sort by: name, price, thc, brand |
| `order` | string | Sort order: asc, desc |
#### Available Fields
When using `fields` parameter, you can request:
- `id` - Product ID
- `name` - Product name
- `brand` - Brand name
- `category` - Product category
- `subcategory` - Product subcategory
- `strain_type` - Indica/Sativa/Hybrid/CBD
- `price` - Current price
- `price_med` - Medical price
- `price_rec` - Recreational price
- `thc` - THC percentage
- `cbd` - CBD percentage
- `weight` - Product weight/size
- `status` - Stock status
- `in_stock` - Boolean in-stock flag
- `image_url` - Product image
- `description` - Product description
#### Examples
**Get all flower products under $40:**
```
GET /api/payloads/store/112/query?category=flower&price_max=40
```
**Search for "Blue Dream" with high THC:**
```
GET /api/payloads/store/112/query?search=blue+dream&thc_min=20
```
**Get only name and price for Alien Labs products:**
```
GET /api/payloads/store/112/query?brand=Alien+Labs&fields=name,price,thc
```
**Get top 10 highest THC products:**
```
GET /api/payloads/store/112/query?sort=thc&order=desc&limit=10
```
**Paginate through in-stock products:**
```
GET /api/payloads/store/112/query?in_stock=true&limit=50&offset=0
GET /api/payloads/store/112/query?in_stock=true&limit=50&offset=50
```
#### Response
```json
{
"success": true,
"dispensaryId": 112,
"payloadId": 45,
"fetchedAt": "2025-12-11T10:30:00Z",
"query": {
"filters": {
"brand": "Alien Labs",
"category": null,
"price_max": null
},
"sort": "price",
"order": "asc",
"limit": 100,
"offset": 0
},
"pagination": {
"total": 15,
"returned": 15,
"limit": 100,
"offset": 0,
"has_more": false
},
"products": [
{
"id": "507f1f77bcf86cd799439011",
"name": "Alien Labs - Baklava 3.5g",
"brand": "Alien Labs",
"category": "flower",
"strain_type": "hybrid",
"price": 55,
"thc": "28.5",
"in_stock": true
}
]
}
```
---
### 2. Aggregate Data
Group products and calculate metrics.
```
GET /api/payloads/store/{dispensaryId}/aggregate
```
#### Query Parameters
| Parameter | Type | Description |
|-----------|------|-------------|
| `group_by` | string | **Required.** Field to group by: brand, category, subcategory, strain_type |
| `metrics` | string | Comma-separated metrics (default: count) |
#### Available Metrics
- `count` - Number of products
- `avg_price` - Average price
- `min_price` - Lowest price
- `max_price` - Highest price
- `avg_thc` - Average THC percentage
- `in_stock_count` - Number of in-stock products
#### Examples
**Count products by brand:**
```
GET /api/payloads/store/112/aggregate?group_by=brand
```
**Get price stats by category:**
```
GET /api/payloads/store/112/aggregate?group_by=category&metrics=count,avg_price,min_price,max_price
```
**Get THC averages by strain type:**
```
GET /api/payloads/store/112/aggregate?group_by=strain_type&metrics=count,avg_thc
```
**Brand analysis with stock info:**
```
GET /api/payloads/store/112/aggregate?group_by=brand&metrics=count,avg_price,in_stock_count
```
#### Response
```json
{
"success": true,
"dispensaryId": 112,
"payloadId": 45,
"fetchedAt": "2025-12-11T10:30:00Z",
"groupBy": "brand",
"metrics": ["count", "avg_price"],
"totalProducts": 450,
"groupCount": 85,
"aggregations": [
{
"brand": "Alien Labs",
"count": 15,
"avg_price": 52.33
},
{
"brand": "Connected",
"count": 12,
"avg_price": 48.50
}
]
}
```
---
### 3. Compare Stores (Price Comparison)
Query the same data from multiple stores and compare in your app:
```javascript
// Get flower prices from Store A
const storeA = await fetch('/api/payloads/store/112/query?category=flower&fields=name,brand,price');
// Get flower prices from Store B
const storeB = await fetch('/api/payloads/store/115/query?category=flower&fields=name,brand,price');
// Compare in your app
const dataA = await storeA.json();
const dataB = await storeB.json();
// Find matching products and compare prices
```
---
### 4. Price History
For historical price data, use the snapshots endpoint:
```
GET /api/v1/products/{productId}/history?days=30
```
Or compare payloads over time:
```
GET /api/payloads/store/{dispensaryId}/diff?from={payloadId1}&to={payloadId2}
```
The diff endpoint shows:
- Products added
- Products removed
- Price changes
- Stock changes
---
### 5. List Stores
Get available dispensaries to query:
```
GET /api/stores
```
Returns all stores with their IDs, names, and locations.
---
## Use Cases
### Price Comparison App
```javascript
// 1. Get stores in Arizona
const stores = await fetch('/api/stores?state=AZ').then(r => r.json());
// 2. Query flower prices from each store
const prices = await Promise.all(
stores.map(store =>
fetch(`/api/payloads/store/${store.id}/query?category=flower&fields=name,brand,price`)
.then(r => r.json())
)
);
// 3. Build comparison matrix in your app
```
### Brand Analytics Dashboard
```javascript
// Get brand presence across stores
const brandData = await Promise.all(
storeIds.map(id =>
fetch(`/api/payloads/store/${id}/aggregate?group_by=brand&metrics=count,avg_price`)
.then(r => r.json())
)
);
// Aggregate brand presence across all stores
```
### Deal Finder
```javascript
// Find high-THC flower under $30
const deals = await fetch(
'/api/payloads/store/112/query?category=flower&price_max=30&thc_min=20&in_stock=true&sort=thc&order=desc'
).then(r => r.json());
```
### Inventory Tracker
```javascript
// Get products that went out of stock
const diff = await fetch('/api/payloads/store/112/diff').then(r => r.json());
const outOfStock = diff.details.stockChanges.filter(
p => p.newStatus !== 'Active'
);
```
---
## Rate Limits
- Default: 100 requests/minute per API key
- Contact support for higher limits
## Error Responses
```json
{
"success": false,
"error": "Error message here"
}
```
Common errors:
- `404` - Store or payload not found
- `400` - Missing required parameter
- `401` - Invalid or missing API key
- `429` - Rate limit exceeded

View File

@@ -0,0 +1,394 @@
# Brand Intelligence API
## Endpoint
```
GET /api/analytics/v2/brand/:name/intelligence
```
## Query Parameters
| Param | Type | Default | Description |
|-------|------|---------|-------------|
| `window` | `7d\|30d\|90d` | `30d` | Time window for trend calculations |
| `state` | string | - | Filter by state code (e.g., `AZ`) |
| `category` | string | - | Filter by category (e.g., `Flower`) |
## Response Payload Schema
```typescript
interface BrandIntelligenceResult {
brand_name: string;
window: '7d' | '30d' | '90d';
generated_at: string; // ISO timestamp when data was computed
performance_snapshot: PerformanceSnapshot;
alerts: Alerts;
sku_performance: SkuPerformance[];
retail_footprint: RetailFootprint;
competitive_landscape: CompetitiveLandscape;
inventory_health: InventoryHealth;
promo_performance: PromoPerformance;
}
```
---
## Section 1: Performance Snapshot
Summary cards with key brand metrics.
```typescript
interface PerformanceSnapshot {
active_skus: number; // Total products in catalog
total_revenue_30d: number | null; // Estimated from qty × price
total_stores: number; // Active retail partners
new_stores_30d: number; // New distribution in window
market_share: number | null; // % of category SKUs
avg_wholesale_price: number | null;
price_position: 'premium' | 'value' | 'competitive';
}
```
**UI Label Mapping:**
| Field | User-Facing Label | Helper Text |
|-------|-------------------|-------------|
| `active_skus` | Active Products | X total in catalog |
| `total_revenue_30d` | Monthly Revenue | Estimated from sales |
| `total_stores` | Retail Distribution | Active retail partners |
| `new_stores_30d` | New Opportunities | X new in last 30 days |
| `market_share` | Category Position | % of category |
| `avg_wholesale_price` | Avg Wholesale | Per unit |
| `price_position` | Pricing Tier | Premium/Value/Market Rate |
---
## Section 2: Alerts
Issues requiring attention.
```typescript
interface Alerts {
lost_stores_30d_count: number;
lost_skus_30d_count: number;
competitor_takeover_count: number;
avg_oos_duration_days: number | null;
avg_reorder_lag_days: number | null;
items: AlertItem[];
}
interface AlertItem {
type: 'lost_store' | 'delisted_sku' | 'shelf_loss' | 'extended_oos';
severity: 'critical' | 'warning';
store_name?: string;
product_name?: string;
competitor_brand?: string;
days_since?: number;
state_code?: string;
}
```
**UI Label Mapping:**
| Field | User-Facing Label |
|-------|-------------------|
| `lost_stores_30d_count` | Accounts at Risk |
| `lost_skus_30d_count` | Delisted SKUs |
| `competitor_takeover_count` | Shelf Losses |
| `avg_oos_duration_days` | Avg Stockout Length |
| `avg_reorder_lag_days` | Avg Restock Time |
| `severity: critical` | Urgent |
| `severity: warning` | Watch |
---
## Section 3: SKU Performance (Product Velocity)
How fast each SKU sells.
```typescript
interface SkuPerformance {
store_product_id: number;
product_name: string;
category: string | null;
daily_velocity: number; // Units/day estimate
velocity_status: 'hot' | 'steady' | 'slow' | 'stale';
retail_price: number | null;
on_sale: boolean;
stores_carrying: number;
stock_status: 'in_stock' | 'low_stock' | 'out_of_stock';
}
```
**UI Label Mapping:**
| Field | User-Facing Label |
|-------|-------------------|
| `daily_velocity` | Daily Rate |
| `velocity_status` | Momentum |
| `velocity_status: hot` | Hot |
| `velocity_status: steady` | Steady |
| `velocity_status: slow` | Slow |
| `velocity_status: stale` | Stale |
| `retail_price` | Retail Price |
| `on_sale` | Promo (badge) |
**Velocity Thresholds:**
- `hot`: >= 5 units/day
- `steady`: >= 1 unit/day
- `slow`: >= 0.1 units/day
- `stale`: < 0.1 units/day
---
## Section 4: Retail Footprint
Store placement and coverage.
```typescript
interface RetailFootprint {
total_stores: number;
in_stock_count: number;
out_of_stock_count: number;
penetration_by_region: RegionPenetration[];
whitespace_stores: WhitespaceStore[];
}
interface RegionPenetration {
state_code: string;
store_count: number;
percent_reached: number; // % of state's dispensaries
in_stock: number;
out_of_stock: number;
}
interface WhitespaceStore {
store_id: number;
store_name: string;
state_code: string;
city: string | null;
category_fit: number; // How many competing brands they carry
competitor_brands: string[];
}
```
**UI Label Mapping:**
| Field | User-Facing Label |
|-------|-------------------|
| `penetration_by_region` | Market Coverage by Region |
| `percent_reached` | X% reached |
| `in_stock` | X stocked |
| `out_of_stock` | X out |
| `whitespace_stores` | Expansion Opportunities |
| `category_fit` | X fit |
---
## Section 5: Competitive Landscape
Market positioning vs competitors.
```typescript
interface CompetitiveLandscape {
brand_price_position: 'premium' | 'value' | 'competitive';
market_share_trend: MarketSharePoint[];
competitors: Competitor[];
head_to_head_skus: HeadToHead[];
}
interface MarketSharePoint {
date: string;
share_percent: number;
}
interface Competitor {
brand_name: string;
store_overlap_percent: number;
price_position: 'premium' | 'value' | 'competitive';
avg_price: number | null;
sku_count: number;
}
interface HeadToHead {
product_name: string;
brand_price: number;
competitor_brand: string;
competitor_price: number;
price_diff_percent: number;
}
```
**UI Label Mapping:**
| Field | User-Facing Label |
|-------|-------------------|
| `price_position: premium` | Premium Tier |
| `price_position: value` | Value Leader |
| `price_position: competitive` | Market Rate |
| `market_share_trend` | Share of Shelf Trend |
| `head_to_head_skus` | Price Comparison |
| `store_overlap_percent` | X% store overlap |
---
## Section 6: Inventory Health
Stock projections and risk levels.
```typescript
interface InventoryHealth {
critical_count: number; // <7 days stock
warning_count: number; // 7-14 days stock
healthy_count: number; // 14-90 days stock
overstocked_count: number; // >90 days stock
skus: InventorySku[];
overstock_alert: OverstockItem[];
}
interface InventorySku {
store_product_id: number;
product_name: string;
store_name: string;
days_of_stock: number | null;
risk_level: 'critical' | 'elevated' | 'moderate' | 'healthy';
current_quantity: number | null;
daily_sell_rate: number | null;
}
interface OverstockItem {
product_name: string;
store_name: string;
excess_units: number;
days_of_stock: number;
}
```
**UI Label Mapping:**
| Field | User-Facing Label |
|-------|-------------------|
| `risk_level: critical` | Reorder Now |
| `risk_level: elevated` | Low Stock |
| `risk_level: moderate` | Monitor |
| `risk_level: healthy` | Healthy |
| `critical_count` | Urgent (<7 days) |
| `warning_count` | Low (7-14 days) |
| `overstocked_count` | Excess (>90 days) |
| `days_of_stock` | X days remaining |
| `overstock_alert` | Overstock Alert |
| `excess_units` | X excess units |
---
## Section 7: Promotion Effectiveness
How promotions impact sales.
```typescript
interface PromoPerformance {
avg_baseline_velocity: number | null;
avg_promo_velocity: number | null;
avg_velocity_lift: number | null; // % increase during promo
avg_efficiency_score: number | null; // ROI proxy
promotions: Promotion[];
}
interface Promotion {
product_name: string;
store_name: string;
status: 'active' | 'scheduled' | 'ended';
start_date: string;
end_date: string | null;
regular_price: number;
promo_price: number;
discount_percent: number;
baseline_velocity: number | null;
promo_velocity: number | null;
velocity_lift: number | null;
efficiency_score: number | null;
}
```
**UI Label Mapping:**
| Field | User-Facing Label |
|-------|-------------------|
| `avg_baseline_velocity` | Normal Rate |
| `avg_promo_velocity` | During Promos |
| `avg_velocity_lift` | Avg Sales Lift |
| `avg_efficiency_score` | ROI Score |
| `velocity_lift` | Sales Lift |
| `efficiency_score` | ROI Score |
| `status: active` | Live |
| `status: scheduled` | Scheduled |
| `status: ended` | Ended |
---
## Example Queries
### Get full payload
```javascript
const response = await fetch('/api/analytics/v2/brand/Wyld/intelligence?window=30d');
const data = await response.json();
```
### Extract summary cards (flattened)
```javascript
const { performance_snapshot: ps, alerts } = data;
const summaryCards = {
activeProducts: ps.active_skus,
monthlyRevenue: ps.total_revenue_30d,
retailDistribution: ps.total_stores,
newOpportunities: ps.new_stores_30d,
categoryPosition: ps.market_share,
avgWholesale: ps.avg_wholesale_price,
pricingTier: ps.price_position,
accountsAtRisk: alerts.lost_stores_30d_count,
delistedSkus: alerts.lost_skus_30d_count,
shelfLosses: alerts.competitor_takeover_count,
};
```
### Get top 10 fastest selling SKUs
```javascript
const topSkus = data.sku_performance
.filter(sku => sku.velocity_status === 'hot' || sku.velocity_status === 'steady')
.sort((a, b) => b.daily_velocity - a.daily_velocity)
.slice(0, 10);
```
### Get critical inventory alerts only
```javascript
const criticalInventory = data.inventory_health.skus
.filter(sku => sku.risk_level === 'critical');
```
### Get states with <50% penetration
```javascript
const underPenetrated = data.retail_footprint.penetration_by_region
.filter(region => region.percent_reached < 50)
.sort((a, b) => a.percent_reached - b.percent_reached);
```
### Get active promotions with positive lift
```javascript
const effectivePromos = data.promo_performance.promotions
.filter(p => p.status === 'active' && p.velocity_lift > 0)
.sort((a, b) => b.velocity_lift - a.velocity_lift);
```
### Build chart data for market share trend
```javascript
const chartData = data.competitive_landscape.market_share_trend.map(point => ({
x: new Date(point.date),
y: point.share_percent,
}));
```
---
## Notes for Frontend Implementation
1. **All fields are snake_case** - transform to camelCase if needed
2. **Null values are possible** - handle gracefully in UI
3. **Arrays may be empty** - show appropriate empty states
4. **Timestamps are ISO format** - parse with `new Date()`
5. **Percentages are already computed** - no need to multiply by 100
6. **The `window` parameter affects trend calculations** - 7d/30d/90d

View File

@@ -0,0 +1,539 @@
# Crawl Pipeline Documentation
## Overview
The crawl pipeline fetches product data from Dutchie dispensary menus and stores it in the canonical database. This document covers the complete flow from task scheduling to data storage.
---
## Pipeline Stages
```
┌─────────────────────┐
│ store_discovery │ Find new dispensaries
└─────────┬───────────┘
┌─────────────────────┐
│ entry_point_discovery│ Resolve slug → platform_dispensary_id
└─────────┬───────────┘
┌─────────────────────┐
│ product_discovery │ Initial product crawl
└─────────┬───────────┘
┌─────────────────────┐
│ product_resync │ Recurring crawl (every 4 hours)
└─────────────────────┘
```
---
## Stage Details
### 1. Store Discovery
**Purpose:** Find new dispensaries to crawl
**Handler:** `src/tasks/handlers/store-discovery.ts`
**Flow:**
1. Query Dutchie `ConsumerDispensaries` GraphQL for cities/states
2. Extract dispensary info (name, address, menu_url)
3. Insert into `dutchie_discovery_locations`
4. Queue `entry_point_discovery` for each new location
---
### 2. Entry Point Discovery
**Purpose:** Resolve menu URL slug to platform_dispensary_id (MongoDB ObjectId)
**Handler:** `src/tasks/handlers/entry-point-discovery.ts`
**Flow:**
1. Load dispensary from database
2. Extract slug from `menu_url`:
- `/embedded-menu/<slug>` or `/dispensary/<slug>`
3. Start stealth session (fingerprint + proxy)
4. Query `resolveDispensaryIdWithDetails(slug)` via GraphQL
5. Update dispensary with `platform_dispensary_id`
6. Queue `product_discovery` task
**Example:**
```
menu_url: https://dutchie.com/embedded-menu/deeply-rooted
slug: deeply-rooted
platform_dispensary_id: 6405ef617056e8014d79101b
```
---
### 3. Product Discovery
**Purpose:** Initial crawl of a new dispensary
**Handler:** `src/tasks/handlers/product-discovery.ts`
Same as product_resync but for first-time crawls.
---
### 4. Product Resync
**Purpose:** Recurring crawl to capture price/stock changes
**Handler:** `src/tasks/handlers/product-resync.ts`
**Flow:**
#### Step 1: Load Dispensary Info
```sql
SELECT id, name, platform_dispensary_id, menu_url, state
FROM dispensaries
WHERE id = $1 AND crawl_enabled = true
```
#### Step 2: Start Stealth Session
- Generate random browser fingerprint
- Set locale/timezone matching state
- Optional proxy rotation
#### Step 3: Fetch Products via GraphQL
**Endpoint:** `https://dutchie.com/api-3/graphql`
**Variables:**
```javascript
{
includeEnterpriseSpecials: false,
productsFilter: {
dispensaryId: "<platform_dispensary_id>",
pricingType: "rec",
Status: "All",
types: [],
useCache: false,
isDefaultSort: true,
sortBy: "popularSortIdx",
sortDirection: 1,
bypassOnlineThresholds: true,
isKioskMenu: false,
removeProductsBelowOptionThresholds: false
},
page: 0,
perPage: 100
}
```
**Key Notes:**
- `Status: "All"` returns all products (Active returns same count)
- `Status: null` returns 0 products (broken)
- `pricingType: "rec"` returns BOTH rec and med prices
- Paginate until `products.length < perPage` or `allProducts.length >= totalCount`
#### Step 4: Normalize Data
Transform raw Dutchie payload to canonical format via `DutchieNormalizer`.
#### Step 5: Upsert Products
Insert/update `store_products` table with normalized data.
#### Step 6: Create Snapshots
Insert point-in-time record to `store_product_snapshots`.
#### Step 7: Track Missing Products (OOS Detection)
```sql
-- Reset consecutive_misses for products IN the feed
UPDATE store_products
SET consecutive_misses = 0, last_seen_at = NOW()
WHERE dispensary_id = $1
AND provider = 'dutchie'
AND provider_product_id = ANY($2)
-- Increment for products NOT in feed
UPDATE store_products
SET consecutive_misses = consecutive_misses + 1
WHERE dispensary_id = $1
AND provider = 'dutchie'
AND provider_product_id NOT IN (...)
AND consecutive_misses < 3
-- Mark OOS at 3 consecutive misses
UPDATE store_products
SET stock_status = 'oos', is_in_stock = false
WHERE dispensary_id = $1
AND consecutive_misses >= 3
AND stock_status != 'oos'
```
#### Step 8: Download Images
For new products, download and store images locally.
#### Step 9: Update Dispensary
```sql
UPDATE dispensaries SET last_crawl_at = NOW() WHERE id = $1
```
---
## GraphQL Payload Structure
### Product Fields (from filteredProducts.products[])
| Field | Type | Description |
|-------|------|-------------|
| `_id` / `id` | string | MongoDB ObjectId (24 hex chars) |
| `Name` | string | Product display name |
| `brandName` | string | Brand name |
| `brand.name` | string | Brand name (nested) |
| `brand.description` | string | Brand description |
| `type` | string | Category (Flower, Edible, Concentrate, etc.) |
| `subcategory` | string | Subcategory |
| `strainType` | string | Hybrid, Indica, Sativa, N/A |
| `Status` | string | Always "Active" in feed |
| `Image` | string | Primary image URL |
| `images[]` | array | All product images |
### Pricing Fields
| Field | Type | Description |
|-------|------|-------------|
| `Prices[]` | number[] | Rec prices per option |
| `recPrices[]` | number[] | Rec prices |
| `medicalPrices[]` | number[] | Medical prices |
| `recSpecialPrices[]` | number[] | Rec sale prices |
| `medicalSpecialPrices[]` | number[] | Medical sale prices |
| `Options[]` | string[] | Size options ("1/8oz", "1g", etc.) |
| `rawOptions[]` | string[] | Raw weight options ("3.5g") |
### Inventory Fields (POSMetaData.children[])
| Field | Type | Description |
|-------|------|-------------|
| `quantity` | number | Total inventory count |
| `quantityAvailable` | number | Available for online orders |
| `kioskQuantityAvailable` | number | Available for kiosk orders |
| `option` | string | Which size option this is for |
### Potency Fields
| Field | Type | Description |
|-------|------|-------------|
| `THCContent.range[]` | number[] | THC percentage |
| `CBDContent.range[]` | number[] | CBD percentage |
| `cannabinoidsV2[]` | array | Detailed cannabinoid breakdown |
### Specials (specialData.bogoSpecials[])
| Field | Type | Description |
|-------|------|-------------|
| `specialName` | string | Deal name |
| `specialType` | string | "bogo", "sale", etc. |
| `itemsForAPrice.value` | string | Bundle price |
| `bogoRewards[].totalQuantity.quantity` | number | Required quantity |
---
## OOS Detection Logic
Products disappear from the Dutchie feed when they go out of stock. We track this via `consecutive_misses`:
| Scenario | Action |
|----------|--------|
| Product in feed | `consecutive_misses = 0` |
| Product missing 1st time | `consecutive_misses = 1` |
| Product missing 2nd time | `consecutive_misses = 2` |
| Product missing 3rd time | `consecutive_misses = 3`, mark `stock_status = 'oos'` |
| Product returns to feed | `consecutive_misses = 0`, update stock_status |
**Why 3 misses?**
- Protects against false positives from crawl failures
- Single bad crawl doesn't trigger mass OOS alerts
- Balances detection speed vs accuracy
---
## Database Tables
### store_products
Current state of each product:
- `provider_product_id` - Dutchie's MongoDB ObjectId
- `name_raw`, `brand_name_raw` - Raw values from feed
- `price_rec`, `price_med` - Current prices
- `is_in_stock`, `stock_status` - Availability
- `consecutive_misses` - OOS detection counter
- `last_seen_at` - Last time product was in feed
### store_product_snapshots
Point-in-time records for historical analysis:
- One row per product per crawl
- Captures price, stock, potency at that moment
- Used for price history, analytics
### dispensaries
Store metadata:
- `platform_dispensary_id` - MongoDB ObjectId for GraphQL
- `menu_url` - Source URL
- `last_crawl_at` - Last successful crawl
- `crawl_enabled` - Whether to crawl
---
## Worker Roles
Workers pull tasks from the `worker_tasks` queue based on their assigned role.
| Role | Name | Description | Handler |
|------|------|-------------|---------|
| `product_resync` | Product Resync | Re-crawl dispensary products for price/stock changes | `handleProductResync` |
| `product_discovery` | Product Discovery | Initial product discovery for new dispensaries | `handleProductDiscovery` |
| `store_discovery` | Store Discovery | Discover new dispensary locations | `handleStoreDiscovery` |
| `entry_point_discovery` | Entry Point Discovery | Resolve platform IDs from menu URLs | `handleEntryPointDiscovery` |
| `analytics_refresh` | Analytics Refresh | Refresh materialized views and analytics | `handleAnalyticsRefresh` |
**API Endpoint:** `GET /api/worker-registry/roles`
---
## Scheduling
Crawls are scheduled via `worker_tasks` table:
| Role | Frequency | Description |
|------|-----------|-------------|
| `product_resync` | Every 4 hours | Regular product refresh |
| `product_discovery` | On-demand | First crawl for new stores |
| `entry_point_discovery` | On-demand | New store setup |
| `store_discovery` | Daily | Find new stores |
| `analytics_refresh` | Daily | Refresh analytics materialized views |
---
## Priority & On-Demand Tasks
Tasks are claimed by workers in order of **priority DESC, created_at ASC**.
### Priority Levels
| Priority | Use Case | Example |
|----------|----------|---------|
| 0 | Scheduled/batch tasks | Daily product_resync generation |
| 10 | On-demand/chained tasks | entry_point → product_discovery |
| Higher | Urgent/manual triggers | Admin-triggered immediate crawl |
### Task Chaining
When a task completes, the system automatically creates follow-up tasks:
```
store_discovery (completed)
└─► entry_point_discovery (priority: 10) for each new store
entry_point_discovery (completed, success)
└─► product_discovery (priority: 10) for that store
product_discovery (completed)
└─► [no chain] Store enters regular resync schedule
```
### On-Demand Task Creation
Use the task service to create high-priority tasks:
```typescript
// Create immediate product resync for a store
await taskService.createTask({
role: 'product_resync',
dispensary_id: 123,
platform: 'dutchie',
priority: 20, // Higher than batch tasks
});
// Convenience methods with default high priority (10)
await taskService.createEntryPointTask(dispensaryId, 'dutchie');
await taskService.createProductDiscoveryTask(dispensaryId, 'dutchie');
await taskService.createStoreDiscoveryTask('dutchie', 'AZ');
```
### Claim Function
The `claim_task()` SQL function atomically claims tasks:
- Respects priority ordering (higher = first)
- Uses `FOR UPDATE SKIP LOCKED` for concurrency
- Prevents multiple active tasks per store
---
## Image Storage
Images are downloaded from Dutchie's AWS S3 and stored locally with on-demand resizing.
### Storage Path
```
/storage/images/products/<state>/<store>/<brand>/<product_id>/image-<hash>.webp
/storage/images/brands/<brand>/logo-<hash>.webp
```
**Example:**
```
/storage/images/products/az/az-deeply-rooted/bud-bros/6913e3cd444eac3935e928b9/image-ae38b1f9.webp
```
### Image Proxy API
Served via `/img/*` with on-demand resizing using **sharp**:
```
GET /img/products/az/az-deeply-rooted/bud-bros/6913e3cd444eac3935e928b9/image-ae38b1f9.webp?w=200
```
| Param | Description |
|-------|-------------|
| `w` | Width in pixels (max 4000) |
| `h` | Height in pixels (max 4000) |
| `q` | Quality 1-100 (default 80) |
| `fit` | cover, contain, fill, inside, outside |
| `blur` | Blur sigma (0.3-1000) |
| `gray` | Grayscale (1 = enabled) |
| `format` | webp, jpeg, png, avif (default webp) |
### Key Files
| File | Purpose |
|------|---------|
| `src/utils/image-storage.ts` | Download & save images to local filesystem |
| `src/routes/image-proxy.ts` | On-demand resize/transform at `/img/*` |
### Download Rules
| Scenario | Image Action |
|----------|--------------|
| **New product (first crawl)** | Download if `primaryImageUrl` exists |
| **Existing product (refresh)** | Download only if `local_image_path` is NULL (backfill) |
| **Product already has local image** | Skip download entirely |
**Logic:**
- Images are downloaded **once** and never re-downloaded on subsequent crawls
- `skipIfExists: true` - filesystem check prevents re-download even if queued
- First crawl: all products get images
- Refresh crawl: only new products or products missing local images
### Storage Rules
- **NO MinIO** - local filesystem only (`STORAGE_DRIVER=local`)
- Store full resolution, resize on-demand via `/img` proxy
- Convert to webp for consistency using **sharp**
- Preserve original Dutchie URL as fallback in `image_url` column
- Local path stored in `local_image_path` column
---
## Stealth & Anti-Detection
**PROXIES ARE REQUIRED** - Workers will fail to start if no active proxies are available in the database. All HTTP requests to Dutchie go through a proxy.
Workers automatically initialize anti-detection systems on startup.
### Components
| Component | Purpose | Source |
|-----------|---------|--------|
| **CrawlRotator** | Coordinates proxy + UA rotation | `src/services/crawl-rotator.ts` |
| **ProxyRotator** | Round-robin proxy selection, health tracking | `src/services/crawl-rotator.ts` |
| **UserAgentRotator** | Cycles through realistic browser fingerprints | `src/services/crawl-rotator.ts` |
| **Dutchie Client** | Curl-based HTTP with auto-retry on 403 | `src/platforms/dutchie/client.ts` |
### Initialization Flow
```
Worker Start
├─► initializeStealth()
│ │
│ ├─► CrawlRotator.initialize()
│ │ └─► Load proxies from `proxies` table
│ │
│ └─► setCrawlRotator(rotator)
│ └─► Wire to Dutchie client
└─► Process tasks...
```
### Stealth Session (per task)
Each crawl task starts a stealth session:
```typescript
// In product-refresh.ts, entry-point-discovery.ts
const session = startSession(dispensary.state || 'AZ', 'America/Phoenix');
```
This creates a new identity with:
- **Random fingerprint:** Chrome/Firefox/Safari/Edge on Win/Mac/Linux
- **Accept-Language:** Matches timezone (e.g., `America/Phoenix``en-US,en;q=0.9`)
- **sec-ch-ua headers:** Proper Client Hints for the browser profile
### On 403 Block
When Dutchie returns 403, the client automatically:
1. Records failure on current proxy (increments `failure_count`)
2. If proxy has 5+ failures, deactivates it
3. Rotates to next healthy proxy
4. Rotates fingerprint
5. Retries the request
### Proxy Table Schema
```sql
CREATE TABLE proxies (
id SERIAL PRIMARY KEY,
host VARCHAR(255) NOT NULL,
port INTEGER NOT NULL,
username VARCHAR(100),
password VARCHAR(100),
protocol VARCHAR(10) DEFAULT 'http', -- http, https, socks5
is_active BOOLEAN DEFAULT true,
last_used_at TIMESTAMPTZ,
failure_count INTEGER DEFAULT 0,
success_count INTEGER DEFAULT 0,
avg_response_time_ms INTEGER,
last_failure_at TIMESTAMPTZ,
last_error TEXT
);
```
### Configuration
Proxies are mandatory. There is no environment variable to disable them. Workers will refuse to start without active proxies in the database.
### User-Agent Generation
See `workflow-12102025.md` for full specification.
**Summary:**
- Uses `intoli/user-agents` library (daily-updated market share data)
- Device distribution: Mobile 62%, Desktop 36%, Tablet 2%
- Browser whitelist: Chrome, Safari, Edge, Firefox only
- UA sticks until IP rotates (403 or manual rotation)
- Failure = alert admin + stop crawl (no fallback)
Each fingerprint includes proper `sec-ch-ua`, `sec-ch-ua-platform`, and `sec-ch-ua-mobile` headers.
---
## Error Handling
- **GraphQL errors:** Logged, task marked failed, retried later
- **Normalization errors:** Logged as warnings, continue with valid products
- **Image download errors:** Non-fatal, logged, continue
- **Database errors:** Task fails, will be retried
- **403 blocks:** Auto-rotate proxy + fingerprint, retry (up to 3 retries)
---
## Files
| File | Purpose |
|------|---------|
| `src/tasks/handlers/product-resync.ts` | Main crawl handler |
| `src/tasks/handlers/entry-point-discovery.ts` | Slug → ID resolution |
| `src/platforms/dutchie/index.ts` | GraphQL client, session management |
| `src/hydration/normalizers/dutchie.ts` | Payload normalization |
| `src/hydration/canonical-upsert.ts` | Database upsert logic |
| `src/utils/image-storage.ts` | Image download and local storage |
| `src/routes/image-proxy.ts` | On-demand image resizing |
| `migrations/075_consecutive_misses.sql` | OOS tracking column |

View File

@@ -0,0 +1,297 @@
# Organic Browser-Based Scraping Guide
**Last Updated:** 2025-12-12
**Status:** Production-ready proof of concept
---
## Overview
This document describes the "organic" browser-based approach to scraping Dutchie dispensary menus. Unlike direct curl/axios requests, this method uses a real browser session to make API calls, making requests appear natural and reducing detection risk.
---
## Why Organic Scraping?
| Approach | Detection Risk | Speed | Complexity |
|----------|---------------|-------|------------|
| Direct curl | Higher | Fast | Low |
| curl-impersonate | Medium | Fast | Medium |
| **Browser-based (organic)** | **Lowest** | Slower | Higher |
Direct curl requests can be fingerprinted via:
- TLS fingerprint (cipher suites, extensions)
- Header order and values
- Missing cookies/session data
- Request patterns
Browser-based requests inherit:
- Real Chrome TLS fingerprint
- Session cookies from page visit
- Natural header order
- JavaScript execution environment
---
## Implementation
### Dependencies
```bash
npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth
```
### Core Script: `test-intercept.js`
Located at: `backend/test-intercept.js`
```javascript
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
const fs = require('fs');
puppeteer.use(StealthPlugin());
async function capturePayload(config) {
const { dispensaryId, platformId, cName, outputPath } = config;
const browser = await puppeteer.launch({
headless: 'new',
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
const page = await browser.newPage();
// STEP 1: Establish session by visiting the menu
const embedUrl = `https://dutchie.com/embedded-menu/${cName}?menuType=rec`;
await page.goto(embedUrl, { waitUntil: 'networkidle2', timeout: 60000 });
// STEP 2: Fetch ALL products using GraphQL from browser context
const result = await page.evaluate(async (platformId) => {
const allProducts = [];
let pageNum = 0;
const perPage = 100;
let totalCount = 0;
const sessionId = 'browser-session-' + Date.now();
while (pageNum < 30) {
const variables = {
includeEnterpriseSpecials: false,
productsFilter: {
dispensaryId: platformId,
pricingType: 'rec',
Status: 'Active', // CRITICAL: Must be 'Active', not null
types: [],
useCache: true,
isDefaultSort: true,
sortBy: 'popularSortIdx',
sortDirection: 1,
bypassOnlineThresholds: true,
isKioskMenu: false,
removeProductsBelowOptionThresholds: false,
},
page: pageNum,
perPage: perPage,
};
const extensions = {
persistedQuery: {
version: 1,
sha256Hash: 'ee29c060826dc41c527e470e9ae502c9b2c169720faa0a9f5d25e1b9a530a4a0'
}
};
const qs = new URLSearchParams({
operationName: 'FilteredProducts',
variables: JSON.stringify(variables),
extensions: JSON.stringify(extensions)
});
const response = await fetch(`https://dutchie.com/api-3/graphql?${qs}`, {
method: 'GET',
headers: {
'Accept': 'application/json',
'content-type': 'application/json',
'x-dutchie-session': sessionId,
'apollographql-client-name': 'Marketplace (production)',
},
credentials: 'include'
});
const json = await response.json();
const data = json?.data?.filteredProducts;
if (!data?.products) break;
allProducts.push(...data.products);
if (pageNum === 0) totalCount = data.queryInfo?.totalCount || 0;
if (allProducts.length >= totalCount) break;
pageNum++;
await new Promise(r => setTimeout(r, 200)); // Polite delay
}
return { products: allProducts, totalCount };
}, platformId);
await browser.close();
// STEP 3: Save payload
const payload = {
dispensaryId,
platformId,
cName,
fetchedAt: new Date().toISOString(),
productCount: result.products.length,
products: result.products,
};
fs.writeFileSync(outputPath, JSON.stringify(payload, null, 2));
return payload;
}
```
---
## Critical Parameters
### GraphQL Hash (FilteredProducts)
```
ee29c060826dc41c527e470e9ae502c9b2c169720faa0a9f5d25e1b9a530a4a0
```
**WARNING:** Using the wrong hash returns HTTP 400.
### Status Parameter
| Value | Result |
|-------|--------|
| `'Active'` | Returns in-stock products (1019 in test) |
| `null` | Returns 0 products |
| `'All'` | Returns HTTP 400 |
**ALWAYS use `Status: 'Active'`**
### Required Headers
```javascript
{
'Accept': 'application/json',
'content-type': 'application/json',
'x-dutchie-session': 'unique-session-id',
'apollographql-client-name': 'Marketplace (production)',
}
```
### Endpoint
```
https://dutchie.com/api-3/graphql
```
---
## Performance Benchmarks
Test store: AZ-Deeply-Rooted (1019 products)
| Metric | Value |
|--------|-------|
| Total products | 1019 |
| Time | 18.5 seconds |
| Payload size | 11.8 MB |
| Pages fetched | 11 (100 per page) |
| Success rate | 100% |
---
## Payload Format
The output matches the existing `payload-fetch.ts` handler format:
```json
{
"dispensaryId": 123,
"platformId": "6405ef617056e8014d79101b",
"cName": "AZ-Deeply-Rooted",
"fetchedAt": "2025-12-12T05:05:19.837Z",
"productCount": 1019,
"products": [
{
"id": "6927508db4851262f629a869",
"Name": "Product Name",
"brand": { "name": "Brand Name", ... },
"type": "Flower",
"THC": "25%",
"Prices": [...],
"Options": [...],
...
}
]
}
```
---
## Integration Points
### As a Task Handler
The organic approach can be integrated as an alternative to curl-based fetching:
```typescript
// In src/tasks/handlers/organic-payload-fetch.ts
export async function handleOrganicPayloadFetch(ctx: TaskContext): Promise<TaskResult> {
// Use puppeteer-based capture
// Save to same payload storage
// Queue product_refresh task
}
```
### Worker Configuration
Add to job_schedules:
```sql
INSERT INTO job_schedules (name, role, cron_expression)
VALUES ('organic_product_crawl', 'organic_payload_fetch', '0 */6 * * *');
```
---
## Troubleshooting
### HTTP 400 Bad Request
- Check hash is correct: `ee29c060...`
- Verify Status is `'Active'` (string, not null)
### 0 Products Returned
- Status was likely `null` or `'All'` - use `'Active'`
- Check platformId is valid MongoDB ObjectId
### Session Not Established
- Increase timeout on initial page.goto()
- Check cName is valid (matches embedded-menu URL)
### Detection/Blocking
- StealthPlugin should handle most cases
- Add random delays between pages
- Use headless: 'new' (not true/false)
---
## Files Reference
| File | Purpose |
|------|---------|
| `backend/test-intercept.js` | Proof of concept script |
| `backend/src/platforms/dutchie/client.ts` | GraphQL hashes, curl implementation |
| `backend/src/tasks/handlers/payload-fetch.ts` | Current curl-based handler |
| `backend/src/utils/payload-storage.ts` | Payload save/load utilities |
---
## See Also
- `DUTCHIE_CRAWL_WORKFLOW.md` - Full crawl pipeline documentation
- `TASK_WORKFLOW_2024-12-10.md` - Task system architecture
- `CLAUDE.md` - Project rules and constraints

View File

@@ -0,0 +1,25 @@
# ARCHIVED DOCUMENTATION
**WARNING: These docs may be outdated or inaccurate.**
The code has evolved significantly. These docs are kept for historical reference only.
## What to Use Instead
**The single source of truth is:**
- `CLAUDE.md` (root) - Essential rules and quick reference
- `docs/CODEBASE_MAP.md` - Current file/directory reference
## Why Archive?
These docs were written during development iterations and may reference:
- Old file paths that no longer exist
- Deprecated approaches (hydration, scraper-v2)
- APIs that have changed
- Database schemas that evolved
## If You Need Details
1. First check CODEBASE_MAP.md for current file locations
2. Then read the actual source code
3. Only use archive docs as a last resort for historical context

View File

@@ -0,0 +1,584 @@
# Task Workflow Documentation
**Date: 2024-12-10**
This document describes the complete task/job processing architecture after the 2024-12-10 rewrite.
---
## Complete Architecture
```
┌─────────────────────────────────────────────────────────────────────────────────┐
│ KUBERNETES CLUSTER │
├─────────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ API SERVER POD (scraper) │ │
│ │ │ │
│ │ ┌──────────────────┐ ┌────────────────────────────────────────┐ │ │
│ │ │ Express API │ │ TaskScheduler │ │ │
│ │ │ │ │ (src/services/task-scheduler.ts) │ │ │
│ │ │ /api/job-queue │ │ │ │ │
│ │ │ /api/tasks │ │ • Polls every 60s │ │ │
│ │ │ /api/schedules │ │ • Checks task_schedules table │ │ │
│ │ └────────┬─────────┘ │ • SELECT FOR UPDATE SKIP LOCKED │ │ │
│ │ │ │ • Generates tasks when due │ │ │
│ │ │ └──────────────────┬─────────────────────┘ │ │
│ │ │ │ │ │
│ └────────────┼──────────────────────────────────┼──────────────────────────┘ │
│ │ │ │
│ │ ┌────────────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ POSTGRESQL DATABASE │ │
│ │ │ │
│ │ ┌─────────────────────┐ ┌─────────────────────┐ │ │
│ │ │ task_schedules │ │ worker_tasks │ │ │
│ │ │ │ │ │ │ │
│ │ │ • product_refresh │───────►│ • pending tasks │ │ │
│ │ │ • store_discovery │ create │ • claimed tasks │ │ │
│ │ │ • analytics_refresh │ tasks │ • running tasks │ │ │
│ │ │ │ │ • completed tasks │ │ │
│ │ │ next_run_at │ │ │ │ │
│ │ │ last_run_at │ │ role, dispensary_id │ │ │
│ │ │ interval_hours │ │ priority, status │ │ │
│ │ └─────────────────────┘ └──────────┬──────────┘ │ │
│ │ │ │ │
│ └─────────────────────────────────────────────┼────────────────────────────┘ │
│ │ │
│ ┌──────────────────────┘ │
│ │ Workers poll for tasks │
│ │ (SELECT FOR UPDATE SKIP LOCKED) │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ WORKER PODS (StatefulSet: scraper-worker) │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Worker 0 │ │ Worker 1 │ │ Worker 2 │ │ Worker N │ │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │ │ task-worker │ │ task-worker │ │ task-worker │ │ task-worker │ │ │
│ │ │ .ts │ │ .ts │ │ .ts │ │ .ts │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────────────────┘
```
---
## Startup Sequence
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ API SERVER STARTUP │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. Express app initializes │
│ │ │
│ ▼ │
│ 2. runAutoMigrations() │
│ • Runs pending migrations (including 079_task_schedules.sql) │
│ │ │
│ ▼ │
│ 3. initializeMinio() / initializeImageStorage() │
│ │ │
│ ▼ │
│ 4. cleanupOrphanedJobs() │
│ │ │
│ ▼ │
│ 5. taskScheduler.start() ◄─── NEW (per TASK_WORKFLOW_2024-12-10.md) │
│ │ │
│ ├── Recover stale tasks (workers that died) │
│ ├── Ensure default schedules exist in task_schedules │
│ ├── Check and run any due schedules immediately │
│ └── Start 60-second poll interval │
│ │ │
│ ▼ │
│ 6. app.listen(PORT) │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ WORKER POD STARTUP │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. K8s starts pod from StatefulSet │
│ │ │
│ ▼ │
│ 2. TaskWorker.constructor() │
│ • Create DB pool │
│ • Create CrawlRotator │
│ │ │
│ ▼ │
│ 3. initializeStealth() │
│ • Load proxies from DB (REQUIRED - fails if none) │
│ • Wire rotator to Dutchie client │
│ │ │
│ ▼ │
│ 4. register() with API │
│ • Optional - continues if fails │
│ │ │
│ ▼ │
│ 5. startRegistryHeartbeat() every 30s │
│ │ │
│ ▼ │
│ 6. processNextTask() loop │
│ │ │
│ ├── Poll for pending task (FOR UPDATE SKIP LOCKED) │
│ ├── Claim task atomically │
│ ├── Execute handler (product_refresh, store_discovery, etc.) │
│ ├── Mark complete/failed │
│ ├── Chain next task if applicable │
│ └── Loop │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
---
## Schedule Flow
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ SCHEDULER POLL (every 60 seconds) │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ BEGIN TRANSACTION │
│ │ │
│ ▼ │
│ SELECT * FROM task_schedules │
│ WHERE enabled = true AND next_run_at <= NOW() │
│ FOR UPDATE SKIP LOCKED ◄─── Prevents duplicate execution across replicas │
│ │ │
│ ▼ │
│ For each due schedule: │
│ │ │
│ ├── product_refresh_all │
│ │ └─► Query dispensaries needing crawl │
│ │ └─► Create product_refresh tasks in worker_tasks │
│ │ │
│ ├── store_discovery_dutchie │
│ │ └─► Create single store_discovery task │
│ │ │
│ └── analytics_refresh │
│ └─► Create single analytics_refresh task │
│ │ │
│ ▼ │
│ UPDATE task_schedules SET │
│ last_run_at = NOW(), │
│ next_run_at = NOW() + interval_hours │
│ │ │
│ ▼ │
│ COMMIT │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
---
## Task Lifecycle
```
┌──────────┐
│ SCHEDULE │
│ DUE │
└────┬─────┘
┌──────────────┐ claim ┌──────────────┐ start ┌──────────────┐
│ PENDING │────────────►│ CLAIMED │────────────►│ RUNNING │
└──────────────┘ └──────────────┘ └──────┬───────┘
▲ │
│ ┌──────────────┼──────────────┐
│ retry │ │ │
│ (if retries < max) ▼ ▼ ▼
│ ┌──────────┐ ┌──────────┐ ┌──────────┐
└──────────────────────────────────│ FAILED │ │ COMPLETED│ │ STALE │
└──────────┘ └──────────┘ └────┬─────┘
recover_stale_tasks()
┌──────────┐
│ PENDING │
└──────────┘
```
---
## Database Tables
### task_schedules (NEW - migration 079)
Stores schedule definitions. Survives restarts.
```sql
CREATE TABLE task_schedules (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL UNIQUE,
role VARCHAR(50) NOT NULL, -- product_refresh, store_discovery, etc.
enabled BOOLEAN DEFAULT TRUE,
interval_hours INTEGER NOT NULL, -- How often to run
priority INTEGER DEFAULT 0, -- Task priority when created
state_code VARCHAR(2), -- Optional filter
last_run_at TIMESTAMPTZ, -- When it last ran
next_run_at TIMESTAMPTZ, -- When it's due next
last_task_count INTEGER, -- Tasks created last run
last_error TEXT -- Error message if failed
);
```
### worker_tasks (migration 074)
The task queue. Workers pull from here.
```sql
CREATE TABLE worker_tasks (
id SERIAL PRIMARY KEY,
role task_role NOT NULL, -- What type of work
dispensary_id INTEGER, -- Which store (if applicable)
platform VARCHAR(50), -- Which platform
status task_status DEFAULT 'pending',
priority INTEGER DEFAULT 0, -- Higher = process first
scheduled_for TIMESTAMP, -- Don't process before this time
worker_id VARCHAR(100), -- Which worker claimed it
claimed_at TIMESTAMP,
started_at TIMESTAMP,
completed_at TIMESTAMP,
last_heartbeat_at TIMESTAMP, -- For stale detection
result JSONB,
error_message TEXT,
retry_count INTEGER DEFAULT 0,
max_retries INTEGER DEFAULT 3
);
```
---
## Default Schedules
| Name | Role | Interval | Priority | Description |
|------|------|----------|----------|-------------|
| `payload_fetch_all` | payload_fetch | 4 hours | 0 | Fetch payloads from Dutchie API (chains to product_refresh) |
| `store_discovery_dutchie` | store_discovery | 24 hours | 5 | Find new Dutchie stores |
| `analytics_refresh` | analytics_refresh | 6 hours | 0 | Refresh MVs |
---
## Task Roles
| Role | Description | Creates Tasks For |
|------|-------------|-------------------|
| `payload_fetch` | **NEW** - Fetch from Dutchie API, save to disk | Each dispensary needing crawl |
| `product_refresh` | **CHANGED** - Read local payload, normalize, upsert to DB | Chained from payload_fetch |
| `store_discovery` | Find new dispensaries, returns newStoreIds[] | Single task per platform |
| `entry_point_discovery` | **DEPRECATED** - Resolve platform IDs | No longer used |
| `product_discovery` | Initial product fetch for new stores | Chained from store_discovery |
| `analytics_refresh` | Refresh MVs | Single global task |
### Payload/Refresh Separation (2024-12-10)
The crawl workflow is now split into two phases:
```
payload_fetch (scheduled every 4h)
└─► Hit Dutchie GraphQL API
└─► Save raw JSON to /storage/payloads/{year}/{month}/{day}/store_{id}_{ts}.json.gz
└─► Record metadata in raw_crawl_payloads table
└─► Queue product_refresh task with payload_id
product_refresh (chained from payload_fetch)
└─► Load payload from filesystem (NOT from API)
└─► Normalize via DutchieNormalizer
└─► Upsert to store_products
└─► Create snapshots
└─► Track missing products
└─► Download images
```
**Benefits:**
- **Retry-friendly**: If normalize fails, re-run product_refresh without re-crawling
- **Replay-able**: Run product_refresh against any historical payload
- **Faster refreshes**: Local file read vs network call
- **Historical diffs**: Compare payloads to see what changed between crawls
- **Less API pressure**: Only payload_fetch hits Dutchie
---
## Task Chaining
Tasks automatically queue follow-up tasks upon successful completion. This creates two main flows:
### Discovery Flow (New Stores)
When `store_discovery` finds new dispensaries, they automatically get their initial product data:
```
store_discovery
└─► Discovers new locations via Dutchie GraphQL
└─► Auto-promotes valid locations to dispensaries table
└─► Collects newDispensaryIds[] from promotions
└─► Returns { newStoreIds: [...] } in result
chainNextTask() detects newStoreIds
└─► Creates product_discovery task for each new store
product_discovery
└─► Calls handlePayloadFetch() internally
└─► payload_fetch hits Dutchie API
└─► Saves raw JSON to /storage/payloads/
└─► Queues product_refresh task with payload_id
product_refresh
└─► Loads payload from filesystem
└─► Normalizes and upserts to store_products
└─► Creates snapshots, downloads images
```
**Complete Discovery Chain:**
```
store_discovery → product_discovery → payload_fetch → product_refresh
(internal call) (queues next)
```
### Scheduled Flow (Existing Stores)
For existing stores, `payload_fetch_all` schedule runs every 4 hours:
```
TaskScheduler (every 60s)
└─► Checks task_schedules for due schedules
└─► payload_fetch_all is due
└─► Generates payload_fetch task for each dispensary
payload_fetch
└─► Hits Dutchie GraphQL API
└─► Saves raw JSON to /storage/payloads/
└─► Queues product_refresh task with payload_id
product_refresh
└─► Loads payload from filesystem (NOT API)
└─► Normalizes via DutchieNormalizer
└─► Upserts to store_products
└─► Creates snapshots
```
**Complete Scheduled Chain:**
```
payload_fetch → product_refresh
(queues) (reads local)
```
### Chaining Implementation
Task chaining is handled in two places:
1. **Internal chaining (handler calls handler):**
- `product_discovery` calls `handlePayloadFetch()` directly
2. **External chaining (chainNextTask() in task-service.ts):**
- Called after task completion
- `store_discovery` → queues `product_discovery` for each newStoreId
3. **Queue-based chaining (taskService.createTask):**
- `payload_fetch` queues `product_refresh` with `payload: { payload_id }`
---
## Payload API Endpoints
Raw crawl payloads can be accessed via the Payloads API:
| Endpoint | Method | Description |
|----------|--------|-------------|
| `GET /api/payloads` | GET | List payload metadata (paginated) |
| `GET /api/payloads/:id` | GET | Get payload metadata by ID |
| `GET /api/payloads/:id/data` | GET | Get full payload JSON (decompressed) |
| `GET /api/payloads/store/:dispensaryId` | GET | List payloads for a store |
| `GET /api/payloads/store/:dispensaryId/latest` | GET | Get latest payload for a store |
| `GET /api/payloads/store/:dispensaryId/diff` | GET | Diff two payloads for changes |
### Payload Diff Response
The diff endpoint returns:
```json
{
"success": true,
"from": { "id": 123, "fetchedAt": "...", "productCount": 100 },
"to": { "id": 456, "fetchedAt": "...", "productCount": 105 },
"diff": {
"added": 10,
"removed": 5,
"priceChanges": 8,
"stockChanges": 12
},
"details": {
"added": [...],
"removed": [...],
"priceChanges": [...],
"stockChanges": [...]
}
}
```
---
## API Endpoints
### Schedules (NEW)
| Endpoint | Method | Description |
|----------|--------|-------------|
| `GET /api/schedules` | GET | List all schedules |
| `PUT /api/schedules/:id` | PUT | Update schedule |
| `POST /api/schedules/:id/trigger` | POST | Run schedule immediately |
### Task Creation (rewired 2024-12-10)
| Endpoint | Method | Description |
|----------|--------|-------------|
| `POST /api/job-queue/enqueue` | POST | Create single task |
| `POST /api/job-queue/enqueue-batch` | POST | Create batch tasks |
| `POST /api/job-queue/enqueue-state` | POST | Create tasks for state |
| `POST /api/tasks` | POST | Direct task creation |
### Task Management
| Endpoint | Method | Description |
|----------|--------|-------------|
| `GET /api/tasks` | GET | List tasks |
| `GET /api/tasks/:id` | GET | Get single task |
| `GET /api/tasks/counts` | GET | Task counts by status |
| `POST /api/tasks/recover-stale` | POST | Recover stale tasks |
---
## Key Files
| File | Purpose |
|------|---------|
| `src/services/task-scheduler.ts` | **NEW** - DB-driven scheduler |
| `src/tasks/task-worker.ts` | Worker that processes tasks |
| `src/tasks/task-service.ts` | Task CRUD operations |
| `src/tasks/handlers/payload-fetch.ts` | **NEW** - Fetches from API, saves to disk |
| `src/tasks/handlers/product-refresh.ts` | **CHANGED** - Reads from disk, processes to DB |
| `src/utils/payload-storage.ts` | **NEW** - Payload save/load utilities |
| `src/routes/tasks.ts` | Task API endpoints |
| `src/routes/job-queue.ts` | Job Queue UI endpoints (rewired) |
| `migrations/079_task_schedules.sql` | Schedule table |
| `migrations/080_raw_crawl_payloads.sql` | Payload metadata table |
| `migrations/081_payload_fetch_columns.sql` | payload, last_fetch_at columns |
| `migrations/074_worker_task_queue.sql` | Task queue table |
---
## Legacy Code (DEPRECATED)
| File | Status | Replacement |
|------|--------|-------------|
| `src/services/scheduler.ts` | DEPRECATED | `task-scheduler.ts` |
| `dispensary_crawl_jobs` table | ORPHANED | `worker_tasks` |
| `job_schedules` table | LEGACY | `task_schedules` |
---
## Dashboard Integration
Both pages remain wired to the dashboard:
| Page | Data Source | Actions |
|------|-------------|---------|
| **Job Queue** | `worker_tasks`, `task_schedules` | Create tasks, view schedules |
| **Task Queue** | `worker_tasks` | View tasks, recover stale |
---
## Multi-Replica Safety
The scheduler uses `SELECT FOR UPDATE SKIP LOCKED` to ensure:
1. **Only one replica** executes a schedule at a time
2. **No duplicate tasks** created
3. **Survives pod restarts** - state in DB, not memory
4. **Self-healing** - recovers stale tasks on startup
```sql
-- This query is atomic across all API server replicas
SELECT * FROM task_schedules
WHERE enabled = true AND next_run_at <= NOW()
FOR UPDATE SKIP LOCKED
```
---
## Worker Scaling (K8s)
Workers run as a StatefulSet in Kubernetes. You can scale from the admin UI or CLI.
### From Admin UI
The Workers page (`/admin/workers`) provides:
- Current replica count display
- Scale up/down buttons
- Target replica input
### API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `GET /api/workers/k8s/replicas` | GET | Get current/desired replica counts |
| `POST /api/workers/k8s/scale` | POST | Scale to N replicas (body: `{ replicas: N }`) |
### From CLI
```bash
# View current replicas
kubectl get statefulset scraper-worker -n dispensary-scraper
# Scale to 10 workers
kubectl scale statefulset scraper-worker -n dispensary-scraper --replicas=10
# Scale down to 3 workers
kubectl scale statefulset scraper-worker -n dispensary-scraper --replicas=3
```
### Configuration
Environment variables for the API server:
| Variable | Default | Description |
|----------|---------|-------------|
| `K8S_NAMESPACE` | `dispensary-scraper` | Kubernetes namespace |
| `K8S_WORKER_STATEFULSET` | `scraper-worker` | StatefulSet name |
### RBAC Requirements
The API server pod needs these K8s permissions:
```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: worker-scaler
namespace: dispensary-scraper
rules:
- apiGroups: ["apps"]
resources: ["statefulsets"]
verbs: ["get", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: scraper-worker-scaler
namespace: dispensary-scraper
subjects:
- kind: ServiceAccount
name: default
namespace: dispensary-scraper
roleRef:
kind: Role
name: worker-scaler
apiGroup: rbac.authorization.k8s.io
```

View File

@@ -0,0 +1,639 @@
# Worker Task Architecture
This document describes the unified task-based worker system that replaces the legacy fragmented job systems.
## Overview
The task worker architecture provides a single, unified system for managing all background work in CannaiQ:
- **Store discovery** - Find new dispensaries on platforms
- **Entry point discovery** - Resolve platform IDs from menu URLs
- **Product discovery** - Initial product fetch for new stores
- **Product resync** - Regular price/stock updates for existing stores
- **Analytics refresh** - Refresh materialized views and analytics
## Architecture
### Database Tables
**`worker_tasks`** - Central task queue
```sql
CREATE TABLE worker_tasks (
id SERIAL PRIMARY KEY,
role task_role NOT NULL, -- What type of work
dispensary_id INTEGER, -- Which store (if applicable)
platform VARCHAR(50), -- Which platform (dutchie, etc.)
status task_status DEFAULT 'pending',
priority INTEGER DEFAULT 0, -- Higher = process first
scheduled_for TIMESTAMP, -- Don't process before this time
worker_id VARCHAR(100), -- Which worker claimed it
claimed_at TIMESTAMP,
started_at TIMESTAMP,
completed_at TIMESTAMP,
last_heartbeat_at TIMESTAMP, -- For stale detection
result JSONB, -- Output from handler
error_message TEXT,
retry_count INTEGER DEFAULT 0,
max_retries INTEGER DEFAULT 3,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
```
**Key indexes:**
- `idx_worker_tasks_pending_priority` - For efficient task claiming
- `idx_worker_tasks_active_dispensary` - Prevents concurrent tasks per store (partial unique index)
### Task Roles
| Role | Purpose | Per-Store | Scheduled |
|------|---------|-----------|-----------|
| `store_discovery` | Find new stores on a platform | No | Daily |
| `entry_point_discovery` | Resolve platform IDs | Yes | On-demand |
| `product_discovery` | Initial product fetch | Yes | After entry_point |
| `product_resync` | Price/stock updates | Yes | Every 4 hours |
| `analytics_refresh` | Refresh MVs | No | Daily |
### Task Lifecycle
```
pending → claimed → running → completed
failed
```
1. **pending** - Task is waiting to be picked up
2. **claimed** - Worker has claimed it (atomic via SELECT FOR UPDATE SKIP LOCKED)
3. **running** - Worker is actively processing
4. **completed** - Task finished successfully
5. **failed** - Task encountered an error
6. **stale** - Task lost its worker (recovered automatically)
## Files
### Core Files
| File | Purpose |
|------|---------|
| `src/tasks/task-service.ts` | TaskService - CRUD, claiming, capacity metrics |
| `src/tasks/task-worker.ts` | TaskWorker - Main worker loop |
| `src/tasks/index.ts` | Module exports |
| `src/routes/tasks.ts` | API endpoints |
| `migrations/074_worker_task_queue.sql` | Database schema |
### Task Handlers
| File | Role |
|------|------|
| `src/tasks/handlers/store-discovery.ts` | `store_discovery` |
| `src/tasks/handlers/entry-point-discovery.ts` | `entry_point_discovery` |
| `src/tasks/handlers/product-discovery.ts` | `product_discovery` |
| `src/tasks/handlers/product-resync.ts` | `product_resync` |
| `src/tasks/handlers/analytics-refresh.ts` | `analytics_refresh` |
## Running Workers
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `WORKER_ROLE` | (required) | Which task role to process |
| `WORKER_ID` | auto-generated | Custom worker identifier |
| `POLL_INTERVAL_MS` | 5000 | How often to check for tasks |
| `HEARTBEAT_INTERVAL_MS` | 30000 | How often to update heartbeat |
### Starting a Worker
```bash
# Start a product resync worker
WORKER_ROLE=product_resync npx tsx src/tasks/task-worker.ts
# Start with custom ID
WORKER_ROLE=product_resync WORKER_ID=resync-1 npx tsx src/tasks/task-worker.ts
# Start multiple workers for different roles
WORKER_ROLE=store_discovery npx tsx src/tasks/task-worker.ts &
WORKER_ROLE=product_resync npx tsx src/tasks/task-worker.ts &
```
### Kubernetes Deployment
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: task-worker-resync
spec:
replicas: 3
template:
spec:
containers:
- name: worker
image: code.cannabrands.app/creationshop/dispensary-scraper:latest
command: ["npx", "tsx", "src/tasks/task-worker.ts"]
env:
- name: WORKER_ROLE
value: "product_resync"
```
## API Endpoints
### Task Management
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/tasks` | GET | List tasks with filters |
| `/api/tasks` | POST | Create a new task |
| `/api/tasks/:id` | GET | Get task by ID |
| `/api/tasks/counts` | GET | Get counts by status |
| `/api/tasks/capacity` | GET | Get capacity metrics |
| `/api/tasks/capacity/:role` | GET | Get role-specific capacity |
| `/api/tasks/recover-stale` | POST | Recover tasks from dead workers |
### Task Generation
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/tasks/generate/resync` | POST | Generate daily resync tasks |
| `/api/tasks/generate/discovery` | POST | Create store discovery task |
### Migration (from legacy systems)
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/tasks/migration/status` | GET | Compare old vs new systems |
| `/api/tasks/migration/disable-old-schedules` | POST | Disable job_schedules |
| `/api/tasks/migration/cancel-pending-crawl-jobs` | POST | Cancel old crawl jobs |
| `/api/tasks/migration/create-resync-tasks` | POST | Create tasks for all stores |
| `/api/tasks/migration/full-migrate` | POST | One-click migration |
### Role-Specific Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/tasks/role/:role/last-completion` | GET | Last completion time |
| `/api/tasks/role/:role/recent` | GET | Recent completions |
| `/api/tasks/store/:id/active` | GET | Check if store has active task |
## Capacity Planning
The `v_worker_capacity` view provides real-time metrics:
```sql
SELECT * FROM v_worker_capacity;
```
Returns:
- `pending_tasks` - Tasks waiting to be claimed
- `ready_tasks` - Tasks ready now (scheduled_for is null or past)
- `claimed_tasks` - Tasks claimed but not started
- `running_tasks` - Tasks actively processing
- `completed_last_hour` - Recent completions
- `failed_last_hour` - Recent failures
- `active_workers` - Workers with recent heartbeats
- `avg_duration_sec` - Average task duration
- `tasks_per_worker_hour` - Throughput estimate
- `estimated_hours_to_drain` - Time to clear queue
### Scaling Recommendations
```javascript
// API: GET /api/tasks/capacity/:role
{
"role": "product_resync",
"pending_tasks": 500,
"active_workers": 3,
"workers_needed": {
"for_1_hour": 10,
"for_4_hours": 3,
"for_8_hours": 2
}
}
```
## Task Chaining
Tasks can automatically create follow-up tasks:
```
store_discovery → entry_point_discovery → product_discovery
(store has platform_dispensary_id)
Daily resync tasks
```
The `chainNextTask()` method handles this automatically.
## Stale Task Recovery
Tasks are considered stale if `last_heartbeat_at` is older than the threshold (default 10 minutes).
```sql
SELECT recover_stale_tasks(10); -- 10 minute threshold
```
Or via API:
```bash
curl -X POST /api/tasks/recover-stale \
-H 'Content-Type: application/json' \
-d '{"threshold_minutes": 10}'
```
## Migration from Legacy Systems
### Legacy Systems Replaced
1. **job_schedules + job_run_logs** - Scheduled job definitions
2. **dispensary_crawl_jobs** - Per-dispensary crawl queue
3. **SyncOrchestrator + HydrationWorker** - Raw payload processing
### Migration Steps
**Option 1: One-Click Migration**
```bash
curl -X POST /api/tasks/migration/full-migrate
```
This will:
1. Disable all job_schedules
2. Cancel pending dispensary_crawl_jobs
3. Generate resync tasks for all stores
4. Create discovery and analytics tasks
**Option 2: Manual Migration**
```bash
# 1. Check current status
curl /api/tasks/migration/status
# 2. Disable old schedules
curl -X POST /api/tasks/migration/disable-old-schedules
# 3. Cancel pending crawl jobs
curl -X POST /api/tasks/migration/cancel-pending-crawl-jobs
# 4. Create resync tasks
curl -X POST /api/tasks/migration/create-resync-tasks \
-H 'Content-Type: application/json' \
-d '{"state_code": "AZ"}'
# 5. Generate daily resync schedule
curl -X POST /api/tasks/generate/resync \
-H 'Content-Type: application/json' \
-d '{"batches_per_day": 6}'
```
## Per-Store Locking
The system prevents concurrent tasks for the same store using a partial unique index:
```sql
CREATE UNIQUE INDEX idx_worker_tasks_active_dispensary
ON worker_tasks (dispensary_id)
WHERE dispensary_id IS NOT NULL
AND status IN ('claimed', 'running');
```
This ensures only one task can be active per store at any time.
## Task Priority
Tasks are claimed in priority order (higher first), then by creation time:
```sql
ORDER BY priority DESC, created_at ASC
```
Default priorities:
- `store_discovery`: 0
- `entry_point_discovery`: 10 (high - new stores)
- `product_discovery`: 10 (high - new stores)
- `product_resync`: 0
- `analytics_refresh`: 0
## Scheduled Tasks
Tasks can be scheduled for future execution:
```javascript
await taskService.createTask({
role: 'product_resync',
dispensary_id: 123,
scheduled_for: new Date('2025-01-10T06:00:00Z'),
});
```
The `generate_resync_tasks()` function creates staggered tasks throughout the day:
```sql
SELECT generate_resync_tasks(6, '2025-01-10'); -- 6 batches = every 4 hours
```
## Dashboard Integration
The admin dashboard shows task queue status in the main overview:
```
Task Queue Summary
------------------
Pending: 45
Running: 3
Completed: 1,234
Failed: 12
```
Full task management is available at `/admin/tasks`.
## Error Handling
Failed tasks include the error message in `error_message` and can be retried:
```sql
-- View failed tasks
SELECT id, role, dispensary_id, error_message, retry_count
FROM worker_tasks
WHERE status = 'failed'
ORDER BY completed_at DESC
LIMIT 20;
-- Retry failed tasks
UPDATE worker_tasks
SET status = 'pending', retry_count = retry_count + 1
WHERE status = 'failed' AND retry_count < max_retries;
```
## Concurrent Task Processing (Added 2024-12)
Workers can now process multiple tasks concurrently within a single worker instance. This improves throughput by utilizing async I/O efficiently.
### Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ Pod (K8s) │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ TaskWorker │ │
│ │ │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ Task 1 │ │ Task 2 │ │ Task 3 │ (concurrent)│ │
│ │ └─────────┘ └─────────┘ └─────────┘ │ │
│ │ │ │
│ │ Resource Monitor │ │
│ │ ├── Memory: 65% (threshold: 85%) │ │
│ │ ├── CPU: 45% (threshold: 90%) │ │
│ │ └── Status: Normal │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `MAX_CONCURRENT_TASKS` | 3 | Maximum tasks a worker will run concurrently |
| `MEMORY_BACKOFF_THRESHOLD` | 0.85 | Back off when heap memory exceeds 85% |
| `CPU_BACKOFF_THRESHOLD` | 0.90 | Back off when CPU exceeds 90% |
| `BACKOFF_DURATION_MS` | 10000 | How long to wait when backing off (10s) |
### How It Works
1. **Main Loop**: Worker continuously tries to fill up to `MAX_CONCURRENT_TASKS`
2. **Resource Monitoring**: Before claiming a new task, worker checks memory and CPU
3. **Backoff**: If resources exceed thresholds, worker pauses and stops claiming new tasks
4. **Concurrent Execution**: Tasks run in parallel using `Promise` - they don't block each other
5. **Graceful Shutdown**: On SIGTERM/decommission, worker stops claiming but waits for active tasks
### Resource Monitoring
```typescript
// ResourceStats interface
interface ResourceStats {
memoryPercent: number; // Current heap usage as decimal (0.0-1.0)
memoryMb: number; // Current heap used in MB
memoryTotalMb: number; // Total heap available in MB
cpuPercent: number; // CPU usage as percentage (0-100)
isBackingOff: boolean; // True if worker is in backoff state
backoffReason: string; // Why the worker is backing off
}
```
### Heartbeat Data
Workers report the following in their heartbeat:
```json
{
"worker_id": "worker-abc123",
"current_task_id": 456,
"current_task_ids": [456, 457, 458],
"active_task_count": 3,
"max_concurrent_tasks": 3,
"status": "active",
"resources": {
"memory_mb": 256,
"memory_total_mb": 512,
"memory_rss_mb": 320,
"memory_percent": 50,
"cpu_user_ms": 12500,
"cpu_system_ms": 3200,
"cpu_percent": 45,
"is_backing_off": false,
"backoff_reason": null
}
}
```
### Backoff Behavior
When resources exceed thresholds:
1. Worker logs the backoff reason:
```
[TaskWorker] MyWorker backing off: Memory at 87.3% (threshold: 85%)
```
2. Worker stops claiming new tasks but continues existing tasks
3. After `BACKOFF_DURATION_MS`, worker rechecks resources
4. When resources return to normal:
```
[TaskWorker] MyWorker resuming normal operation
```
### UI Display
The Workers Dashboard shows:
- **Tasks Column**: `2/3 tasks` (active/max concurrent)
- **Resources Column**: Memory % and CPU % with color coding
- Green: < 50%
- Yellow: 50-74%
- Amber: 75-89%
- Red: 90%+
- **Backing Off**: Orange warning badge when worker is in backoff state
### Task Count Badge Details
```
┌─────────────────────────────────────────────┐
│ Worker: "MyWorker" │
│ Tasks: 2/3 tasks #456, #457 │
│ Resources: 🧠 65% 💻 45% │
│ Status: ● Active │
└─────────────────────────────────────────────┘
```
### Best Practices
1. **Start Conservative**: Use `MAX_CONCURRENT_TASKS=3` initially
2. **Monitor Resources**: Watch for frequent backoffs in logs
3. **Tune Per Workload**: I/O-bound tasks benefit from higher concurrency
4. **Scale Horizontally**: Add more pods rather than cranking concurrency too high
### Code References
| File | Purpose |
|------|---------|
| `src/tasks/task-worker.ts:68-71` | Concurrency environment variables |
| `src/tasks/task-worker.ts:104-111` | ResourceStats interface |
| `src/tasks/task-worker.ts:149-179` | getResourceStats() method |
| `src/tasks/task-worker.ts:184-196` | shouldBackOff() method |
| `src/tasks/task-worker.ts:462-516` | mainLoop() with concurrent claiming |
| `src/routes/worker-registry.ts:148-195` | Heartbeat endpoint handling |
| `cannaiq/src/pages/WorkersDashboard.tsx:233-305` | UI components for resources |
## Browser Task Memory Limits (Updated 2025-12)
Browser-based tasks (Puppeteer/Chrome) have strict memory constraints that limit concurrency.
### Why Browser Tasks Are Different
Each browser task launches a Chrome process. Unlike I/O-bound API calls, browsers consume significant RAM:
| Component | RAM Usage |
|-----------|-----------|
| Node.js runtime | ~150 MB |
| Chrome browser (base) | ~200-250 MB |
| Dutchie menu page (loaded) | ~100-150 MB |
| **Per browser total** | **~350-450 MB** |
### Memory Math for Pod Limits
```
Pod memory limit: 2 GB (2000 MB)
Node.js runtime: -150 MB
Safety buffer: -100 MB
────────────────────────────────
Available for browsers: 1750 MB
Per browser + page: ~400 MB
Max browsers: 1750 ÷ 400 = ~4 browsers
Recommended: 3 browsers (leaves headroom for spikes)
```
### MAX_CONCURRENT_TASKS for Browser Tasks
| Browsers per Pod | RAM Used | Risk Level |
|------------------|----------|------------|
| 1 | ~500 MB | Very safe |
| 2 | ~900 MB | Safe |
| **3** | **~1.3 GB** | **Recommended** |
| 4 | ~1.7 GB | Tight (may OOM) |
| 5+ | >2 GB | Will OOM crash |
**CRITICAL**: `MAX_CONCURRENT_TASKS=3` is the maximum safe value for browser tasks with current pod limits.
### Scaling Strategy
Scale **horizontally** (more pods) rather than vertically (more concurrency per pod):
```
┌─────────────────────────────────────────────────────────────────────────┐
│ Cluster: 8 pods × 3 browsers = 24 concurrent tasks │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Pod 0 │ │ Pod 1 │ │ Pod 2 │ │ Pod 3 │ │
│ │ 3 browsers │ │ 3 browsers │ │ 3 browsers │ │ 3 browsers │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Pod 4 │ │ Pod 5 │ │ Pod 6 │ │ Pod 7 │ │
│ │ 3 browsers │ │ 3 browsers │ │ 3 browsers │ │ 3 browsers │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
```
### Browser Lifecycle Per Task
Each task gets a fresh browser with fresh IP/identity:
```
1. Claim task from queue
2. Get fresh proxy from pool
3. Launch browser with proxy
4. Run preflight (verify IP)
5. Execute scrape
6. Close browser
7. Repeat
```
This ensures:
- Fresh IP per task (proxy rotation)
- Fresh fingerprint per task (UA rotation)
- No cookie/session bleed between tasks
- Predictable memory usage
### Increasing Capacity
To handle more concurrent tasks:
1. **Add more pods** (up to 8 per CLAUDE.md limit)
2. **Increase pod memory** (allows 4 browsers per pod):
```yaml
resources:
limits:
memory: "2.5Gi" # from 2Gi
```
**DO NOT** simply increase `MAX_CONCURRENT_TASKS` without also increasing pod memory limits.
## Monitoring
### Logs
Workers log to stdout:
```
[TaskWorker] Starting worker worker-product_resync-a1b2c3d4 for role: product_resync
[TaskWorker] Claimed task 123 (product_resync) for dispensary 456
[TaskWorker] Task 123 completed successfully
```
### Health Check
Check if workers are active:
```sql
SELECT worker_id, role, COUNT(*), MAX(last_heartbeat_at)
FROM worker_tasks
WHERE last_heartbeat_at > NOW() - INTERVAL '5 minutes'
GROUP BY worker_id, role;
```
### Metrics
```sql
-- Tasks by status
SELECT status, COUNT(*) FROM worker_tasks GROUP BY status;
-- Tasks by role
SELECT role, status, COUNT(*) FROM worker_tasks GROUP BY role, status;
-- Average duration by role
SELECT role, AVG(EXTRACT(EPOCH FROM (completed_at - started_at))) as avg_seconds
FROM worker_tasks
WHERE status = 'completed' AND completed_at > NOW() - INTERVAL '24 hours'
GROUP BY role;
```

View File

@@ -0,0 +1,69 @@
apiVersion: batch/v1
kind: CronJob
metadata:
name: ip2location-update
namespace: default
spec:
# Run on the 1st of every month at 3am UTC
schedule: "0 3 1 * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
containers:
- name: ip2location-updater
image: curlimages/curl:latest
command:
- /bin/sh
- -c
- |
set -e
echo "Downloading IP2Location LITE DB5..."
# Download to temp
cd /tmp
curl -L -o ip2location.zip "https://www.ip2location.com/download/?token=${IP2LOCATION_TOKEN}&file=DB5LITEBIN"
# Extract
unzip -o ip2location.zip
# Find and copy the BIN file
BIN_FILE=$(ls *.BIN 2>/dev/null | head -1)
if [ -z "$BIN_FILE" ]; then
echo "ERROR: No BIN file found"
exit 1
fi
# Copy to shared volume
cp "$BIN_FILE" /data/IP2LOCATION-LITE-DB5.BIN
echo "Done! Database updated: /data/IP2LOCATION-LITE-DB5.BIN"
env:
- name: IP2LOCATION_TOKEN
valueFrom:
secretKeyRef:
name: dutchie-backend-secret
key: IP2LOCATION_TOKEN
volumeMounts:
- name: ip2location-data
mountPath: /data
restartPolicy: OnFailure
volumes:
- name: ip2location-data
persistentVolumeClaim:
claimName: ip2location-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ip2location-pvc
namespace: default
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Mi

View File

@@ -26,6 +26,12 @@ spec:
name: dutchie-backend-config name: dutchie-backend-config
- secretRef: - secretRef:
name: dutchie-backend-secret name: dutchie-backend-secret
env:
- name: IP2LOCATION_DB_PATH
value: /data/ip2location/IP2LOCATION-LITE-DB5.BIN
volumeMounts:
- name: ip2location-data
mountPath: /data/ip2location
resources: resources:
requests: requests:
memory: "256Mi" memory: "256Mi"
@@ -45,3 +51,7 @@ spec:
port: 3010 port: 3010
initialDelaySeconds: 5 initialDelaySeconds: 5
periodSeconds: 5 periodSeconds: 5
volumes:
- name: ip2location-data
persistentVolumeClaim:
claimName: ip2location-pvc

View File

@@ -0,0 +1,77 @@
apiVersion: v1
kind: Service
metadata:
name: scraper-worker
namespace: cannaiq
labels:
app: scraper-worker
spec:
clusterIP: None # Headless service required for StatefulSet
selector:
app: scraper-worker
ports:
- port: 3010
name: http
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: scraper-worker
namespace: cannaiq
spec:
serviceName: scraper-worker
replicas: 8
podManagementPolicy: Parallel # Start all pods at once
updateStrategy:
type: OnDelete # Pods only update when manually deleted - no automatic restarts
selector:
matchLabels:
app: scraper-worker
template:
metadata:
labels:
app: scraper-worker
spec:
terminationGracePeriodSeconds: 60
imagePullSecrets:
- name: regcred
containers:
- name: worker
image: git.spdy.io/creationshop/cannaiq:latest
imagePullPolicy: Always
command: ["node"]
args: ["dist/tasks/task-worker.js"]
env:
- name: WORKER_MODE
value: "true"
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MAX_CONCURRENT_TASKS
value: "50"
- name: API_BASE_URL
value: http://scraper
- name: NODE_OPTIONS
value: --max-old-space-size=1500
envFrom:
- configMapRef:
name: scraper-config
- secretRef:
name: scraper-secrets
resources:
requests:
cpu: 100m
memory: 1Gi
limits:
cpu: 500m
memory: 2Gi
livenessProbe:
exec:
command:
- /bin/sh
- -c
- pgrep -f 'task-worker' > /dev/null
initialDelaySeconds: 10
periodSeconds: 30
failureThreshold: 3

View File

@@ -1,18 +1,18 @@
-- Add location columns to proxies table -- Add location columns to proxies table
ALTER TABLE proxies ALTER TABLE proxies
ADD COLUMN city VARCHAR(100), ADD COLUMN IF NOT EXISTS city VARCHAR(100),
ADD COLUMN state VARCHAR(100), ADD COLUMN IF NOT EXISTS state VARCHAR(100),
ADD COLUMN country VARCHAR(100), ADD COLUMN IF NOT EXISTS country VARCHAR(100),
ADD COLUMN country_code VARCHAR(2), ADD COLUMN IF NOT EXISTS country_code VARCHAR(2),
ADD COLUMN location_updated_at TIMESTAMP; ADD COLUMN IF NOT EXISTS location_updated_at TIMESTAMP;
-- Add index for location-based queries -- Add index for location-based queries
CREATE INDEX idx_proxies_location ON proxies(country_code, state, city); CREATE INDEX IF NOT EXISTS idx_proxies_location ON proxies(country_code, state, city);
-- Add the same to failed_proxies table -- Add the same to failed_proxies table
ALTER TABLE failed_proxies ALTER TABLE failed_proxies
ADD COLUMN city VARCHAR(100), ADD COLUMN IF NOT EXISTS city VARCHAR(100),
ADD COLUMN state VARCHAR(100), ADD COLUMN IF NOT EXISTS state VARCHAR(100),
ADD COLUMN country VARCHAR(100), ADD COLUMN IF NOT EXISTS country VARCHAR(100),
ADD COLUMN country_code VARCHAR(2), ADD COLUMN IF NOT EXISTS country_code VARCHAR(2),
ADD COLUMN location_updated_at TIMESTAMP; ADD COLUMN IF NOT EXISTS location_updated_at TIMESTAMP;

View File

@@ -1,6 +1,6 @@
-- Create dispensaries table as single source of truth -- Create dispensaries table as single source of truth
-- This consolidates azdhs_list (official data) + stores (menu data) into one table -- This consolidates azdhs_list (official data) + stores (menu data) into one table
CREATE TABLE dispensaries ( CREATE TABLE IF NOT EXISTS dispensaries (
-- Primary key -- Primary key
id SERIAL PRIMARY KEY, id SERIAL PRIMARY KEY,
@@ -43,11 +43,11 @@ CREATE TABLE dispensaries (
); );
-- Create indexes for common queries -- Create indexes for common queries
CREATE INDEX idx_dispensaries_city ON dispensaries(city); CREATE INDEX IF NOT EXISTS idx_dispensaries_city ON dispensaries(city);
CREATE INDEX idx_dispensaries_state ON dispensaries(state); CREATE INDEX IF NOT EXISTS idx_dispensaries_state ON dispensaries(state);
CREATE INDEX idx_dispensaries_slug ON dispensaries(slug); CREATE INDEX IF NOT EXISTS idx_dispensaries_slug ON dispensaries(slug);
CREATE INDEX idx_dispensaries_azdhs_id ON dispensaries(azdhs_id); CREATE INDEX IF NOT EXISTS idx_dispensaries_azdhs_id ON dispensaries(azdhs_id);
CREATE INDEX idx_dispensaries_menu_status ON dispensaries(menu_scrape_status); CREATE INDEX IF NOT EXISTS idx_dispensaries_menu_status ON dispensaries(menu_scrape_status);
-- Create index for location-based queries -- Create index for location-based queries
CREATE INDEX idx_dispensaries_location ON dispensaries(latitude, longitude) WHERE latitude IS NOT NULL AND longitude IS NOT NULL; CREATE INDEX IF NOT EXISTS idx_dispensaries_location ON dispensaries(latitude, longitude) WHERE latitude IS NOT NULL AND longitude IS NOT NULL;

View File

@@ -1,6 +1,6 @@
-- Create dispensary_changes table for change approval workflow -- Create dispensary_changes table for change approval workflow
-- This protects against accidental data destruction by requiring manual review -- This protects against accidental data destruction by requiring manual review
CREATE TABLE dispensary_changes ( CREATE TABLE IF NOT EXISTS dispensary_changes (
id SERIAL PRIMARY KEY, id SERIAL PRIMARY KEY,
dispensary_id INTEGER NOT NULL REFERENCES dispensaries(id) ON DELETE CASCADE, dispensary_id INTEGER NOT NULL REFERENCES dispensaries(id) ON DELETE CASCADE,
@@ -26,10 +26,10 @@ CREATE TABLE dispensary_changes (
); );
-- Create indexes for common queries -- Create indexes for common queries
CREATE INDEX idx_dispensary_changes_status ON dispensary_changes(status); CREATE INDEX IF NOT EXISTS idx_dispensary_changes_status ON dispensary_changes(status);
CREATE INDEX idx_dispensary_changes_dispensary_status ON dispensary_changes(dispensary_id, status); CREATE INDEX IF NOT EXISTS idx_dispensary_changes_dispensary_status ON dispensary_changes(dispensary_id, status);
CREATE INDEX idx_dispensary_changes_created_at ON dispensary_changes(created_at DESC); CREATE INDEX IF NOT EXISTS idx_dispensary_changes_created_at ON dispensary_changes(created_at DESC);
CREATE INDEX idx_dispensary_changes_requires_recrawl ON dispensary_changes(requires_recrawl) WHERE requires_recrawl = TRUE; CREATE INDEX IF NOT EXISTS idx_dispensary_changes_requires_recrawl ON dispensary_changes(requires_recrawl) WHERE requires_recrawl = TRUE;
-- Create function to automatically set requires_recrawl for website/menu_url changes -- Create function to automatically set requires_recrawl for website/menu_url changes
CREATE OR REPLACE FUNCTION set_requires_recrawl() CREATE OR REPLACE FUNCTION set_requires_recrawl()
@@ -42,7 +42,8 @@ BEGIN
END; END;
$$ LANGUAGE plpgsql; $$ LANGUAGE plpgsql;
-- Create trigger to call the function -- Create trigger to call the function (drop first to make idempotent)
DROP TRIGGER IF EXISTS trigger_set_requires_recrawl ON dispensary_changes;
CREATE TRIGGER trigger_set_requires_recrawl CREATE TRIGGER trigger_set_requires_recrawl
BEFORE INSERT ON dispensary_changes BEFORE INSERT ON dispensary_changes
FOR EACH ROW FOR EACH ROW

View File

@@ -1,6 +1,7 @@
-- Populate dispensaries table from azdhs_list -- Populate dispensaries table from azdhs_list
-- This migrates all 182 AZDHS records with their enriched Google Maps data -- This migrates all 182 AZDHS records with their enriched Google Maps data
-- For multi-location dispensaries with duplicate slugs, append city name to make unique -- For multi-location dispensaries with duplicate slugs, append city name to make unique
-- IDEMPOTENT: Uses ON CONFLICT DO NOTHING to skip already-imported records
WITH ranked_dispensaries AS ( WITH ranked_dispensaries AS (
SELECT SELECT
@@ -78,9 +79,10 @@ SELECT
created_at, created_at,
updated_at updated_at
FROM ranked_dispensaries FROM ranked_dispensaries
ORDER BY id; ORDER BY id
ON CONFLICT (azdhs_id) DO NOTHING;
-- Verify the migration -- Verify the migration (idempotent - just logs, doesn't fail)
DO $$ DO $$
DECLARE DECLARE
source_count INTEGER; source_count INTEGER;
@@ -89,9 +91,11 @@ BEGIN
SELECT COUNT(*) INTO source_count FROM azdhs_list; SELECT COUNT(*) INTO source_count FROM azdhs_list;
SELECT COUNT(*) INTO dest_count FROM dispensaries; SELECT COUNT(*) INTO dest_count FROM dispensaries;
RAISE NOTICE 'Migration complete: % records from azdhs_list % records in dispensaries', source_count, dest_count; RAISE NOTICE 'Migration status: % records in azdhs_list, % records in dispensaries', source_count, dest_count;
IF source_count != dest_count THEN IF dest_count >= source_count THEN
RAISE EXCEPTION 'Record count mismatch! Expected %, got %', source_count, dest_count; RAISE NOTICE 'OK: dispensaries table has expected records';
ELSE
RAISE WARNING 'dispensaries has fewer records than azdhs_list (% vs %)', dest_count, source_count;
END IF; END IF;
END $$; END $$;

View File

@@ -3,15 +3,15 @@
-- Add dispensary_id to products table -- Add dispensary_id to products table
ALTER TABLE products ALTER TABLE products
ADD COLUMN dispensary_id INTEGER REFERENCES dispensaries(id) ON DELETE CASCADE; ADD COLUMN IF NOT EXISTS dispensary_id INTEGER REFERENCES dispensaries(id) ON DELETE CASCADE;
-- Add dispensary_id to categories table -- Add dispensary_id to categories table
ALTER TABLE categories ALTER TABLE categories
ADD COLUMN dispensary_id INTEGER REFERENCES dispensaries(id) ON DELETE CASCADE; ADD COLUMN IF NOT EXISTS dispensary_id INTEGER REFERENCES dispensaries(id) ON DELETE CASCADE;
-- Create indexes for the new foreign keys -- Create indexes for the new foreign keys
CREATE INDEX idx_products_dispensary_id ON products(dispensary_id); CREATE INDEX IF NOT EXISTS idx_products_dispensary_id ON products(dispensary_id);
CREATE INDEX idx_categories_dispensary_id ON categories(dispensary_id); CREATE INDEX IF NOT EXISTS idx_categories_dispensary_id ON categories(dispensary_id);
-- NOTE: We'll populate these FKs and migrate data from stores in a separate data migration -- NOTE: We'll populate these FKs and migrate data from stores in a separate data migration
-- For now, new scrapers should use dispensary_id, but old store_id still works -- For now, new scrapers should use dispensary_id, but old store_id still works

View File

@@ -0,0 +1,119 @@
-- Migration 051: Worker Definitions
-- Creates a dedicated workers table for named workers with roles and assignments
-- Workers table - defines named workers with roles
CREATE TABLE IF NOT EXISTS workers (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL UNIQUE,
role VARCHAR(100) NOT NULL,
description TEXT,
enabled BOOLEAN DEFAULT TRUE,
-- Schedule configuration (for dedicated crawl workers)
schedule_type VARCHAR(50) DEFAULT 'interval', -- 'interval', 'cron', 'manual'
interval_minutes INTEGER DEFAULT 240,
cron_expression VARCHAR(100), -- e.g., '0 */4 * * *'
jitter_minutes INTEGER DEFAULT 30,
-- Assignment scope
assignment_type VARCHAR(50) DEFAULT 'all', -- 'all', 'state', 'dispensary', 'chain'
assigned_state_codes TEXT[], -- e.g., ['AZ', 'CA']
assigned_dispensary_ids INTEGER[],
assigned_chain_ids INTEGER[],
-- Job configuration
job_type VARCHAR(50) NOT NULL DEFAULT 'dutchie_product_crawl',
job_config JSONB DEFAULT '{}',
priority INTEGER DEFAULT 0,
max_concurrent INTEGER DEFAULT 1,
-- Status tracking
status VARCHAR(50) DEFAULT 'idle', -- 'idle', 'running', 'paused', 'error'
last_run_at TIMESTAMPTZ,
last_status VARCHAR(50),
last_error TEXT,
last_duration_ms INTEGER,
next_run_at TIMESTAMPTZ,
current_job_id INTEGER,
-- Metrics
total_runs INTEGER DEFAULT 0,
successful_runs INTEGER DEFAULT 0,
failed_runs INTEGER DEFAULT 0,
avg_duration_ms INTEGER,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Worker run history
CREATE TABLE IF NOT EXISTS worker_runs (
id SERIAL PRIMARY KEY,
worker_id INTEGER NOT NULL REFERENCES workers(id) ON DELETE CASCADE,
started_at TIMESTAMPTZ DEFAULT NOW(),
completed_at TIMESTAMPTZ,
status VARCHAR(50) DEFAULT 'running', -- 'running', 'success', 'error', 'cancelled'
duration_ms INTEGER,
-- What was processed
jobs_created INTEGER DEFAULT 0,
jobs_completed INTEGER DEFAULT 0,
jobs_failed INTEGER DEFAULT 0,
dispensaries_crawled INTEGER DEFAULT 0,
products_found INTEGER DEFAULT 0,
error_message TEXT,
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Index for efficient lookups
CREATE INDEX IF NOT EXISTS idx_workers_enabled ON workers(enabled) WHERE enabled = TRUE;
CREATE INDEX IF NOT EXISTS idx_workers_next_run ON workers(next_run_at) WHERE enabled = TRUE;
CREATE INDEX IF NOT EXISTS idx_workers_status ON workers(status);
CREATE INDEX IF NOT EXISTS idx_worker_runs_worker_id ON worker_runs(worker_id);
CREATE INDEX IF NOT EXISTS idx_worker_runs_started_at ON worker_runs(started_at DESC);
-- Add worker_id to dispensary_crawl_jobs if not exists
DO $$
BEGIN
IF NOT EXISTS (
SELECT 1 FROM information_schema.columns
WHERE table_name = 'dispensary_crawl_jobs' AND column_name = 'assigned_worker_id'
) THEN
ALTER TABLE dispensary_crawl_jobs ADD COLUMN assigned_worker_id INTEGER REFERENCES workers(id);
END IF;
END $$;
-- Migrate existing job_schedules workers to new workers table
INSERT INTO workers (name, role, description, enabled, interval_minutes, jitter_minutes, job_type, job_config, last_run_at, last_status, last_error, last_duration_ms, next_run_at)
SELECT
worker_name,
worker_role,
description,
enabled,
base_interval_minutes,
jitter_minutes,
job_name,
job_config,
last_run_at,
last_status,
last_error_message,
last_duration_ms,
next_run_at
FROM job_schedules
WHERE worker_name IS NOT NULL
ON CONFLICT (name) DO UPDATE SET
updated_at = NOW();
-- Available worker roles (reference)
COMMENT ON TABLE workers IS 'Named workers with specific roles and assignments. Roles include:
- product_sync: Crawls products from dispensary menus
- store_discovery: Discovers new dispensary locations
- entry_point_finder: Detects menu providers and resolves platform IDs
- analytics_refresh: Refreshes materialized views and analytics
- price_monitor: Monitors price changes and triggers alerts
- inventory_sync: Syncs inventory levels
- image_processor: Downloads and processes product images
- data_validator: Validates data integrity';

View File

@@ -0,0 +1,49 @@
-- Migration 052: SEO Settings Table
-- Key/value store for SEO Orchestrator configuration
CREATE TABLE IF NOT EXISTS seo_settings (
id SERIAL PRIMARY KEY,
key TEXT UNIQUE NOT NULL,
value JSONB NOT NULL,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Create index on key for fast lookups
CREATE INDEX IF NOT EXISTS idx_seo_settings_key ON seo_settings(key);
-- Seed with default settings
INSERT INTO seo_settings (key, value) VALUES
-- Section 1: Global Content Generation Settings
('primary_prompt_template', '"You are a cannabis industry content expert. Generate SEO-optimized content for {{page_type}} pages about {{subject}}. Focus on: {{focus_areas}}. Maintain a {{tone}} tone and keep content {{length}}."'),
('regeneration_prompt_template', '"Regenerate the following SEO content with fresh perspectives. Original topic: {{subject}}. Improve upon: {{improvement_areas}}. Maintain compliance with cannabis industry standards."'),
('default_content_length', '"medium"'),
('tone_voice', '"informational"'),
-- Section 2: Automatic Refresh Rules
('auto_refresh_interval', '"weekly"'),
('trigger_pct_product_change', 'true'),
('trigger_pct_brand_change', 'true'),
('trigger_new_stores', 'true'),
('trigger_market_shift', 'false'),
('webhook_url', '""'),
('notify_on_trigger', 'false'),
-- Section 3: Page-Level Defaults
('default_title_template', '"{{state_name}} Dispensaries | Find Cannabis Near You | CannaiQ"'),
('default_meta_description_template', '"Discover the best dispensaries in {{state_name}}. Browse {{dispensary_count}}+ licensed retailers, compare prices, and find cannabis products near you."'),
('default_slug_template', '"dispensaries-{{state_code_lower}}"'),
('default_og_image_template', '"/images/seo/og-{{state_code_lower}}.jpg"'),
('enable_ai_images', 'false'),
-- Section 4: Crawl / Dataset Configuration
('primary_data_provider', '"cannaiq"'),
('fallback_data_provider', '"dutchie"'),
('min_data_freshness_hours', '24'),
('stale_data_behavior', '"allow_with_warning"')
ON CONFLICT (key) DO NOTHING;
-- Record migration
INSERT INTO schema_migrations (version, name, applied_at)
VALUES ('052', 'seo_settings', NOW())
ON CONFLICT (version) DO NOTHING;

View File

@@ -0,0 +1,42 @@
-- Migration 057: Add crawl_enabled and dutchie_verified fields to dispensaries
--
-- Purpose:
-- 1. Add crawl_enabled to control which dispensaries get crawled
-- 2. Add dutchie_verified to track Dutchie source-of-truth verification
-- 3. Default existing records to crawl_enabled = TRUE to preserve behavior
--
-- After this migration, run the harmonization script to:
-- - Match dispensaries to Dutchie discoveries
-- - Update platform_dispensary_id from Dutchie
-- - Set dutchie_verified = TRUE for matches
-- - Set crawl_enabled = FALSE for unverified records
-- Add crawl_enabled column (defaults to true to not break existing crawls)
ALTER TABLE dispensaries
ADD COLUMN IF NOT EXISTS crawl_enabled BOOLEAN DEFAULT TRUE;
-- Add dutchie_verified column to track if record is verified against Dutchie
ALTER TABLE dispensaries
ADD COLUMN IF NOT EXISTS dutchie_verified BOOLEAN DEFAULT FALSE;
-- Add dutchie_verified_at timestamp
ALTER TABLE dispensaries
ADD COLUMN IF NOT EXISTS dutchie_verified_at TIMESTAMP WITH TIME ZONE;
-- Add dutchie_discovery_id to link back to the discovery record
ALTER TABLE dispensaries
ADD COLUMN IF NOT EXISTS dutchie_discovery_id BIGINT REFERENCES dutchie_discovery_locations(id);
-- Create index for crawl queries (only crawl enabled dispensaries)
CREATE INDEX IF NOT EXISTS idx_dispensaries_crawl_enabled
ON dispensaries(crawl_enabled, state)
WHERE crawl_enabled = TRUE;
-- Create index for dutchie verification status
CREATE INDEX IF NOT EXISTS idx_dispensaries_dutchie_verified
ON dispensaries(dutchie_verified, state);
COMMENT ON COLUMN dispensaries.crawl_enabled IS 'Whether this dispensary should be included in crawl jobs. Set to FALSE for unverified or problematic records.';
COMMENT ON COLUMN dispensaries.dutchie_verified IS 'Whether this dispensary has been verified against Dutchie source of truth (matched by slug or manually linked).';
COMMENT ON COLUMN dispensaries.dutchie_verified_at IS 'Timestamp when Dutchie verification was completed.';
COMMENT ON COLUMN dispensaries.dutchie_discovery_id IS 'Link to the dutchie_discovery_locations record this was matched/verified against.';

View File

@@ -0,0 +1,56 @@
-- Migration 065: Slug verification and data source tracking
-- Adds columns to track when slug/menu data was verified and from what source
-- Add slug verification columns to dispensaries
ALTER TABLE dispensaries
ADD COLUMN IF NOT EXISTS slug_source VARCHAR(50),
ADD COLUMN IF NOT EXISTS slug_verified_at TIMESTAMPTZ,
ADD COLUMN IF NOT EXISTS slug_status VARCHAR(20) DEFAULT 'unverified',
ADD COLUMN IF NOT EXISTS menu_url_source VARCHAR(50),
ADD COLUMN IF NOT EXISTS menu_url_verified_at TIMESTAMPTZ,
ADD COLUMN IF NOT EXISTS platform_id_source VARCHAR(50),
ADD COLUMN IF NOT EXISTS platform_id_verified_at TIMESTAMPTZ,
ADD COLUMN IF NOT EXISTS country VARCHAR(2) DEFAULT 'US';
-- Add index for finding unverified stores
CREATE INDEX IF NOT EXISTS idx_dispensaries_slug_status
ON dispensaries(slug_status)
WHERE slug_status != 'verified';
-- Add index for country
CREATE INDEX IF NOT EXISTS idx_dispensaries_country
ON dispensaries(country);
-- Comment on columns
COMMENT ON COLUMN dispensaries.slug_source IS 'Source of slug data: dutchie_api, manual, azdhs, discovery, etc.';
COMMENT ON COLUMN dispensaries.slug_verified_at IS 'When the slug was last verified against the source';
COMMENT ON COLUMN dispensaries.slug_status IS 'Status: unverified, verified, invalid, changed';
COMMENT ON COLUMN dispensaries.menu_url_source IS 'Source of menu_url: dutchie_api, website_scrape, manual, etc.';
COMMENT ON COLUMN dispensaries.menu_url_verified_at IS 'When the menu_url was last verified';
COMMENT ON COLUMN dispensaries.platform_id_source IS 'Source of platform_dispensary_id: dutchie_api, graphql_resolution, etc.';
COMMENT ON COLUMN dispensaries.platform_id_verified_at IS 'When the platform_dispensary_id was last verified';
COMMENT ON COLUMN dispensaries.country IS 'ISO 2-letter country code: US, CA, etc.';
-- Update Green Pharms Mesa with verified Dutchie data
UPDATE dispensaries
SET
slug = 'green-pharms-mesa',
menu_url = 'https://dutchie.com/embedded-menu/green-pharms-mesa',
menu_type = 'dutchie',
platform_dispensary_id = '68dc47a2af90f2e653f8df30',
slug_source = 'dutchie_api',
slug_verified_at = NOW(),
slug_status = 'verified',
menu_url_source = 'dutchie_api',
menu_url_verified_at = NOW(),
platform_id_source = 'dutchie_api',
platform_id_verified_at = NOW(),
updated_at = NOW()
WHERE id = 232;
-- Mark all other AZ dispensaries as needing verification
UPDATE dispensaries
SET slug_status = 'unverified'
WHERE state = 'AZ'
AND id != 232
AND (slug_status IS NULL OR slug_status = 'unverified');

View File

@@ -0,0 +1,140 @@
-- Migration 066: Align dispensaries and discovery_locations tables with Dutchie field names
-- Uses snake_case convention (Postgres standard) mapped from Dutchie's camelCase
--
-- Changes:
-- 1. dispensaries: rename address→address1, zip→zipcode, remove company_name
-- 2. dispensaries: add missing Dutchie fields
-- 3. dutchie_discovery_locations: add missing Dutchie fields
-- ============================================================================
-- DISPENSARIES TABLE
-- ============================================================================
-- Rename address to address1 (matches Dutchie's address1)
ALTER TABLE dispensaries RENAME COLUMN address TO address1;
-- Rename zip to zipcode (matches Dutchie's zip, but we use zipcode for clarity)
ALTER TABLE dispensaries RENAME COLUMN zip TO zipcode;
-- Drop company_name (redundant with name)
ALTER TABLE dispensaries DROP COLUMN IF EXISTS company_name;
-- Add address2
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS address2 VARCHAR(255);
-- Add country
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS country VARCHAR(100) DEFAULT 'United States';
-- Add timezone
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS timezone VARCHAR(50);
-- Add email
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS email VARCHAR(255);
-- Add description
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS description TEXT;
-- Add logo_image (Dutchie: logoImage)
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS logo_image TEXT;
-- Add banner_image (Dutchie: bannerImage)
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS banner_image TEXT;
-- Add offer_pickup (Dutchie: offerPickup)
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS offer_pickup BOOLEAN DEFAULT TRUE;
-- Add offer_delivery (Dutchie: offerDelivery)
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS offer_delivery BOOLEAN DEFAULT FALSE;
-- Add offer_curbside_pickup (Dutchie: offerCurbsidePickup)
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS offer_curbside_pickup BOOLEAN DEFAULT FALSE;
-- Add is_medical (Dutchie: isMedical)
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS is_medical BOOLEAN DEFAULT FALSE;
-- Add is_recreational (Dutchie: isRecreational)
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS is_recreational BOOLEAN DEFAULT FALSE;
-- Add chain_slug (Dutchie: chain)
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS chain_slug VARCHAR(255);
-- Add enterprise_id (Dutchie: retailer.enterpriseId)
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS enterprise_id VARCHAR(100);
-- Add status (Dutchie: status - open/closed)
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS status VARCHAR(50);
-- Add c_name (Dutchie: cName - the URL slug used in embedded menus)
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS c_name VARCHAR(255);
-- ============================================================================
-- DUTCHIE_DISCOVERY_LOCATIONS TABLE
-- ============================================================================
-- Add phone
ALTER TABLE dutchie_discovery_locations ADD COLUMN IF NOT EXISTS phone VARCHAR(50);
-- Add website (Dutchie: embedBackUrl)
ALTER TABLE dutchie_discovery_locations ADD COLUMN IF NOT EXISTS website TEXT;
-- Add email
ALTER TABLE dutchie_discovery_locations ADD COLUMN IF NOT EXISTS email VARCHAR(255);
-- Add description
ALTER TABLE dutchie_discovery_locations ADD COLUMN IF NOT EXISTS description TEXT;
-- Add logo_image
ALTER TABLE dutchie_discovery_locations ADD COLUMN IF NOT EXISTS logo_image TEXT;
-- Add banner_image
ALTER TABLE dutchie_discovery_locations ADD COLUMN IF NOT EXISTS banner_image TEXT;
-- Add chain_slug
ALTER TABLE dutchie_discovery_locations ADD COLUMN IF NOT EXISTS chain_slug VARCHAR(255);
-- Add enterprise_id
ALTER TABLE dutchie_discovery_locations ADD COLUMN IF NOT EXISTS enterprise_id VARCHAR(100);
-- Add c_name
ALTER TABLE dutchie_discovery_locations ADD COLUMN IF NOT EXISTS c_name VARCHAR(255);
-- Add country
ALTER TABLE dutchie_discovery_locations ADD COLUMN IF NOT EXISTS country VARCHAR(100) DEFAULT 'United States';
-- Add store status
ALTER TABLE dutchie_discovery_locations ADD COLUMN IF NOT EXISTS store_status VARCHAR(50);
-- ============================================================================
-- INDEXES
-- ============================================================================
-- Index for chain lookups
CREATE INDEX IF NOT EXISTS idx_dispensaries_chain_slug ON dispensaries(chain_slug) WHERE chain_slug IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_discovery_locations_chain_slug ON dutchie_discovery_locations(chain_slug) WHERE chain_slug IS NOT NULL;
-- Index for enterprise lookups (for multi-location chains)
CREATE INDEX IF NOT EXISTS idx_dispensaries_enterprise_id ON dispensaries(enterprise_id) WHERE enterprise_id IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_discovery_locations_enterprise_id ON dutchie_discovery_locations(enterprise_id) WHERE enterprise_id IS NOT NULL;
-- Index for c_name lookups
CREATE INDEX IF NOT EXISTS idx_dispensaries_c_name ON dispensaries(c_name) WHERE c_name IS NOT NULL;
-- ============================================================================
-- COMMENTS
-- ============================================================================
COMMENT ON COLUMN dispensaries.address1 IS 'Street address line 1 (Dutchie: address1)';
COMMENT ON COLUMN dispensaries.address2 IS 'Street address line 2 (Dutchie: address2)';
COMMENT ON COLUMN dispensaries.zipcode IS 'ZIP/postal code (Dutchie: zip)';
COMMENT ON COLUMN dispensaries.c_name IS 'Dutchie URL slug for embedded menus (Dutchie: cName)';
COMMENT ON COLUMN dispensaries.chain_slug IS 'Chain identifier slug (Dutchie: chain)';
COMMENT ON COLUMN dispensaries.enterprise_id IS 'Parent enterprise UUID (Dutchie: retailer.enterpriseId)';
COMMENT ON COLUMN dispensaries.logo_image IS 'Logo image URL (Dutchie: logoImage)';
COMMENT ON COLUMN dispensaries.banner_image IS 'Banner image URL (Dutchie: bannerImage)';
COMMENT ON COLUMN dispensaries.offer_pickup IS 'Offers in-store pickup (Dutchie: offerPickup)';
COMMENT ON COLUMN dispensaries.offer_delivery IS 'Offers delivery (Dutchie: offerDelivery)';
COMMENT ON COLUMN dispensaries.offer_curbside_pickup IS 'Offers curbside pickup (Dutchie: offerCurbsidePickup)';
COMMENT ON COLUMN dispensaries.is_medical IS 'Licensed for medical sales (Dutchie: isMedical)';
COMMENT ON COLUMN dispensaries.is_recreational IS 'Licensed for recreational sales (Dutchie: isRecreational)';
SELECT 'Migration 066 completed: Dutchie field alignment' as status;

View File

@@ -0,0 +1,24 @@
-- Promotion log table for tracking discovery → dispensary promotions
-- Tracks validation and promotion actions for audit/review
CREATE TABLE IF NOT EXISTS dutchie_promotion_log (
id SERIAL PRIMARY KEY,
discovery_id INTEGER REFERENCES dutchie_discovery_locations(id) ON DELETE SET NULL,
dispensary_id INTEGER REFERENCES dispensaries(id) ON DELETE SET NULL,
action VARCHAR(50) NOT NULL, -- 'validated', 'rejected', 'promoted_create', 'promoted_update', 'skipped'
state_code VARCHAR(10),
store_name VARCHAR(255),
validation_errors TEXT[], -- Array of error messages if rejected
field_changes JSONB, -- Before/after snapshot of changed fields
triggered_by VARCHAR(100) DEFAULT 'auto', -- 'auto', 'manual', 'api'
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
-- Indexes for efficient querying
CREATE INDEX IF NOT EXISTS idx_promotion_log_discovery_id ON dutchie_promotion_log(discovery_id);
CREATE INDEX IF NOT EXISTS idx_promotion_log_dispensary_id ON dutchie_promotion_log(dispensary_id);
CREATE INDEX IF NOT EXISTS idx_promotion_log_action ON dutchie_promotion_log(action);
CREATE INDEX IF NOT EXISTS idx_promotion_log_state_code ON dutchie_promotion_log(state_code);
CREATE INDEX IF NOT EXISTS idx_promotion_log_created_at ON dutchie_promotion_log(created_at DESC);
COMMENT ON TABLE dutchie_promotion_log IS 'Audit log for discovery location validation and promotion to dispensaries';

View File

@@ -0,0 +1,95 @@
-- Migration 068: Crawler Status Alerts
-- Creates status_alerts table for dashboard notifications and status change logging
-- ============================================================
-- STATUS ALERTS TABLE
-- ============================================================
CREATE TABLE IF NOT EXISTS crawler_status_alerts (
id SERIAL PRIMARY KEY,
-- References
dispensary_id INTEGER REFERENCES dispensaries(id),
profile_id INTEGER REFERENCES dispensary_crawler_profiles(id),
-- Alert info
alert_type VARCHAR(50) NOT NULL, -- 'status_change', 'crawl_error', 'validation_failed', 'promoted', 'demoted'
severity VARCHAR(20) DEFAULT 'info', -- 'info', 'warning', 'error', 'critical'
-- Status transition
previous_status VARCHAR(50),
new_status VARCHAR(50),
-- Context
message TEXT,
error_details JSONB,
metadata JSONB, -- Additional context (product counts, error codes, etc.)
-- Tracking
acknowledged BOOLEAN DEFAULT FALSE,
acknowledged_at TIMESTAMP WITH TIME ZONE,
acknowledged_by VARCHAR(100),
-- Timestamps
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
-- Indexes for common queries
CREATE INDEX IF NOT EXISTS idx_crawler_status_alerts_dispensary ON crawler_status_alerts(dispensary_id);
CREATE INDEX IF NOT EXISTS idx_crawler_status_alerts_type ON crawler_status_alerts(alert_type);
CREATE INDEX IF NOT EXISTS idx_crawler_status_alerts_severity ON crawler_status_alerts(severity);
CREATE INDEX IF NOT EXISTS idx_crawler_status_alerts_unack ON crawler_status_alerts(acknowledged) WHERE acknowledged = FALSE;
CREATE INDEX IF NOT EXISTS idx_crawler_status_alerts_created ON crawler_status_alerts(created_at DESC);
-- ============================================================
-- STATUS DEFINITIONS (for reference/validation)
-- ============================================================
COMMENT ON TABLE crawler_status_alerts IS 'Crawler status change notifications for dashboard alerting';
COMMENT ON COLUMN crawler_status_alerts.alert_type IS 'Type: status_change, crawl_error, validation_failed, promoted, demoted';
COMMENT ON COLUMN crawler_status_alerts.severity IS 'Severity: info, warning, error, critical';
COMMENT ON COLUMN crawler_status_alerts.previous_status IS 'Previous crawler status before change';
COMMENT ON COLUMN crawler_status_alerts.new_status IS 'New crawler status after change';
-- ============================================================
-- STATUS TRACKING ON PROFILES
-- ============================================================
-- Add columns for status tracking if not exists
DO $$
BEGIN
-- Consecutive success count for auto-promotion
IF NOT EXISTS (SELECT 1 FROM information_schema.columns
WHERE table_name = 'dispensary_crawler_profiles' AND column_name = 'consecutive_successes') THEN
ALTER TABLE dispensary_crawler_profiles ADD COLUMN consecutive_successes INTEGER DEFAULT 0;
END IF;
-- Consecutive failure count for auto-demotion
IF NOT EXISTS (SELECT 1 FROM information_schema.columns
WHERE table_name = 'dispensary_crawler_profiles' AND column_name = 'consecutive_failures') THEN
ALTER TABLE dispensary_crawler_profiles ADD COLUMN consecutive_failures INTEGER DEFAULT 0;
END IF;
-- Last status change timestamp
IF NOT EXISTS (SELECT 1 FROM information_schema.columns
WHERE table_name = 'dispensary_crawler_profiles' AND column_name = 'status_changed_at') THEN
ALTER TABLE dispensary_crawler_profiles ADD COLUMN status_changed_at TIMESTAMP WITH TIME ZONE;
END IF;
-- Status change reason
IF NOT EXISTS (SELECT 1 FROM information_schema.columns
WHERE table_name = 'dispensary_crawler_profiles' AND column_name = 'status_reason') THEN
ALTER TABLE dispensary_crawler_profiles ADD COLUMN status_reason TEXT;
END IF;
END $$;
-- ============================================================
-- VALID STATUS VALUES
-- ============================================================
-- Status values for dispensary_crawler_profiles.status:
-- 'sandbox' - Newly created, being validated
-- 'production' - Healthy, actively crawled
-- 'needs_manual' - Requires human intervention
-- 'failing' - Multiple consecutive failures
-- 'disabled' - Manually disabled
-- 'legacy' - No profile, uses default method (virtual status)

View File

@@ -0,0 +1,163 @@
-- Migration 069: Seven-Stage Status System
--
-- Implements explicit 7-stage pipeline for store lifecycle:
-- 1. discovered - Found via Dutchie API, raw data
-- 2. validated - Passed field checks, ready for promotion
-- 3. promoted - In dispensaries table, has crawler profile
-- 4. sandbox - First crawl attempted, testing
-- 5. hydrating - Products are being loaded/updated
-- 6. production - Healthy, scheduled crawls via Horizon
-- 7. failing - Crawl errors, needs attention
-- ============================================================
-- STAGE ENUM TYPE
-- ============================================================
DO $$
BEGIN
-- Create enum if not exists
IF NOT EXISTS (SELECT 1 FROM pg_type WHERE typname = 'store_stage') THEN
CREATE TYPE store_stage AS ENUM (
'discovered',
'validated',
'promoted',
'sandbox',
'hydrating',
'production',
'failing'
);
END IF;
END $$;
-- ============================================================
-- UPDATE DISCOVERY LOCATIONS TABLE
-- ============================================================
-- Add stage column to discovery locations (replaces status)
DO $$
BEGIN
IF NOT EXISTS (SELECT 1 FROM information_schema.columns
WHERE table_name = 'dutchie_discovery_locations' AND column_name = 'stage') THEN
ALTER TABLE dutchie_discovery_locations ADD COLUMN stage VARCHAR(20) DEFAULT 'discovered';
END IF;
END $$;
-- Migrate existing status values to stage
UPDATE dutchie_discovery_locations
SET stage = CASE
WHEN status = 'discovered' THEN 'discovered'
WHEN status = 'verified' THEN 'validated'
WHEN status = 'rejected' THEN 'failing'
WHEN status = 'merged' THEN 'validated'
ELSE 'discovered'
END
WHERE stage IS NULL OR stage = '';
-- ============================================================
-- UPDATE CRAWLER PROFILES TABLE
-- ============================================================
-- Ensure status column exists and update to new values
UPDATE dispensary_crawler_profiles
SET status = CASE
WHEN status = 'sandbox' THEN 'sandbox'
WHEN status = 'production' THEN 'production'
WHEN status = 'needs_manual' THEN 'failing'
WHEN status = 'failing' THEN 'failing'
WHEN status = 'disabled' THEN 'failing'
WHEN status IS NULL THEN 'promoted'
ELSE 'promoted'
END;
-- ============================================================
-- ADD STAGE TRACKING TO DISPENSARIES
-- ============================================================
DO $$
BEGIN
-- Add stage column to dispensaries for quick filtering
IF NOT EXISTS (SELECT 1 FROM information_schema.columns
WHERE table_name = 'dispensaries' AND column_name = 'stage') THEN
ALTER TABLE dispensaries ADD COLUMN stage VARCHAR(20) DEFAULT 'promoted';
END IF;
-- Add stage_changed_at for tracking
IF NOT EXISTS (SELECT 1 FROM information_schema.columns
WHERE table_name = 'dispensaries' AND column_name = 'stage_changed_at') THEN
ALTER TABLE dispensaries ADD COLUMN stage_changed_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP;
END IF;
-- Add first_crawl_at to track sandbox → production transition
IF NOT EXISTS (SELECT 1 FROM information_schema.columns
WHERE table_name = 'dispensaries' AND column_name = 'first_crawl_at') THEN
ALTER TABLE dispensaries ADD COLUMN first_crawl_at TIMESTAMP WITH TIME ZONE;
END IF;
-- Add last_successful_crawl_at
IF NOT EXISTS (SELECT 1 FROM information_schema.columns
WHERE table_name = 'dispensaries' AND column_name = 'last_successful_crawl_at') THEN
ALTER TABLE dispensaries ADD COLUMN last_successful_crawl_at TIMESTAMP WITH TIME ZONE;
END IF;
END $$;
-- Set initial stage for existing dispensaries based on their crawler profile status
UPDATE dispensaries d
SET stage = COALESCE(
(SELECT dcp.status FROM dispensary_crawler_profiles dcp
WHERE dcp.dispensary_id = d.id AND dcp.enabled = true
ORDER BY dcp.updated_at DESC LIMIT 1),
'promoted'
)
WHERE d.stage IS NULL OR d.stage = '';
-- ============================================================
-- INDEXES FOR STAGE-BASED QUERIES
-- ============================================================
CREATE INDEX IF NOT EXISTS idx_dispensaries_stage ON dispensaries(stage);
CREATE INDEX IF NOT EXISTS idx_dispensaries_stage_state ON dispensaries(stage, state);
CREATE INDEX IF NOT EXISTS idx_discovery_locations_stage ON dutchie_discovery_locations(stage);
CREATE INDEX IF NOT EXISTS idx_crawler_profiles_status ON dispensary_crawler_profiles(status);
-- ============================================================
-- STAGE TRANSITION LOG
-- ============================================================
CREATE TABLE IF NOT EXISTS stage_transitions (
id SERIAL PRIMARY KEY,
-- What changed
entity_type VARCHAR(20) NOT NULL, -- 'discovery_location' or 'dispensary'
entity_id INTEGER NOT NULL,
-- Stage change
from_stage VARCHAR(20),
to_stage VARCHAR(20) NOT NULL,
-- Context
trigger_type VARCHAR(50) NOT NULL, -- 'api', 'scheduler', 'manual', 'auto'
trigger_endpoint VARCHAR(200),
-- Outcome
success BOOLEAN DEFAULT TRUE,
error_message TEXT,
metadata JSONB,
-- Timing
duration_ms INTEGER,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_stage_transitions_entity ON stage_transitions(entity_type, entity_id);
CREATE INDEX IF NOT EXISTS idx_stage_transitions_to_stage ON stage_transitions(to_stage);
CREATE INDEX IF NOT EXISTS idx_stage_transitions_created ON stage_transitions(created_at DESC);
-- ============================================================
-- COMMENTS
-- ============================================================
COMMENT ON TABLE stage_transitions IS 'Audit log for all stage transitions in the pipeline';
COMMENT ON COLUMN dispensaries.stage IS 'Current pipeline stage: discovered, validated, promoted, sandbox, production, failing';
COMMENT ON COLUMN dispensaries.stage_changed_at IS 'When the stage was last changed';
COMMENT ON COLUMN dispensaries.first_crawl_at IS 'When the first crawl was attempted (sandbox stage)';
COMMENT ON COLUMN dispensaries.last_successful_crawl_at IS 'When the last successful crawl completed';

View File

@@ -0,0 +1,239 @@
-- ============================================================================
-- Migration 070: Product Variants Tables
-- ============================================================================
--
-- Purpose: Store variant-level pricing and inventory as first-class entities
-- to enable time-series analytics, price comparisons, and sale tracking.
--
-- Enables queries like:
-- - Price history for a specific variant (1g Blue Dream over time)
-- - Sale frequency analysis (how often is this on special?)
-- - Cross-store price comparison (who has cheapest 1g flower?)
-- - Current specials across all stores
--
-- RULES:
-- - STRICTLY ADDITIVE (no DROP, DELETE, TRUNCATE)
-- - All new tables use IF NOT EXISTS
-- - All indexes use IF NOT EXISTS
--
-- ============================================================================
-- ============================================================================
-- SECTION 1: PRODUCT_VARIANTS TABLE (Current State)
-- ============================================================================
-- One row per product+option combination. Tracks current pricing/inventory.
CREATE TABLE IF NOT EXISTS product_variants (
id SERIAL PRIMARY KEY,
store_product_id INTEGER NOT NULL REFERENCES store_products(id) ON DELETE CASCADE,
dispensary_id INTEGER NOT NULL REFERENCES dispensaries(id) ON DELETE CASCADE,
-- Variant identity (from Dutchie POSMetaData.children)
option VARCHAR(100) NOT NULL, -- "1g", "3.5g", "1/8oz", "100mg"
canonical_sku VARCHAR(100), -- Dutchie canonicalSKU
canonical_id VARCHAR(100), -- Dutchie canonicalID
canonical_name VARCHAR(500), -- Dutchie canonicalName
-- Current pricing (in dollars, not cents)
price_rec NUMERIC(10,2),
price_med NUMERIC(10,2),
price_rec_special NUMERIC(10,2),
price_med_special NUMERIC(10,2),
-- Current inventory
quantity INTEGER,
quantity_available INTEGER,
in_stock BOOLEAN DEFAULT TRUE,
-- Special/sale status
is_on_special BOOLEAN DEFAULT FALSE,
-- Weight/size parsing (for analytics)
weight_value NUMERIC(10,2), -- 1, 3.5, 28, etc.
weight_unit VARCHAR(20), -- g, oz, mg, ml, etc.
-- Timestamps
first_seen_at TIMESTAMPTZ DEFAULT NOW(),
last_seen_at TIMESTAMPTZ DEFAULT NOW(),
last_price_change_at TIMESTAMPTZ,
last_stock_change_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(store_product_id, option)
);
-- Indexes for common queries
CREATE INDEX IF NOT EXISTS idx_variants_store_product ON product_variants(store_product_id);
CREATE INDEX IF NOT EXISTS idx_variants_dispensary ON product_variants(dispensary_id);
CREATE INDEX IF NOT EXISTS idx_variants_option ON product_variants(option);
CREATE INDEX IF NOT EXISTS idx_variants_in_stock ON product_variants(dispensary_id, in_stock) WHERE in_stock = TRUE;
CREATE INDEX IF NOT EXISTS idx_variants_on_special ON product_variants(dispensary_id, is_on_special) WHERE is_on_special = TRUE;
CREATE INDEX IF NOT EXISTS idx_variants_canonical_sku ON product_variants(canonical_sku) WHERE canonical_sku IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_variants_price_rec ON product_variants(price_rec) WHERE price_rec IS NOT NULL;
COMMENT ON TABLE product_variants IS 'Current state of each product variant (weight/size option). One row per product+option.';
COMMENT ON COLUMN product_variants.option IS 'Weight/size option string from Dutchie (e.g., "1g", "3.5g", "1/8oz")';
COMMENT ON COLUMN product_variants.canonical_sku IS 'Dutchie POS SKU for cross-store matching';
-- ============================================================================
-- SECTION 2: PRODUCT_VARIANT_SNAPSHOTS TABLE (Historical Data)
-- ============================================================================
-- Time-series data for variant pricing. One row per variant per crawl.
-- CRITICAL: NEVER DELETE from this table.
CREATE TABLE IF NOT EXISTS product_variant_snapshots (
id SERIAL PRIMARY KEY,
product_variant_id INTEGER NOT NULL REFERENCES product_variants(id) ON DELETE CASCADE,
store_product_id INTEGER REFERENCES store_products(id) ON DELETE SET NULL,
dispensary_id INTEGER NOT NULL REFERENCES dispensaries(id) ON DELETE CASCADE,
crawl_run_id INTEGER REFERENCES crawl_runs(id) ON DELETE SET NULL,
-- Variant identity (denormalized for query performance)
option VARCHAR(100) NOT NULL,
-- Pricing at time of capture
price_rec NUMERIC(10,2),
price_med NUMERIC(10,2),
price_rec_special NUMERIC(10,2),
price_med_special NUMERIC(10,2),
-- Inventory at time of capture
quantity INTEGER,
in_stock BOOLEAN DEFAULT TRUE,
-- Special status at time of capture
is_on_special BOOLEAN DEFAULT FALSE,
-- Feed presence (FALSE = variant missing from crawl)
is_present_in_feed BOOLEAN DEFAULT TRUE,
-- Capture timestamp
captured_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Indexes for time-series queries
CREATE INDEX IF NOT EXISTS idx_variant_snapshots_variant ON product_variant_snapshots(product_variant_id, captured_at DESC);
CREATE INDEX IF NOT EXISTS idx_variant_snapshots_dispensary ON product_variant_snapshots(dispensary_id, captured_at DESC);
CREATE INDEX IF NOT EXISTS idx_variant_snapshots_crawl ON product_variant_snapshots(crawl_run_id) WHERE crawl_run_id IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_variant_snapshots_captured ON product_variant_snapshots(captured_at DESC);
CREATE INDEX IF NOT EXISTS idx_variant_snapshots_special ON product_variant_snapshots(is_on_special, captured_at DESC) WHERE is_on_special = TRUE;
CREATE INDEX IF NOT EXISTS idx_variant_snapshots_option ON product_variant_snapshots(option, captured_at DESC);
COMMENT ON TABLE product_variant_snapshots IS 'Historical variant pricing/inventory. One row per variant per crawl. NEVER DELETE.';
-- ============================================================================
-- SECTION 3: USEFUL VIEWS
-- ============================================================================
-- View: Current specials across all stores
CREATE OR REPLACE VIEW v_current_specials AS
SELECT
pv.id as variant_id,
sp.id as product_id,
sp.name_raw as product_name,
sp.brand_name_raw as brand_name,
sp.category_raw as category,
d.id as dispensary_id,
d.name as dispensary_name,
d.city,
d.state,
pv.option,
pv.price_rec,
pv.price_rec_special,
ROUND(((pv.price_rec - pv.price_rec_special) / NULLIF(pv.price_rec, 0)) * 100, 1) as discount_percent,
pv.quantity,
pv.in_stock,
pv.last_seen_at
FROM product_variants pv
JOIN store_products sp ON sp.id = pv.store_product_id
JOIN dispensaries d ON d.id = pv.dispensary_id
WHERE pv.is_on_special = TRUE
AND pv.in_stock = TRUE
AND pv.price_rec_special IS NOT NULL
AND pv.price_rec_special < pv.price_rec;
COMMENT ON VIEW v_current_specials IS 'All products currently on special across all stores';
-- View: Price comparison for a product across stores
CREATE OR REPLACE VIEW v_price_comparison AS
SELECT
sp.name_raw as product_name,
sp.brand_name_raw as brand_name,
sp.category_raw as category,
pv.option,
d.id as dispensary_id,
d.name as dispensary_name,
d.city,
pv.price_rec,
pv.price_rec_special,
pv.is_on_special,
pv.in_stock,
pv.quantity,
RANK() OVER (PARTITION BY sp.name_raw, pv.option ORDER BY COALESCE(pv.price_rec_special, pv.price_rec) ASC) as price_rank
FROM product_variants pv
JOIN store_products sp ON sp.id = pv.store_product_id
JOIN dispensaries d ON d.id = pv.dispensary_id
WHERE pv.in_stock = TRUE
AND (pv.price_rec IS NOT NULL OR pv.price_rec_special IS NOT NULL);
COMMENT ON VIEW v_price_comparison IS 'Compare prices for same product across stores, ranked by price';
-- View: Latest snapshot per variant
CREATE OR REPLACE VIEW v_latest_variant_snapshots AS
SELECT DISTINCT ON (product_variant_id)
pvs.*
FROM product_variant_snapshots pvs
ORDER BY product_variant_id, captured_at DESC;
-- ============================================================================
-- SECTION 4: HELPER FUNCTION FOR SALE FREQUENCY
-- ============================================================================
-- Function to calculate sale frequency for a variant
CREATE OR REPLACE FUNCTION get_variant_sale_stats(p_variant_id INTEGER, p_days INTEGER DEFAULT 30)
RETURNS TABLE (
total_snapshots BIGINT,
times_on_special BIGINT,
special_frequency_pct NUMERIC,
avg_discount_pct NUMERIC,
min_price NUMERIC,
max_price NUMERIC,
avg_price NUMERIC
) AS $$
BEGIN
RETURN QUERY
SELECT
COUNT(*)::BIGINT as total_snapshots,
COUNT(*) FILTER (WHERE is_on_special)::BIGINT as times_on_special,
ROUND((COUNT(*) FILTER (WHERE is_on_special)::NUMERIC / NULLIF(COUNT(*), 0)) * 100, 1) as special_frequency_pct,
ROUND(AVG(
CASE WHEN is_on_special AND price_rec_special IS NOT NULL AND price_rec IS NOT NULL
THEN ((price_rec - price_rec_special) / NULLIF(price_rec, 0)) * 100
END
), 1) as avg_discount_pct,
MIN(COALESCE(price_rec_special, price_rec)) as min_price,
MAX(price_rec) as max_price,
ROUND(AVG(COALESCE(price_rec_special, price_rec)), 2) as avg_price
FROM product_variant_snapshots
WHERE product_variant_id = p_variant_id
AND captured_at >= NOW() - (p_days || ' days')::INTERVAL;
END;
$$ LANGUAGE plpgsql;
COMMENT ON FUNCTION get_variant_sale_stats IS 'Get sale frequency and price stats for a variant over N days';
-- ============================================================================
-- DONE
-- ============================================================================
SELECT 'Migration 070 completed. Product variants tables ready for time-series analytics.' AS status;

View File

@@ -0,0 +1,53 @@
-- Migration 071: Harmonize store_products with dutchie_products
-- Adds missing columns to store_products to consolidate on a single canonical table
-- Product details
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS description TEXT;
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS weight VARCHAR(50);
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS weights JSONB;
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS measurements JSONB;
-- Cannabinoid/terpene data
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS effects JSONB;
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS terpenes JSONB;
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS cannabinoids_v2 JSONB;
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS thc_content NUMERIC(10,4);
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS cbd_content NUMERIC(10,4);
-- Images
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS images JSONB;
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS primary_image_url TEXT;
-- Inventory
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS total_quantity_available INTEGER DEFAULT 0;
-- Status/flags
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS status VARCHAR(50);
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS featured BOOLEAN DEFAULT FALSE;
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS coming_soon BOOLEAN DEFAULT FALSE;
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS visibility_lost BOOLEAN DEFAULT FALSE;
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS visibility_lost_at TIMESTAMP WITH TIME ZONE;
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS visibility_restored_at TIMESTAMP WITH TIME ZONE;
-- Threshold flags (Dutchie-specific)
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS is_below_threshold BOOLEAN DEFAULT FALSE;
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS is_below_kiosk_threshold BOOLEAN DEFAULT FALSE;
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS options_below_threshold BOOLEAN DEFAULT FALSE;
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS options_below_kiosk_threshold BOOLEAN DEFAULT FALSE;
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS certificate_of_analysis_enabled BOOLEAN DEFAULT FALSE;
-- Platform metadata
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS external_product_id VARCHAR(100);
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS c_name VARCHAR(500);
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS past_c_names TEXT[];
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS latest_raw_payload JSONB;
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS created_at_platform TIMESTAMP WITH TIME ZONE;
ALTER TABLE store_products ADD COLUMN IF NOT EXISTS updated_at_platform TIMESTAMP WITH TIME ZONE;
-- Indexes for common queries
CREATE INDEX IF NOT EXISTS idx_store_products_external_id ON store_products(external_product_id);
CREATE INDEX IF NOT EXISTS idx_store_products_visibility_lost ON store_products(visibility_lost) WHERE visibility_lost = TRUE;
CREATE INDEX IF NOT EXISTS idx_store_products_status ON store_products(status);
-- Add comment
COMMENT ON TABLE store_products IS 'Canonical product table - consolidated from dutchie_products';

View File

@@ -0,0 +1,74 @@
-- Migration 072: Create compatibility views for store_products and store_product_snapshots
-- These views provide backward-compatible column names for API routes
-- v_products view - aliases store_products columns to match legacy dutchie_products naming
CREATE OR REPLACE VIEW v_products AS
SELECT
id,
dispensary_id,
provider_product_id as external_product_id,
provider_product_id as dutchie_id,
name_raw as name,
brand_name_raw as brand_name,
category_raw as type,
subcategory_raw as subcategory,
strain_type,
thc_percent as thc,
cbd_percent as cbd,
stock_status,
is_in_stock,
stock_quantity,
image_url,
primary_image_url,
images,
effects,
description,
is_on_special,
featured,
medical_only,
rec_only,
external_product_id as external_id,
provider,
created_at,
updated_at
FROM store_products;
-- v_product_snapshots view - aliases store_product_snapshots columns to match legacy naming
CREATE OR REPLACE VIEW v_product_snapshots AS
SELECT
id,
store_product_id,
dispensary_id,
provider,
provider_product_id,
crawl_run_id,
captured_at as crawled_at,
name_raw,
brand_name_raw,
category_raw,
subcategory_raw,
-- Convert price_rec (dollars) to rec_min_price_cents (cents)
CASE WHEN price_rec IS NOT NULL THEN (price_rec * 100)::integer END as rec_min_price_cents,
CASE WHEN price_rec IS NOT NULL THEN (price_rec * 100)::integer END as rec_max_price_cents,
CASE WHEN price_rec_special IS NOT NULL THEN (price_rec_special * 100)::integer END as rec_min_special_price_cents,
CASE WHEN price_med IS NOT NULL THEN (price_med * 100)::integer END as med_min_price_cents,
CASE WHEN price_med IS NOT NULL THEN (price_med * 100)::integer END as med_max_price_cents,
CASE WHEN price_med_special IS NOT NULL THEN (price_med_special * 100)::integer END as med_min_special_price_cents,
is_on_special as special,
discount_percent,
is_in_stock,
stock_quantity,
stock_status,
stock_quantity as total_quantity_available,
thc_percent,
cbd_percent,
image_url,
raw_data as options,
created_at
FROM store_product_snapshots;
-- Add indexes for the views' underlying tables
CREATE INDEX IF NOT EXISTS idx_store_products_dispensary ON store_products(dispensary_id);
CREATE INDEX IF NOT EXISTS idx_store_products_stock ON store_products(stock_status);
CREATE INDEX IF NOT EXISTS idx_store_snapshots_product ON store_product_snapshots(store_product_id);
CREATE INDEX IF NOT EXISTS idx_store_snapshots_captured ON store_product_snapshots(captured_at DESC);

View File

@@ -0,0 +1,12 @@
-- Add timezone column to proxies table for geo-consistent fingerprinting
-- This allows matching Accept-Language and other headers to proxy location
ALTER TABLE proxies
ADD COLUMN IF NOT EXISTS timezone VARCHAR(50);
-- Add timezone to failed_proxies as well
ALTER TABLE failed_proxies
ADD COLUMN IF NOT EXISTS timezone VARCHAR(50);
-- Comment explaining usage
COMMENT ON COLUMN proxies.timezone IS 'IANA timezone (e.g., America/Phoenix) for geo-consistent fingerprinting';

View File

@@ -0,0 +1,27 @@
-- Migration: Worker Commands Table
-- Purpose: Store commands for workers (decommission, etc.)
-- Workers poll this table after each task to check for commands
CREATE TABLE IF NOT EXISTS worker_commands (
id SERIAL PRIMARY KEY,
worker_id TEXT NOT NULL,
command TEXT NOT NULL, -- 'decommission', 'pause', 'resume'
reason TEXT,
issued_by TEXT,
issued_at TIMESTAMPTZ DEFAULT NOW(),
acknowledged_at TIMESTAMPTZ,
executed_at TIMESTAMPTZ,
status TEXT DEFAULT 'pending' -- 'pending', 'acknowledged', 'executed', 'cancelled'
);
-- Index for worker lookups
CREATE INDEX IF NOT EXISTS idx_worker_commands_worker_id ON worker_commands(worker_id);
CREATE INDEX IF NOT EXISTS idx_worker_commands_pending ON worker_commands(worker_id, status) WHERE status = 'pending';
-- Add decommission_requested column to worker_registry for quick checks
ALTER TABLE worker_registry ADD COLUMN IF NOT EXISTS decommission_requested BOOLEAN DEFAULT FALSE;
ALTER TABLE worker_registry ADD COLUMN IF NOT EXISTS decommission_reason TEXT;
ALTER TABLE worker_registry ADD COLUMN IF NOT EXISTS decommission_requested_at TIMESTAMPTZ;
-- Comment
COMMENT ON TABLE worker_commands IS 'Commands issued to workers (decommission after task, pause, etc.)';

View File

@@ -0,0 +1,322 @@
-- Migration 074: Worker Task Queue System
-- Implements role-based task queue with per-store locking and capacity tracking
-- Task queue table
CREATE TABLE IF NOT EXISTS worker_tasks (
id SERIAL PRIMARY KEY,
-- Task identification
role VARCHAR(50) NOT NULL, -- store_discovery, entry_point_discovery, product_discovery, product_resync, analytics_refresh
dispensary_id INTEGER REFERENCES dispensaries(id) ON DELETE CASCADE,
platform VARCHAR(20), -- dutchie, jane, treez, etc.
-- Task state
status VARCHAR(20) NOT NULL DEFAULT 'pending',
priority INTEGER DEFAULT 0, -- Higher = more urgent
-- Scheduling
scheduled_for TIMESTAMPTZ, -- For batch scheduling (e.g., every 4 hours)
-- Ownership
worker_id VARCHAR(100), -- Pod name or worker ID
claimed_at TIMESTAMPTZ,
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
last_heartbeat_at TIMESTAMPTZ,
-- Results
result JSONB, -- Task output data
error_message TEXT,
retry_count INTEGER DEFAULT 0,
max_retries INTEGER DEFAULT 3,
-- Metadata
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
-- Constraints
CONSTRAINT valid_status CHECK (status IN ('pending', 'claimed', 'running', 'completed', 'failed', 'stale'))
);
-- Indexes for efficient task claiming
CREATE INDEX IF NOT EXISTS idx_worker_tasks_pending
ON worker_tasks(role, priority DESC, created_at ASC)
WHERE status = 'pending';
CREATE INDEX IF NOT EXISTS idx_worker_tasks_claimed
ON worker_tasks(worker_id, claimed_at)
WHERE status = 'claimed';
CREATE INDEX IF NOT EXISTS idx_worker_tasks_running
ON worker_tasks(worker_id, last_heartbeat_at)
WHERE status = 'running';
CREATE INDEX IF NOT EXISTS idx_worker_tasks_dispensary
ON worker_tasks(dispensary_id)
WHERE dispensary_id IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_worker_tasks_scheduled
ON worker_tasks(scheduled_for)
WHERE status = 'pending' AND scheduled_for IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_worker_tasks_history
ON worker_tasks(role, completed_at DESC)
WHERE status IN ('completed', 'failed');
-- Partial unique index to prevent duplicate active tasks per store
-- Only one task can be claimed/running for a given dispensary at a time
CREATE UNIQUE INDEX IF NOT EXISTS idx_worker_tasks_unique_active_store
ON worker_tasks(dispensary_id)
WHERE status IN ('claimed', 'running') AND dispensary_id IS NOT NULL;
-- Worker registration table (tracks active workers)
CREATE TABLE IF NOT EXISTS worker_registry (
id SERIAL PRIMARY KEY,
worker_id VARCHAR(100) UNIQUE NOT NULL,
role VARCHAR(50) NOT NULL,
pod_name VARCHAR(100),
hostname VARCHAR(100),
started_at TIMESTAMPTZ DEFAULT NOW(),
last_heartbeat_at TIMESTAMPTZ DEFAULT NOW(),
tasks_completed INTEGER DEFAULT 0,
tasks_failed INTEGER DEFAULT 0,
status VARCHAR(20) DEFAULT 'active',
CONSTRAINT valid_worker_status CHECK (status IN ('active', 'idle', 'offline'))
);
CREATE INDEX IF NOT EXISTS idx_worker_registry_role
ON worker_registry(role, status);
CREATE INDEX IF NOT EXISTS idx_worker_registry_heartbeat
ON worker_registry(last_heartbeat_at)
WHERE status = 'active';
-- Task completion tracking (summarized history)
CREATE TABLE IF NOT EXISTS task_completion_log (
id SERIAL PRIMARY KEY,
role VARCHAR(50) NOT NULL,
date DATE NOT NULL DEFAULT CURRENT_DATE,
hour INTEGER NOT NULL DEFAULT EXTRACT(HOUR FROM NOW()),
tasks_created INTEGER DEFAULT 0,
tasks_completed INTEGER DEFAULT 0,
tasks_failed INTEGER DEFAULT 0,
avg_duration_sec NUMERIC(10,2),
min_duration_sec NUMERIC(10,2),
max_duration_sec NUMERIC(10,2),
updated_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(role, date, hour)
);
-- Capacity planning view
CREATE OR REPLACE VIEW v_worker_capacity AS
SELECT
role,
COUNT(*) FILTER (WHERE status = 'pending') as pending_tasks,
COUNT(*) FILTER (WHERE status = 'pending' AND (scheduled_for IS NULL OR scheduled_for <= NOW())) as ready_tasks,
COUNT(*) FILTER (WHERE status = 'claimed') as claimed_tasks,
COUNT(*) FILTER (WHERE status = 'running') as running_tasks,
COUNT(*) FILTER (WHERE status = 'completed' AND completed_at > NOW() - INTERVAL '1 hour') as completed_last_hour,
COUNT(*) FILTER (WHERE status = 'failed' AND completed_at > NOW() - INTERVAL '1 hour') as failed_last_hour,
COUNT(DISTINCT worker_id) FILTER (WHERE status IN ('claimed', 'running')) as active_workers,
AVG(EXTRACT(EPOCH FROM (completed_at - started_at)))
FILTER (WHERE status = 'completed' AND completed_at > NOW() - INTERVAL '1 hour') as avg_duration_sec,
-- Capacity planning metrics
CASE
WHEN COUNT(*) FILTER (WHERE status = 'completed' AND completed_at > NOW() - INTERVAL '1 hour') > 0
THEN 3600.0 / NULLIF(AVG(EXTRACT(EPOCH FROM (completed_at - started_at)))
FILTER (WHERE status = 'completed' AND completed_at > NOW() - INTERVAL '1 hour'), 0)
ELSE NULL
END as tasks_per_worker_hour,
-- Estimated time to drain queue
CASE
WHEN COUNT(DISTINCT worker_id) FILTER (WHERE status IN ('claimed', 'running')) > 0
AND COUNT(*) FILTER (WHERE status = 'completed' AND completed_at > NOW() - INTERVAL '1 hour') > 0
THEN COUNT(*) FILTER (WHERE status = 'pending') / NULLIF(
COUNT(DISTINCT worker_id) FILTER (WHERE status IN ('claimed', 'running')) *
(3600.0 / NULLIF(AVG(EXTRACT(EPOCH FROM (completed_at - started_at)))
FILTER (WHERE status = 'completed' AND completed_at > NOW() - INTERVAL '1 hour'), 0)),
0
)
ELSE NULL
END as estimated_hours_to_drain
FROM worker_tasks
GROUP BY role;
-- Task history view (for UI)
CREATE OR REPLACE VIEW v_task_history AS
SELECT
t.id,
t.role,
t.dispensary_id,
d.name as dispensary_name,
t.platform,
t.status,
t.priority,
t.worker_id,
t.scheduled_for,
t.claimed_at,
t.started_at,
t.completed_at,
t.error_message,
t.retry_count,
t.created_at,
EXTRACT(EPOCH FROM (t.completed_at - t.started_at)) as duration_sec
FROM worker_tasks t
LEFT JOIN dispensaries d ON d.id = t.dispensary_id
ORDER BY t.created_at DESC;
-- Function to claim a task atomically
CREATE OR REPLACE FUNCTION claim_task(
p_role VARCHAR(50),
p_worker_id VARCHAR(100)
) RETURNS worker_tasks AS $$
DECLARE
claimed_task worker_tasks;
BEGIN
UPDATE worker_tasks
SET
status = 'claimed',
worker_id = p_worker_id,
claimed_at = NOW(),
updated_at = NOW()
WHERE id = (
SELECT id FROM worker_tasks
WHERE role = p_role
AND status = 'pending'
AND (scheduled_for IS NULL OR scheduled_for <= NOW())
-- Exclude stores that already have an active task
AND (dispensary_id IS NULL OR dispensary_id NOT IN (
SELECT dispensary_id FROM worker_tasks
WHERE status IN ('claimed', 'running')
AND dispensary_id IS NOT NULL
))
ORDER BY priority DESC, created_at ASC
LIMIT 1
FOR UPDATE SKIP LOCKED
)
RETURNING * INTO claimed_task;
RETURN claimed_task;
END;
$$ LANGUAGE plpgsql;
-- Function to mark stale tasks (workers that died)
CREATE OR REPLACE FUNCTION recover_stale_tasks(
stale_threshold_minutes INTEGER DEFAULT 10
) RETURNS INTEGER AS $$
DECLARE
recovered_count INTEGER;
BEGIN
WITH stale AS (
UPDATE worker_tasks
SET
status = 'pending',
worker_id = NULL,
claimed_at = NULL,
started_at = NULL,
retry_count = retry_count + 1,
updated_at = NOW()
WHERE status IN ('claimed', 'running')
AND last_heartbeat_at < NOW() - (stale_threshold_minutes || ' minutes')::INTERVAL
AND retry_count < max_retries
RETURNING id
)
SELECT COUNT(*) INTO recovered_count FROM stale;
-- Mark tasks that exceeded retries as failed
UPDATE worker_tasks
SET
status = 'failed',
error_message = 'Exceeded max retries after worker failures',
completed_at = NOW(),
updated_at = NOW()
WHERE status IN ('claimed', 'running')
AND last_heartbeat_at < NOW() - (stale_threshold_minutes || ' minutes')::INTERVAL
AND retry_count >= max_retries;
RETURN recovered_count;
END;
$$ LANGUAGE plpgsql;
-- Function to generate daily resync tasks
CREATE OR REPLACE FUNCTION generate_resync_tasks(
p_batches_per_day INTEGER DEFAULT 6, -- Every 4 hours
p_date DATE DEFAULT CURRENT_DATE
) RETURNS INTEGER AS $$
DECLARE
store_count INTEGER;
stores_per_batch INTEGER;
batch_num INTEGER;
scheduled_time TIMESTAMPTZ;
created_count INTEGER := 0;
BEGIN
-- Count active stores that need resync
SELECT COUNT(*) INTO store_count
FROM dispensaries
WHERE crawl_enabled = true
AND menu_type = 'dutchie'
AND platform_dispensary_id IS NOT NULL;
IF store_count = 0 THEN
RETURN 0;
END IF;
stores_per_batch := CEIL(store_count::NUMERIC / p_batches_per_day);
FOR batch_num IN 0..(p_batches_per_day - 1) LOOP
scheduled_time := p_date + (batch_num * 4 || ' hours')::INTERVAL;
INSERT INTO worker_tasks (role, dispensary_id, platform, scheduled_for, priority)
SELECT
'product_resync',
d.id,
'dutchie',
scheduled_time,
0
FROM (
SELECT id, ROW_NUMBER() OVER (ORDER BY id) as rn
FROM dispensaries
WHERE crawl_enabled = true
AND menu_type = 'dutchie'
AND platform_dispensary_id IS NOT NULL
) d
WHERE d.rn > (batch_num * stores_per_batch)
AND d.rn <= ((batch_num + 1) * stores_per_batch)
ON CONFLICT DO NOTHING;
GET DIAGNOSTICS created_count = created_count + ROW_COUNT;
END LOOP;
RETURN created_count;
END;
$$ LANGUAGE plpgsql;
-- Trigger to update timestamp
CREATE OR REPLACE FUNCTION update_worker_tasks_timestamp()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = NOW();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
DROP TRIGGER IF EXISTS worker_tasks_updated_at ON worker_tasks;
CREATE TRIGGER worker_tasks_updated_at
BEFORE UPDATE ON worker_tasks
FOR EACH ROW
EXECUTE FUNCTION update_worker_tasks_timestamp();
-- Comments
COMMENT ON TABLE worker_tasks IS 'Central task queue for all worker roles';
COMMENT ON TABLE worker_registry IS 'Registry of active workers and their stats';
COMMENT ON TABLE task_completion_log IS 'Hourly aggregated task completion metrics';
COMMENT ON VIEW v_worker_capacity IS 'Real-time capacity planning metrics per role';
COMMENT ON VIEW v_task_history IS 'Task history with dispensary details for UI';
COMMENT ON FUNCTION claim_task IS 'Atomically claim a task for a worker, respecting per-store locking';
COMMENT ON FUNCTION recover_stale_tasks IS 'Release tasks from dead workers back to pending';
COMMENT ON FUNCTION generate_resync_tasks IS 'Generate daily product resync tasks in batches';

View File

@@ -0,0 +1,13 @@
-- Migration 075: Add consecutive_misses column to store_products
-- Used to track how many consecutive crawls a product has been missing from the feed
-- After 3 consecutive misses, product is marked as OOS
ALTER TABLE store_products
ADD COLUMN IF NOT EXISTS consecutive_misses INTEGER NOT NULL DEFAULT 0;
-- Index for finding products that need OOS check
CREATE INDEX IF NOT EXISTS idx_store_products_consecutive_misses
ON store_products (dispensary_id, consecutive_misses)
WHERE consecutive_misses > 0;
COMMENT ON COLUMN store_products.consecutive_misses IS 'Number of consecutive crawls where product was not in feed. Reset to 0 when seen. At 3, mark OOS.';

View File

@@ -0,0 +1,71 @@
-- Visitor location analytics for Findagram
-- Tracks visitor locations to understand popular areas
CREATE TABLE IF NOT EXISTS visitor_locations (
id SERIAL PRIMARY KEY,
-- Location data (from IP lookup)
ip_hash VARCHAR(64), -- Hashed IP for privacy (SHA256)
city VARCHAR(100),
state VARCHAR(100),
state_code VARCHAR(10),
country VARCHAR(100),
country_code VARCHAR(10),
latitude DECIMAL(10, 7),
longitude DECIMAL(10, 7),
-- Visit metadata
domain VARCHAR(50) NOT NULL, -- 'findagram.co', 'findadispo.com', etc.
page_path VARCHAR(255), -- '/products', '/dispensaries/123', etc.
referrer VARCHAR(500),
user_agent VARCHAR(500),
-- Session tracking
session_id VARCHAR(64), -- For grouping page views in a session
-- Timestamps
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Indexes for analytics queries
CREATE INDEX IF NOT EXISTS idx_visitor_locations_domain ON visitor_locations(domain);
CREATE INDEX IF NOT EXISTS idx_visitor_locations_city_state ON visitor_locations(city, state_code);
CREATE INDEX IF NOT EXISTS idx_visitor_locations_created_at ON visitor_locations(created_at);
CREATE INDEX IF NOT EXISTS idx_visitor_locations_session ON visitor_locations(session_id);
-- Aggregated daily stats (materialized for performance)
CREATE TABLE IF NOT EXISTS visitor_location_stats (
id SERIAL PRIMARY KEY,
date DATE NOT NULL,
domain VARCHAR(50) NOT NULL,
city VARCHAR(100),
state VARCHAR(100),
state_code VARCHAR(10),
country_code VARCHAR(10),
-- Metrics
visit_count INTEGER DEFAULT 0,
unique_sessions INTEGER DEFAULT 0,
UNIQUE(date, domain, city, state_code, country_code)
);
CREATE INDEX IF NOT EXISTS idx_visitor_stats_date ON visitor_location_stats(date);
CREATE INDEX IF NOT EXISTS idx_visitor_stats_domain ON visitor_location_stats(domain);
CREATE INDEX IF NOT EXISTS idx_visitor_stats_state ON visitor_location_stats(state_code);
-- View for easy querying of top locations
CREATE OR REPLACE VIEW v_top_visitor_locations AS
SELECT
domain,
city,
state,
state_code,
country_code,
COUNT(*) as total_visits,
COUNT(DISTINCT session_id) as unique_sessions,
MAX(created_at) as last_visit
FROM visitor_locations
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY domain, city, state, state_code, country_code
ORDER BY total_visits DESC;

View File

@@ -0,0 +1,141 @@
-- Migration 076: Worker Registry for Dynamic Workers
-- Workers register on startup, receive a friendly name, and report heartbeats
-- Name pool for workers (expandable, no hardcoding)
CREATE TABLE IF NOT EXISTS worker_name_pool (
id SERIAL PRIMARY KEY,
name VARCHAR(50) UNIQUE NOT NULL,
in_use BOOLEAN DEFAULT FALSE,
assigned_to VARCHAR(100), -- worker_id
assigned_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Seed with initial names (can add more via API)
INSERT INTO worker_name_pool (name) VALUES
('Alice'), ('Bella'), ('Clara'), ('Diana'), ('Elena'),
('Fiona'), ('Grace'), ('Hazel'), ('Iris'), ('Julia'),
('Katie'), ('Luna'), ('Mia'), ('Nora'), ('Olive'),
('Pearl'), ('Quinn'), ('Rosa'), ('Sara'), ('Tara'),
('Uma'), ('Vera'), ('Wendy'), ('Xena'), ('Yuki'), ('Zara'),
('Amber'), ('Blake'), ('Coral'), ('Dawn'), ('Echo'),
('Fleur'), ('Gem'), ('Haven'), ('Ivy'), ('Jade'),
('Kira'), ('Lotus'), ('Maple'), ('Nova'), ('Onyx'),
('Pixel'), ('Quest'), ('Raven'), ('Sage'), ('Terra'),
('Unity'), ('Violet'), ('Willow'), ('Xylo'), ('Yara'), ('Zen')
ON CONFLICT (name) DO NOTHING;
-- Worker registry - tracks active workers
CREATE TABLE IF NOT EXISTS worker_registry (
id SERIAL PRIMARY KEY,
worker_id VARCHAR(100) UNIQUE NOT NULL, -- e.g., "pod-abc123" or uuid
friendly_name VARCHAR(50), -- assigned from pool
role VARCHAR(50) NOT NULL, -- task role
pod_name VARCHAR(100), -- k8s pod name
hostname VARCHAR(100), -- machine hostname
ip_address VARCHAR(50), -- worker IP
status VARCHAR(20) DEFAULT 'starting', -- starting, active, idle, offline, terminated
started_at TIMESTAMPTZ DEFAULT NOW(),
last_heartbeat_at TIMESTAMPTZ DEFAULT NOW(),
last_task_at TIMESTAMPTZ,
tasks_completed INTEGER DEFAULT 0,
tasks_failed INTEGER DEFAULT 0,
current_task_id INTEGER,
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Indexes for worker registry
CREATE INDEX IF NOT EXISTS idx_worker_registry_status ON worker_registry(status);
CREATE INDEX IF NOT EXISTS idx_worker_registry_role ON worker_registry(role);
CREATE INDEX IF NOT EXISTS idx_worker_registry_heartbeat ON worker_registry(last_heartbeat_at);
-- Function to assign a name to a new worker
CREATE OR REPLACE FUNCTION assign_worker_name(p_worker_id VARCHAR(100))
RETURNS VARCHAR(50) AS $$
DECLARE
v_name VARCHAR(50);
BEGIN
-- Try to get an unused name
UPDATE worker_name_pool
SET in_use = TRUE, assigned_to = p_worker_id, assigned_at = NOW()
WHERE id = (
SELECT id FROM worker_name_pool
WHERE in_use = FALSE
ORDER BY RANDOM()
LIMIT 1
FOR UPDATE SKIP LOCKED
)
RETURNING name INTO v_name;
-- If no names available, generate one
IF v_name IS NULL THEN
v_name := 'Worker-' || SUBSTRING(p_worker_id FROM 1 FOR 8);
END IF;
RETURN v_name;
END;
$$ LANGUAGE plpgsql;
-- Function to release a worker's name back to the pool
CREATE OR REPLACE FUNCTION release_worker_name(p_worker_id VARCHAR(100))
RETURNS VOID AS $$
BEGIN
UPDATE worker_name_pool
SET in_use = FALSE, assigned_to = NULL, assigned_at = NULL
WHERE assigned_to = p_worker_id;
END;
$$ LANGUAGE plpgsql;
-- Function to mark stale workers as offline
CREATE OR REPLACE FUNCTION mark_stale_workers(stale_threshold_minutes INTEGER DEFAULT 5)
RETURNS INTEGER AS $$
DECLARE
v_count INTEGER;
BEGIN
UPDATE worker_registry
SET status = 'offline', updated_at = NOW()
WHERE status IN ('active', 'idle', 'starting')
AND last_heartbeat_at < NOW() - (stale_threshold_minutes || ' minutes')::INTERVAL
RETURNING COUNT(*) INTO v_count;
-- Release names from offline workers
PERFORM release_worker_name(worker_id)
FROM worker_registry
WHERE status = 'offline'
AND last_heartbeat_at < NOW() - INTERVAL '30 minutes';
RETURN COALESCE(v_count, 0);
END;
$$ LANGUAGE plpgsql;
-- View for dashboard
CREATE OR REPLACE VIEW v_active_workers AS
SELECT
wr.id,
wr.worker_id,
wr.friendly_name,
wr.role,
wr.status,
wr.pod_name,
wr.hostname,
wr.started_at,
wr.last_heartbeat_at,
wr.last_task_at,
wr.tasks_completed,
wr.tasks_failed,
wr.current_task_id,
EXTRACT(EPOCH FROM (NOW() - wr.last_heartbeat_at)) as seconds_since_heartbeat,
CASE
WHEN wr.status = 'offline' THEN 'offline'
WHEN wr.last_heartbeat_at < NOW() - INTERVAL '2 minutes' THEN 'stale'
WHEN wr.current_task_id IS NOT NULL THEN 'busy'
ELSE 'ready'
END as health_status
FROM worker_registry wr
WHERE wr.status != 'terminated'
ORDER BY wr.status = 'active' DESC, wr.last_heartbeat_at DESC;
COMMENT ON TABLE worker_registry IS 'Tracks all workers that have registered with the system';
COMMENT ON TABLE worker_name_pool IS 'Pool of friendly names for workers - expandable via API';

View File

@@ -0,0 +1,35 @@
-- Migration: Add visitor location and dispensary name to click events
-- Captures where visitors are clicking from and which dispensary
-- Add visitor location columns
ALTER TABLE product_click_events
ADD COLUMN IF NOT EXISTS visitor_city VARCHAR(100);
ALTER TABLE product_click_events
ADD COLUMN IF NOT EXISTS visitor_state VARCHAR(10);
ALTER TABLE product_click_events
ADD COLUMN IF NOT EXISTS visitor_lat DECIMAL(10, 7);
ALTER TABLE product_click_events
ADD COLUMN IF NOT EXISTS visitor_lng DECIMAL(10, 7);
-- Add dispensary name for easier reporting
ALTER TABLE product_click_events
ADD COLUMN IF NOT EXISTS dispensary_name VARCHAR(255);
-- Create index for location-based analytics
CREATE INDEX IF NOT EXISTS idx_product_click_events_visitor_state
ON product_click_events(visitor_state)
WHERE visitor_state IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_product_click_events_visitor_city
ON product_click_events(visitor_city)
WHERE visitor_city IS NOT NULL;
-- Add comments
COMMENT ON COLUMN product_click_events.visitor_city IS 'City where the visitor is located (from IP geolocation)';
COMMENT ON COLUMN product_click_events.visitor_state IS 'State where the visitor is located (from IP geolocation)';
COMMENT ON COLUMN product_click_events.visitor_lat IS 'Visitor latitude (from IP geolocation)';
COMMENT ON COLUMN product_click_events.visitor_lng IS 'Visitor longitude (from IP geolocation)';
COMMENT ON COLUMN product_click_events.dispensary_name IS 'Name of the dispensary (denormalized for easier reporting)';

View File

@@ -0,0 +1,8 @@
-- Migration 078: Add consecutive_403_count to proxies table
-- Per workflow-12102025.md: Track consecutive 403s per proxy
-- After 3 consecutive 403s with different fingerprints → disable proxy
ALTER TABLE proxies ADD COLUMN IF NOT EXISTS consecutive_403_count INTEGER DEFAULT 0;
-- Add comment explaining the column
COMMENT ON COLUMN proxies.consecutive_403_count IS 'Tracks consecutive 403 blocks. Reset to 0 on success. Proxy disabled at 3.';

View File

@@ -0,0 +1,49 @@
-- Migration 079: Task Schedules for Database-Driven Scheduler
-- Per TASK_WORKFLOW_2024-12-10.md: Replaces node-cron with DB-driven scheduling
--
-- 2024-12-10: Created for reliable, multi-replica-safe task scheduling
-- task_schedules: Stores schedule definitions and state
CREATE TABLE IF NOT EXISTS task_schedules (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL UNIQUE,
role VARCHAR(50) NOT NULL, -- TaskRole: product_refresh, store_discovery, etc.
description TEXT,
-- Schedule configuration
enabled BOOLEAN DEFAULT TRUE,
interval_hours INTEGER NOT NULL DEFAULT 4,
priority INTEGER DEFAULT 0,
-- Optional scope filters
state_code VARCHAR(2), -- NULL = all states
platform VARCHAR(50), -- NULL = all platforms
-- Execution state (updated by scheduler)
last_run_at TIMESTAMPTZ,
next_run_at TIMESTAMPTZ,
last_task_count INTEGER DEFAULT 0,
last_error TEXT,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Indexes for scheduler queries
CREATE INDEX IF NOT EXISTS idx_task_schedules_enabled ON task_schedules(enabled) WHERE enabled = TRUE;
CREATE INDEX IF NOT EXISTS idx_task_schedules_next_run ON task_schedules(next_run_at) WHERE enabled = TRUE;
-- Insert default schedules
INSERT INTO task_schedules (name, role, interval_hours, priority, description, next_run_at)
VALUES
('product_refresh_all', 'product_refresh', 4, 0, 'Generate product refresh tasks for all crawl-enabled stores every 4 hours', NOW()),
('store_discovery_dutchie', 'store_discovery', 24, 5, 'Discover new Dutchie stores daily', NOW()),
('analytics_refresh', 'analytics_refresh', 6, 0, 'Refresh analytics materialized views every 6 hours', NOW())
ON CONFLICT (name) DO NOTHING;
-- Comment for documentation
COMMENT ON TABLE task_schedules IS 'Database-driven task scheduler configuration. Per TASK_WORKFLOW_2024-12-10.md:
- Schedules persist in DB (survive restarts)
- Uses SELECT FOR UPDATE SKIP LOCKED for multi-replica safety
- Scheduler polls every 60s and executes due schedules
- Creates tasks in worker_tasks for task-worker.ts to process';

View File

@@ -0,0 +1,58 @@
-- Migration 080: Raw Crawl Payloads Metadata Table
-- Per TASK_WORKFLOW_2024-12-10.md: Store full GraphQL payloads for historical analysis
--
-- Design Pattern: Metadata/Payload Separation
-- - Metadata (this table): Small, indexed, queryable
-- - Payload (filesystem): Gzipped JSON at storage_path
--
-- Benefits:
-- - Compare any two crawls to see what changed
-- - Replay/re-normalize historical data if logic changes
-- - Debug issues by seeing exactly what the API returned
-- - DB stays small, backups stay fast
--
-- Storage location: /storage/payloads/{year}/{month}/{day}/store_{id}_{timestamp}.json.gz
-- Compression: ~90% reduction (1.5MB -> 150KB per crawl)
CREATE TABLE IF NOT EXISTS raw_crawl_payloads (
id SERIAL PRIMARY KEY,
-- Links to crawl tracking
crawl_run_id INTEGER REFERENCES crawl_runs(id) ON DELETE SET NULL,
dispensary_id INTEGER NOT NULL REFERENCES dispensaries(id) ON DELETE CASCADE,
-- File location (gzipped JSON)
storage_path TEXT NOT NULL,
-- Metadata for quick queries without loading file
product_count INTEGER NOT NULL DEFAULT 0,
size_bytes INTEGER, -- Compressed size
size_bytes_raw INTEGER, -- Uncompressed size
-- Timestamps
fetched_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
-- Optional: checksum for integrity verification
checksum_sha256 VARCHAR(64)
);
-- Indexes for common queries
CREATE INDEX IF NOT EXISTS idx_raw_crawl_payloads_dispensary
ON raw_crawl_payloads(dispensary_id);
CREATE INDEX IF NOT EXISTS idx_raw_crawl_payloads_dispensary_fetched
ON raw_crawl_payloads(dispensary_id, fetched_at DESC);
CREATE INDEX IF NOT EXISTS idx_raw_crawl_payloads_fetched
ON raw_crawl_payloads(fetched_at DESC);
CREATE INDEX IF NOT EXISTS idx_raw_crawl_payloads_crawl_run
ON raw_crawl_payloads(crawl_run_id)
WHERE crawl_run_id IS NOT NULL;
-- Comments
COMMENT ON TABLE raw_crawl_payloads IS 'Metadata for raw GraphQL payloads stored on filesystem. Per TASK_WORKFLOW_2024-12-10.md: Full payloads enable historical diffs and replay.';
COMMENT ON COLUMN raw_crawl_payloads.storage_path IS 'Path to gzipped JSON file, e.g. /storage/payloads/2024/12/10/store_123_1702234567.json.gz';
COMMENT ON COLUMN raw_crawl_payloads.size_bytes IS 'Compressed file size in bytes';
COMMENT ON COLUMN raw_crawl_payloads.size_bytes_raw IS 'Uncompressed payload size in bytes';

View File

@@ -0,0 +1,37 @@
-- Migration 081: Payload Fetch Columns
-- Per TASK_WORKFLOW_2024-12-10.md: Separates API fetch from data processing
--
-- New architecture:
-- - payload_fetch: Hits Dutchie API, saves raw payload to disk
-- - product_refresh: Reads local payload, normalizes, upserts to DB
--
-- This migration adds:
-- 1. payload column to worker_tasks (for task chaining data)
-- 2. processed_at column to raw_crawl_payloads (track when payload was processed)
-- 3. last_fetch_at column to dispensaries (track when last payload was fetched)
-- Add payload column to worker_tasks for task chaining
-- Used by payload_fetch to pass payload_id to product_refresh
ALTER TABLE worker_tasks
ADD COLUMN IF NOT EXISTS payload JSONB DEFAULT NULL;
COMMENT ON COLUMN worker_tasks.payload IS 'Per TASK_WORKFLOW_2024-12-10.md: Task chaining data (e.g., payload_id from payload_fetch to product_refresh)';
-- Add processed_at to raw_crawl_payloads
-- Tracks when the payload was processed by product_refresh
ALTER TABLE raw_crawl_payloads
ADD COLUMN IF NOT EXISTS processed_at TIMESTAMPTZ DEFAULT NULL;
COMMENT ON COLUMN raw_crawl_payloads.processed_at IS 'When this payload was processed by product_refresh handler';
-- Index for finding unprocessed payloads
CREATE INDEX IF NOT EXISTS idx_raw_crawl_payloads_unprocessed
ON raw_crawl_payloads(dispensary_id, fetched_at DESC)
WHERE processed_at IS NULL;
-- Add last_fetch_at to dispensaries
-- Tracks when the last payload was fetched (separate from last_crawl_at which is when processing completed)
ALTER TABLE dispensaries
ADD COLUMN IF NOT EXISTS last_fetch_at TIMESTAMPTZ DEFAULT NULL;
COMMENT ON COLUMN dispensaries.last_fetch_at IS 'Per TASK_WORKFLOW_2024-12-10.md: When last payload was fetched from API (separate from last_crawl_at which is when processing completed)';

View File

@@ -0,0 +1,27 @@
-- Migration: 082_proxy_notification_trigger
-- Date: 2024-12-11
-- Description: Add PostgreSQL NOTIFY trigger to alert workers when proxies are added
-- Create function to notify workers when active proxy is added/activated
CREATE OR REPLACE FUNCTION notify_proxy_added()
RETURNS TRIGGER AS $$
BEGIN
-- Only notify if proxy is active
IF NEW.active = true THEN
PERFORM pg_notify('proxy_added', NEW.id::text);
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- Drop existing trigger if any
DROP TRIGGER IF EXISTS proxy_added_trigger ON proxies;
-- Create trigger on insert and update of active column
CREATE TRIGGER proxy_added_trigger
AFTER INSERT OR UPDATE OF active ON proxies
FOR EACH ROW
EXECUTE FUNCTION notify_proxy_added();
COMMENT ON FUNCTION notify_proxy_added() IS
'Sends PostgreSQL NOTIFY to proxy_added channel when an active proxy is added or activated. Workers LISTEN on this channel to wake up immediately.';

View File

@@ -0,0 +1,88 @@
-- Migration 083: Discovery Run Tracking
-- Tracks progress of store discovery runs step-by-step
-- Main discovery runs table
CREATE TABLE IF NOT EXISTS discovery_runs (
id SERIAL PRIMARY KEY,
platform VARCHAR(50) NOT NULL DEFAULT 'dutchie',
status VARCHAR(20) NOT NULL DEFAULT 'running', -- running, completed, failed
started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
finished_at TIMESTAMPTZ,
task_id INTEGER REFERENCES worker_task_queue(id),
-- Totals
states_total INTEGER DEFAULT 0,
states_completed INTEGER DEFAULT 0,
locations_discovered INTEGER DEFAULT 0,
locations_promoted INTEGER DEFAULT 0,
new_store_ids INTEGER[] DEFAULT '{}',
-- Error info
error_message TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Per-state progress within a run
CREATE TABLE IF NOT EXISTS discovery_run_states (
id SERIAL PRIMARY KEY,
run_id INTEGER NOT NULL REFERENCES discovery_runs(id) ON DELETE CASCADE,
state_code VARCHAR(2) NOT NULL,
status VARCHAR(20) NOT NULL DEFAULT 'pending', -- pending, running, completed, failed
started_at TIMESTAMPTZ,
finished_at TIMESTAMPTZ,
-- Results
cities_found INTEGER DEFAULT 0,
locations_found INTEGER DEFAULT 0,
locations_upserted INTEGER DEFAULT 0,
new_dispensary_ids INTEGER[] DEFAULT '{}',
-- Error info
error_message TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(run_id, state_code)
);
-- Step-by-step log for detailed progress tracking
CREATE TABLE IF NOT EXISTS discovery_run_steps (
id SERIAL PRIMARY KEY,
run_id INTEGER NOT NULL REFERENCES discovery_runs(id) ON DELETE CASCADE,
state_code VARCHAR(2),
step_name VARCHAR(100) NOT NULL,
status VARCHAR(20) NOT NULL DEFAULT 'started', -- started, completed, failed
started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
finished_at TIMESTAMPTZ,
-- Details (JSON for flexibility)
details JSONB DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Indexes for querying
CREATE INDEX IF NOT EXISTS idx_discovery_runs_status ON discovery_runs(status);
CREATE INDEX IF NOT EXISTS idx_discovery_runs_platform ON discovery_runs(platform);
CREATE INDEX IF NOT EXISTS idx_discovery_runs_started_at ON discovery_runs(started_at DESC);
CREATE INDEX IF NOT EXISTS idx_discovery_run_states_run_id ON discovery_run_states(run_id);
CREATE INDEX IF NOT EXISTS idx_discovery_run_steps_run_id ON discovery_run_steps(run_id);
-- View for latest run status per platform
CREATE OR REPLACE VIEW v_latest_discovery_runs AS
SELECT DISTINCT ON (platform)
id,
platform,
status,
started_at,
finished_at,
states_total,
states_completed,
locations_discovered,
locations_promoted,
array_length(new_store_ids, 1) as new_stores_count,
error_message,
EXTRACT(EPOCH FROM (COALESCE(finished_at, NOW()) - started_at)) as duration_seconds
FROM discovery_runs
ORDER BY platform, started_at DESC;

View File

@@ -0,0 +1,253 @@
-- Migration 084: Dual Transport Preflight System
-- Workers run both curl and http (Puppeteer) preflights on startup
-- Tasks can require a specific transport method
-- ===================================================================
-- PART 1: Add preflight columns to worker_registry
-- ===================================================================
-- Preflight status for curl/axios transport (proxy-based)
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS preflight_curl_status VARCHAR(20) DEFAULT 'pending';
-- Preflight status for http/Puppeteer transport (browser-based)
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS preflight_http_status VARCHAR(20) DEFAULT 'pending';
-- Timestamps for when each preflight completed
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS preflight_curl_at TIMESTAMPTZ;
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS preflight_http_at TIMESTAMPTZ;
-- Error messages for failed preflights
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS preflight_curl_error TEXT;
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS preflight_http_error TEXT;
-- Response time for successful preflights (ms)
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS preflight_curl_ms INTEGER;
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS preflight_http_ms INTEGER;
-- Constraints for preflight status values
ALTER TABLE worker_registry
DROP CONSTRAINT IF EXISTS valid_preflight_curl_status;
ALTER TABLE worker_registry
ADD CONSTRAINT valid_preflight_curl_status
CHECK (preflight_curl_status IN ('pending', 'passed', 'failed', 'skipped'));
ALTER TABLE worker_registry
DROP CONSTRAINT IF EXISTS valid_preflight_http_status;
ALTER TABLE worker_registry
ADD CONSTRAINT valid_preflight_http_status
CHECK (preflight_http_status IN ('pending', 'passed', 'failed', 'skipped'));
-- ===================================================================
-- PART 2: Add method column to worker_tasks
-- ===================================================================
-- Transport method requirement for the task
-- NULL = no preference (any worker can claim)
-- 'curl' = requires curl/axios transport (proxy-based, fast)
-- 'http' = requires http/Puppeteer transport (browser-based, anti-detect)
ALTER TABLE worker_tasks
ADD COLUMN IF NOT EXISTS method VARCHAR(10);
-- Constraint for valid method values
ALTER TABLE worker_tasks
DROP CONSTRAINT IF EXISTS valid_task_method;
ALTER TABLE worker_tasks
ADD CONSTRAINT valid_task_method
CHECK (method IS NULL OR method IN ('curl', 'http'));
-- Index for method-based task claiming
CREATE INDEX IF NOT EXISTS idx_worker_tasks_method
ON worker_tasks(method)
WHERE status = 'pending';
-- Set default method for all existing pending tasks to 'http'
-- ALL current tasks require Puppeteer/browser-based transport
UPDATE worker_tasks
SET method = 'http'
WHERE method IS NULL;
-- ===================================================================
-- PART 3: Update claim_task function for method compatibility
-- ===================================================================
CREATE OR REPLACE FUNCTION claim_task(
p_role VARCHAR(50),
p_worker_id VARCHAR(100),
p_curl_passed BOOLEAN DEFAULT TRUE,
p_http_passed BOOLEAN DEFAULT FALSE
) RETURNS worker_tasks AS $$
DECLARE
claimed_task worker_tasks;
BEGIN
UPDATE worker_tasks
SET
status = 'claimed',
worker_id = p_worker_id,
claimed_at = NOW(),
updated_at = NOW()
WHERE id = (
SELECT id FROM worker_tasks
WHERE role = p_role
AND status = 'pending'
AND (scheduled_for IS NULL OR scheduled_for <= NOW())
-- Method compatibility: worker must have passed the required preflight
AND (
method IS NULL -- No preference, any worker can claim
OR (method = 'curl' AND p_curl_passed = TRUE)
OR (method = 'http' AND p_http_passed = TRUE)
)
-- Exclude stores that already have an active task
AND (dispensary_id IS NULL OR dispensary_id NOT IN (
SELECT dispensary_id FROM worker_tasks
WHERE status IN ('claimed', 'running')
AND dispensary_id IS NOT NULL
))
ORDER BY priority DESC, created_at ASC
LIMIT 1
FOR UPDATE SKIP LOCKED
)
RETURNING * INTO claimed_task;
RETURN claimed_task;
END;
$$ LANGUAGE plpgsql;
-- ===================================================================
-- PART 4: Update v_active_workers view
-- ===================================================================
DROP VIEW IF EXISTS v_active_workers;
CREATE VIEW v_active_workers AS
SELECT
wr.id,
wr.worker_id,
wr.friendly_name,
wr.role,
wr.status,
wr.pod_name,
wr.hostname,
wr.started_at,
wr.last_heartbeat_at,
wr.last_task_at,
wr.tasks_completed,
wr.tasks_failed,
wr.current_task_id,
-- Preflight status
wr.preflight_curl_status,
wr.preflight_http_status,
wr.preflight_curl_at,
wr.preflight_http_at,
wr.preflight_curl_error,
wr.preflight_http_error,
wr.preflight_curl_ms,
wr.preflight_http_ms,
-- Computed fields
EXTRACT(EPOCH FROM (NOW() - wr.last_heartbeat_at)) as seconds_since_heartbeat,
CASE
WHEN wr.status = 'offline' THEN 'offline'
WHEN wr.last_heartbeat_at < NOW() - INTERVAL '2 minutes' THEN 'stale'
WHEN wr.current_task_id IS NOT NULL THEN 'busy'
ELSE 'ready'
END as health_status,
-- Capability flags (can this worker handle curl/http tasks?)
(wr.preflight_curl_status = 'passed') as can_curl,
(wr.preflight_http_status = 'passed') as can_http
FROM worker_registry wr
WHERE wr.status != 'terminated'
ORDER BY wr.status = 'active' DESC, wr.last_heartbeat_at DESC;
-- ===================================================================
-- PART 5: View for task queue with method info
-- ===================================================================
DROP VIEW IF EXISTS v_task_history;
CREATE VIEW v_task_history AS
SELECT
t.id,
t.role,
t.dispensary_id,
d.name as dispensary_name,
t.platform,
t.status,
t.priority,
t.method,
t.worker_id,
t.scheduled_for,
t.claimed_at,
t.started_at,
t.completed_at,
t.error_message,
t.retry_count,
t.created_at,
EXTRACT(EPOCH FROM (t.completed_at - t.started_at)) as duration_sec
FROM worker_tasks t
LEFT JOIN dispensaries d ON d.id = t.dispensary_id
ORDER BY t.created_at DESC;
-- ===================================================================
-- PART 6: Helper function to update worker preflight status
-- ===================================================================
CREATE OR REPLACE FUNCTION update_worker_preflight(
p_worker_id VARCHAR(100),
p_transport VARCHAR(10), -- 'curl' or 'http'
p_status VARCHAR(20), -- 'passed', 'failed', 'skipped'
p_response_ms INTEGER DEFAULT NULL,
p_error TEXT DEFAULT NULL
) RETURNS VOID AS $$
BEGIN
IF p_transport = 'curl' THEN
UPDATE worker_registry
SET
preflight_curl_status = p_status,
preflight_curl_at = NOW(),
preflight_curl_ms = p_response_ms,
preflight_curl_error = p_error,
updated_at = NOW()
WHERE worker_id = p_worker_id;
ELSIF p_transport = 'http' THEN
UPDATE worker_registry
SET
preflight_http_status = p_status,
preflight_http_at = NOW(),
preflight_http_ms = p_response_ms,
preflight_http_error = p_error,
updated_at = NOW()
WHERE worker_id = p_worker_id;
END IF;
END;
$$ LANGUAGE plpgsql;
-- ===================================================================
-- Comments
-- ===================================================================
COMMENT ON COLUMN worker_registry.preflight_curl_status IS 'Status of curl/axios preflight: pending, passed, failed, skipped';
COMMENT ON COLUMN worker_registry.preflight_http_status IS 'Status of http/Puppeteer preflight: pending, passed, failed, skipped';
COMMENT ON COLUMN worker_registry.preflight_curl_at IS 'When curl preflight completed';
COMMENT ON COLUMN worker_registry.preflight_http_at IS 'When http preflight completed';
COMMENT ON COLUMN worker_registry.preflight_curl_error IS 'Error message if curl preflight failed';
COMMENT ON COLUMN worker_registry.preflight_http_error IS 'Error message if http preflight failed';
COMMENT ON COLUMN worker_registry.preflight_curl_ms IS 'Response time of successful curl preflight (ms)';
COMMENT ON COLUMN worker_registry.preflight_http_ms IS 'Response time of successful http preflight (ms)';
COMMENT ON COLUMN worker_tasks.method IS 'Transport method required: NULL=any, curl=proxy-based, http=browser-based';
COMMENT ON FUNCTION claim_task IS 'Atomically claim a task, respecting method requirements and per-store locking';
COMMENT ON FUNCTION update_worker_preflight IS 'Update a workers preflight status for a given transport';

View File

@@ -0,0 +1,168 @@
-- Migration 085: Add IP and fingerprint columns for preflight reporting
-- These columns were missing from migration 084
-- ===================================================================
-- PART 1: Add IP address columns to worker_registry
-- ===================================================================
-- IP address detected during curl/axios preflight
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS curl_ip VARCHAR(45);
-- IP address detected during http/Puppeteer preflight
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS http_ip VARCHAR(45);
-- ===================================================================
-- PART 2: Add fingerprint data column
-- ===================================================================
-- Browser fingerprint data captured during Puppeteer preflight
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS fingerprint_data JSONB;
-- ===================================================================
-- PART 3: Add combined preflight status/timestamp for convenience
-- ===================================================================
-- Overall preflight status (computed from both transports)
-- Values: 'pending', 'passed', 'partial', 'failed'
-- - 'pending': neither transport tested
-- - 'passed': both transports passed (or http passed for browser-only)
-- - 'partial': at least one passed
-- - 'failed': no transport passed
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS preflight_status VARCHAR(20) DEFAULT 'pending';
-- Most recent preflight completion timestamp
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS preflight_at TIMESTAMPTZ;
-- ===================================================================
-- PART 4: Update function to set preflight status
-- ===================================================================
CREATE OR REPLACE FUNCTION update_worker_preflight(
p_worker_id VARCHAR(100),
p_transport VARCHAR(10), -- 'curl' or 'http'
p_status VARCHAR(20), -- 'passed', 'failed', 'skipped'
p_ip VARCHAR(45) DEFAULT NULL,
p_response_ms INTEGER DEFAULT NULL,
p_error TEXT DEFAULT NULL,
p_fingerprint JSONB DEFAULT NULL
) RETURNS VOID AS $$
DECLARE
v_curl_status VARCHAR(20);
v_http_status VARCHAR(20);
v_overall_status VARCHAR(20);
BEGIN
IF p_transport = 'curl' THEN
UPDATE worker_registry
SET
preflight_curl_status = p_status,
preflight_curl_at = NOW(),
preflight_curl_ms = p_response_ms,
preflight_curl_error = p_error,
curl_ip = p_ip,
updated_at = NOW()
WHERE worker_id = p_worker_id;
ELSIF p_transport = 'http' THEN
UPDATE worker_registry
SET
preflight_http_status = p_status,
preflight_http_at = NOW(),
preflight_http_ms = p_response_ms,
preflight_http_error = p_error,
http_ip = p_ip,
fingerprint_data = COALESCE(p_fingerprint, fingerprint_data),
updated_at = NOW()
WHERE worker_id = p_worker_id;
END IF;
-- Update overall preflight status
SELECT preflight_curl_status, preflight_http_status
INTO v_curl_status, v_http_status
FROM worker_registry
WHERE worker_id = p_worker_id;
-- Compute overall status
IF v_curl_status = 'passed' AND v_http_status = 'passed' THEN
v_overall_status := 'passed';
ELSIF v_curl_status = 'passed' OR v_http_status = 'passed' THEN
v_overall_status := 'partial';
ELSIF v_curl_status = 'failed' OR v_http_status = 'failed' THEN
v_overall_status := 'failed';
ELSE
v_overall_status := 'pending';
END IF;
UPDATE worker_registry
SET
preflight_status = v_overall_status,
preflight_at = NOW()
WHERE worker_id = p_worker_id;
END;
$$ LANGUAGE plpgsql;
-- ===================================================================
-- PART 5: Update v_active_workers view
-- ===================================================================
DROP VIEW IF EXISTS v_active_workers;
CREATE VIEW v_active_workers AS
SELECT
wr.id,
wr.worker_id,
wr.friendly_name,
wr.role,
wr.status,
wr.pod_name,
wr.hostname,
wr.started_at,
wr.last_heartbeat_at,
wr.last_task_at,
wr.tasks_completed,
wr.tasks_failed,
wr.current_task_id,
-- IP addresses from preflights
wr.curl_ip,
wr.http_ip,
-- Combined preflight status
wr.preflight_status,
wr.preflight_at,
-- Detailed preflight status per transport
wr.preflight_curl_status,
wr.preflight_http_status,
wr.preflight_curl_at,
wr.preflight_http_at,
wr.preflight_curl_error,
wr.preflight_http_error,
wr.preflight_curl_ms,
wr.preflight_http_ms,
-- Fingerprint data
wr.fingerprint_data,
-- Computed fields
EXTRACT(EPOCH FROM (NOW() - wr.last_heartbeat_at)) as seconds_since_heartbeat,
CASE
WHEN wr.status = 'offline' THEN 'offline'
WHEN wr.last_heartbeat_at < NOW() - INTERVAL '2 minutes' THEN 'stale'
WHEN wr.current_task_id IS NOT NULL THEN 'busy'
ELSE 'ready'
END as health_status,
-- Capability flags (can this worker handle curl/http tasks?)
(wr.preflight_curl_status = 'passed') as can_curl,
(wr.preflight_http_status = 'passed') as can_http
FROM worker_registry wr
WHERE wr.status != 'terminated'
ORDER BY wr.status = 'active' DESC, wr.last_heartbeat_at DESC;
-- ===================================================================
-- Comments
-- ===================================================================
COMMENT ON COLUMN worker_registry.curl_ip IS 'IP address detected during curl/axios preflight';
COMMENT ON COLUMN worker_registry.http_ip IS 'IP address detected during Puppeteer preflight';
COMMENT ON COLUMN worker_registry.fingerprint_data IS 'Browser fingerprint captured during Puppeteer preflight';
COMMENT ON COLUMN worker_registry.preflight_status IS 'Overall preflight status: pending, passed, partial, failed';
COMMENT ON COLUMN worker_registry.preflight_at IS 'Most recent preflight completion timestamp';

View File

@@ -0,0 +1,59 @@
-- Migration 085: Trusted Origins Management
-- Allows admin to manage trusted IPs and domains via UI instead of hardcoded values
-- Trusted origins table (IPs and domains that bypass API key auth)
CREATE TABLE IF NOT EXISTS trusted_origins (
id SERIAL PRIMARY KEY,
-- Origin type: 'ip', 'domain', 'pattern'
origin_type VARCHAR(20) NOT NULL CHECK (origin_type IN ('ip', 'domain', 'pattern')),
-- The actual value
-- For ip: '127.0.0.1', '::1', '192.168.1.0/24'
-- For domain: 'cannaiq.co', 'findadispo.com'
-- For pattern: '^https://.*\.cannabrands\.app$' (regex)
origin_value VARCHAR(255) NOT NULL,
-- Description for admin reference
description TEXT,
-- Active flag
active BOOLEAN DEFAULT true,
-- Audit
created_at TIMESTAMPTZ DEFAULT NOW(),
created_by INTEGER REFERENCES users(id),
updated_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(origin_type, origin_value)
);
-- Index for quick lookups
CREATE INDEX IF NOT EXISTS idx_trusted_origins_active ON trusted_origins(active) WHERE active = true;
CREATE INDEX IF NOT EXISTS idx_trusted_origins_type ON trusted_origins(origin_type, active);
-- Seed with current hardcoded values
INSERT INTO trusted_origins (origin_type, origin_value, description) VALUES
-- Trusted IPs (localhost)
('ip', '127.0.0.1', 'Localhost IPv4'),
('ip', '::1', 'Localhost IPv6'),
('ip', '::ffff:127.0.0.1', 'Localhost IPv4-mapped IPv6'),
-- Trusted domains
('domain', 'cannaiq.co', 'CannaiQ production'),
('domain', 'www.cannaiq.co', 'CannaiQ production (www)'),
('domain', 'findadispo.com', 'FindADispo production'),
('domain', 'www.findadispo.com', 'FindADispo production (www)'),
('domain', 'findagram.co', 'Findagram production'),
('domain', 'www.findagram.co', 'Findagram production (www)'),
('domain', 'localhost:3010', 'Local backend dev'),
('domain', 'localhost:8080', 'Local admin dev'),
('domain', 'localhost:5173', 'Local Vite dev'),
-- Pattern-based (regex)
('pattern', '^https://.*\.cannabrands\.app$', 'All cannabrands.app subdomains'),
('pattern', '^https://.*\.cannaiq\.co$', 'All cannaiq.co subdomains')
ON CONFLICT (origin_type, origin_value) DO NOTHING;
-- Add comment
COMMENT ON TABLE trusted_origins IS 'IPs and domains that bypass API key authentication. Managed via /admin.';

View File

@@ -0,0 +1,10 @@
-- Migration 086: Add proxy_url column for alternative URL formats
-- Some proxy providers use non-standard URL formats (e.g., host:port:user:pass)
-- This column allows storing the raw URL directly
-- Add proxy_url column - if set, used directly instead of constructing from parts
ALTER TABLE proxies
ADD COLUMN IF NOT EXISTS proxy_url TEXT;
-- Add comment
COMMENT ON COLUMN proxies.proxy_url IS 'Raw proxy URL (if provider uses non-standard format). Takes precedence over constructed URL from host/port/user/pass.';

View File

@@ -0,0 +1,30 @@
-- Migration 088: Extend raw_crawl_payloads for discovery payloads
--
-- Enables saving raw store data from Dutchie discovery crawls.
-- Store discovery returns raw dispensary objects - save them for historical analysis.
-- Add payload_type to distinguish product crawls from discovery crawls
ALTER TABLE raw_crawl_payloads
ADD COLUMN IF NOT EXISTS payload_type VARCHAR(32) NOT NULL DEFAULT 'product';
-- Add state_code for discovery payloads (null for product payloads)
ALTER TABLE raw_crawl_payloads
ADD COLUMN IF NOT EXISTS state_code VARCHAR(10);
-- Add store_count for discovery payloads (alternative to product_count)
ALTER TABLE raw_crawl_payloads
ADD COLUMN IF NOT EXISTS store_count INTEGER;
-- Make dispensary_id nullable for discovery payloads
ALTER TABLE raw_crawl_payloads
ALTER COLUMN dispensary_id DROP NOT NULL;
-- Add index for discovery payload queries
CREATE INDEX IF NOT EXISTS idx_raw_crawl_payloads_type_state
ON raw_crawl_payloads(payload_type, state_code)
WHERE payload_type = 'store_discovery';
-- Comments
COMMENT ON COLUMN raw_crawl_payloads.payload_type IS 'Type: product (default), store_discovery';
COMMENT ON COLUMN raw_crawl_payloads.state_code IS 'State code for discovery payloads (e.g., AZ, MI)';
COMMENT ON COLUMN raw_crawl_payloads.store_count IS 'Number of stores in discovery payload';

View File

@@ -0,0 +1,105 @@
-- Migration 089: Immutable Schedules with Per-State Product Discovery
--
-- Key changes:
-- 1. Add is_immutable column - schedules can be edited but not deleted
-- 2. Add method column - all tasks use 'http' (Puppeteer transport)
-- 3. Store discovery weekly (168h)
-- 4. Per-state product_discovery schedules (4h default)
-- 5. Remove old payload_fetch schedules
-- =====================================================
-- 1) Add new columns to task_schedules
-- =====================================================
ALTER TABLE task_schedules
ADD COLUMN IF NOT EXISTS is_immutable BOOLEAN DEFAULT FALSE;
ALTER TABLE task_schedules
ADD COLUMN IF NOT EXISTS method VARCHAR(10) DEFAULT 'http';
-- =====================================================
-- 2) Update store_discovery to weekly and immutable
-- =====================================================
UPDATE task_schedules
SET interval_hours = 168, -- 7 days
is_immutable = TRUE,
method = 'http',
description = 'Discover new Dutchie stores weekly (HTTP transport)'
WHERE name IN ('store_discovery_dutchie', 'Store Discovery');
-- Insert if doesn't exist
INSERT INTO task_schedules (name, role, interval_hours, priority, description, is_immutable, method, platform, next_run_at)
VALUES ('Store Discovery', 'store_discovery', 168, 5, 'Discover new Dutchie stores weekly (HTTP transport)', TRUE, 'http', 'dutchie', NOW())
ON CONFLICT (name) DO UPDATE SET
interval_hours = 168,
is_immutable = TRUE,
method = 'http',
description = 'Discover new Dutchie stores weekly (HTTP transport)';
-- =====================================================
-- 3) Remove old payload_fetch and product_refresh_all schedules
-- =====================================================
DELETE FROM task_schedules WHERE name IN ('payload_fetch_all', 'product_refresh_all');
-- =====================================================
-- 4) Create per-state product_discovery schedules
-- =====================================================
-- One schedule per state that has dispensaries with active cannabis programs
INSERT INTO task_schedules (name, role, state_code, interval_hours, priority, description, is_immutable, method, enabled, next_run_at)
SELECT
'product_discovery_' || lower(s.code) AS name,
'product_discovery' AS role,
s.code AS state_code,
4 AS interval_hours, -- 4 hours default, editable
10 AS priority,
'Product discovery for ' || s.name || ' dispensaries (HTTP transport)' AS description,
TRUE AS is_immutable, -- Can edit but not delete
'http' AS method,
CASE WHEN s.is_active THEN TRUE ELSE FALSE END AS enabled,
-- Stagger start times: each state starts 5 minutes after the previous
NOW() + (ROW_NUMBER() OVER (ORDER BY s.code) * INTERVAL '5 minutes') AS next_run_at
FROM states s
WHERE EXISTS (
SELECT 1 FROM dispensaries d
WHERE d.state_id = s.id AND d.crawl_enabled = true
)
ON CONFLICT (name) DO UPDATE SET
is_immutable = TRUE,
method = 'http',
description = EXCLUDED.description;
-- Also create schedules for states that might have stores discovered later
INSERT INTO task_schedules (name, role, state_code, interval_hours, priority, description, is_immutable, method, enabled, next_run_at)
SELECT
'product_discovery_' || lower(s.code) AS name,
'product_discovery' AS role,
s.code AS state_code,
4 AS interval_hours,
10 AS priority,
'Product discovery for ' || s.name || ' dispensaries (HTTP transport)' AS description,
TRUE AS is_immutable,
'http' AS method,
FALSE AS enabled, -- Disabled until stores exist
NOW() + INTERVAL '1 hour'
FROM states s
WHERE NOT EXISTS (
SELECT 1 FROM task_schedules ts WHERE ts.name = 'product_discovery_' || lower(s.code)
)
ON CONFLICT (name) DO NOTHING;
-- =====================================================
-- 5) Make analytics_refresh immutable
-- =====================================================
UPDATE task_schedules
SET is_immutable = TRUE, method = 'http'
WHERE name = 'analytics_refresh';
-- =====================================================
-- 6) Add index for schedule lookups
-- =====================================================
CREATE INDEX IF NOT EXISTS idx_task_schedules_state_code
ON task_schedules(state_code)
WHERE state_code IS NOT NULL;
-- Comments
COMMENT ON COLUMN task_schedules.is_immutable IS 'If TRUE, schedule cannot be deleted (only edited)';
COMMENT ON COLUMN task_schedules.method IS 'Transport method: http (Puppeteer/browser) or curl (axios)';

View File

@@ -0,0 +1,66 @@
-- Migration 090: Add modification tracking columns
--
-- Tracks when records were last modified and by which task.
-- Enables debugging, auditing, and understanding data freshness.
--
-- Columns added:
-- last_modified_at - When the record was last modified by a task
-- last_modified_by_task - Which task role modified it (e.g., 'product_refresh')
-- last_modified_task_id - The specific task ID that modified it
-- ============================================================
-- dispensaries table
-- ============================================================
ALTER TABLE dispensaries
ADD COLUMN IF NOT EXISTS last_modified_at TIMESTAMPTZ;
ALTER TABLE dispensaries
ADD COLUMN IF NOT EXISTS last_modified_by_task VARCHAR(50);
ALTER TABLE dispensaries
ADD COLUMN IF NOT EXISTS last_modified_task_id INTEGER;
-- Index for querying recently modified records
CREATE INDEX IF NOT EXISTS idx_dispensaries_last_modified
ON dispensaries(last_modified_at DESC)
WHERE last_modified_at IS NOT NULL;
-- Index for querying by task type
CREATE INDEX IF NOT EXISTS idx_dispensaries_modified_by_task
ON dispensaries(last_modified_by_task)
WHERE last_modified_by_task IS NOT NULL;
COMMENT ON COLUMN dispensaries.last_modified_at IS 'Timestamp when this record was last modified by a task';
COMMENT ON COLUMN dispensaries.last_modified_by_task IS 'Task role that last modified this record (e.g., store_discovery_state, entry_point_discovery)';
COMMENT ON COLUMN dispensaries.last_modified_task_id IS 'ID of the worker_tasks record that last modified this';
-- ============================================================
-- store_products table
-- ============================================================
ALTER TABLE store_products
ADD COLUMN IF NOT EXISTS last_modified_at TIMESTAMPTZ;
ALTER TABLE store_products
ADD COLUMN IF NOT EXISTS last_modified_by_task VARCHAR(50);
ALTER TABLE store_products
ADD COLUMN IF NOT EXISTS last_modified_task_id INTEGER;
-- Index for querying recently modified products
CREATE INDEX IF NOT EXISTS idx_store_products_last_modified
ON store_products(last_modified_at DESC)
WHERE last_modified_at IS NOT NULL;
-- Index for querying by task type
CREATE INDEX IF NOT EXISTS idx_store_products_modified_by_task
ON store_products(last_modified_by_task)
WHERE last_modified_by_task IS NOT NULL;
-- Composite index for finding products modified by a specific task
CREATE INDEX IF NOT EXISTS idx_store_products_task_modified
ON store_products(dispensary_id, last_modified_at DESC)
WHERE last_modified_at IS NOT NULL;
COMMENT ON COLUMN store_products.last_modified_at IS 'Timestamp when this record was last modified by a task';
COMMENT ON COLUMN store_products.last_modified_by_task IS 'Task role that last modified this record (e.g., product_refresh, product_discovery)';
COMMENT ON COLUMN store_products.last_modified_task_id IS 'ID of the worker_tasks record that last modified this';

View File

@@ -0,0 +1,26 @@
-- Migration 091: Add store discovery tracking columns
-- Per auto-healing scheme (2025-12-12):
-- Track when store_discovery last updated each dispensary
-- Track when last payload was saved
-- Add last_store_discovery_at to track when store_discovery updated this record
ALTER TABLE dispensaries
ADD COLUMN IF NOT EXISTS last_store_discovery_at TIMESTAMPTZ;
-- Add last_payload_at to track when last product payload was saved
-- (Complements last_fetch_at which tracks API fetch time)
ALTER TABLE dispensaries
ADD COLUMN IF NOT EXISTS last_payload_at TIMESTAMPTZ;
-- Add index for finding stale discovery data
CREATE INDEX IF NOT EXISTS idx_dispensaries_store_discovery_at
ON dispensaries (last_store_discovery_at DESC NULLS LAST)
WHERE crawl_enabled = true;
-- Add index for finding dispensaries without recent payloads
CREATE INDEX IF NOT EXISTS idx_dispensaries_payload_at
ON dispensaries (last_payload_at DESC NULLS LAST)
WHERE crawl_enabled = true;
COMMENT ON COLUMN dispensaries.last_store_discovery_at IS 'When store_discovery task last updated this record';
COMMENT ON COLUMN dispensaries.last_payload_at IS 'When last product payload was saved for this dispensary';

View File

@@ -0,0 +1,30 @@
-- Fix 3 Trulieve/Harvest stores with incorrect menu URLs
-- These records have NULL or mismatched platform_dispensary_id so store_discovery
-- ON CONFLICT can't update them automatically
UPDATE dispensaries
SET
menu_url = 'https://dutchie.com/dispensary/svaccha-llc-nirvana-center-apache-junction',
updated_at = NOW()
WHERE id = 224;
UPDATE dispensaries
SET
menu_url = 'https://dutchie.com/dispensary/trulieve-of-phoenix-tatum',
updated_at = NOW()
WHERE id = 76;
UPDATE dispensaries
SET
menu_url = 'https://dutchie.com/dispensary/harvest-of-havasu',
updated_at = NOW()
WHERE id = 403;
-- Queue entry_point_discovery tasks to resolve their platform_dispensary_id
-- method='http' ensures only workers that passed http preflight can claim these
INSERT INTO worker_tasks (role, dispensary_id, priority, scheduled_for, method)
VALUES
('entry_point_discovery', 224, 5, NOW(), 'http'),
('entry_point_discovery', 76, 5, NOW(), 'http'),
('entry_point_discovery', 403, 5, NOW(), 'http')
ON CONFLICT DO NOTHING;

View File

@@ -0,0 +1,35 @@
-- Migration 092: Store Intelligence Cache
-- Pre-computed store intelligence data refreshed by analytics_refresh task
-- Eliminates costly aggregation queries on /intelligence/stores endpoint
CREATE TABLE IF NOT EXISTS store_intelligence_cache (
dispensary_id INTEGER PRIMARY KEY REFERENCES dispensaries(id) ON DELETE CASCADE,
-- Basic counts
sku_count INTEGER NOT NULL DEFAULT 0,
brand_count INTEGER NOT NULL DEFAULT 0,
snapshot_count INTEGER NOT NULL DEFAULT 0,
-- Pricing
avg_price_rec NUMERIC(10,2),
avg_price_med NUMERIC(10,2),
min_price NUMERIC(10,2),
max_price NUMERIC(10,2),
-- Category breakdown (JSONB for flexibility)
category_counts JSONB DEFAULT '{}',
-- Timestamps
last_crawl_at TIMESTAMPTZ,
last_refresh_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
-- Metadata
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Index for fast lookups
CREATE INDEX IF NOT EXISTS idx_store_intelligence_cache_refresh
ON store_intelligence_cache (last_refresh_at DESC);
COMMENT ON TABLE store_intelligence_cache IS 'Pre-computed store intelligence metrics, refreshed by analytics_refresh task';
COMMENT ON COLUMN store_intelligence_cache.category_counts IS 'JSON object mapping category_raw to product count';

View File

@@ -0,0 +1,43 @@
-- Migration: 093_fix_mv_state_metrics.sql
-- Purpose: Fix mv_state_metrics to use brand_name_raw and show correct store counts
-- Issues fixed:
-- 1. unique_brands used brand_id (often NULL), now uses brand_name_raw
-- 2. Added out_of_stock_products column
-- 3. dispensary_count now correctly named
-- Drop and recreate the materialized view with correct definition
DROP MATERIALIZED VIEW IF EXISTS mv_state_metrics;
CREATE MATERIALIZED VIEW mv_state_metrics AS
SELECT
d.state,
s.name AS state_name,
COUNT(DISTINCT d.id) AS dispensary_count,
COUNT(DISTINCT CASE WHEN d.menu_type = 'dutchie' THEN d.id END) AS dutchie_stores,
COUNT(DISTINCT CASE WHEN d.crawl_enabled = true THEN d.id END) AS active_stores,
COUNT(sp.id) AS total_products,
COUNT(CASE WHEN COALESCE(sp.is_in_stock, true) THEN sp.id END) AS in_stock_products,
COUNT(CASE WHEN sp.is_in_stock = false THEN sp.id END) AS out_of_stock_products,
COUNT(CASE WHEN sp.is_on_special THEN sp.id END) AS on_special_products,
COUNT(DISTINCT sp.brand_name_raw) FILTER (WHERE sp.brand_name_raw IS NOT NULL AND sp.brand_name_raw != '') AS unique_brands,
COUNT(DISTINCT sp.category_raw) FILTER (WHERE sp.category_raw IS NOT NULL) AS unique_categories,
ROUND(AVG(sp.price_rec) FILTER (WHERE sp.price_rec > 0)::NUMERIC, 2) AS avg_price_rec,
MIN(sp.price_rec) FILTER (WHERE sp.price_rec > 0) AS min_price_rec,
MAX(sp.price_rec) FILTER (WHERE sp.price_rec > 0) AS max_price_rec,
NOW() AS refreshed_at
FROM dispensaries d
LEFT JOIN states s ON d.state = s.code
LEFT JOIN store_products sp ON d.id = sp.dispensary_id
WHERE d.state IS NOT NULL
GROUP BY d.state, s.name;
-- Create unique index for CONCURRENTLY refresh support
CREATE UNIQUE INDEX idx_mv_state_metrics_state ON mv_state_metrics(state);
-- Update refresh function
CREATE OR REPLACE FUNCTION refresh_state_metrics()
RETURNS void AS $$
BEGIN
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_state_metrics;
END;
$$ LANGUAGE plpgsql;

View File

@@ -0,0 +1,516 @@
-- Migration: Import 500 Evomi residential proxies
-- These are sticky-session rotating proxies where password contains session ID
-- Active is set to false - run Test All to verify and activate
-- First, drop the old unique constraint that doesn't account for username/password
ALTER TABLE proxies DROP CONSTRAINT IF EXISTS proxies_host_port_protocol_key;
-- Add new unique constraint that includes username and password
-- This allows multiple entries for the same host:port with different credentials (sessions)
ALTER TABLE proxies ADD CONSTRAINT proxies_host_port_protocol_username_password_key
UNIQUE(host, port, protocol, username, password);
-- Now insert all 500 proxies
INSERT INTO proxies (host, port, protocol, username, password, active, max_connections)
VALUES
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-4XRRPF1UQ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-5UNGX7N7K', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-9PSKYP1GU', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-GZBKKYL2S', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-YHJHM0XZU', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-ESDYQ34CJ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-GAXUMFKQI', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-2FF66K4CI', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-SUYM0R49B', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-A8VHZMEFP', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-WNRLH6NXR', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-SPSB3IUX6', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-85N76UU5Q', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-189P3LH2F', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-47DQOAGWY', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-IBT0QO7M2', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-UPXOUOH8X', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-BFQ1PH75D', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-KNTFKRY1J', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-5L8IG6DZX', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-9YE13X0BA', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-6KBHCHF0I', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-CETHHFHZ6', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-A06J8ST3I', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-YFS93P1YR', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-RB74B3R6C', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-2JW27O3EU', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-KCUX84BL0', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-1A2KSG6HO', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-4QW8ILV0E', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-0Q09GH2VL', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-16BRXBCYC', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-9W02B3R4L', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-CVAEH76YT', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-CATOG0Q5I', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-F81625L74', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-DO4AVTPK4', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-SBZPXORD5', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-JA1AWOX03', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-0FUJTRSYT', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-CM1R2RSTB', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-EHPJZCK1S', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-ZYLKORNAF', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-05A8BUD25', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-RHM1Q6O4M', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-ES5VPCE6Z', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-P0JEGLP4O', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-OC4AX88D0', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-3BN54IEBV', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-ABSC7S550', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-LNIJU6R2V', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-OYGQPPCOV', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-32YBOHQWR', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-7KGEMK4SL', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-FAW8T2EBW', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-GPV69KI9T', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-JPBHSN8M2', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-VZ1JQOF15', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-7DJXXPK1E', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-JXKQ7JVZ1', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-88Q5UQX3B', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-HAI5K0JFO', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-65SUKG0QH', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-1XFJETX1F', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-7ZNUCVCBW', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-O1DCK15LA', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-WLTEA65WB', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-KCHAFNK2P', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-6ODSZ6CUT', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-SZ8R2EFH4', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-9EPPYQREC', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-MPCBES7UI', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-FCCPL0XWZ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-GJ23UYEGI', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-RQT80689I', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-TDQO2AP5E', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-D5Q5SEUEO', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-DZN4ZTENM', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-4HVQ33VK9', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-F1HJ7GPHA', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-RM708QD2Y', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-K36N27GM5', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-O73TS0DAE', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-54QXRWEA8', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-1P6LP0365', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-WMZ2ST34E', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-175UYF58T', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-W0HTK6F28', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-D5275CTIM', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-IH2IWVZOH', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-C4VFW7GSA', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-O9XGULSNA', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-PJ1W1P5L9', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-MQQU30KPC', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-BNPIBZTYV', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-7BNRCH922', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-5AZLU117B', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-3PPJ49VJC', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-FMC8CQO74', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-VCHW23CXJ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-1S4749PCB', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-0T9DJFZPK', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-L0RMV65W3', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-FZ1ZZUQNA', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-6IFJD23DI', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-ZKUEP5XM0', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-Z8KU62CLT', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-LO77J78X1', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-27FBKYRJ4', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-0TDQTESGW', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-IMKI89WQ1', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-ANS65MIJS', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-O3T2OTT0Y', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-MWW6Z1QVM', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-TT47MX0BB', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-59CFKTM14', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-DOD61TVZN', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-RH9Y298WS', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-X98AATJ7B', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-C3UMES1W8', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-8O3J7G3PT', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-3K4OH78OJ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-N4A3JMVL1', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-HK1SRLAC9', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-Y9VLJJXVU', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-KTTH7R0EC', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-JKVX01E8T', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-HW2VPAHJO', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-7WZ9UHBH8', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-JTKFK0CP7', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-G3F27NXG5', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-K7I2JWYSP', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-CTUU8UQ0T', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-ISHMAP6RQ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-LVWNZ1LHP', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-N5CQ1YG2Z', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-XL2XY2SLZ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-UCRZVFIV1', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-VLGQFYNEL', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-YPCDM9O5Y', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-R6VA2S25E', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-4W8X8BBUL', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-5INDC8M80', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-Q8RKKOF29', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-B5ED3EFBC', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-8IC5ZXAX1', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-KCGM25D75', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-1MO06IRID', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-4QWGUGN6W', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-5T9M5KEHT', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-9KG7W7NZF', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-NYGN5R2CL', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-H61OXFCJ2', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-30WSQ4EFH', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-J36NG6MY2', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-TZU34ZA7A', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-LPWNYL74G', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-DDJTXOS4Z', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-HFOS4S185', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-2MLGIFL1M', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-CI5AHX0TC', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-WSXVCH1WN', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-8F0C3D06T', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-3YZR0664F', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-1L2VMWTM0', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-KPMCB57O7', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-N6QXQDZV3', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-35FAYFWDP', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-TVZWE2JR8', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-0WK86IKLF', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-8WBU6ESHJ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-XGU6UNM01', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-86CXNEQZC', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-NZ4LFCHE3', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-ZKB6D72RF', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-BKXNG77NS', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-3MJ332POD', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-SL9VEYNJ0', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-LY8KO43Z8', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-8KGF1XR1L', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-WT6FB54HW', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-7UQ9JMG5E', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-KX3L2040U', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-HL809F9WU', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-T9GU40ERH', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-I5O2NX3G9', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-RVOUYU3NO', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-2T3ETNUKS', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-SW0B93DZZ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-PQ55UF3K6', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-VNRWWHHJB', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-8Q26FZ7EP', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-ZWD9FA90J', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-QSGMQX3RZ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-83NZ9MEAC', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-Q9QQ4AL37', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-QBE9KD60Z', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-NRNUXUO44', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-8F0XKQ9P8', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-095JV1CJN', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-WRRSIRUTZ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-DTUD7IDQI', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-ASCEAI9LD', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-YOUM7BJZH', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-PEG2ZH9J3', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-WAUW31F78', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-GIBZ6U7AQ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-63TD9LFBG', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-0MH1N9MJB', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-YFP9RNQIK', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-SW4N5162D', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-53MWFB2MP', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-QWLUKBMIN', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-JHS6QIX9G', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-6R04HZ5UD', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-OUJLT31VN', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-6BMKW933S', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-R4GG84E4Q', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-00XAP630X', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-AK97MC2A0', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-NBS2GKGO5', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-NVFEWK4S5', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-MTV3WSYS1', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-JS8RM4JGW', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-6NL4QR1XN', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-4BUUQVSN6', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-56WEAAU3M', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-WCA56PFTF', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-TK1QAZP0B', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-SYZ5ADFXP', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-S3VLOUW6G', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-V2K1V1JWJ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-MZ6VHV5PQ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-DRZDQDPN3', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-231VVRYYA', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-06G3MC88G', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-WS52I2ZVD', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-3QTNQD55U', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-EX7ALECU3', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-DQN8TVQY6', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-FJT54OQFI', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-BLTYUF7QR', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-8DL2JXDSO', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-KBAOXIJ4Y', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-ZYL28R5UW', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-NCRDA8LYB', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-BQYKXQLXU', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-PSHCS65MR', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-90Y1WFVYZ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-4GG33NUPW', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-5Y0A79GED', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-RMZHTAD6J', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-XBSOJ5I36', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-AAJW53VNE', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-9NYSPSEL6', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-94WMY337S', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-35Y3BJQFW', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-R7WY3TMRC', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-RXAQVH0F3', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-EFQ2AVFSB', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-XPOUJSAVD', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-RSHPF5NTT', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-Z9402336V', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-OI36C5WOJ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-XEOGV1LVS', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-QIQDXG9NC', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-9IY242GGT', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-PQTEUT52E', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-18NKI3WPS', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-34U3QAA49', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-S05TYKBBF', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-B4J8WCWDD', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-HR377WC28', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-PNRR7S1T2', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-UNR0N0KJ9', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-NARQQANBE', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-8PUL1MYUU', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-KJPCT1FP3', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-XGC80N0AM', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-8Y1JN8DH3', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-Y56M31T07', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-NHYHXQSV1', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-V30RZVG7L', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-CR6V2GSOU', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-VSAF5O0LJ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-4F4BF2LFH', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-ERSMQHXNX', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-Q0TFLZQWS', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-ZXCS6SMHD', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-JHXYAUGRA', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-IT2XYWES2', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-22UCD94OG', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-VGDLQ3K35', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-O8AFL8RGX', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-9RBIZ8G9X', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-9JIU0SVBV', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-PWRBG0GWU', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-ZME1MX12T', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-A7LWRKSJP', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-5XISX0HD4', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-5T6EXKD3Z', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-10ILV351B', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-FDULBZDIY', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-SFVR6I980', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-FKV8DCZGT', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-ECRK3M3IZ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-WMKSLOF39', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-HGE60O6AL', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-RGCWDJOT8', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-DESWK5KVN', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-RD593HJ92', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-XWNCAO39B', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-AQ4XGDLX8', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-2ZOVEA1PL', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-JF4FUX83X', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-CQ228GK3B', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-XCTMU9I7U', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-M3F37T22W', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-ASZUXM9M9', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-CJVHX24WW', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-KZT4T898V', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-RI128R5TE', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-HCAG6X9MJ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-XOQENWBP7', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-1LTQGM497', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-ZLVZT4O1G', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-FTIXTXCIA', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-O2YE6QNHY', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-0JPDDBF47', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-H1FP1IFJI', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-FYBPBMY5B', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-F7BWDVC97', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-MLENB1LQ4', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-FT9YNU8UP', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-5W21Q2O5L', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-YM61QWPR3', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-XXFQJJHZM', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-H52YKCM9X', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-NT56ZNZ54', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-DRJY7BMB5', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-P6886RPXX', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-PBXW2EY5K', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-5VQCJTM36', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-NMM3GGM1J', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-1JQQ0CDSA', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-R89YI91K4', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-7L7L9MXOT', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-50Z7MXKZS', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-EGADRZTIB', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-1DR7H46H6', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-O28QZL994', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-EYTRWVERM', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-HAJZAUWJV', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-AGYO3AB89', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-V224329ZM', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-4YTMSFWYK', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-QP40RL1N1', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-CB1BVAMAH', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-9VGXUY02O', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-BCPVVKCZ3', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-VDC3CWZX7', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-7HWLI21FA', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-5QWIUJEFM', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-4C3PBMAIZ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-3QC7DM7PH', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-A6R5G3FWV', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-3A6WDE12Z', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-0F2LZA9RU', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-XGBJXMXRX', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-5YOGR8PQ1', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-LPBFBUF3N', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-TUSPGR2AY', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-G05I8M2FQ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-H5NDXJIAQ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-X8FJL8WQZ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-KIB2FQRUP', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-VNV0OYWR7', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-GKBPM3PB2', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-XVPI30KE7', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-Y3PRMJP51', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-KEPP5SBML', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-0PDUZ6QEQ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-1GHWWFLLE', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-149S2TO8O', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-1ZB6FSIGE', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-VCRQTXDZL', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-645JVC3XL', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-2HJ00JBSR', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-7FZDG2W65', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-HD6ANE3LN', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-2HS1B1J8V', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-IHOHYMDF5', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-ZYZMAFEKF', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-JO85WX5JE', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-RURJDCURW', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-FZC3BLXPJ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-B0YR2LOZ1', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-6ZFP58ZRK', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-UMZDLHQ78', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-8A2IHDXY3', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-EDYEPWUMT', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-X3TM99R12', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-DLV0UTQ72', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-SFU0ZYIM0', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-YAJ6A66NH', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-X8CFU41AU', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-CJ3Z4WP32', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-UJBLRQKXA', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-T78R8EBGH', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-DDIH55GNZ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-F1SSD4NWF', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-4BE55FKRD', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-BG2DFBL46', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-MKVMNR7W4', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-C3Z4JUGU5', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-NVP8EEEGQ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-MQFWP2LU7', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-BH873JG6H', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-3D76651SM', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-KZ7V6KWMP', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-CD8NEJFJN', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-PWXE9L30H', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-1RT95F5LR', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-Q7CEEROE5', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-Q08APOAEG', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-NNKREGLXE', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-YQEG33MKX', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-VRD9G7H5K', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-68R86GQ1G', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-BXZUKQL2M', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-QM13UD73C', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-I7OOGJLNS', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-GXDBO1IQJ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-JJZPRFMWN', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-DBTDFITGW', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-VYHL6ASIJ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-F61NNU332', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-6Z9H72KMC', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-WVOONDMA9', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-CXTSTBXN3', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-CSMZLC921', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-3FTBSARZJ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-ESHGKBXLY', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-E0YLXW5H4', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-3QFI6UMWE', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-23VOWHO88', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-02Q9U5QCH', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-3POMNSMB0', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-NTT8OWUFQ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-MT5XEHJWX', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-ILDOY0PCQ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-MN9HU4DGO', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-1YOPU7GLL', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-ZC5BM5MYB', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-UD3FXK3I9', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-LMDJOV52Y', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-N45X16BSL', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-1CBY3Z7QC', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-F0D3AO9E6', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-YQA8GUOD1', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-2EE999233', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-D6GD5WT2Y', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-7DFBMLTMY', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-J6TJKC6VJ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-2AWQ3ZRF4', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-4KOVIF5W3', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-3489SXI1U', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-F37VKUHVE', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-GHBMAVCE4', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-W64U46547', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-1GUJV1MGQ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-M13IOZVI9', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-TX7EVZN1Z', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-2PTS2ML8J', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-VTG83RVX7', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-2IOE6BR66', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-I68XZMR23', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-Q940UN6MU', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-8Y9NFR0N0', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-MYP341DZ8', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-WJ68VGKAZ', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-819MSDR9H', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-27CGND4VG', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-YYDOD47BF', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-YU7F6J8G5', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-HMY16WTCA', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-FPWEBRLG2', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-FGE79X0DE', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-551LMZ84R', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-UWMBDCTX4', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-BNHQXW9HY', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-WB0P5LCN6', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-Z4P9E1SVG', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-UVW2G9IRN', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-OO93WVLB0', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-NTRIK82TG', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-8TXV42S74', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-Z74LKL50G', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-QQEXNIPTR', false, 1),
('core-residential.evomi.com', 1000, 'http', 'kl8', 'ogh9U1Xe7Gzxzozo4rmP_country-US_session-WGK2VD34L', false, 1)
ON CONFLICT DO NOTHING;

View File

@@ -0,0 +1,81 @@
-- Migration: Auto-retry failed proxies after cooldown period
-- Proxies that fail will be retried after a configurable interval
-- Add last_failed_at column to track when proxy last failed
ALTER TABLE proxies ADD COLUMN IF NOT EXISTS last_failed_at TIMESTAMP;
-- Add retry settings
INSERT INTO settings (key, value, description)
VALUES
('proxy_retry_interval_hours', '4', 'Hours to wait before retrying a failed proxy'),
('proxy_max_failures_before_permanent', '10', 'Max failures before proxy is permanently disabled')
ON CONFLICT (key) DO NOTHING;
-- Create function to get eligible proxies (active OR failed but past retry interval)
CREATE OR REPLACE FUNCTION get_eligible_proxy_ids()
RETURNS TABLE(proxy_id INT) AS $$
DECLARE
retry_hours INT;
BEGIN
-- Get retry interval from settings (default 4 hours)
SELECT COALESCE(value::int, 4) INTO retry_hours
FROM settings WHERE key = 'proxy_retry_interval_hours';
RETURN QUERY
SELECT p.id
FROM proxies p
WHERE p.active = true
OR (
p.active = false
AND p.last_failed_at IS NOT NULL
AND p.last_failed_at < NOW() - (retry_hours || ' hours')::interval
AND p.failure_count < 10 -- Don't retry if too many failures
)
ORDER BY
p.active DESC, -- Prefer active proxies
p.failure_count ASC, -- Then prefer proxies with fewer failures
RANDOM();
END;
$$ LANGUAGE plpgsql;
-- Create scheduled job to periodically re-enable proxies past their retry window
-- This runs every hour and marks proxies as active if they're past retry interval
CREATE OR REPLACE FUNCTION auto_reenable_proxies()
RETURNS INT AS $$
DECLARE
retry_hours INT;
max_failures INT;
reenabled_count INT;
BEGIN
-- Get settings
SELECT COALESCE(value::int, 4) INTO retry_hours
FROM settings WHERE key = 'proxy_retry_interval_hours';
SELECT COALESCE(value::int, 10) INTO max_failures
FROM settings WHERE key = 'proxy_max_failures_before_permanent';
-- Re-enable proxies that have cooled down
UPDATE proxies
SET active = true,
updated_at = NOW()
WHERE active = false
AND last_failed_at IS NOT NULL
AND last_failed_at < NOW() - (retry_hours || ' hours')::interval
AND failure_count < max_failures;
GET DIAGNOSTICS reenabled_count = ROW_COUNT;
IF reenabled_count > 0 THEN
RAISE NOTICE 'Auto-reenabled % proxies after % hour cooldown', reenabled_count, retry_hours;
END IF;
RETURN reenabled_count;
END;
$$ LANGUAGE plpgsql;
-- Add index for efficient querying
CREATE INDEX IF NOT EXISTS idx_proxies_retry
ON proxies(active, last_failed_at, failure_count);
COMMENT ON COLUMN proxies.last_failed_at IS 'Timestamp of last failure - used for auto-retry logic';
COMMENT ON FUNCTION auto_reenable_proxies() IS 'Call periodically to re-enable failed proxies that have cooled down';

View File

@@ -0,0 +1,20 @@
-- Migration: Add trigram indexes for fast ILIKE product searches
-- Enables fast searches on name_raw, brand_name_raw, and description
-- Enable pg_trgm extension if not already enabled
CREATE EXTENSION IF NOT EXISTS pg_trgm;
-- Create GIN trigram indexes for fast ILIKE searches
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_store_products_name_trgm
ON store_products USING gin (name_raw gin_trgm_ops);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_store_products_brand_name_trgm
ON store_products USING gin (brand_name_raw gin_trgm_ops);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_store_products_description_trgm
ON store_products USING gin (description gin_trgm_ops);
-- Add comment
COMMENT ON INDEX idx_store_products_name_trgm IS 'Trigram index for fast ILIKE searches on product name';
COMMENT ON INDEX idx_store_products_brand_name_trgm IS 'Trigram index for fast ILIKE searches on brand name';
COMMENT ON INDEX idx_store_products_description_trgm IS 'Trigram index for fast ILIKE searches on description';

View File

@@ -0,0 +1,11 @@
-- Migration: Add indexes for dashboard performance
-- Speeds up the tasks listing query with ORDER BY and JOIN
-- Index for JOIN with worker_registry
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_worker_tasks_worker_id
ON worker_tasks(worker_id)
WHERE worker_id IS NOT NULL;
-- Index for ORDER BY created_at DESC (dashboard listing)
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_worker_tasks_created_at_desc
ON worker_tasks(created_at DESC);

View File

@@ -0,0 +1,13 @@
-- Migration: Add stage tracking columns to dispensaries table
-- Required for stage checkpoint feature in task handlers
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS consecutive_successes INTEGER DEFAULT 0;
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS consecutive_failures INTEGER DEFAULT 0;
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS last_successful_crawl_at TIMESTAMPTZ;
-- Indexes for finding stores by status
CREATE INDEX IF NOT EXISTS idx_dispensaries_consecutive_successes
ON dispensaries(consecutive_successes) WHERE consecutive_successes > 0;
CREATE INDEX IF NOT EXISTS idx_dispensaries_consecutive_failures
ON dispensaries(consecutive_failures) WHERE consecutive_failures > 0;

View File

@@ -0,0 +1,68 @@
-- Migration: 099_working_hours.sql
-- Description: Working hours profiles for natural traffic pattern simulation
-- Created: 2024-12-13
-- Working hours table: defines hourly activity weights to mimic natural traffic
CREATE TABLE IF NOT EXISTS working_hours (
id SERIAL PRIMARY KEY,
name VARCHAR(50) UNIQUE NOT NULL,
description TEXT,
-- Hour weights: {"0": 15, "1": 5, ..., "18": 100, ...}
-- Value = percent chance to trigger activity that hour (0-100)
hour_weights JSONB NOT NULL,
-- Day-of-week multipliers (0=Sunday, 6=Saturday)
-- Optional adjustment for weekend vs weekday patterns
dow_weights JSONB DEFAULT '{"0": 90, "1": 100, "2": 100, "3": 100, "4": 100, "5": 110, "6": 95}',
timezone VARCHAR(50) DEFAULT 'America/Phoenix',
enabled BOOLEAN DEFAULT true,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Seed: Natural traffic pattern based on internet usage research
-- Optimized for cannabis dispensary browsing (lunch + after-work peaks)
INSERT INTO working_hours (name, description, timezone, hour_weights) VALUES (
'natural_traffic',
'Mimics natural user browsing patterns - peaks at lunch and 5-7 PM',
'America/Phoenix',
'{
"0": 15,
"1": 5,
"2": 5,
"3": 5,
"4": 5,
"5": 10,
"6": 20,
"7": 30,
"8": 35,
"9": 45,
"10": 50,
"11": 60,
"12": 75,
"13": 65,
"14": 60,
"15": 70,
"16": 80,
"17": 95,
"18": 100,
"19": 100,
"20": 90,
"21": 70,
"22": 45,
"23": 25
}'::jsonb
) ON CONFLICT (name) DO UPDATE SET
hour_weights = EXCLUDED.hour_weights,
description = EXCLUDED.description,
updated_at = NOW();
-- Index for quick lookups
CREATE INDEX IF NOT EXISTS idx_working_hours_name ON working_hours(name);
CREATE INDEX IF NOT EXISTS idx_working_hours_enabled ON working_hours(enabled);
COMMENT ON TABLE working_hours IS 'Activity profiles for natural traffic simulation. Hour weights are percent chance (0-100) to trigger activity.';
COMMENT ON COLUMN working_hours.hour_weights IS 'JSON object mapping hour (0-23) to percent chance (0-100). 100 = always run, 0 = never run.';
COMMENT ON COLUMN working_hours.dow_weights IS 'Optional day-of-week multipliers. 0=Sunday. Applied as (hour_weight * dow_weight / 100).';

View File

@@ -0,0 +1,19 @@
-- Migration: 100_worker_timezone.sql
-- Description: Add timezone column to worker_registry for working hours support
-- Created: 2024-12-13
-- Add timezone column to worker_registry
-- Populated from preflight IP geolocation (e.g., 'America/New_York')
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS timezone VARCHAR(50);
-- Add working_hours_id to link worker to a specific working hours profile
-- NULL means use default 'natural_traffic' profile
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS working_hours_id INTEGER REFERENCES working_hours(id);
-- Index for workers by timezone (useful for capacity planning)
CREATE INDEX IF NOT EXISTS idx_worker_registry_timezone ON worker_registry(timezone);
COMMENT ON COLUMN worker_registry.timezone IS 'IANA timezone from preflight IP geolocation (e.g., America/New_York)';
COMMENT ON COLUMN worker_registry.working_hours_id IS 'Reference to working_hours profile. NULL uses default natural_traffic.';

View File

@@ -0,0 +1,78 @@
-- Migration: 101_worker_preflight_timezone.sql
-- Description: Update update_worker_preflight to extract timezone from fingerprint
-- Created: 2024-12-13
CREATE OR REPLACE FUNCTION public.update_worker_preflight(
p_worker_id character varying,
p_transport character varying,
p_status character varying,
p_ip character varying DEFAULT NULL,
p_response_ms integer DEFAULT NULL,
p_error text DEFAULT NULL,
p_fingerprint jsonb DEFAULT NULL
)
RETURNS void
LANGUAGE plpgsql
AS $function$
DECLARE
v_curl_status VARCHAR(20);
v_http_status VARCHAR(20);
v_overall_status VARCHAR(20);
v_timezone VARCHAR(50);
BEGIN
IF p_transport = 'curl' THEN
UPDATE worker_registry
SET
preflight_curl_status = p_status,
preflight_curl_at = NOW(),
preflight_curl_ms = p_response_ms,
preflight_curl_error = p_error,
curl_ip = p_ip,
updated_at = NOW()
WHERE worker_id = p_worker_id;
ELSIF p_transport = 'http' THEN
-- Extract timezone from fingerprint JSON if present
v_timezone := p_fingerprint->>'detectedTimezone';
UPDATE worker_registry
SET
preflight_http_status = p_status,
preflight_http_at = NOW(),
preflight_http_ms = p_response_ms,
preflight_http_error = p_error,
http_ip = p_ip,
fingerprint_data = COALESCE(p_fingerprint, fingerprint_data),
-- Save extracted timezone
timezone = COALESCE(v_timezone, timezone),
updated_at = NOW()
WHERE worker_id = p_worker_id;
END IF;
-- Update overall preflight status
SELECT preflight_curl_status, preflight_http_status
INTO v_curl_status, v_http_status
FROM worker_registry
WHERE worker_id = p_worker_id;
-- Compute overall status
IF v_curl_status = 'passed' AND v_http_status = 'passed' THEN
v_overall_status := 'passed';
ELSIF v_curl_status = 'passed' OR v_http_status = 'passed' THEN
v_overall_status := 'partial';
ELSIF v_curl_status = 'failed' OR v_http_status = 'failed' THEN
v_overall_status := 'failed';
ELSE
v_overall_status := 'pending';
END IF;
UPDATE worker_registry
SET
preflight_status = v_overall_status,
preflight_at = NOW()
WHERE worker_id = p_worker_id;
END;
$function$;
COMMENT ON FUNCTION update_worker_preflight(varchar, varchar, varchar, varchar, integer, text, jsonb)
IS 'Updates worker preflight status and extracts timezone from fingerprint for working hours';

View File

@@ -0,0 +1,114 @@
-- Migration: 102_check_working_hours.sql
-- Description: Function to check if worker should be available based on working hours
-- Created: 2024-12-13
-- Function to check if a worker should be available for work
-- Returns TRUE if worker passes the probability check for current hour
-- Returns FALSE if worker should sleep/skip this cycle
CREATE OR REPLACE FUNCTION check_working_hours(
p_worker_id VARCHAR,
p_profile_name VARCHAR DEFAULT 'natural_traffic'
)
RETURNS TABLE (
is_available BOOLEAN,
current_hour INTEGER,
hour_weight INTEGER,
worker_timezone VARCHAR,
roll INTEGER,
reason TEXT
)
LANGUAGE plpgsql
AS $function$
DECLARE
v_timezone VARCHAR(50);
v_hour INTEGER;
v_weight INTEGER;
v_dow INTEGER;
v_dow_weight INTEGER;
v_final_weight INTEGER;
v_roll INTEGER;
v_hour_weights JSONB;
v_dow_weights JSONB;
v_profile_enabled BOOLEAN;
BEGIN
-- Get worker's timezone (from preflight)
SELECT wr.timezone INTO v_timezone
FROM worker_registry wr
WHERE wr.worker_id = p_worker_id;
-- Default to America/Phoenix if no timezone set
v_timezone := COALESCE(v_timezone, 'America/Phoenix');
-- Get current hour in worker's timezone
v_hour := EXTRACT(HOUR FROM NOW() AT TIME ZONE v_timezone)::INTEGER;
-- Get day of week (0=Sunday)
v_dow := EXTRACT(DOW FROM NOW() AT TIME ZONE v_timezone)::INTEGER;
-- Get working hours profile
SELECT wh.hour_weights, wh.dow_weights, wh.enabled
INTO v_hour_weights, v_dow_weights, v_profile_enabled
FROM working_hours wh
WHERE wh.name = p_profile_name AND wh.enabled = true;
-- If profile not found or disabled, always available
IF v_hour_weights IS NULL THEN
RETURN QUERY SELECT
TRUE::BOOLEAN,
v_hour,
100::INTEGER,
v_timezone,
0::INTEGER,
'Profile not found or disabled - defaulting to available'::TEXT;
RETURN;
END IF;
-- Get hour weight (default to 50 if hour not specified)
v_weight := COALESCE((v_hour_weights->>v_hour::TEXT)::INTEGER, 50);
-- Get day-of-week weight (default to 100)
v_dow_weight := COALESCE((v_dow_weights->>v_dow::TEXT)::INTEGER, 100);
-- Calculate final weight (hour_weight * dow_weight / 100)
v_final_weight := (v_weight * v_dow_weight / 100);
-- Roll the dice (0-99)
v_roll := floor(random() * 100)::INTEGER;
-- Return result
RETURN QUERY SELECT
(v_roll < v_final_weight)::BOOLEAN AS is_available,
v_hour AS current_hour,
v_final_weight AS hour_weight,
v_timezone AS worker_timezone,
v_roll AS roll,
CASE
WHEN v_roll < v_final_weight THEN
format('Available: rolled %s < %s%% threshold', v_roll, v_final_weight)
ELSE
format('Sleeping: rolled %s >= %s%% threshold', v_roll, v_final_weight)
END AS reason;
END;
$function$;
-- Simplified version that just returns boolean
CREATE OR REPLACE FUNCTION is_worker_available(
p_worker_id VARCHAR,
p_profile_name VARCHAR DEFAULT 'natural_traffic'
)
RETURNS BOOLEAN
LANGUAGE plpgsql
AS $function$
DECLARE
v_result BOOLEAN;
BEGIN
SELECT is_available INTO v_result
FROM check_working_hours(p_worker_id, p_profile_name);
RETURN COALESCE(v_result, TRUE);
END;
$function$;
COMMENT ON FUNCTION check_working_hours(VARCHAR, VARCHAR) IS
'Check if worker should be available based on working hours profile. Returns detailed info.';
COMMENT ON FUNCTION is_worker_available(VARCHAR, VARCHAR) IS
'Simple boolean check if worker passes working hours probability roll.';

View File

@@ -0,0 +1,12 @@
-- Migration: 103_schedule_dispensary_id.sql
-- Description: Add dispensary_id to task_schedules for per-store schedules
-- Created: 2025-12-13
-- Add dispensary_id column for single-store schedules
ALTER TABLE task_schedules
ADD COLUMN IF NOT EXISTS dispensary_id INTEGER REFERENCES dispensaries(id);
-- Index for quick lookups
CREATE INDEX IF NOT EXISTS idx_task_schedules_dispensary_id ON task_schedules(dispensary_id);
COMMENT ON COLUMN task_schedules.dispensary_id IS 'For single-store schedules. If set, only this store is refreshed. If NULL, uses state_code for all stores in state.';

View File

@@ -0,0 +1,25 @@
-- Migration 104: Add source tracking to worker_tasks
-- Purpose: Track WHERE tasks are created from (schedule vs API endpoint)
--
-- All automated task creation should be visible in task_schedules.
-- This column helps identify "phantom" tasks created outside the schedule system.
-- Add source column to worker_tasks
ALTER TABLE worker_tasks
ADD COLUMN IF NOT EXISTS source VARCHAR(100);
-- Add source_id column (references schedule_id if from a schedule)
ALTER TABLE worker_tasks
ADD COLUMN IF NOT EXISTS source_schedule_id INTEGER REFERENCES task_schedules(id);
-- Add request metadata (IP, user agent) for debugging
ALTER TABLE worker_tasks
ADD COLUMN IF NOT EXISTS source_metadata JSONB;
-- Create index for querying by source
CREATE INDEX IF NOT EXISTS idx_worker_tasks_source ON worker_tasks(source);
-- Comment explaining source values
COMMENT ON COLUMN worker_tasks.source IS 'Task creation source: schedule, api_run_now, api_crawl_state, api_batch_staggered, api_batch_az_stores, task_chain, manual';
COMMENT ON COLUMN worker_tasks.source_schedule_id IS 'ID of the schedule that created this task (if source=schedule or source=api_run_now)';
COMMENT ON COLUMN worker_tasks.source_metadata IS 'Request metadata: {ip, user_agent, endpoint, timestamp}';

View File

@@ -0,0 +1,25 @@
-- Migration 105: Add indexes for dashboard performance
-- Purpose: Speed up the /dashboard and /national/summary endpoints
--
-- These queries were identified as slow:
-- 1. COUNT(*) FROM store_product_snapshots WHERE captured_at >= NOW() - INTERVAL '24 hours'
-- 2. National summary aggregate queries
-- Index for snapshot counts by time (used in dashboard)
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_store_product_snapshots_captured_at
ON store_product_snapshots(captured_at DESC);
-- Index for crawl traces by time and success (used in dashboard)
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_crawl_traces_started_success
ON crawl_orchestration_traces(started_at DESC, success);
-- Partial index for recent failed crawls (faster for dashboard alerts)
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_crawl_traces_recent_failures
ON crawl_orchestration_traces(started_at DESC)
WHERE success = false;
-- Composite index for store_products aggregations by dispensary
-- Helps with national summary state metrics query
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_store_products_dispensary_brand
ON store_products(dispensary_id, brand_name_raw)
WHERE brand_name_raw IS NOT NULL;

View File

@@ -0,0 +1,10 @@
-- Migration: 106_rename_store_discovery_schedule.sql
-- Description: Rename store_discovery_dutchie to 'Store Discovery'
-- Created: 2025-12-13
-- Update the schedule name for better display
-- The platform='dutchie' field is preserved for badge display in UI
UPDATE task_schedules
SET name = 'Store Discovery',
updated_at = NOW()
WHERE name = 'store_discovery_dutchie';

View File

@@ -0,0 +1,23 @@
-- Migration: 107_proxy_tracking.sql
-- Description: Add proxy tracking columns to worker_tasks for geo-targeting visibility
-- Created: 2025-12-13
-- Add proxy tracking columns to worker_tasks
ALTER TABLE worker_tasks
ADD COLUMN IF NOT EXISTS proxy_ip VARCHAR(45);
ALTER TABLE worker_tasks
ADD COLUMN IF NOT EXISTS proxy_geo VARCHAR(100);
ALTER TABLE worker_tasks
ADD COLUMN IF NOT EXISTS proxy_source VARCHAR(10);
-- Comments
COMMENT ON COLUMN worker_tasks.proxy_ip IS 'IP address of proxy used for this task';
COMMENT ON COLUMN worker_tasks.proxy_geo IS 'Geo target used (e.g., "arizona", "phoenix, arizona")';
COMMENT ON COLUMN worker_tasks.proxy_source IS 'Source of proxy: "api" (Evomi dynamic) or "static" (fallback table)';
-- Index for proxy analysis
CREATE INDEX IF NOT EXISTS idx_worker_tasks_proxy_ip
ON worker_tasks(proxy_ip)
WHERE proxy_ip IS NOT NULL;

View File

@@ -0,0 +1,231 @@
-- Migration: 108_worker_geo_sessions.sql
-- Description: Add geo session tracking to worker_registry for state-based task assignment
-- Created: 2025-12-13
-- Worker geo session columns
-- Worker qualifies with a geo (state/city), then only claims tasks matching that geo
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS current_state VARCHAR(2);
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS current_city VARCHAR(100);
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS geo_session_started_at TIMESTAMPTZ;
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS session_task_count INT DEFAULT 0;
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS session_max_tasks INT DEFAULT 7;
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS proxy_geo VARCHAR(100);
-- Comments
COMMENT ON COLUMN worker_registry.current_state IS 'Worker''s current geo assignment (US state code, e.g., AZ)';
COMMENT ON COLUMN worker_registry.current_city IS 'Worker''s current city assignment (optional, e.g., phoenix)';
COMMENT ON COLUMN worker_registry.geo_session_started_at IS 'When worker''s current geo session started';
COMMENT ON COLUMN worker_registry.session_task_count IS 'Number of tasks completed in current geo session';
COMMENT ON COLUMN worker_registry.session_max_tasks IS 'Max tasks per geo session before re-qualification (default 7)';
COMMENT ON COLUMN worker_registry.proxy_geo IS 'Geo target string used for proxy (e.g., "arizona" or "phoenix, arizona")';
-- Index for finding workers by state
CREATE INDEX IF NOT EXISTS idx_worker_registry_current_state
ON worker_registry(current_state)
WHERE current_state IS NOT NULL;
-- ============================================================
-- UPDATED claim_task FUNCTION
-- Now filters by worker's geo session state
-- ============================================================
CREATE OR REPLACE FUNCTION claim_task(
p_role VARCHAR(50),
p_worker_id VARCHAR(100),
p_curl_passed BOOLEAN DEFAULT TRUE,
p_http_passed BOOLEAN DEFAULT FALSE
) RETURNS worker_tasks AS $$
DECLARE
claimed_task worker_tasks;
worker_state VARCHAR(2);
session_valid BOOLEAN;
session_tasks INT;
max_tasks INT;
BEGIN
-- Get worker's current geo session info
SELECT
current_state,
session_task_count,
session_max_tasks,
(geo_session_started_at IS NOT NULL AND geo_session_started_at > NOW() - INTERVAL '60 minutes')
INTO worker_state, session_tasks, max_tasks, session_valid
FROM worker_registry
WHERE worker_id = p_worker_id;
-- If no valid geo session, or session exhausted, worker can't claim tasks
-- Worker must re-qualify first
IF worker_state IS NULL OR NOT session_valid OR session_tasks >= COALESCE(max_tasks, 7) THEN
RETURN NULL;
END IF;
-- Claim task matching worker's state
UPDATE worker_tasks
SET
status = 'claimed',
worker_id = p_worker_id,
claimed_at = NOW(),
updated_at = NOW()
WHERE id = (
SELECT wt.id FROM worker_tasks wt
JOIN dispensaries d ON wt.dispensary_id = d.id
WHERE wt.role = p_role
AND wt.status = 'pending'
AND (wt.scheduled_for IS NULL OR wt.scheduled_for <= NOW())
-- GEO FILTER: Task's dispensary must match worker's state
AND d.state = worker_state
-- Method compatibility: worker must have passed the required preflight
AND (
wt.method IS NULL -- No preference, any worker can claim
OR (wt.method = 'curl' AND p_curl_passed = TRUE)
OR (wt.method = 'http' AND p_http_passed = TRUE)
)
-- Exclude stores that already have an active task
AND (wt.dispensary_id IS NULL OR wt.dispensary_id NOT IN (
SELECT dispensary_id FROM worker_tasks
WHERE status IN ('claimed', 'running')
AND dispensary_id IS NOT NULL
))
ORDER BY wt.priority DESC, wt.created_at ASC
LIMIT 1
FOR UPDATE SKIP LOCKED
)
RETURNING * INTO claimed_task;
-- If task claimed, increment session task count
-- Note: Use claimed_task.id IS NOT NULL (not claimed_task IS NOT NULL)
-- PostgreSQL composite type NULL check quirk
IF claimed_task.id IS NOT NULL THEN
UPDATE worker_registry
SET session_task_count = session_task_count + 1
WHERE worker_id = p_worker_id;
END IF;
RETURN claimed_task;
END;
$$ LANGUAGE plpgsql;
-- ============================================================
-- FUNCTION: assign_worker_geo
-- Assigns a geo session to a worker based on demand
-- Returns the assigned state, or NULL if no tasks available
-- ============================================================
CREATE OR REPLACE FUNCTION assign_worker_geo(
p_worker_id VARCHAR(100)
) RETURNS VARCHAR(2) AS $$
DECLARE
assigned_state VARCHAR(2);
BEGIN
-- Find state with highest demand (pending tasks) and lowest coverage (workers)
SELECT d.state INTO assigned_state
FROM dispensaries d
JOIN worker_tasks wt ON wt.dispensary_id = d.id
LEFT JOIN worker_registry wr ON wr.current_state = d.state
AND wr.status = 'active'
AND wr.geo_session_started_at > NOW() - INTERVAL '60 minutes'
WHERE wt.status = 'pending'
AND d.platform_dispensary_id IS NOT NULL
GROUP BY d.state
ORDER BY
COUNT(wt.id) DESC, -- Most pending tasks first
COUNT(DISTINCT wr.worker_id) ASC -- Fewest workers second
LIMIT 1;
-- If no pending tasks anywhere, return NULL
IF assigned_state IS NULL THEN
RETURN NULL;
END IF;
-- Assign the state to this worker
UPDATE worker_registry
SET
current_state = assigned_state,
current_city = NULL, -- City assigned later if available
geo_session_started_at = NOW(),
session_task_count = 0
WHERE worker_id = p_worker_id;
RETURN assigned_state;
END;
$$ LANGUAGE plpgsql;
-- ============================================================
-- FUNCTION: check_worker_geo_session
-- Returns info about worker's current geo session
-- ============================================================
CREATE OR REPLACE FUNCTION check_worker_geo_session(
p_worker_id VARCHAR(100)
) RETURNS TABLE (
current_state VARCHAR(2),
current_city VARCHAR(100),
session_valid BOOLEAN,
session_tasks_remaining INT,
session_minutes_remaining INT
) AS $$
BEGIN
RETURN QUERY
SELECT
wr.current_state,
wr.current_city,
(wr.geo_session_started_at IS NOT NULL AND wr.geo_session_started_at > NOW() - INTERVAL '60 minutes') as session_valid,
GREATEST(0, wr.session_max_tasks - wr.session_task_count) as session_tasks_remaining,
GREATEST(0, EXTRACT(EPOCH FROM (wr.geo_session_started_at + INTERVAL '60 minutes' - NOW())) / 60)::INT as session_minutes_remaining
FROM worker_registry wr
WHERE wr.worker_id = p_worker_id;
END;
$$ LANGUAGE plpgsql;
-- View for worker thinness per state
-- Derives states from dispensaries table - no external states table dependency
CREATE OR REPLACE VIEW worker_state_capacity AS
WITH active_states AS (
-- Get unique states from dispensaries with valid platform IDs
SELECT DISTINCT state as code
FROM dispensaries
WHERE state IS NOT NULL
AND platform_dispensary_id IS NOT NULL
),
pending_by_state AS (
SELECT d.state, COUNT(*) as count
FROM worker_tasks t
JOIN dispensaries d ON t.dispensary_id = d.id
WHERE t.status = 'pending'
AND d.state IS NOT NULL
GROUP BY d.state
),
workers_by_state AS (
SELECT
current_state,
COUNT(*) as count,
SUM(GREATEST(0, session_max_tasks - session_task_count)) as remaining_capacity
FROM worker_registry
WHERE status IN ('active', 'idle') -- Include both active and idle workers
AND preflight_http_status = 'passed'
AND current_state IS NOT NULL
AND geo_session_started_at > NOW() - INTERVAL '60 minutes'
GROUP BY current_state
)
SELECT
s.code as state,
s.code as state_name, -- Use code as name since we don't have a states lookup table
COALESCE(p.count, 0) as pending_tasks,
COALESCE(w.count, 0) as workers_on_state,
COALESCE(w.remaining_capacity, 0) as remaining_capacity,
CASE
WHEN COALESCE(w.remaining_capacity, 0) = 0 AND COALESCE(p.count, 0) > 0 THEN 'no_coverage'
WHEN COALESCE(w.remaining_capacity, 0) < COALESCE(p.count, 0) THEN 'thin'
ELSE 'ok'
END as status
FROM active_states s
LEFT JOIN pending_by_state p ON p.state = s.code
LEFT JOIN workers_by_state w ON w.current_state = s.code
ORDER BY COALESCE(p.count, 0) DESC;

View File

@@ -0,0 +1,354 @@
-- Migration: 109_worker_identity_pool.sql
-- Description: Identity pool for diverse IP/fingerprint rotation
-- Created: 2025-12-14
--
-- Workers claim identities (IP + fingerprint) from pool.
-- Each identity used for 3-5 tasks, then cools down 2-3 hours.
-- This creates natural browsing patterns - same person doesn't hit 20 stores.
-- ============================================================
-- IDENTITY POOL TABLE
-- ============================================================
CREATE TABLE IF NOT EXISTS worker_identities (
id SERIAL PRIMARY KEY,
-- Evomi session controls the IP
session_id VARCHAR(100) UNIQUE NOT NULL,
-- Detected IP from this session
ip_address INET,
-- Geo targeting
state_code VARCHAR(2) NOT NULL,
city VARCHAR(100), -- City-level targeting for diversity
-- Fingerprint data (UA, timezone, locale, device, etc.)
fingerprint JSONB NOT NULL,
-- Timestamps
created_at TIMESTAMPTZ DEFAULT NOW(),
last_used_at TIMESTAMPTZ,
cooldown_until TIMESTAMPTZ, -- Can't reuse until this time
-- Usage stats
total_tasks_completed INT DEFAULT 0,
total_sessions INT DEFAULT 1, -- How many times this identity has been used
-- Current state
is_active BOOLEAN DEFAULT FALSE, -- Currently claimed by a worker
active_worker_id VARCHAR(100), -- Which worker has it
-- Health tracking
consecutive_failures INT DEFAULT 0,
is_healthy BOOLEAN DEFAULT TRUE -- Set false if IP gets blocked
);
-- Indexes for efficient lookups
CREATE INDEX IF NOT EXISTS idx_worker_identities_state_city
ON worker_identities(state_code, city);
CREATE INDEX IF NOT EXISTS idx_worker_identities_available
ON worker_identities(state_code, is_active, cooldown_until)
WHERE is_healthy = TRUE;
CREATE INDEX IF NOT EXISTS idx_worker_identities_cooldown
ON worker_identities(cooldown_until)
WHERE is_healthy = TRUE AND is_active = FALSE;
-- ============================================================
-- METRO AREA MAPPING
-- For fallback when exact city not available
-- ============================================================
CREATE TABLE IF NOT EXISTS metro_areas (
id SERIAL PRIMARY KEY,
metro_name VARCHAR(100) NOT NULL,
state_code VARCHAR(2) NOT NULL,
city VARCHAR(100) NOT NULL,
is_primary BOOLEAN DEFAULT FALSE, -- Primary city of the metro
UNIQUE(state_code, city)
);
-- Phoenix Metro Area
INSERT INTO metro_areas (metro_name, state_code, city, is_primary) VALUES
('Phoenix Metro', 'AZ', 'Phoenix', TRUE),
('Phoenix Metro', 'AZ', 'Mesa', FALSE),
('Phoenix Metro', 'AZ', 'Glendale', FALSE),
('Phoenix Metro', 'AZ', 'Tempe', FALSE),
('Phoenix Metro', 'AZ', 'Scottsdale', FALSE),
('Phoenix Metro', 'AZ', 'Chandler', FALSE),
('Phoenix Metro', 'AZ', 'Peoria', FALSE),
('Phoenix Metro', 'AZ', 'El Mirage', FALSE),
('Phoenix Metro', 'AZ', 'Tolleson', FALSE),
('Phoenix Metro', 'AZ', 'Sun City', FALSE),
('Phoenix Metro', 'AZ', 'Apache Junction', FALSE),
('Phoenix Metro', 'AZ', 'Cave Creek', FALSE),
('Phoenix Metro', 'AZ', 'Gilbert', FALSE),
('Phoenix Metro', 'AZ', 'Surprise', FALSE),
('Phoenix Metro', 'AZ', 'Avondale', FALSE),
('Phoenix Metro', 'AZ', 'Goodyear', FALSE),
('Phoenix Metro', 'AZ', 'Buckeye', FALSE),
('Phoenix Metro', 'AZ', 'Queen Creek', FALSE)
ON CONFLICT (state_code, city) DO NOTHING;
-- Tucson Metro Area
INSERT INTO metro_areas (metro_name, state_code, city, is_primary) VALUES
('Tucson Metro', 'AZ', 'Tucson', TRUE),
('Tucson Metro', 'AZ', 'Oro Valley', FALSE),
('Tucson Metro', 'AZ', 'Marana', FALSE),
('Tucson Metro', 'AZ', 'Sahuarita', FALSE),
('Tucson Metro', 'AZ', 'South Tucson', FALSE)
ON CONFLICT (state_code, city) DO NOTHING;
-- Flagstaff Area
INSERT INTO metro_areas (metro_name, state_code, city, is_primary) VALUES
('Flagstaff Area', 'AZ', 'Flagstaff', TRUE),
('Flagstaff Area', 'AZ', 'Sedona', FALSE)
ON CONFLICT (state_code, city) DO NOTHING;
-- Prescott Area
INSERT INTO metro_areas (metro_name, state_code, city, is_primary) VALUES
('Prescott Area', 'AZ', 'Prescott', TRUE),
('Prescott Area', 'AZ', 'Prescott Valley', FALSE)
ON CONFLICT (state_code, city) DO NOTHING;
-- ============================================================
-- FUNCTION: claim_identity
-- Claims an available identity for a worker
-- Tries: exact city -> metro area -> any in state -> create new
-- ============================================================
CREATE OR REPLACE FUNCTION claim_identity(
p_worker_id VARCHAR(100),
p_state_code VARCHAR(2),
p_city VARCHAR(100) DEFAULT NULL
) RETURNS worker_identities AS $$
DECLARE
claimed_identity worker_identities;
metro_name_val VARCHAR(100);
primary_city VARCHAR(100);
BEGIN
-- 1. Try exact city match (if city provided)
IF p_city IS NOT NULL THEN
UPDATE worker_identities
SET is_active = TRUE,
active_worker_id = p_worker_id,
last_used_at = NOW()
WHERE id = (
SELECT id FROM worker_identities
WHERE state_code = p_state_code
AND city = p_city
AND is_active = FALSE
AND is_healthy = TRUE
AND (cooldown_until IS NULL OR cooldown_until < NOW())
ORDER BY last_used_at ASC NULLS FIRST
LIMIT 1
FOR UPDATE SKIP LOCKED
)
RETURNING * INTO claimed_identity;
IF claimed_identity.id IS NOT NULL THEN
RETURN claimed_identity;
END IF;
END IF;
-- 2. Try metro area fallback
IF p_city IS NOT NULL THEN
-- Find the metro area for this city
SELECT ma.metro_name INTO metro_name_val
FROM metro_areas ma
WHERE ma.state_code = p_state_code AND ma.city = p_city;
IF metro_name_val IS NOT NULL THEN
-- Get primary city of metro
SELECT ma.city INTO primary_city
FROM metro_areas ma
WHERE ma.metro_name = metro_name_val AND ma.is_primary = TRUE;
-- Try any city in same metro
UPDATE worker_identities wi
SET is_active = TRUE,
active_worker_id = p_worker_id,
last_used_at = NOW()
WHERE wi.id = (
SELECT wi2.id FROM worker_identities wi2
JOIN metro_areas ma ON wi2.city = ma.city AND wi2.state_code = ma.state_code
WHERE ma.metro_name = metro_name_val
AND wi2.is_active = FALSE
AND wi2.is_healthy = TRUE
AND (wi2.cooldown_until IS NULL OR wi2.cooldown_until < NOW())
ORDER BY wi2.last_used_at ASC NULLS FIRST
LIMIT 1
FOR UPDATE SKIP LOCKED
)
RETURNING * INTO claimed_identity;
IF claimed_identity.id IS NOT NULL THEN
RETURN claimed_identity;
END IF;
END IF;
END IF;
-- 3. Try any identity in state
UPDATE worker_identities
SET is_active = TRUE,
active_worker_id = p_worker_id,
last_used_at = NOW()
WHERE id = (
SELECT id FROM worker_identities
WHERE state_code = p_state_code
AND is_active = FALSE
AND is_healthy = TRUE
AND (cooldown_until IS NULL OR cooldown_until < NOW())
ORDER BY last_used_at ASC NULLS FIRST
LIMIT 1
FOR UPDATE SKIP LOCKED
)
RETURNING * INTO claimed_identity;
-- Return whatever we got (NULL if nothing available - caller should create new)
RETURN claimed_identity;
END;
$$ LANGUAGE plpgsql;
-- ============================================================
-- FUNCTION: release_identity
-- Releases an identity back to pool with cooldown
-- ============================================================
CREATE OR REPLACE FUNCTION release_identity(
p_identity_id INT,
p_tasks_completed INT DEFAULT 0,
p_failed BOOLEAN DEFAULT FALSE
) RETURNS VOID AS $$
DECLARE
cooldown_hours FLOAT;
BEGIN
-- Random cooldown between 2-3 hours for diversity
cooldown_hours := 2 + random(); -- 2.0 to 3.0 hours
UPDATE worker_identities
SET is_active = FALSE,
active_worker_id = NULL,
total_tasks_completed = total_tasks_completed + p_tasks_completed,
total_sessions = total_sessions + 1,
cooldown_until = NOW() + (cooldown_hours || ' hours')::INTERVAL,
consecutive_failures = CASE WHEN p_failed THEN consecutive_failures + 1 ELSE 0 END,
is_healthy = CASE WHEN consecutive_failures >= 3 THEN FALSE ELSE TRUE END
WHERE id = p_identity_id;
END;
$$ LANGUAGE plpgsql;
-- ============================================================
-- FUNCTION: get_pending_tasks_by_geo
-- Gets pending tasks grouped by state/city for identity assignment
-- ============================================================
CREATE OR REPLACE FUNCTION get_pending_tasks_by_geo(
p_limit INT DEFAULT 10
) RETURNS TABLE (
state_code VARCHAR(2),
city VARCHAR(100),
pending_count BIGINT,
available_identities BIGINT
) AS $$
BEGIN
RETURN QUERY
SELECT
d.state as state_code,
d.city,
COUNT(t.id) as pending_count,
(
SELECT COUNT(*) FROM worker_identities wi
WHERE wi.state_code = d.state
AND (wi.city = d.city OR wi.city IS NULL)
AND wi.is_active = FALSE
AND wi.is_healthy = TRUE
AND (wi.cooldown_until IS NULL OR wi.cooldown_until < NOW())
) as available_identities
FROM worker_tasks t
JOIN dispensaries d ON t.dispensary_id = d.id
WHERE t.status = 'pending'
AND d.state IS NOT NULL
GROUP BY d.state, d.city
ORDER BY COUNT(t.id) DESC
LIMIT p_limit;
END;
$$ LANGUAGE plpgsql;
-- ============================================================
-- FUNCTION: get_tasks_for_identity
-- Gets tasks matching an identity's geo (same city or metro)
-- ============================================================
CREATE OR REPLACE FUNCTION get_tasks_for_identity(
p_state_code VARCHAR(2),
p_city VARCHAR(100),
p_limit INT DEFAULT 5
) RETURNS TABLE (
task_id INT,
dispensary_id INT,
dispensary_name VARCHAR(255),
dispensary_city VARCHAR(100),
role VARCHAR(50)
) AS $$
DECLARE
metro_name_val VARCHAR(100);
BEGIN
-- Find metro area for this city
SELECT ma.metro_name INTO metro_name_val
FROM metro_areas ma
WHERE ma.state_code = p_state_code AND ma.city = p_city;
RETURN QUERY
SELECT
t.id as task_id,
d.id as dispensary_id,
d.name as dispensary_name,
d.city as dispensary_city,
t.role
FROM worker_tasks t
JOIN dispensaries d ON t.dispensary_id = d.id
WHERE t.status = 'pending'
AND d.state = p_state_code
AND (
-- Exact city match
d.city = p_city
-- Or same metro area
OR (metro_name_val IS NOT NULL AND d.city IN (
SELECT ma.city FROM metro_areas ma WHERE ma.metro_name = metro_name_val
))
-- Or any in state if no metro
OR (metro_name_val IS NULL)
)
ORDER BY
CASE WHEN d.city = p_city THEN 0 ELSE 1 END, -- Prefer exact city
t.priority DESC,
t.created_at ASC
LIMIT p_limit;
END;
$$ LANGUAGE plpgsql;
-- ============================================================
-- VIEW: identity_pool_status
-- Overview of identity pool health and availability
-- ============================================================
CREATE OR REPLACE VIEW identity_pool_status AS
SELECT
state_code,
city,
COUNT(*) as total_identities,
COUNT(*) FILTER (WHERE is_active) as active,
COUNT(*) FILTER (WHERE NOT is_active AND is_healthy AND (cooldown_until IS NULL OR cooldown_until < NOW())) as available,
COUNT(*) FILTER (WHERE NOT is_active AND cooldown_until > NOW()) as cooling_down,
COUNT(*) FILTER (WHERE NOT is_healthy) as unhealthy,
SUM(total_tasks_completed) as total_tasks,
AVG(total_tasks_completed)::INT as avg_tasks_per_identity
FROM worker_identities
GROUP BY state_code, city
ORDER BY state_code, city;
-- ============================================================
-- Comments
-- ============================================================
COMMENT ON TABLE worker_identities IS 'Pool of IP/fingerprint identities for worker rotation';
COMMENT ON TABLE metro_areas IS 'City groupings for geographic fallback matching';
COMMENT ON FUNCTION claim_identity IS 'Claim an available identity: exact city -> metro -> state -> NULL (create new)';
COMMENT ON FUNCTION release_identity IS 'Release identity with 2-3 hour random cooldown';
COMMENT ON FUNCTION get_pending_tasks_by_geo IS 'Get pending task counts by state/city';
COMMENT ON FUNCTION get_tasks_for_identity IS 'Get tasks matching identity geo (city or metro area)';

View File

@@ -0,0 +1,92 @@
-- Migration: 110_trusted_origins.sql
-- Description: Trusted origins for API access without token
-- Created: 2024-12-14
--
-- Manages which domains, IPs, and patterns can access the API without a Bearer token.
-- Used by auth middleware to grant 'internal' role to trusted requests.
-- ============================================================
-- TRUSTED ORIGINS TABLE
-- ============================================================
CREATE TABLE IF NOT EXISTS trusted_origins (
id SERIAL PRIMARY KEY,
-- Origin identification
name VARCHAR(100) NOT NULL, -- Friendly name (e.g., "CannaIQ Production")
origin_type VARCHAR(20) NOT NULL, -- 'domain', 'ip', or 'pattern'
origin_value VARCHAR(255) NOT NULL, -- The actual value to match
-- Metadata
description TEXT, -- Optional notes
active BOOLEAN DEFAULT TRUE,
-- Tracking
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
created_by INTEGER REFERENCES users(id),
-- Constraints
CONSTRAINT valid_origin_type CHECK (origin_type IN ('domain', 'ip', 'pattern')),
UNIQUE(origin_type, origin_value)
);
-- Index for active lookups (used by auth middleware)
CREATE INDEX IF NOT EXISTS idx_trusted_origins_active
ON trusted_origins(active) WHERE active = TRUE;
-- Updated at trigger
CREATE OR REPLACE FUNCTION update_trusted_origins_updated_at()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = NOW();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
DROP TRIGGER IF EXISTS trusted_origins_updated_at ON trusted_origins;
CREATE TRIGGER trusted_origins_updated_at
BEFORE UPDATE ON trusted_origins
FOR EACH ROW
EXECUTE FUNCTION update_trusted_origins_updated_at();
-- ============================================================
-- SEED DEFAULT TRUSTED ORIGINS
-- These match the hardcoded fallbacks in middleware.ts
-- ============================================================
-- Production domains
INSERT INTO trusted_origins (name, origin_type, origin_value, description) VALUES
('CannaIQ Production', 'domain', 'https://cannaiq.co', 'Main CannaIQ dashboard'),
('CannaIQ Production (www)', 'domain', 'https://www.cannaiq.co', 'Main CannaIQ dashboard with www'),
('FindADispo Production', 'domain', 'https://findadispo.com', 'Consumer dispensary finder'),
('FindADispo Production (www)', 'domain', 'https://www.findadispo.com', 'Consumer dispensary finder with www'),
('Findagram Production', 'domain', 'https://findagram.co', 'Instagram-style cannabis discovery'),
('Findagram Production (www)', 'domain', 'https://www.findagram.co', 'Instagram-style cannabis discovery with www')
ON CONFLICT (origin_type, origin_value) DO NOTHING;
-- Wildcard patterns
INSERT INTO trusted_origins (name, origin_type, origin_value, description) VALUES
('CannaBrands Subdomains', 'pattern', '^https://.*\\.cannabrands\\.app$', 'All *.cannabrands.app subdomains'),
('CannaIQ Subdomains', 'pattern', '^https://.*\\.cannaiq\\.co$', 'All *.cannaiq.co subdomains')
ON CONFLICT (origin_type, origin_value) DO NOTHING;
-- Local development
INSERT INTO trusted_origins (name, origin_type, origin_value, description) VALUES
('Local API', 'domain', 'http://localhost:3010', 'Local backend API'),
('Local Admin', 'domain', 'http://localhost:8080', 'Local admin dashboard'),
('Local Vite Dev', 'domain', 'http://localhost:5173', 'Vite dev server')
ON CONFLICT (origin_type, origin_value) DO NOTHING;
-- Trusted IPs (localhost)
INSERT INTO trusted_origins (name, origin_type, origin_value, description) VALUES
('Localhost IPv4', 'ip', '127.0.0.1', 'Local machine'),
('Localhost IPv6', 'ip', '::1', 'Local machine IPv6'),
('Localhost IPv6 Mapped', 'ip', '::ffff:127.0.0.1', 'IPv6-mapped IPv4 localhost')
ON CONFLICT (origin_type, origin_value) DO NOTHING;
-- ============================================================
-- COMMENTS
-- ============================================================
COMMENT ON TABLE trusted_origins IS 'Domains, IPs, and patterns that can access API without token';
COMMENT ON COLUMN trusted_origins.origin_type IS 'domain = exact URL match, ip = IP address, pattern = regex pattern';
COMMENT ON COLUMN trusted_origins.origin_value IS 'For domain: full URL. For ip: IP address. For pattern: regex string';

View File

@@ -0,0 +1,35 @@
-- Migration: 111_system_settings.sql
-- Description: System settings table for runtime configuration
-- Created: 2024-12-14
CREATE TABLE IF NOT EXISTS system_settings (
key VARCHAR(100) PRIMARY KEY,
value TEXT NOT NULL,
description TEXT,
updated_at TIMESTAMPTZ DEFAULT NOW(),
updated_by INTEGER REFERENCES users(id)
);
-- Task pool gate - controls whether workers can claim tasks
INSERT INTO system_settings (key, value, description) VALUES
('task_pool_open', 'true', 'When false, workers cannot claim new tasks from the pool')
ON CONFLICT (key) DO NOTHING;
-- Updated at trigger
CREATE OR REPLACE FUNCTION update_system_settings_updated_at()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = NOW();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
DROP TRIGGER IF EXISTS system_settings_updated_at ON system_settings;
CREATE TRIGGER system_settings_updated_at
BEFORE UPDATE ON system_settings
FOR EACH ROW
EXECUTE FUNCTION update_system_settings_updated_at();
COMMENT ON TABLE system_settings IS 'Runtime configuration settings';
COMMENT ON COLUMN system_settings.key IS 'Setting name (e.g., task_pool_open)';
COMMENT ON COLUMN system_settings.value IS 'Setting value as string';

View File

@@ -0,0 +1,390 @@
-- Migration 112: Worker Session Pool
-- Tracks IP/fingerprint sessions with exclusive locks and cooldowns
-- Each worker claims up to 6 tasks, uses one IP/fingerprint for those tasks,
-- then retires the session (8hr cooldown before IP can be reused)
-- Drop old identity pool tables if they exist (replacing with simpler session model)
DROP TABLE IF EXISTS worker_identity_claims CASCADE;
DROP TABLE IF EXISTS worker_identities CASCADE;
-- Worker sessions: tracks active and cooling down IP/fingerprint pairs
CREATE TABLE IF NOT EXISTS worker_sessions (
id SERIAL PRIMARY KEY,
-- IP and fingerprint for this session
ip_address VARCHAR(45) NOT NULL,
fingerprint_hash VARCHAR(64) NOT NULL,
fingerprint_data JSONB,
-- Geo this session is locked to
state_code VARCHAR(2) NOT NULL,
city VARCHAR(100),
-- Ownership
worker_id VARCHAR(255), -- NULL if in cooldown
-- Status: 'active' (locked to worker), 'cooldown' (8hr wait), 'available'
status VARCHAR(20) NOT NULL DEFAULT 'available',
-- Task tracking
tasks_claimed INTEGER NOT NULL DEFAULT 0,
tasks_completed INTEGER NOT NULL DEFAULT 0,
tasks_failed INTEGER NOT NULL DEFAULT 0,
max_tasks INTEGER NOT NULL DEFAULT 6,
-- Timestamps
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
locked_at TIMESTAMPTZ, -- When worker locked this session
retired_at TIMESTAMPTZ, -- When session was retired (cooldown starts)
cooldown_until TIMESTAMPTZ, -- When session becomes available again
-- Constraints
CONSTRAINT valid_status CHECK (status IN ('active', 'cooldown', 'available'))
);
-- Indexes for fast lookups
CREATE INDEX IF NOT EXISTS idx_worker_sessions_ip ON worker_sessions(ip_address);
CREATE INDEX IF NOT EXISTS idx_worker_sessions_status ON worker_sessions(status);
CREATE INDEX IF NOT EXISTS idx_worker_sessions_worker ON worker_sessions(worker_id) WHERE worker_id IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_worker_sessions_geo ON worker_sessions(state_code, city);
CREATE INDEX IF NOT EXISTS idx_worker_sessions_cooldown ON worker_sessions(cooldown_until) WHERE status = 'cooldown';
-- Unique constraint: only one active session per IP
CREATE UNIQUE INDEX IF NOT EXISTS idx_worker_sessions_active_ip
ON worker_sessions(ip_address)
WHERE status = 'active';
-- Function: Check if IP is available (not active, not in cooldown)
CREATE OR REPLACE FUNCTION is_ip_available(check_ip VARCHAR(45))
RETURNS BOOLEAN AS $$
BEGIN
-- Check if any session has this IP and is either active or in cooldown
RETURN NOT EXISTS (
SELECT 1 FROM worker_sessions
WHERE ip_address = check_ip
AND (status = 'active' OR (status = 'cooldown' AND cooldown_until > NOW()))
);
END;
$$ LANGUAGE plpgsql;
-- Function: Lock a session to a worker
-- Returns the session if successful, NULL if IP not available
CREATE OR REPLACE FUNCTION lock_worker_session(
p_worker_id VARCHAR(255),
p_ip_address VARCHAR(45),
p_state_code VARCHAR(2),
p_city VARCHAR(100) DEFAULT NULL,
p_fingerprint_hash VARCHAR(64) DEFAULT NULL,
p_fingerprint_data JSONB DEFAULT NULL
) RETURNS worker_sessions AS $$
DECLARE
v_session worker_sessions;
BEGIN
-- First check if IP is available
IF NOT is_ip_available(p_ip_address) THEN
RETURN NULL;
END IF;
-- Try to find an existing available session for this IP
SELECT * INTO v_session
FROM worker_sessions
WHERE ip_address = p_ip_address
AND status = 'available'
FOR UPDATE SKIP LOCKED
LIMIT 1;
IF v_session.id IS NOT NULL THEN
-- Reuse existing session
UPDATE worker_sessions SET
worker_id = p_worker_id,
status = 'active',
state_code = p_state_code,
city = p_city,
fingerprint_hash = COALESCE(p_fingerprint_hash, fingerprint_hash),
fingerprint_data = COALESCE(p_fingerprint_data, fingerprint_data),
tasks_claimed = 0,
tasks_completed = 0,
tasks_failed = 0,
locked_at = NOW(),
retired_at = NULL,
cooldown_until = NULL
WHERE id = v_session.id
RETURNING * INTO v_session;
ELSE
-- Create new session
INSERT INTO worker_sessions (
ip_address, fingerprint_hash, fingerprint_data,
state_code, city, worker_id, status, locked_at
) VALUES (
p_ip_address, COALESCE(p_fingerprint_hash, md5(random()::text)),
p_fingerprint_data, p_state_code, p_city, p_worker_id, 'active', NOW()
)
RETURNING * INTO v_session;
END IF;
RETURN v_session;
END;
$$ LANGUAGE plpgsql;
-- Function: Retire a session (start 8hr cooldown)
CREATE OR REPLACE FUNCTION retire_worker_session(p_worker_id VARCHAR(255))
RETURNS BOOLEAN AS $$
DECLARE
v_updated INTEGER;
BEGIN
UPDATE worker_sessions SET
status = 'cooldown',
worker_id = NULL,
retired_at = NOW(),
cooldown_until = NOW() + INTERVAL '8 hours'
WHERE worker_id = p_worker_id
AND status = 'active';
GET DIAGNOSTICS v_updated = ROW_COUNT;
RETURN v_updated > 0;
END;
$$ LANGUAGE plpgsql;
-- Function: Release expired cooldowns
CREATE OR REPLACE FUNCTION release_expired_sessions()
RETURNS INTEGER AS $$
DECLARE
v_released INTEGER;
BEGIN
UPDATE worker_sessions SET
status = 'available'
WHERE status = 'cooldown'
AND cooldown_until <= NOW();
GET DIAGNOSTICS v_released = ROW_COUNT;
RETURN v_released;
END;
$$ LANGUAGE plpgsql;
-- Function: Get session for worker
CREATE OR REPLACE FUNCTION get_worker_session(p_worker_id VARCHAR(255))
RETURNS worker_sessions AS $$
SELECT * FROM worker_sessions
WHERE worker_id = p_worker_id AND status = 'active'
LIMIT 1;
$$ LANGUAGE sql;
-- Function: Increment task counters
CREATE OR REPLACE FUNCTION session_task_completed(p_worker_id VARCHAR(255))
RETURNS BOOLEAN AS $$
BEGIN
UPDATE worker_sessions SET
tasks_completed = tasks_completed + 1
WHERE worker_id = p_worker_id AND status = 'active';
RETURN FOUND;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION session_task_failed(p_worker_id VARCHAR(255))
RETURNS BOOLEAN AS $$
BEGIN
UPDATE worker_sessions SET
tasks_failed = tasks_failed + 1
WHERE worker_id = p_worker_id AND status = 'active';
RETURN FOUND;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION session_task_claimed(p_worker_id VARCHAR(255), p_count INTEGER DEFAULT 1)
RETURNS BOOLEAN AS $$
BEGIN
UPDATE worker_sessions SET
tasks_claimed = tasks_claimed + p_count
WHERE worker_id = p_worker_id AND status = 'active';
RETURN FOUND;
END;
$$ LANGUAGE plpgsql;
-- Scheduled job hint: Run release_expired_sessions() every 5 minutes
COMMENT ON FUNCTION release_expired_sessions() IS
'Run periodically to release sessions from cooldown. Suggest: every 5 minutes.';
-- =============================================================================
-- ATOMIC TASK CLAIMING
-- Worker claims up to 6 tasks for same geo in one transaction
-- =============================================================================
-- Function: Claim up to N tasks for same geo
-- Returns claimed tasks with dispensary geo info
CREATE OR REPLACE FUNCTION claim_tasks_batch(
p_worker_id VARCHAR(255),
p_max_tasks INTEGER DEFAULT 6,
p_role VARCHAR(50) DEFAULT NULL -- Optional role filter
) RETURNS TABLE (
task_id INTEGER,
role VARCHAR(50),
dispensary_id INTEGER,
dispensary_name VARCHAR(255),
city VARCHAR(100),
state_code VARCHAR(2),
platform VARCHAR(50),
method VARCHAR(20)
) AS $$
DECLARE
v_target_state VARCHAR(2);
v_target_city VARCHAR(100);
v_claimed_count INTEGER := 0;
BEGIN
-- First, find the geo with most pending tasks to target
SELECT d.state, d.city INTO v_target_state, v_target_city
FROM worker_tasks t
JOIN dispensaries d ON t.dispensary_id = d.id
WHERE t.status = 'pending'
AND (p_role IS NULL OR t.role = p_role)
GROUP BY d.state, d.city
ORDER BY COUNT(*) DESC
LIMIT 1;
-- No pending tasks
IF v_target_state IS NULL THEN
RETURN;
END IF;
-- Claim up to p_max_tasks for this geo
RETURN QUERY
WITH claimed AS (
UPDATE worker_tasks t SET
status = 'claimed',
worker_id = p_worker_id,
claimed_at = NOW()
FROM (
SELECT t2.id
FROM worker_tasks t2
JOIN dispensaries d ON t2.dispensary_id = d.id
WHERE t2.status = 'pending'
AND d.state = v_target_state
AND (v_target_city IS NULL OR d.city = v_target_city)
AND (p_role IS NULL OR t2.role = p_role)
ORDER BY t2.priority DESC, t2.created_at ASC
FOR UPDATE SKIP LOCKED
LIMIT p_max_tasks
) sub
WHERE t.id = sub.id
RETURNING t.id, t.role, t.dispensary_id, t.method
)
SELECT
c.id as task_id,
c.role,
c.dispensary_id,
d.name as dispensary_name,
d.city,
d.state as state_code,
d.platform,
c.method
FROM claimed c
JOIN dispensaries d ON c.dispensary_id = d.id;
END;
$$ LANGUAGE plpgsql;
-- Function: Release claimed tasks back to pending (for failed worker or cleanup)
CREATE OR REPLACE FUNCTION release_claimed_tasks(p_worker_id VARCHAR(255))
RETURNS INTEGER AS $$
DECLARE
v_released INTEGER;
BEGIN
UPDATE worker_tasks SET
status = 'pending',
worker_id = NULL,
claimed_at = NULL
WHERE worker_id = p_worker_id
AND status IN ('claimed', 'running');
GET DIAGNOSTICS v_released = ROW_COUNT;
RETURN v_released;
END;
$$ LANGUAGE plpgsql;
-- Function: Mark task as running
CREATE OR REPLACE FUNCTION start_task(p_task_id INTEGER, p_worker_id VARCHAR(255))
RETURNS BOOLEAN AS $$
BEGIN
UPDATE worker_tasks SET
status = 'running',
started_at = NOW()
WHERE id = p_task_id
AND worker_id = p_worker_id
AND status = 'claimed';
RETURN FOUND;
END;
$$ LANGUAGE plpgsql;
-- Function: Mark task as completed (leaves pool)
CREATE OR REPLACE FUNCTION complete_task(
p_task_id INTEGER,
p_worker_id VARCHAR(255),
p_result JSONB DEFAULT NULL
) RETURNS BOOLEAN AS $$
BEGIN
UPDATE worker_tasks SET
status = 'completed',
completed_at = NOW(),
result = p_result
WHERE id = p_task_id
AND worker_id = p_worker_id
AND status = 'running';
RETURN FOUND;
END;
$$ LANGUAGE plpgsql;
-- Function: Mark task as failed (returns to pending for retry)
CREATE OR REPLACE FUNCTION fail_task(
p_task_id INTEGER,
p_worker_id VARCHAR(255),
p_error TEXT DEFAULT NULL,
p_max_retries INTEGER DEFAULT 3
) RETURNS BOOLEAN AS $$
DECLARE
v_retry_count INTEGER;
BEGIN
-- Get current retry count
SELECT COALESCE(retry_count, 0) INTO v_retry_count
FROM worker_tasks WHERE id = p_task_id;
IF v_retry_count >= p_max_retries THEN
-- Max retries exceeded - mark as permanently failed
UPDATE worker_tasks SET
status = 'failed',
completed_at = NOW(),
error_message = p_error,
retry_count = v_retry_count + 1
WHERE id = p_task_id
AND worker_id = p_worker_id;
ELSE
-- Return to pending for retry
UPDATE worker_tasks SET
status = 'pending',
worker_id = NULL,
claimed_at = NULL,
started_at = NULL,
error_message = p_error,
retry_count = v_retry_count + 1
WHERE id = p_task_id
AND worker_id = p_worker_id;
END IF;
RETURN FOUND;
END;
$$ LANGUAGE plpgsql;
-- Add retry_count column if not exists
DO $$
BEGIN
IF NOT EXISTS (
SELECT 1 FROM information_schema.columns
WHERE table_name = 'worker_tasks' AND column_name = 'retry_count'
) THEN
ALTER TABLE worker_tasks ADD COLUMN retry_count INTEGER NOT NULL DEFAULT 0;
END IF;
IF NOT EXISTS (
SELECT 1 FROM information_schema.columns
WHERE table_name = 'worker_tasks' AND column_name = 'claimed_at'
) THEN
ALTER TABLE worker_tasks ADD COLUMN claimed_at TIMESTAMPTZ;
END IF;
END $$;

View File

@@ -0,0 +1,381 @@
-- Task Pools: Group tasks by geo area for worker assignment
-- Workers claim a pool, get proxy for that geo, then pull tasks from pool
-- ============================================================================
-- TASK POOLS TABLE
-- ============================================================================
-- Each pool represents a metro area (e.g., Phoenix AZ = 100mi radius)
-- Dispensaries are assigned to pools based on location
-- Workers claim a pool, not individual tasks
CREATE TABLE IF NOT EXISTS task_pools (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL UNIQUE, -- e.g., 'phoenix_az'
display_name VARCHAR(100) NOT NULL, -- e.g., 'Phoenix, AZ'
state_code VARCHAR(2) NOT NULL, -- e.g., 'AZ'
city VARCHAR(100) NOT NULL, -- e.g., 'Phoenix'
latitude DECIMAL(10, 6) NOT NULL, -- pool center lat
longitude DECIMAL(10, 6) NOT NULL, -- pool center lng
radius_miles INTEGER DEFAULT 100, -- pool radius (100mi default)
timezone VARCHAR(50) NOT NULL, -- e.g., 'America/Phoenix'
is_active BOOLEAN DEFAULT true,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Index for active pools
CREATE INDEX IF NOT EXISTS idx_task_pools_active ON task_pools(is_active) WHERE is_active = true;
-- ============================================================================
-- LINK DISPENSARIES TO POOLS
-- ============================================================================
-- Add pool_id to dispensaries table
ALTER TABLE dispensaries
ADD COLUMN IF NOT EXISTS pool_id INTEGER REFERENCES task_pools(id);
-- Index for pool membership
CREATE INDEX IF NOT EXISTS idx_dispensaries_pool ON dispensaries(pool_id) WHERE pool_id IS NOT NULL;
-- ============================================================================
-- WORKER POOL ASSIGNMENT
-- ============================================================================
-- Track which pool a worker is currently assigned to
ALTER TABLE worker_registry
ADD COLUMN IF NOT EXISTS current_pool_id INTEGER REFERENCES task_pools(id),
ADD COLUMN IF NOT EXISTS pool_claimed_at TIMESTAMPTZ,
ADD COLUMN IF NOT EXISTS pool_stores_visited INTEGER DEFAULT 0,
ADD COLUMN IF NOT EXISTS pool_max_stores INTEGER DEFAULT 6;
-- ============================================================================
-- SEED INITIAL POOLS
-- ============================================================================
-- Major cannabis markets with approximate center coordinates
INSERT INTO task_pools (name, display_name, state_code, city, latitude, longitude, timezone, radius_miles) VALUES
-- Arizona
('phoenix_az', 'Phoenix, AZ', 'AZ', 'Phoenix', 33.4484, -112.0740, 'America/Phoenix', 100),
('tucson_az', 'Tucson, AZ', 'AZ', 'Tucson', 32.2226, -110.9747, 'America/Phoenix', 75),
-- California
('los_angeles_ca', 'Los Angeles, CA', 'CA', 'Los Angeles', 34.0522, -118.2437, 'America/Los_Angeles', 100),
('san_francisco_ca', 'San Francisco, CA', 'CA', 'San Francisco', 37.7749, -122.4194, 'America/Los_Angeles', 75),
('san_diego_ca', 'San Diego, CA', 'CA', 'San Diego', 32.7157, -117.1611, 'America/Los_Angeles', 75),
('sacramento_ca', 'Sacramento, CA', 'CA', 'Sacramento', 38.5816, -121.4944, 'America/Los_Angeles', 75),
-- Colorado
('denver_co', 'Denver, CO', 'CO', 'Denver', 39.7392, -104.9903, 'America/Denver', 100),
-- Illinois
('chicago_il', 'Chicago, IL', 'IL', 'Chicago', 41.8781, -87.6298, 'America/Chicago', 100),
-- Massachusetts
('boston_ma', 'Boston, MA', 'MA', 'Boston', 42.3601, -71.0589, 'America/New_York', 75),
-- Michigan
('detroit_mi', 'Detroit, MI', 'MI', 'Detroit', 42.3314, -83.0458, 'America/Detroit', 100),
-- Nevada
('las_vegas_nv', 'Las Vegas, NV', 'NV', 'Las Vegas', 36.1699, -115.1398, 'America/Los_Angeles', 75),
('reno_nv', 'Reno, NV', 'NV', 'Reno', 39.5296, -119.8138, 'America/Los_Angeles', 50),
-- New Jersey
('newark_nj', 'Newark, NJ', 'NJ', 'Newark', 40.7357, -74.1724, 'America/New_York', 75),
-- New York
('new_york_ny', 'New York, NY', 'NY', 'New York', 40.7128, -74.0060, 'America/New_York', 75),
-- Oklahoma
('oklahoma_city_ok', 'Oklahoma City, OK', 'OK', 'Oklahoma City', 35.4676, -97.5164, 'America/Chicago', 100),
('tulsa_ok', 'Tulsa, OK', 'OK', 'Tulsa', 36.1540, -95.9928, 'America/Chicago', 75),
-- Oregon
('portland_or', 'Portland, OR', 'OR', 'Portland', 45.5152, -122.6784, 'America/Los_Angeles', 75),
-- Washington
('seattle_wa', 'Seattle, WA', 'WA', 'Seattle', 47.6062, -122.3321, 'America/Los_Angeles', 100)
ON CONFLICT (name) DO NOTHING;
-- ============================================================================
-- FUNCTION: Assign dispensary to nearest pool
-- ============================================================================
CREATE OR REPLACE FUNCTION assign_dispensary_to_pool(disp_id INTEGER)
RETURNS INTEGER AS $$
DECLARE
disp_lat DECIMAL(10,6);
disp_lng DECIMAL(10,6);
nearest_pool_id INTEGER;
BEGIN
-- Get dispensary coordinates
SELECT latitude, longitude INTO disp_lat, disp_lng
FROM dispensaries WHERE id = disp_id;
IF disp_lat IS NULL OR disp_lng IS NULL THEN
RETURN NULL;
END IF;
-- Find nearest active pool within radius
-- Using Haversine approximation (accurate enough for 100mi)
SELECT id INTO nearest_pool_id
FROM task_pools
WHERE is_active = true
AND (
3959 * acos(
cos(radians(latitude)) * cos(radians(disp_lat)) *
cos(radians(disp_lng) - radians(longitude)) +
sin(radians(latitude)) * sin(radians(disp_lat))
)
) <= radius_miles
ORDER BY (
3959 * acos(
cos(radians(latitude)) * cos(radians(disp_lat)) *
cos(radians(disp_lng) - radians(longitude)) +
sin(radians(latitude)) * sin(radians(disp_lat))
)
)
LIMIT 1;
-- Update dispensary
IF nearest_pool_id IS NOT NULL THEN
UPDATE dispensaries SET pool_id = nearest_pool_id WHERE id = disp_id;
END IF;
RETURN nearest_pool_id;
END;
$$ LANGUAGE plpgsql;
-- ============================================================================
-- FUNCTION: Assign all dispensaries to pools (batch)
-- ============================================================================
CREATE OR REPLACE FUNCTION assign_all_dispensaries_to_pools()
RETURNS TABLE(assigned INTEGER, unassigned INTEGER) AS $$
DECLARE
assigned_count INTEGER := 0;
unassigned_count INTEGER := 0;
disp RECORD;
pool_id INTEGER;
BEGIN
FOR disp IN SELECT id FROM dispensaries WHERE pool_id IS NULL AND latitude IS NOT NULL LOOP
pool_id := assign_dispensary_to_pool(disp.id);
IF pool_id IS NOT NULL THEN
assigned_count := assigned_count + 1;
ELSE
unassigned_count := unassigned_count + 1;
END IF;
END LOOP;
RETURN QUERY SELECT assigned_count, unassigned_count;
END;
$$ LANGUAGE plpgsql;
-- ============================================================================
-- FUNCTION: Get pools with pending tasks
-- ============================================================================
CREATE OR REPLACE FUNCTION get_pools_with_pending_tasks()
RETURNS TABLE(
pool_id INTEGER,
pool_name VARCHAR(100),
display_name VARCHAR(100),
state_code VARCHAR(2),
city VARCHAR(100),
timezone VARCHAR(50),
pending_count BIGINT,
store_count BIGINT
) AS $$
BEGIN
RETURN QUERY
SELECT
tp.id as pool_id,
tp.name as pool_name,
tp.display_name,
tp.state_code,
tp.city,
tp.timezone,
COUNT(DISTINCT t.id) as pending_count,
COUNT(DISTINCT d.id) as store_count
FROM task_pools tp
JOIN dispensaries d ON d.pool_id = tp.id
JOIN tasks t ON t.dispensary_id = d.id AND t.status = 'pending'
WHERE tp.is_active = true
GROUP BY tp.id, tp.name, tp.display_name, tp.state_code, tp.city, tp.timezone
HAVING COUNT(DISTINCT t.id) > 0
ORDER BY COUNT(DISTINCT t.id) DESC;
END;
$$ LANGUAGE plpgsql;
-- ============================================================================
-- FUNCTION: Worker claims a pool
-- ============================================================================
CREATE OR REPLACE FUNCTION worker_claim_pool(
p_worker_id VARCHAR(100),
p_pool_id INTEGER DEFAULT NULL
)
RETURNS TABLE(
pool_id INTEGER,
pool_name VARCHAR(100),
display_name VARCHAR(100),
state_code VARCHAR(2),
city VARCHAR(100),
latitude DECIMAL(10,6),
longitude DECIMAL(10,6),
timezone VARCHAR(50)
) AS $$
DECLARE
claimed_pool_id INTEGER;
BEGIN
-- If no pool specified, pick the one with most pending tasks
IF p_pool_id IS NULL THEN
SELECT tp.id INTO claimed_pool_id
FROM task_pools tp
JOIN dispensaries d ON d.pool_id = tp.id
JOIN tasks t ON t.dispensary_id = d.id AND t.status = 'pending'
WHERE tp.is_active = true
GROUP BY tp.id
ORDER BY COUNT(DISTINCT t.id) DESC
LIMIT 1;
ELSE
claimed_pool_id := p_pool_id;
END IF;
IF claimed_pool_id IS NULL THEN
RETURN;
END IF;
-- Update worker registry with pool assignment
UPDATE worker_registry
SET
current_pool_id = claimed_pool_id,
pool_claimed_at = NOW(),
pool_stores_visited = 0,
pool_max_stores = 6,
updated_at = NOW()
WHERE worker_id = p_worker_id;
-- Return pool info
RETURN QUERY
SELECT
tp.id,
tp.name,
tp.display_name,
tp.state_code,
tp.city,
tp.latitude,
tp.longitude,
tp.timezone
FROM task_pools tp
WHERE tp.id = claimed_pool_id;
END;
$$ LANGUAGE plpgsql;
-- ============================================================================
-- FUNCTION: Pull tasks from worker's pool (up to 6 stores)
-- ============================================================================
CREATE OR REPLACE FUNCTION pull_tasks_from_pool(
p_worker_id VARCHAR(100),
p_max_stores INTEGER DEFAULT 6
)
RETURNS TABLE(
task_id INTEGER,
dispensary_id INTEGER,
dispensary_name VARCHAR(255),
role VARCHAR(50),
platform VARCHAR(50),
method VARCHAR(20)
) AS $$
DECLARE
worker_pool_id INTEGER;
stores_visited INTEGER;
max_stores INTEGER;
stores_remaining INTEGER;
BEGIN
-- Get worker's current pool and store count
SELECT current_pool_id, pool_stores_visited, pool_max_stores
INTO worker_pool_id, stores_visited, max_stores
FROM worker_registry
WHERE worker_id = p_worker_id;
IF worker_pool_id IS NULL THEN
RAISE EXCEPTION 'Worker % has no pool assigned', p_worker_id;
END IF;
stores_remaining := max_stores - stores_visited;
IF stores_remaining <= 0 THEN
RETURN; -- Worker exhausted
END IF;
-- Claim tasks from pool (one task per store, up to remaining capacity)
RETURN QUERY
WITH available_stores AS (
SELECT DISTINCT ON (d.id)
t.id as task_id,
d.id as dispensary_id,
d.name as dispensary_name,
t.role,
t.platform,
t.method
FROM tasks t
JOIN dispensaries d ON d.id = t.dispensary_id
WHERE d.pool_id = worker_pool_id
AND t.status = 'pending'
AND t.scheduled_for <= NOW()
ORDER BY d.id, t.priority DESC, t.created_at ASC
LIMIT stores_remaining
),
claimed AS (
UPDATE tasks
SET
status = 'claimed',
claimed_by = p_worker_id,
claimed_at = NOW()
WHERE id IN (SELECT task_id FROM available_stores)
RETURNING id
)
SELECT
av.task_id,
av.dispensary_id,
av.dispensary_name,
av.role,
av.platform,
av.method
FROM available_stores av
WHERE av.task_id IN (SELECT id FROM claimed);
-- Update worker store count
UPDATE worker_registry
SET
pool_stores_visited = pool_stores_visited + (
SELECT COUNT(DISTINCT dispensary_id)
FROM tasks
WHERE claimed_by = p_worker_id AND status = 'claimed'
),
updated_at = NOW()
WHERE worker_id = p_worker_id;
END;
$$ LANGUAGE plpgsql;
-- ============================================================================
-- FUNCTION: Worker releases pool (exhausted or done)
-- ============================================================================
CREATE OR REPLACE FUNCTION worker_release_pool(p_worker_id VARCHAR(100))
RETURNS BOOLEAN AS $$
BEGIN
UPDATE worker_registry
SET
current_pool_id = NULL,
pool_claimed_at = NULL,
pool_stores_visited = 0,
current_state = NULL,
current_city = NULL,
updated_at = NOW()
WHERE worker_id = p_worker_id;
RETURN true;
END;
$$ LANGUAGE plpgsql;
-- ============================================================================
-- RUN: Assign existing dispensaries to pools
-- ============================================================================
SELECT * FROM assign_all_dispensaries_to_pools();

View File

@@ -0,0 +1,10 @@
-- Migration 114: Add pool_id to task_schedules
-- Allows schedules to target specific geo pools
ALTER TABLE task_schedules
ADD COLUMN IF NOT EXISTS pool_id INTEGER REFERENCES task_pools(id);
-- Index for pool-based schedule queries
CREATE INDEX IF NOT EXISTS idx_task_schedules_pool ON task_schedules(pool_id) WHERE pool_id IS NOT NULL;
COMMENT ON COLUMN task_schedules.pool_id IS 'Optional geo pool filter. NULL = all pools/dispensaries matching state_code';

View File

@@ -0,0 +1,17 @@
-- Migration: Add proxy_ip tracking to worker_tasks
-- Purpose: Prevent same IP from hitting multiple stores on same platform simultaneously
--
-- Anti-detection measure: Dutchie/Jane may flag if same IP makes requests
-- for multiple different stores. This column lets us track and prevent that.
-- Add proxy_ip column to track which proxy IP is being used for each task
ALTER TABLE worker_tasks ADD COLUMN IF NOT EXISTS proxy_ip VARCHAR(45);
-- Index for quick lookup of active tasks by proxy IP
-- Used to check: "Is this IP already hitting another store?"
CREATE INDEX IF NOT EXISTS idx_worker_tasks_proxy_ip_active
ON worker_tasks (proxy_ip, platform)
WHERE status IN ('claimed', 'running') AND proxy_ip IS NOT NULL;
-- Comment
COMMENT ON COLUMN worker_tasks.proxy_ip IS 'Proxy IP assigned to this task. Used to prevent same IP hitting multiple stores on same platform.';

View File

@@ -0,0 +1,16 @@
-- Migration: Add source tracking columns to worker_tasks
-- Purpose: Track where tasks originated from (schedule, API, manual)
-- Add source tracking columns
ALTER TABLE worker_tasks ADD COLUMN IF NOT EXISTS source VARCHAR(50);
ALTER TABLE worker_tasks ADD COLUMN IF NOT EXISTS source_schedule_id INTEGER REFERENCES task_schedules(id);
ALTER TABLE worker_tasks ADD COLUMN IF NOT EXISTS source_metadata JSONB;
-- Index for tracking tasks by schedule
CREATE INDEX IF NOT EXISTS idx_worker_tasks_source_schedule
ON worker_tasks (source_schedule_id) WHERE source_schedule_id IS NOT NULL;
-- Comments
COMMENT ON COLUMN worker_tasks.source IS 'Origin of task: schedule, api, manual, chain';
COMMENT ON COLUMN worker_tasks.source_schedule_id IS 'ID of schedule that created this task';
COMMENT ON COLUMN worker_tasks.source_metadata IS 'Additional metadata about task origin';

View File

@@ -0,0 +1,32 @@
-- Migration 117: Per-store crawl interval scheduling
-- Adds columns for configurable per-store crawl intervals
-- Part of Real-Time Inventory Tracking feature
-- Per-store crawl interval (NULL = use state schedule default 4h)
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS crawl_interval_minutes INT DEFAULT NULL;
-- When this store should next be crawled (used by high-frequency scheduler)
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS next_crawl_at TIMESTAMPTZ DEFAULT NULL;
-- Track last request time to enforce minimum spacing
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS last_crawl_started_at TIMESTAMPTZ DEFAULT NULL;
-- Change tracking for optimization
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS last_inventory_hash TEXT DEFAULT NULL;
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS last_price_hash TEXT DEFAULT NULL;
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS inventory_changes_24h INT DEFAULT 0;
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS price_changes_24h INT DEFAULT 0;
-- Index for scheduler query: find stores due for high-frequency crawl
CREATE INDEX IF NOT EXISTS idx_dispensaries_next_crawl
ON dispensaries(next_crawl_at)
WHERE crawl_interval_minutes IS NOT NULL AND crawl_enabled = TRUE;
-- Comment for documentation
COMMENT ON COLUMN dispensaries.crawl_interval_minutes IS 'Custom crawl interval in minutes. NULL = use state schedule (4h default). Set to 15/30/60 for high-frequency tracking.';
COMMENT ON COLUMN dispensaries.next_crawl_at IS 'When this store should next be crawled. Updated after each crawl with interval + jitter.';
COMMENT ON COLUMN dispensaries.last_crawl_started_at IS 'When the last crawl task was created. Used to enforce minimum spacing.';
COMMENT ON COLUMN dispensaries.last_inventory_hash IS 'Hash of inventory state from last crawl. Used to detect changes and skip unchanged payloads.';
COMMENT ON COLUMN dispensaries.last_price_hash IS 'Hash of price state from last crawl. Used to detect price changes.';
COMMENT ON COLUMN dispensaries.inventory_changes_24h IS 'Number of inventory changes detected in last 24h. Indicates store volatility.';
COMMENT ON COLUMN dispensaries.price_changes_24h IS 'Number of price changes detected in last 24h.';

View File

@@ -0,0 +1,48 @@
-- Migration 118: Inventory snapshots table
-- Lightweight per-product tracking for sales velocity estimation
-- Part of Real-Time Inventory Tracking feature
CREATE TABLE IF NOT EXISTS inventory_snapshots (
id BIGSERIAL PRIMARY KEY,
dispensary_id INT NOT NULL REFERENCES dispensaries(id) ON DELETE CASCADE,
product_id TEXT NOT NULL, -- provider_product_id (normalized across platforms)
captured_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
-- Platform (for debugging/filtering)
platform TEXT NOT NULL, -- 'dutchie' | 'jane' | 'treez'
-- Inventory fields (normalized from all platforms)
quantity_available INT, -- Dutchie: quantityAvailable, Jane: quantity, Treez: quantityAvailable
is_below_threshold BOOLEAN, -- Dutchie: isBelowThreshold, Jane: computed, Treez: lowInventory
status TEXT, -- Active/Inactive/available
-- Price fields (normalized)
price_rec NUMERIC(10,2), -- recreational price
price_med NUMERIC(10,2), -- medical price (if different)
-- Denormalized for fast queries
brand_name TEXT,
category TEXT,
product_name TEXT
);
-- Primary query: get snapshots for a store over time
CREATE INDEX idx_inv_snap_store_time ON inventory_snapshots(dispensary_id, captured_at DESC);
-- Delta calculation: get consecutive snapshots for a product
CREATE INDEX idx_inv_snap_product_time ON inventory_snapshots(dispensary_id, product_id, captured_at DESC);
-- Brand-level analytics
CREATE INDEX idx_inv_snap_brand_time ON inventory_snapshots(brand_name, captured_at DESC) WHERE brand_name IS NOT NULL;
-- Platform filtering
CREATE INDEX idx_inv_snap_platform ON inventory_snapshots(platform, captured_at DESC);
-- Retention cleanup (30 days) - simple index, cleanup job handles the WHERE
CREATE INDEX IF NOT EXISTS idx_inv_snap_cleanup ON inventory_snapshots(captured_at);
-- Comments
COMMENT ON TABLE inventory_snapshots IS 'Lightweight inventory snapshots for sales velocity tracking. Retained 30 days.';
COMMENT ON COLUMN inventory_snapshots.product_id IS 'Provider product ID, normalized across platforms';
COMMENT ON COLUMN inventory_snapshots.platform IS 'Menu platform: dutchie, jane, or treez';
COMMENT ON COLUMN inventory_snapshots.quantity_available IS 'Current quantity in stock (Dutchie: quantityAvailable, Jane: quantity)';

View File

@@ -0,0 +1,53 @@
-- Migration 119: Product visibility events table
-- Tracks OOS, brand drops, and other notable events for alerts
-- Part of Real-Time Inventory Tracking feature
CREATE TABLE IF NOT EXISTS product_visibility_events (
id SERIAL PRIMARY KEY,
dispensary_id INT NOT NULL REFERENCES dispensaries(id) ON DELETE CASCADE,
-- Product identification (null for brand-level events)
product_id TEXT, -- provider_product_id
product_name TEXT, -- For display in alerts
-- Brand (always populated)
brand_name TEXT,
-- Event details
event_type TEXT NOT NULL, -- 'oos', 'back_in_stock', 'brand_dropped', 'brand_added', 'price_change'
detected_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
-- Context
previous_quantity INT, -- For OOS events: what quantity was before
previous_price NUMERIC(10,2), -- For price change events
new_price NUMERIC(10,2), -- For price change events
price_change_pct NUMERIC(5,2), -- Percentage change (e.g., -15.5 for 15.5% decrease)
-- Platform
platform TEXT, -- 'dutchie' | 'jane' | 'treez'
-- Alert status
notified BOOLEAN DEFAULT FALSE, -- Has external system been notified?
acknowledged_at TIMESTAMPTZ, -- When user acknowledged the alert
acknowledged_by TEXT -- User who acknowledged
);
-- Primary query: recent events by store
CREATE INDEX idx_vis_events_store_time ON product_visibility_events(dispensary_id, detected_at DESC);
-- Alert queries: unnotified events
CREATE INDEX idx_vis_events_unnotified ON product_visibility_events(notified, detected_at DESC) WHERE notified = FALSE;
-- Event type filtering
CREATE INDEX idx_vis_events_type ON product_visibility_events(event_type, detected_at DESC);
-- Brand-level queries
CREATE INDEX idx_vis_events_brand ON product_visibility_events(brand_name, event_type, detected_at DESC) WHERE brand_name IS NOT NULL;
-- Cleanup (90 days retention) - simple index, cleanup job handles the WHERE
CREATE INDEX IF NOT EXISTS idx_vis_events_cleanup ON product_visibility_events(detected_at);
-- Comments
COMMENT ON TABLE product_visibility_events IS 'Notable inventory events for alerting. OOS, brand drops, significant price changes. Retained 90 days.';
COMMENT ON COLUMN product_visibility_events.event_type IS 'Event type: oos (out of stock), back_in_stock, brand_dropped, brand_added, price_change';
COMMENT ON COLUMN product_visibility_events.notified IS 'Whether external systems (other apps) have been notified of this event';

View File

@@ -0,0 +1,13 @@
-- Migration 120: Daily baseline tracking
-- Track when each store's daily baseline payload was last saved
-- Part of Real-Time Inventory Tracking feature
-- Add column to track last baseline save time
ALTER TABLE dispensaries ADD COLUMN IF NOT EXISTS last_baseline_at TIMESTAMPTZ DEFAULT NULL;
-- Index for finding stores that need baselines
CREATE INDEX IF NOT EXISTS idx_dispensaries_baseline ON dispensaries(last_baseline_at)
WHERE crawl_enabled = TRUE;
-- Comment
COMMENT ON COLUMN dispensaries.last_baseline_at IS 'Timestamp of last daily baseline payload save. Baselines saved once per day between 12:01 AM - 3:00 AM.';

View File

@@ -1,6 +1,6 @@
{ {
"name": "dutchie-menus-backend", "name": "dutchie-menus-backend",
"version": "1.5.1", "version": "1.6.0",
"lockfileVersion": 3, "lockfileVersion": 3,
"requires": true, "requires": true,
"packages": { "packages": {
@@ -46,6 +46,97 @@
"resolved": "https://registry.npmjs.org/@ioredis/commands/-/commands-1.4.0.tgz", "resolved": "https://registry.npmjs.org/@ioredis/commands/-/commands-1.4.0.tgz",
"integrity": "sha512-aFT2yemJJo+TZCmieA7qnYGQooOS7QfNmYrzGtsYd3g9j5iDP8AimYYAesf79ohjbLG12XxC4nG5DyEnC88AsQ==" "integrity": "sha512-aFT2yemJJo+TZCmieA7qnYGQooOS7QfNmYrzGtsYd3g9j5iDP8AimYYAesf79ohjbLG12XxC4nG5DyEnC88AsQ=="
}, },
"node_modules/@jsep-plugin/assignment": {
"version": "1.3.0",
"resolved": "https://registry.npmjs.org/@jsep-plugin/assignment/-/assignment-1.3.0.tgz",
"integrity": "sha512-VVgV+CXrhbMI3aSusQyclHkenWSAm95WaiKrMxRFam3JSUiIaQjoMIw2sEs/OX4XifnqeQUN4DYbJjlA8EfktQ==",
"engines": {
"node": ">= 10.16.0"
},
"peerDependencies": {
"jsep": "^0.4.0||^1.0.0"
}
},
"node_modules/@jsep-plugin/regex": {
"version": "1.0.4",
"resolved": "https://registry.npmjs.org/@jsep-plugin/regex/-/regex-1.0.4.tgz",
"integrity": "sha512-q7qL4Mgjs1vByCaTnDFcBnV9HS7GVPJX5vyVoCgZHNSC9rjwIlmbXG5sUuorR5ndfHAIlJ8pVStxvjXHbNvtUg==",
"engines": {
"node": ">= 10.16.0"
},
"peerDependencies": {
"jsep": "^0.4.0||^1.0.0"
}
},
"node_modules/@kubernetes/client-node": {
"version": "1.4.0",
"resolved": "https://registry.npmjs.org/@kubernetes/client-node/-/client-node-1.4.0.tgz",
"integrity": "sha512-Zge3YvF7DJi264dU1b3wb/GmzR99JhUpqTvp+VGHfwZT+g7EOOYNScDJNZwXy9cszyIGPIs0VHr+kk8e95qqrA==",
"dependencies": {
"@types/js-yaml": "^4.0.1",
"@types/node": "^24.0.0",
"@types/node-fetch": "^2.6.13",
"@types/stream-buffers": "^3.0.3",
"form-data": "^4.0.0",
"hpagent": "^1.2.0",
"isomorphic-ws": "^5.0.0",
"js-yaml": "^4.1.0",
"jsonpath-plus": "^10.3.0",
"node-fetch": "^2.7.0",
"openid-client": "^6.1.3",
"rfc4648": "^1.3.0",
"socks-proxy-agent": "^8.0.4",
"stream-buffers": "^3.0.2",
"tar-fs": "^3.0.9",
"ws": "^8.18.2"
}
},
"node_modules/@kubernetes/client-node/node_modules/@types/node": {
"version": "24.10.3",
"resolved": "https://registry.npmjs.org/@types/node/-/node-24.10.3.tgz",
"integrity": "sha512-gqkrWUsS8hcm0r44yn7/xZeV1ERva/nLgrLxFRUGb7aoNMIJfZJ3AC261zDQuOAKC7MiXai1WCpYc48jAHoShQ==",
"dependencies": {
"undici-types": "~7.16.0"
}
},
"node_modules/@kubernetes/client-node/node_modules/tar-fs": {
"version": "3.1.1",
"resolved": "https://registry.npmjs.org/tar-fs/-/tar-fs-3.1.1.tgz",
"integrity": "sha512-LZA0oaPOc2fVo82Txf3gw+AkEd38szODlptMYejQUhndHMLQ9M059uXR+AfS7DNo0NpINvSqDsvyaCrBVkptWg==",
"dependencies": {
"pump": "^3.0.0",
"tar-stream": "^3.1.5"
},
"optionalDependencies": {
"bare-fs": "^4.0.1",
"bare-path": "^3.0.0"
}
},
"node_modules/@kubernetes/client-node/node_modules/undici-types": {
"version": "7.16.0",
"resolved": "https://registry.npmjs.org/undici-types/-/undici-types-7.16.0.tgz",
"integrity": "sha512-Zz+aZWSj8LE6zoxD+xrjh4VfkIG8Ya6LvYkZqtUQGJPZjYl53ypCaUwWqo7eI0x66KBGeRo+mlBEkMSeSZ38Nw=="
},
"node_modules/@kubernetes/client-node/node_modules/ws": {
"version": "8.18.3",
"resolved": "https://registry.npmjs.org/ws/-/ws-8.18.3.tgz",
"integrity": "sha512-PEIGCY5tSlUt50cqyMXfCzX+oOPqN0vuGqWzbcJ2xvnkzkq46oOpz7dQaTDBdfICb4N14+GARUDw2XV2N4tvzg==",
"engines": {
"node": ">=10.0.0"
},
"peerDependencies": {
"bufferutil": "^4.0.1",
"utf-8-validate": ">=5.0.2"
},
"peerDependenciesMeta": {
"bufferutil": {
"optional": true
},
"utf-8-validate": {
"optional": true
}
}
},
"node_modules/@mapbox/node-pre-gyp": { "node_modules/@mapbox/node-pre-gyp": {
"version": "1.0.11", "version": "1.0.11",
"resolved": "https://registry.npmjs.org/@mapbox/node-pre-gyp/-/node-pre-gyp-1.0.11.tgz", "resolved": "https://registry.npmjs.org/@mapbox/node-pre-gyp/-/node-pre-gyp-1.0.11.tgz",
@@ -251,6 +342,11 @@
"integrity": "sha512-r8Tayk8HJnX0FztbZN7oVqGccWgw98T/0neJphO91KkmOzug1KkofZURD4UaD5uH8AqcFLfdPErnBod0u71/qg==", "integrity": "sha512-r8Tayk8HJnX0FztbZN7oVqGccWgw98T/0neJphO91KkmOzug1KkofZURD4UaD5uH8AqcFLfdPErnBod0u71/qg==",
"dev": true "dev": true
}, },
"node_modules/@types/js-yaml": {
"version": "4.0.9",
"resolved": "https://registry.npmjs.org/@types/js-yaml/-/js-yaml-4.0.9.tgz",
"integrity": "sha512-k4MGaQl5TGo/iipqb2UDG2UwjXziSWkh0uysQelTlJpX1qGlpUZYm8PnO4DxG1qBomtJUdYJ6qR6xdIah10JLg=="
},
"node_modules/@types/jsonwebtoken": { "node_modules/@types/jsonwebtoken": {
"version": "9.0.10", "version": "9.0.10",
"resolved": "https://registry.npmjs.org/@types/jsonwebtoken/-/jsonwebtoken-9.0.10.tgz", "resolved": "https://registry.npmjs.org/@types/jsonwebtoken/-/jsonwebtoken-9.0.10.tgz",
@@ -276,7 +372,6 @@
"version": "20.19.25", "version": "20.19.25",
"resolved": "https://registry.npmjs.org/@types/node/-/node-20.19.25.tgz", "resolved": "https://registry.npmjs.org/@types/node/-/node-20.19.25.tgz",
"integrity": "sha512-ZsJzA5thDQMSQO788d7IocwwQbI8B5OPzmqNvpf3NY/+MHDAS759Wo0gd2WQeXYt5AAAQjzcrTVC6SKCuYgoCQ==", "integrity": "sha512-ZsJzA5thDQMSQO788d7IocwwQbI8B5OPzmqNvpf3NY/+MHDAS759Wo0gd2WQeXYt5AAAQjzcrTVC6SKCuYgoCQ==",
"devOptional": true,
"dependencies": { "dependencies": {
"undici-types": "~6.21.0" "undici-types": "~6.21.0"
} }
@@ -287,6 +382,15 @@
"integrity": "sha512-0ikrnug3/IyneSHqCBeslAhlK2aBfYek1fGo4bP4QnZPmiqSGRK+Oy7ZMisLWkesffJvQ1cqAcBnJC+8+nxIAg==", "integrity": "sha512-0ikrnug3/IyneSHqCBeslAhlK2aBfYek1fGo4bP4QnZPmiqSGRK+Oy7ZMisLWkesffJvQ1cqAcBnJC+8+nxIAg==",
"dev": true "dev": true
}, },
"node_modules/@types/node-fetch": {
"version": "2.6.13",
"resolved": "https://registry.npmjs.org/@types/node-fetch/-/node-fetch-2.6.13.tgz",
"integrity": "sha512-QGpRVpzSaUs30JBSGPjOg4Uveu384erbHBoT1zeONvyCfwQxIkUshLAOqN/k9EjGviPRmWTTe6aH2qySWKTVSw==",
"dependencies": {
"@types/node": "*",
"form-data": "^4.0.4"
}
},
"node_modules/@types/pg": { "node_modules/@types/pg": {
"version": "8.15.6", "version": "8.15.6",
"resolved": "https://registry.npmjs.org/@types/pg/-/pg-8.15.6.tgz", "resolved": "https://registry.npmjs.org/@types/pg/-/pg-8.15.6.tgz",
@@ -340,6 +444,14 @@
"@types/node": "*" "@types/node": "*"
} }
}, },
"node_modules/@types/stream-buffers": {
"version": "3.0.8",
"resolved": "https://registry.npmjs.org/@types/stream-buffers/-/stream-buffers-3.0.8.tgz",
"integrity": "sha512-J+7VaHKNvlNPJPEJXX/fKa9DZtR/xPMwuIbe+yNOwp1YB+ApUOBv2aUpEoBJEi8nJgbgs1x8e73ttg0r1rSUdw==",
"dependencies": {
"@types/node": "*"
}
},
"node_modules/@types/uuid": { "node_modules/@types/uuid": {
"version": "9.0.8", "version": "9.0.8",
"resolved": "https://registry.npmjs.org/@types/uuid/-/uuid-9.0.8.tgz", "resolved": "https://registry.npmjs.org/@types/uuid/-/uuid-9.0.8.tgz",
@@ -520,6 +632,78 @@
} }
} }
}, },
"node_modules/bare-fs": {
"version": "4.5.2",
"resolved": "https://registry.npmjs.org/bare-fs/-/bare-fs-4.5.2.tgz",
"integrity": "sha512-veTnRzkb6aPHOvSKIOy60KzURfBdUflr5VReI+NSaPL6xf+XLdONQgZgpYvUuZLVQ8dCqxpBAudaOM1+KpAUxw==",
"optional": true,
"dependencies": {
"bare-events": "^2.5.4",
"bare-path": "^3.0.0",
"bare-stream": "^2.6.4",
"bare-url": "^2.2.2",
"fast-fifo": "^1.3.2"
},
"engines": {
"bare": ">=1.16.0"
},
"peerDependencies": {
"bare-buffer": "*"
},
"peerDependenciesMeta": {
"bare-buffer": {
"optional": true
}
}
},
"node_modules/bare-os": {
"version": "3.6.2",
"resolved": "https://registry.npmjs.org/bare-os/-/bare-os-3.6.2.tgz",
"integrity": "sha512-T+V1+1srU2qYNBmJCXZkUY5vQ0B4FSlL3QDROnKQYOqeiQR8UbjNHlPa+TIbM4cuidiN9GaTaOZgSEgsvPbh5A==",
"optional": true,
"engines": {
"bare": ">=1.14.0"
}
},
"node_modules/bare-path": {
"version": "3.0.0",
"resolved": "https://registry.npmjs.org/bare-path/-/bare-path-3.0.0.tgz",
"integrity": "sha512-tyfW2cQcB5NN8Saijrhqn0Zh7AnFNsnczRcuWODH0eYAXBsJ5gVxAUuNr7tsHSC6IZ77cA0SitzT+s47kot8Mw==",
"optional": true,
"dependencies": {
"bare-os": "^3.0.1"
}
},
"node_modules/bare-stream": {
"version": "2.7.0",
"resolved": "https://registry.npmjs.org/bare-stream/-/bare-stream-2.7.0.tgz",
"integrity": "sha512-oyXQNicV1y8nc2aKffH+BUHFRXmx6VrPzlnaEvMhram0nPBrKcEdcyBg5r08D0i8VxngHFAiVyn1QKXpSG0B8A==",
"optional": true,
"dependencies": {
"streamx": "^2.21.0"
},
"peerDependencies": {
"bare-buffer": "*",
"bare-events": "*"
},
"peerDependenciesMeta": {
"bare-buffer": {
"optional": true
},
"bare-events": {
"optional": true
}
}
},
"node_modules/bare-url": {
"version": "2.3.2",
"resolved": "https://registry.npmjs.org/bare-url/-/bare-url-2.3.2.tgz",
"integrity": "sha512-ZMq4gd9ngV5aTMa5p9+UfY0b3skwhHELaDkhEHetMdX0LRkW9kzaym4oo/Eh+Ghm0CCDuMTsRIGM/ytUc1ZYmw==",
"optional": true,
"dependencies": {
"bare-path": "^3.0.0"
}
},
"node_modules/base64-js": { "node_modules/base64-js": {
"version": "1.5.1", "version": "1.5.1",
"resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.5.1.tgz", "resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.5.1.tgz",
@@ -1026,6 +1210,17 @@
"url": "https://github.com/sponsors/fb55" "url": "https://github.com/sponsors/fb55"
} }
}, },
"node_modules/csv-parser": {
"version": "3.2.0",
"resolved": "https://registry.npmjs.org/csv-parser/-/csv-parser-3.2.0.tgz",
"integrity": "sha512-fgKbp+AJbn1h2dcAHKIdKNSSjfp43BZZykXsCjzALjKy80VXQNHPFJ6T9Afwdzoj24aMkq8GwDS7KGcDPpejrA==",
"bin": {
"csv-parser": "bin/csv-parser"
},
"engines": {
"node": ">= 10"
}
},
"node_modules/data-uri-to-buffer": { "node_modules/data-uri-to-buffer": {
"version": "6.0.2", "version": "6.0.2",
"resolved": "https://registry.npmjs.org/data-uri-to-buffer/-/data-uri-to-buffer-6.0.2.tgz", "resolved": "https://registry.npmjs.org/data-uri-to-buffer/-/data-uri-to-buffer-6.0.2.tgz",
@@ -2008,6 +2203,14 @@
"node": ">=16.0.0" "node": ">=16.0.0"
} }
}, },
"node_modules/hpagent": {
"version": "1.2.0",
"resolved": "https://registry.npmjs.org/hpagent/-/hpagent-1.2.0.tgz",
"integrity": "sha512-A91dYTeIB6NoXG+PxTQpCCDDnfHsW9kc06Lvpu1TEe9gnd6ZFeiBoRO9JvzEv6xK7EX97/dUE8g/vBMTqTS3CA==",
"engines": {
"node": ">=14"
}
},
"node_modules/htmlparser2": { "node_modules/htmlparser2": {
"version": "10.0.0", "version": "10.0.0",
"resolved": "https://registry.npmjs.org/htmlparser2/-/htmlparser2-10.0.0.tgz", "resolved": "https://registry.npmjs.org/htmlparser2/-/htmlparser2-10.0.0.tgz",
@@ -2235,6 +2438,14 @@
"node": ">= 12" "node": ">= 12"
} }
}, },
"node_modules/ip2location-nodejs": {
"version": "9.7.0",
"resolved": "https://registry.npmjs.org/ip2location-nodejs/-/ip2location-nodejs-9.7.0.tgz",
"integrity": "sha512-eQ4T5TXm1cx0+pQcRycPiuaiRuoDEMd9O89Be7Ugk555qi9UY9enXSznkkqr3kQRyUaXx7zj5dORC5LGTPOttA==",
"dependencies": {
"csv-parser": "^3.0.0"
}
},
"node_modules/ipaddr.js": { "node_modules/ipaddr.js": {
"version": "2.2.0", "version": "2.2.0",
"resolved": "https://registry.npmjs.org/ipaddr.js/-/ipaddr.js-2.2.0.tgz", "resolved": "https://registry.npmjs.org/ipaddr.js/-/ipaddr.js-2.2.0.tgz",
@@ -2363,6 +2574,22 @@
"node": ">=0.10.0" "node": ">=0.10.0"
} }
}, },
"node_modules/isomorphic-ws": {
"version": "5.0.0",
"resolved": "https://registry.npmjs.org/isomorphic-ws/-/isomorphic-ws-5.0.0.tgz",
"integrity": "sha512-muId7Zzn9ywDsyXgTIafTry2sV3nySZeUDe6YedVd1Hvuuep5AsIlqK+XefWpYTyJG5e503F2xIuT2lcU6rCSw==",
"peerDependencies": {
"ws": "*"
}
},
"node_modules/jose": {
"version": "6.1.3",
"resolved": "https://registry.npmjs.org/jose/-/jose-6.1.3.tgz",
"integrity": "sha512-0TpaTfihd4QMNwrz/ob2Bp7X04yuxJkjRGi4aKmOqwhov54i6u79oCv7T+C7lo70MKH6BesI3vscD1yb/yzKXQ==",
"funding": {
"url": "https://github.com/sponsors/panva"
}
},
"node_modules/js-tokens": { "node_modules/js-tokens": {
"version": "4.0.0", "version": "4.0.0",
"resolved": "https://registry.npmjs.org/js-tokens/-/js-tokens-4.0.0.tgz", "resolved": "https://registry.npmjs.org/js-tokens/-/js-tokens-4.0.0.tgz",
@@ -2379,6 +2606,14 @@
"js-yaml": "bin/js-yaml.js" "js-yaml": "bin/js-yaml.js"
} }
}, },
"node_modules/jsep": {
"version": "1.4.0",
"resolved": "https://registry.npmjs.org/jsep/-/jsep-1.4.0.tgz",
"integrity": "sha512-B7qPcEVE3NVkmSJbaYxvv4cHkVW7DQsZz13pUMrfS8z8Q/BuShN+gcTXrUlPiGqM2/t/EEaI030bpxMqY8gMlw==",
"engines": {
"node": ">= 10.16.0"
}
},
"node_modules/json-parse-even-better-errors": { "node_modules/json-parse-even-better-errors": {
"version": "2.3.1", "version": "2.3.1",
"resolved": "https://registry.npmjs.org/json-parse-even-better-errors/-/json-parse-even-better-errors-2.3.1.tgz", "resolved": "https://registry.npmjs.org/json-parse-even-better-errors/-/json-parse-even-better-errors-2.3.1.tgz",
@@ -2400,6 +2635,23 @@
"graceful-fs": "^4.1.6" "graceful-fs": "^4.1.6"
} }
}, },
"node_modules/jsonpath-plus": {
"version": "10.3.0",
"resolved": "https://registry.npmjs.org/jsonpath-plus/-/jsonpath-plus-10.3.0.tgz",
"integrity": "sha512-8TNmfeTCk2Le33A3vRRwtuworG/L5RrgMvdjhKZxvyShO+mBu2fP50OWUjRLNtvw344DdDarFh9buFAZs5ujeA==",
"dependencies": {
"@jsep-plugin/assignment": "^1.3.0",
"@jsep-plugin/regex": "^1.0.4",
"jsep": "^1.4.0"
},
"bin": {
"jsonpath": "bin/jsonpath-cli.js",
"jsonpath-plus": "bin/jsonpath-cli.js"
},
"engines": {
"node": ">=18.0.0"
}
},
"node_modules/jsonwebtoken": { "node_modules/jsonwebtoken": {
"version": "9.0.2", "version": "9.0.2",
"resolved": "https://registry.npmjs.org/jsonwebtoken/-/jsonwebtoken-9.0.2.tgz", "resolved": "https://registry.npmjs.org/jsonwebtoken/-/jsonwebtoken-9.0.2.tgz",
@@ -2474,6 +2726,11 @@
"resolved": "https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz", "resolved": "https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz",
"integrity": "sha512-v2kDEe57lecTulaDIuNTPy3Ry4gLGJ6Z1O3vE1krgXZNrsQ+LFTGHVxVjcXPs17LhbZVGedAJv8XZ1tvj5FvSg==" "integrity": "sha512-v2kDEe57lecTulaDIuNTPy3Ry4gLGJ6Z1O3vE1krgXZNrsQ+LFTGHVxVjcXPs17LhbZVGedAJv8XZ1tvj5FvSg=="
}, },
"node_modules/lodash.clonedeep": {
"version": "4.5.0",
"resolved": "https://registry.npmjs.org/lodash.clonedeep/-/lodash.clonedeep-4.5.0.tgz",
"integrity": "sha512-H5ZhCF25riFd9uB5UCkVKo61m3S/xZk1x4wA6yp/L3RFP6Z/eHH1ymQcGLo7J3GMPfm0V/7m1tryHuGVxpqEBQ=="
},
"node_modules/lodash.defaults": { "node_modules/lodash.defaults": {
"version": "4.2.0", "version": "4.2.0",
"resolved": "https://registry.npmjs.org/lodash.defaults/-/lodash.defaults-4.2.0.tgz", "resolved": "https://registry.npmjs.org/lodash.defaults/-/lodash.defaults-4.2.0.tgz",
@@ -2923,6 +3180,14 @@
"url": "https://github.com/fb55/nth-check?sponsor=1" "url": "https://github.com/fb55/nth-check?sponsor=1"
} }
}, },
"node_modules/oauth4webapi": {
"version": "3.8.3",
"resolved": "https://registry.npmjs.org/oauth4webapi/-/oauth4webapi-3.8.3.tgz",
"integrity": "sha512-pQ5BsX3QRTgnt5HxgHwgunIRaDXBdkT23tf8dfzmtTIL2LTpdmxgbpbBm0VgFWAIDlezQvQCTgnVIUmHupXHxw==",
"funding": {
"url": "https://github.com/sponsors/panva"
}
},
"node_modules/object-assign": { "node_modules/object-assign": {
"version": "4.1.1", "version": "4.1.1",
"resolved": "https://registry.npmjs.org/object-assign/-/object-assign-4.1.1.tgz", "resolved": "https://registry.npmjs.org/object-assign/-/object-assign-4.1.1.tgz",
@@ -2961,6 +3226,18 @@
"wrappy": "1" "wrappy": "1"
} }
}, },
"node_modules/openid-client": {
"version": "6.8.1",
"resolved": "https://registry.npmjs.org/openid-client/-/openid-client-6.8.1.tgz",
"integrity": "sha512-VoYT6enBo6Vj2j3Q5Ec0AezS+9YGzQo1f5Xc42lreMGlfP4ljiXPKVDvCADh+XHCV/bqPu/wWSiCVXbJKvrODw==",
"dependencies": {
"jose": "^6.1.0",
"oauth4webapi": "^3.8.2"
},
"funding": {
"url": "https://github.com/sponsors/panva"
}
},
"node_modules/pac-proxy-agent": { "node_modules/pac-proxy-agent": {
"version": "7.2.0", "version": "7.2.0",
"resolved": "https://registry.npmjs.org/pac-proxy-agent/-/pac-proxy-agent-7.2.0.tgz", "resolved": "https://registry.npmjs.org/pac-proxy-agent/-/pac-proxy-agent-7.2.0.tgz",
@@ -3864,6 +4141,11 @@
"url": "https://github.com/privatenumber/resolve-pkg-maps?sponsor=1" "url": "https://github.com/privatenumber/resolve-pkg-maps?sponsor=1"
} }
}, },
"node_modules/rfc4648": {
"version": "1.5.4",
"resolved": "https://registry.npmjs.org/rfc4648/-/rfc4648-1.5.4.tgz",
"integrity": "sha512-rRg/6Lb+IGfJqO05HZkN50UtY7K/JhxJag1kP23+zyMfrvoB0B7RWv06MbOzoc79RgCdNTiUaNsTT1AJZ7Z+cg=="
},
"node_modules/rimraf": { "node_modules/rimraf": {
"version": "3.0.2", "version": "3.0.2",
"resolved": "https://registry.npmjs.org/rimraf/-/rimraf-3.0.2.tgz", "resolved": "https://registry.npmjs.org/rimraf/-/rimraf-3.0.2.tgz",
@@ -4294,6 +4576,14 @@
"node": ">= 0.8" "node": ">= 0.8"
} }
}, },
"node_modules/stream-buffers": {
"version": "3.0.3",
"resolved": "https://registry.npmjs.org/stream-buffers/-/stream-buffers-3.0.3.tgz",
"integrity": "sha512-pqMqwQCso0PBJt2PQmDO0cFj0lyqmiwOMiMSkVtRokl7e+ZTRYgDHKnuZNbqjiJXgsg4nuqtD/zxuo9KqTp0Yw==",
"engines": {
"node": ">= 0.10.0"
}
},
"node_modules/streamx": { "node_modules/streamx": {
"version": "2.23.0", "version": "2.23.0",
"resolved": "https://registry.npmjs.org/streamx/-/streamx-2.23.0.tgz", "resolved": "https://registry.npmjs.org/streamx/-/streamx-2.23.0.tgz",
@@ -4513,8 +4803,7 @@
"node_modules/undici-types": { "node_modules/undici-types": {
"version": "6.21.0", "version": "6.21.0",
"resolved": "https://registry.npmjs.org/undici-types/-/undici-types-6.21.0.tgz", "resolved": "https://registry.npmjs.org/undici-types/-/undici-types-6.21.0.tgz",
"integrity": "sha512-iwDZqg0QAGrg9Rav5H4n0M64c3mkR59cJ6wQp+7C4nI0gsmExaedaYLNO44eT4AtBBwjbTiGPMlt2Md0T9H9JQ==", "integrity": "sha512-iwDZqg0QAGrg9Rav5H4n0M64c3mkR59cJ6wQp+7C4nI0gsmExaedaYLNO44eT4AtBBwjbTiGPMlt2Md0T9H9JQ=="
"devOptional": true
}, },
"node_modules/universalify": { "node_modules/universalify": {
"version": "2.0.1", "version": "2.0.1",
@@ -4537,6 +4826,14 @@
"resolved": "https://registry.npmjs.org/urlpattern-polyfill/-/urlpattern-polyfill-10.0.0.tgz", "resolved": "https://registry.npmjs.org/urlpattern-polyfill/-/urlpattern-polyfill-10.0.0.tgz",
"integrity": "sha512-H/A06tKD7sS1O1X2SshBVeA5FLycRpjqiBeqGKmBwBDBy28EnRjORxTNe269KSSr5un5qyWi1iL61wLxpd+ZOg==" "integrity": "sha512-H/A06tKD7sS1O1X2SshBVeA5FLycRpjqiBeqGKmBwBDBy28EnRjORxTNe269KSSr5un5qyWi1iL61wLxpd+ZOg=="
}, },
"node_modules/user-agents": {
"version": "1.1.669",
"resolved": "https://registry.npmjs.org/user-agents/-/user-agents-1.1.669.tgz",
"integrity": "sha512-pbIzG+AOqCaIpySKJ4IAm1l0VyE4jMnK4y1thV8lm8PYxI+7X5uWcppOK7zY79TCKKTAnJH3/4gaVIZHsjrmJA==",
"dependencies": {
"lodash.clonedeep": "^4.5.0"
}
},
"node_modules/util": { "node_modules/util": {
"version": "0.12.5", "version": "0.12.5",
"resolved": "https://registry.npmjs.org/util/-/util-0.12.5.tgz", "resolved": "https://registry.npmjs.org/util/-/util-0.12.5.tgz",

View File

@@ -1,13 +1,14 @@
{ {
"name": "dutchie-menus-backend", "name": "dutchie-menus-backend",
"version": "1.5.1", "version": "1.6.0",
"lockfileVersion": 3, "lockfileVersion": 3,
"requires": true, "requires": true,
"packages": { "packages": {
"": { "": {
"name": "dutchie-menus-backend", "name": "dutchie-menus-backend",
"version": "1.5.1", "version": "1.6.0",
"dependencies": { "dependencies": {
"@kubernetes/client-node": "^1.4.0",
"@types/bcryptjs": "^3.0.0", "@types/bcryptjs": "^3.0.0",
"axios": "^1.6.2", "axios": "^1.6.2",
"bcrypt": "^5.1.1", "bcrypt": "^5.1.1",
@@ -21,6 +22,7 @@
"helmet": "^7.1.0", "helmet": "^7.1.0",
"https-proxy-agent": "^7.0.2", "https-proxy-agent": "^7.0.2",
"ioredis": "^5.8.2", "ioredis": "^5.8.2",
"ip2location-nodejs": "^9.7.0",
"ipaddr.js": "^2.2.0", "ipaddr.js": "^2.2.0",
"jsonwebtoken": "^9.0.2", "jsonwebtoken": "^9.0.2",
"minio": "^7.1.3", "minio": "^7.1.3",
@@ -33,6 +35,9 @@
"puppeteer-extra-plugin-stealth": "^2.11.2", "puppeteer-extra-plugin-stealth": "^2.11.2",
"sharp": "^0.32.0", "sharp": "^0.32.0",
"socks-proxy-agent": "^8.0.2", "socks-proxy-agent": "^8.0.2",
"swagger-jsdoc": "^6.2.8",
"swagger-ui-express": "^5.0.1",
"user-agents": "^1.1.669",
"uuid": "^9.0.1", "uuid": "^9.0.1",
"zod": "^3.22.4" "zod": "^3.22.4"
}, },
@@ -44,11 +49,53 @@
"@types/node": "^20.10.5", "@types/node": "^20.10.5",
"@types/node-cron": "^3.0.11", "@types/node-cron": "^3.0.11",
"@types/pg": "^8.15.6", "@types/pg": "^8.15.6",
"@types/swagger-jsdoc": "^6.0.4",
"@types/swagger-ui-express": "^4.1.8",
"@types/uuid": "^9.0.7", "@types/uuid": "^9.0.7",
"tsx": "^4.7.0", "tsx": "^4.7.0",
"typescript": "^5.3.3" "typescript": "^5.3.3"
} }
}, },
"node_modules/@apidevtools/json-schema-ref-parser": {
"version": "9.1.2",
"resolved": "https://registry.npmjs.org/@apidevtools/json-schema-ref-parser/-/json-schema-ref-parser-9.1.2.tgz",
"integrity": "sha512-r1w81DpR+KyRWd3f+rk6TNqMgedmAxZP5v5KWlXQWlgMUUtyEJch0DKEci1SorPMiSeM8XPl7MZ3miJ60JIpQg==",
"dependencies": {
"@jsdevtools/ono": "^7.1.3",
"@types/json-schema": "^7.0.6",
"call-me-maybe": "^1.0.1",
"js-yaml": "^4.1.0"
}
},
"node_modules/@apidevtools/openapi-schemas": {
"version": "2.1.0",
"resolved": "https://registry.npmjs.org/@apidevtools/openapi-schemas/-/openapi-schemas-2.1.0.tgz",
"integrity": "sha512-Zc1AlqrJlX3SlpupFGpiLi2EbteyP7fXmUOGup6/DnkRgjP9bgMM/ag+n91rsv0U1Gpz0H3VILA/o3bW7Ua6BQ==",
"engines": {
"node": ">=10"
}
},
"node_modules/@apidevtools/swagger-methods": {
"version": "3.0.2",
"resolved": "https://registry.npmjs.org/@apidevtools/swagger-methods/-/swagger-methods-3.0.2.tgz",
"integrity": "sha512-QAkD5kK2b1WfjDS/UQn/qQkbwF31uqRjPTrsCs5ZG9BQGAkjwvqGFjjPqAuzac/IYzpPtRzjCP1WrTuAIjMrXg=="
},
"node_modules/@apidevtools/swagger-parser": {
"version": "10.0.3",
"resolved": "https://registry.npmjs.org/@apidevtools/swagger-parser/-/swagger-parser-10.0.3.tgz",
"integrity": "sha512-sNiLY51vZOmSPFZA5TF35KZ2HbgYklQnTSDnkghamzLb3EkNtcQnrBQEj5AOCxHpTtXpqMCRM1CrmV2rG6nw4g==",
"dependencies": {
"@apidevtools/json-schema-ref-parser": "^9.0.6",
"@apidevtools/openapi-schemas": "^2.0.4",
"@apidevtools/swagger-methods": "^3.0.2",
"@jsdevtools/ono": "^7.1.3",
"call-me-maybe": "^1.0.1",
"z-schema": "^5.0.1"
},
"peerDependencies": {
"openapi-types": ">=7"
}
},
"node_modules/@babel/code-frame": { "node_modules/@babel/code-frame": {
"version": "7.27.1", "version": "7.27.1",
"resolved": "https://registry.npmjs.org/@babel/code-frame/-/code-frame-7.27.1.tgz", "resolved": "https://registry.npmjs.org/@babel/code-frame/-/code-frame-7.27.1.tgz",
@@ -491,6 +538,102 @@
"resolved": "https://registry.npmjs.org/@ioredis/commands/-/commands-1.4.0.tgz", "resolved": "https://registry.npmjs.org/@ioredis/commands/-/commands-1.4.0.tgz",
"integrity": "sha512-aFT2yemJJo+TZCmieA7qnYGQooOS7QfNmYrzGtsYd3g9j5iDP8AimYYAesf79ohjbLG12XxC4nG5DyEnC88AsQ==" "integrity": "sha512-aFT2yemJJo+TZCmieA7qnYGQooOS7QfNmYrzGtsYd3g9j5iDP8AimYYAesf79ohjbLG12XxC4nG5DyEnC88AsQ=="
}, },
"node_modules/@jsdevtools/ono": {
"version": "7.1.3",
"resolved": "https://registry.npmjs.org/@jsdevtools/ono/-/ono-7.1.3.tgz",
"integrity": "sha512-4JQNk+3mVzK3xh2rqd6RB4J46qUR19azEHBneZyTZM+c456qOrbbM/5xcR8huNCCcbVt7+UmizG6GuUvPvKUYg=="
},
"node_modules/@jsep-plugin/assignment": {
"version": "1.3.0",
"resolved": "https://registry.npmjs.org/@jsep-plugin/assignment/-/assignment-1.3.0.tgz",
"integrity": "sha512-VVgV+CXrhbMI3aSusQyclHkenWSAm95WaiKrMxRFam3JSUiIaQjoMIw2sEs/OX4XifnqeQUN4DYbJjlA8EfktQ==",
"engines": {
"node": ">= 10.16.0"
},
"peerDependencies": {
"jsep": "^0.4.0||^1.0.0"
}
},
"node_modules/@jsep-plugin/regex": {
"version": "1.0.4",
"resolved": "https://registry.npmjs.org/@jsep-plugin/regex/-/regex-1.0.4.tgz",
"integrity": "sha512-q7qL4Mgjs1vByCaTnDFcBnV9HS7GVPJX5vyVoCgZHNSC9rjwIlmbXG5sUuorR5ndfHAIlJ8pVStxvjXHbNvtUg==",
"engines": {
"node": ">= 10.16.0"
},
"peerDependencies": {
"jsep": "^0.4.0||^1.0.0"
}
},
"node_modules/@kubernetes/client-node": {
"version": "1.4.0",
"resolved": "https://registry.npmjs.org/@kubernetes/client-node/-/client-node-1.4.0.tgz",
"integrity": "sha512-Zge3YvF7DJi264dU1b3wb/GmzR99JhUpqTvp+VGHfwZT+g7EOOYNScDJNZwXy9cszyIGPIs0VHr+kk8e95qqrA==",
"dependencies": {
"@types/js-yaml": "^4.0.1",
"@types/node": "^24.0.0",
"@types/node-fetch": "^2.6.13",
"@types/stream-buffers": "^3.0.3",
"form-data": "^4.0.0",
"hpagent": "^1.2.0",
"isomorphic-ws": "^5.0.0",
"js-yaml": "^4.1.0",
"jsonpath-plus": "^10.3.0",
"node-fetch": "^2.7.0",
"openid-client": "^6.1.3",
"rfc4648": "^1.3.0",
"socks-proxy-agent": "^8.0.4",
"stream-buffers": "^3.0.2",
"tar-fs": "^3.0.9",
"ws": "^8.18.2"
}
},
"node_modules/@kubernetes/client-node/node_modules/@types/node": {
"version": "24.10.3",
"resolved": "https://registry.npmjs.org/@types/node/-/node-24.10.3.tgz",
"integrity": "sha512-gqkrWUsS8hcm0r44yn7/xZeV1ERva/nLgrLxFRUGb7aoNMIJfZJ3AC261zDQuOAKC7MiXai1WCpYc48jAHoShQ==",
"dependencies": {
"undici-types": "~7.16.0"
}
},
"node_modules/@kubernetes/client-node/node_modules/tar-fs": {
"version": "3.1.1",
"resolved": "https://registry.npmjs.org/tar-fs/-/tar-fs-3.1.1.tgz",
"integrity": "sha512-LZA0oaPOc2fVo82Txf3gw+AkEd38szODlptMYejQUhndHMLQ9M059uXR+AfS7DNo0NpINvSqDsvyaCrBVkptWg==",
"dependencies": {
"pump": "^3.0.0",
"tar-stream": "^3.1.5"
},
"optionalDependencies": {
"bare-fs": "^4.0.1",
"bare-path": "^3.0.0"
}
},
"node_modules/@kubernetes/client-node/node_modules/undici-types": {
"version": "7.16.0",
"resolved": "https://registry.npmjs.org/undici-types/-/undici-types-7.16.0.tgz",
"integrity": "sha512-Zz+aZWSj8LE6zoxD+xrjh4VfkIG8Ya6LvYkZqtUQGJPZjYl53ypCaUwWqo7eI0x66KBGeRo+mlBEkMSeSZ38Nw=="
},
"node_modules/@kubernetes/client-node/node_modules/ws": {
"version": "8.18.3",
"resolved": "https://registry.npmjs.org/ws/-/ws-8.18.3.tgz",
"integrity": "sha512-PEIGCY5tSlUt50cqyMXfCzX+oOPqN0vuGqWzbcJ2xvnkzkq46oOpz7dQaTDBdfICb4N14+GARUDw2XV2N4tvzg==",
"engines": {
"node": ">=10.0.0"
},
"peerDependencies": {
"bufferutil": "^4.0.1",
"utf-8-validate": ">=5.0.2"
},
"peerDependenciesMeta": {
"bufferutil": {
"optional": true
},
"utf-8-validate": {
"optional": true
}
}
},
"node_modules/@mapbox/node-pre-gyp": { "node_modules/@mapbox/node-pre-gyp": {
"version": "1.0.11", "version": "1.0.11",
"resolved": "https://registry.npmjs.org/@mapbox/node-pre-gyp/-/node-pre-gyp-1.0.11.tgz", "resolved": "https://registry.npmjs.org/@mapbox/node-pre-gyp/-/node-pre-gyp-1.0.11.tgz",
@@ -667,6 +810,12 @@
"resolved": "https://registry.npmjs.org/ms/-/ms-2.1.2.tgz", "resolved": "https://registry.npmjs.org/ms/-/ms-2.1.2.tgz",
"integrity": "sha512-sGkPx+VjMtmA6MX27oA4FBFELFCZZ4S4XqeGOXCv68tT+jb3vk/RyaKWP0PTKyWtmLSM0b+adUTEvbs1PEaH2w==" "integrity": "sha512-sGkPx+VjMtmA6MX27oA4FBFELFCZZ4S4XqeGOXCv68tT+jb3vk/RyaKWP0PTKyWtmLSM0b+adUTEvbs1PEaH2w=="
}, },
"node_modules/@scarf/scarf": {
"version": "1.4.0",
"resolved": "https://registry.npmjs.org/@scarf/scarf/-/scarf-1.4.0.tgz",
"integrity": "sha512-xxeapPiUXdZAE3che6f3xogoJPeZgig6omHEy1rIY5WVsB3H2BHNnZH+gHG6x91SCWyQCzWGsuL2Hh3ClO5/qQ==",
"hasInstallScript": true
},
"node_modules/@tootallnate/quickjs-emscripten": { "node_modules/@tootallnate/quickjs-emscripten": {
"version": "0.23.0", "version": "0.23.0",
"resolved": "https://registry.npmjs.org/@tootallnate/quickjs-emscripten/-/quickjs-emscripten-0.23.0.tgz", "resolved": "https://registry.npmjs.org/@tootallnate/quickjs-emscripten/-/quickjs-emscripten-0.23.0.tgz",
@@ -756,6 +905,16 @@
"integrity": "sha512-r8Tayk8HJnX0FztbZN7oVqGccWgw98T/0neJphO91KkmOzug1KkofZURD4UaD5uH8AqcFLfdPErnBod0u71/qg==", "integrity": "sha512-r8Tayk8HJnX0FztbZN7oVqGccWgw98T/0neJphO91KkmOzug1KkofZURD4UaD5uH8AqcFLfdPErnBod0u71/qg==",
"dev": true "dev": true
}, },
"node_modules/@types/js-yaml": {
"version": "4.0.9",
"resolved": "https://registry.npmjs.org/@types/js-yaml/-/js-yaml-4.0.9.tgz",
"integrity": "sha512-k4MGaQl5TGo/iipqb2UDG2UwjXziSWkh0uysQelTlJpX1qGlpUZYm8PnO4DxG1qBomtJUdYJ6qR6xdIah10JLg=="
},
"node_modules/@types/json-schema": {
"version": "7.0.15",
"resolved": "https://registry.npmjs.org/@types/json-schema/-/json-schema-7.0.15.tgz",
"integrity": "sha512-5+fP8P8MFNC+AyZCDxrB2pkZFPGzqQWUzpSeuuVLvm8VMcorNYavBqoFcxK8bQz4Qsbn4oUEEem4wDLfcysGHA=="
},
"node_modules/@types/jsonwebtoken": { "node_modules/@types/jsonwebtoken": {
"version": "9.0.10", "version": "9.0.10",
"resolved": "https://registry.npmjs.org/@types/jsonwebtoken/-/jsonwebtoken-9.0.10.tgz", "resolved": "https://registry.npmjs.org/@types/jsonwebtoken/-/jsonwebtoken-9.0.10.tgz",
@@ -781,7 +940,6 @@
"version": "20.19.25", "version": "20.19.25",
"resolved": "https://registry.npmjs.org/@types/node/-/node-20.19.25.tgz", "resolved": "https://registry.npmjs.org/@types/node/-/node-20.19.25.tgz",
"integrity": "sha512-ZsJzA5thDQMSQO788d7IocwwQbI8B5OPzmqNvpf3NY/+MHDAS759Wo0gd2WQeXYt5AAAQjzcrTVC6SKCuYgoCQ==", "integrity": "sha512-ZsJzA5thDQMSQO788d7IocwwQbI8B5OPzmqNvpf3NY/+MHDAS759Wo0gd2WQeXYt5AAAQjzcrTVC6SKCuYgoCQ==",
"devOptional": true,
"dependencies": { "dependencies": {
"undici-types": "~6.21.0" "undici-types": "~6.21.0"
} }
@@ -792,6 +950,15 @@
"integrity": "sha512-0ikrnug3/IyneSHqCBeslAhlK2aBfYek1fGo4bP4QnZPmiqSGRK+Oy7ZMisLWkesffJvQ1cqAcBnJC+8+nxIAg==", "integrity": "sha512-0ikrnug3/IyneSHqCBeslAhlK2aBfYek1fGo4bP4QnZPmiqSGRK+Oy7ZMisLWkesffJvQ1cqAcBnJC+8+nxIAg==",
"dev": true "dev": true
}, },
"node_modules/@types/node-fetch": {
"version": "2.6.13",
"resolved": "https://registry.npmjs.org/@types/node-fetch/-/node-fetch-2.6.13.tgz",
"integrity": "sha512-QGpRVpzSaUs30JBSGPjOg4Uveu384erbHBoT1zeONvyCfwQxIkUshLAOqN/k9EjGviPRmWTTe6aH2qySWKTVSw==",
"dependencies": {
"@types/node": "*",
"form-data": "^4.0.4"
}
},
"node_modules/@types/pg": { "node_modules/@types/pg": {
"version": "8.15.6", "version": "8.15.6",
"resolved": "https://registry.npmjs.org/@types/pg/-/pg-8.15.6.tgz", "resolved": "https://registry.npmjs.org/@types/pg/-/pg-8.15.6.tgz",
@@ -845,6 +1012,30 @@
"@types/node": "*" "@types/node": "*"
} }
}, },
"node_modules/@types/stream-buffers": {
"version": "3.0.8",
"resolved": "https://registry.npmjs.org/@types/stream-buffers/-/stream-buffers-3.0.8.tgz",
"integrity": "sha512-J+7VaHKNvlNPJPEJXX/fKa9DZtR/xPMwuIbe+yNOwp1YB+ApUOBv2aUpEoBJEi8nJgbgs1x8e73ttg0r1rSUdw==",
"dependencies": {
"@types/node": "*"
}
},
"node_modules/@types/swagger-jsdoc": {
"version": "6.0.4",
"resolved": "https://registry.npmjs.org/@types/swagger-jsdoc/-/swagger-jsdoc-6.0.4.tgz",
"integrity": "sha512-W+Xw5epcOZrF/AooUM/PccNMSAFOKWZA5dasNyMujTwsBkU74njSJBpvCCJhHAJ95XRMzQrrW844Btu0uoetwQ==",
"dev": true
},
"node_modules/@types/swagger-ui-express": {
"version": "4.1.8",
"resolved": "https://registry.npmjs.org/@types/swagger-ui-express/-/swagger-ui-express-4.1.8.tgz",
"integrity": "sha512-AhZV8/EIreHFmBV5wAs0gzJUNq9JbbSXgJLQubCC0jtIo6prnI9MIRRxnU4MZX9RB9yXxF1V4R7jtLl/Wcj31g==",
"dev": true,
"dependencies": {
"@types/express": "*",
"@types/serve-static": "*"
}
},
"node_modules/@types/uuid": { "node_modules/@types/uuid": {
"version": "9.0.8", "version": "9.0.8",
"resolved": "https://registry.npmjs.org/@types/uuid/-/uuid-9.0.8.tgz", "resolved": "https://registry.npmjs.org/@types/uuid/-/uuid-9.0.8.tgz",
@@ -1025,6 +1216,78 @@
} }
} }
}, },
"node_modules/bare-fs": {
"version": "4.5.2",
"resolved": "https://registry.npmjs.org/bare-fs/-/bare-fs-4.5.2.tgz",
"integrity": "sha512-veTnRzkb6aPHOvSKIOy60KzURfBdUflr5VReI+NSaPL6xf+XLdONQgZgpYvUuZLVQ8dCqxpBAudaOM1+KpAUxw==",
"optional": true,
"dependencies": {
"bare-events": "^2.5.4",
"bare-path": "^3.0.0",
"bare-stream": "^2.6.4",
"bare-url": "^2.2.2",
"fast-fifo": "^1.3.2"
},
"engines": {
"bare": ">=1.16.0"
},
"peerDependencies": {
"bare-buffer": "*"
},
"peerDependenciesMeta": {
"bare-buffer": {
"optional": true
}
}
},
"node_modules/bare-os": {
"version": "3.6.2",
"resolved": "https://registry.npmjs.org/bare-os/-/bare-os-3.6.2.tgz",
"integrity": "sha512-T+V1+1srU2qYNBmJCXZkUY5vQ0B4FSlL3QDROnKQYOqeiQR8UbjNHlPa+TIbM4cuidiN9GaTaOZgSEgsvPbh5A==",
"optional": true,
"engines": {
"bare": ">=1.14.0"
}
},
"node_modules/bare-path": {
"version": "3.0.0",
"resolved": "https://registry.npmjs.org/bare-path/-/bare-path-3.0.0.tgz",
"integrity": "sha512-tyfW2cQcB5NN8Saijrhqn0Zh7AnFNsnczRcuWODH0eYAXBsJ5gVxAUuNr7tsHSC6IZ77cA0SitzT+s47kot8Mw==",
"optional": true,
"dependencies": {
"bare-os": "^3.0.1"
}
},
"node_modules/bare-stream": {
"version": "2.7.0",
"resolved": "https://registry.npmjs.org/bare-stream/-/bare-stream-2.7.0.tgz",
"integrity": "sha512-oyXQNicV1y8nc2aKffH+BUHFRXmx6VrPzlnaEvMhram0nPBrKcEdcyBg5r08D0i8VxngHFAiVyn1QKXpSG0B8A==",
"optional": true,
"dependencies": {
"streamx": "^2.21.0"
},
"peerDependencies": {
"bare-buffer": "*",
"bare-events": "*"
},
"peerDependenciesMeta": {
"bare-buffer": {
"optional": true
},
"bare-events": {
"optional": true
}
}
},
"node_modules/bare-url": {
"version": "2.3.2",
"resolved": "https://registry.npmjs.org/bare-url/-/bare-url-2.3.2.tgz",
"integrity": "sha512-ZMq4gd9ngV5aTMa5p9+UfY0b3skwhHELaDkhEHetMdX0LRkW9kzaym4oo/Eh+Ghm0CCDuMTsRIGM/ytUc1ZYmw==",
"optional": true,
"dependencies": {
"bare-path": "^3.0.0"
}
},
"node_modules/base64-js": { "node_modules/base64-js": {
"version": "1.5.1", "version": "1.5.1",
"resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.5.1.tgz", "resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.5.1.tgz",
@@ -1247,6 +1510,11 @@
"url": "https://github.com/sponsors/ljharb" "url": "https://github.com/sponsors/ljharb"
} }
}, },
"node_modules/call-me-maybe": {
"version": "1.0.2",
"resolved": "https://registry.npmjs.org/call-me-maybe/-/call-me-maybe-1.0.2.tgz",
"integrity": "sha512-HpX65o1Hnr9HH25ojC1YGs7HCQLq0GCOibSaWER0eNpgJ/Z1MZv2mTc7+xh6WOPxbRVcmgbv4hGU+uSQ/2xFZQ=="
},
"node_modules/callsites": { "node_modules/callsites": {
"version": "3.1.0", "version": "3.1.0",
"resolved": "https://registry.npmjs.org/callsites/-/callsites-3.1.0.tgz", "resolved": "https://registry.npmjs.org/callsites/-/callsites-3.1.0.tgz",
@@ -1407,6 +1675,14 @@
"node": ">= 0.8" "node": ">= 0.8"
} }
}, },
"node_modules/commander": {
"version": "6.2.0",
"resolved": "https://registry.npmjs.org/commander/-/commander-6.2.0.tgz",
"integrity": "sha512-zP4jEKbe8SHzKJYQmq8Y9gYjtO/POJLgIdKgV7B9qNmABVFVc+ctqSX6iXh4mCpJfRBOabiZ2YKPg8ciDw6C+Q==",
"engines": {
"node": ">= 6"
}
},
"node_modules/concat-map": { "node_modules/concat-map": {
"version": "0.0.1", "version": "0.0.1",
"resolved": "https://registry.npmjs.org/concat-map/-/concat-map-0.0.1.tgz", "resolved": "https://registry.npmjs.org/concat-map/-/concat-map-0.0.1.tgz",
@@ -1531,6 +1807,17 @@
"url": "https://github.com/sponsors/fb55" "url": "https://github.com/sponsors/fb55"
} }
}, },
"node_modules/csv-parser": {
"version": "3.2.0",
"resolved": "https://registry.npmjs.org/csv-parser/-/csv-parser-3.2.0.tgz",
"integrity": "sha512-fgKbp+AJbn1h2dcAHKIdKNSSjfp43BZZykXsCjzALjKy80VXQNHPFJ6T9Afwdzoj24aMkq8GwDS7KGcDPpejrA==",
"bin": {
"csv-parser": "bin/csv-parser"
},
"engines": {
"node": ">= 10"
}
},
"node_modules/data-uri-to-buffer": { "node_modules/data-uri-to-buffer": {
"version": "6.0.2", "version": "6.0.2",
"resolved": "https://registry.npmjs.org/data-uri-to-buffer/-/data-uri-to-buffer-6.0.2.tgz", "resolved": "https://registry.npmjs.org/data-uri-to-buffer/-/data-uri-to-buffer-6.0.2.tgz",
@@ -1665,6 +1952,17 @@
"resolved": "https://registry.npmjs.org/devtools-protocol/-/devtools-protocol-0.0.1232444.tgz", "resolved": "https://registry.npmjs.org/devtools-protocol/-/devtools-protocol-0.0.1232444.tgz",
"integrity": "sha512-pM27vqEfxSxRkTMnF+XCmxSEb6duO5R+t8A9DEEJgy4Wz2RVanje2mmj99B6A3zv2r/qGfYlOvYznUhuokizmg==" "integrity": "sha512-pM27vqEfxSxRkTMnF+XCmxSEb6duO5R+t8A9DEEJgy4Wz2RVanje2mmj99B6A3zv2r/qGfYlOvYznUhuokizmg=="
}, },
"node_modules/doctrine": {
"version": "3.0.0",
"resolved": "https://registry.npmjs.org/doctrine/-/doctrine-3.0.0.tgz",
"integrity": "sha512-yS+Q5i3hBf7GBkd4KG8a7eBNNWNGLTaEwwYWUijIYM7zrlYDM0BFXHjjPWlWZ1Rg7UaddZeIDmi9jF3HmqiQ2w==",
"dependencies": {
"esutils": "^2.0.2"
},
"engines": {
"node": ">=6.0.0"
}
},
"node_modules/dom-serializer": { "node_modules/dom-serializer": {
"version": "2.0.0", "version": "2.0.0",
"resolved": "https://registry.npmjs.org/dom-serializer/-/dom-serializer-2.0.0.tgz", "resolved": "https://registry.npmjs.org/dom-serializer/-/dom-serializer-2.0.0.tgz",
@@ -2527,6 +2825,14 @@
"node": ">=16.0.0" "node": ">=16.0.0"
} }
}, },
"node_modules/hpagent": {
"version": "1.2.0",
"resolved": "https://registry.npmjs.org/hpagent/-/hpagent-1.2.0.tgz",
"integrity": "sha512-A91dYTeIB6NoXG+PxTQpCCDDnfHsW9kc06Lvpu1TEe9gnd6ZFeiBoRO9JvzEv6xK7EX97/dUE8g/vBMTqTS3CA==",
"engines": {
"node": ">=14"
}
},
"node_modules/htmlparser2": { "node_modules/htmlparser2": {
"version": "10.0.0", "version": "10.0.0",
"resolved": "https://registry.npmjs.org/htmlparser2/-/htmlparser2-10.0.0.tgz", "resolved": "https://registry.npmjs.org/htmlparser2/-/htmlparser2-10.0.0.tgz",
@@ -2754,6 +3060,14 @@
"node": ">= 12" "node": ">= 12"
} }
}, },
"node_modules/ip2location-nodejs": {
"version": "9.7.0",
"resolved": "https://registry.npmjs.org/ip2location-nodejs/-/ip2location-nodejs-9.7.0.tgz",
"integrity": "sha512-eQ4T5TXm1cx0+pQcRycPiuaiRuoDEMd9O89Be7Ugk555qi9UY9enXSznkkqr3kQRyUaXx7zj5dORC5LGTPOttA==",
"dependencies": {
"csv-parser": "^3.0.0"
}
},
"node_modules/ipaddr.js": { "node_modules/ipaddr.js": {
"version": "2.2.0", "version": "2.2.0",
"resolved": "https://registry.npmjs.org/ipaddr.js/-/ipaddr.js-2.2.0.tgz", "resolved": "https://registry.npmjs.org/ipaddr.js/-/ipaddr.js-2.2.0.tgz",
@@ -2882,6 +3196,22 @@
"node": ">=0.10.0" "node": ">=0.10.0"
} }
}, },
"node_modules/isomorphic-ws": {
"version": "5.0.0",
"resolved": "https://registry.npmjs.org/isomorphic-ws/-/isomorphic-ws-5.0.0.tgz",
"integrity": "sha512-muId7Zzn9ywDsyXgTIafTry2sV3nySZeUDe6YedVd1Hvuuep5AsIlqK+XefWpYTyJG5e503F2xIuT2lcU6rCSw==",
"peerDependencies": {
"ws": "*"
}
},
"node_modules/jose": {
"version": "6.1.3",
"resolved": "https://registry.npmjs.org/jose/-/jose-6.1.3.tgz",
"integrity": "sha512-0TpaTfihd4QMNwrz/ob2Bp7X04yuxJkjRGi4aKmOqwhov54i6u79oCv7T+C7lo70MKH6BesI3vscD1yb/yzKXQ==",
"funding": {
"url": "https://github.com/sponsors/panva"
}
},
"node_modules/js-tokens": { "node_modules/js-tokens": {
"version": "4.0.0", "version": "4.0.0",
"resolved": "https://registry.npmjs.org/js-tokens/-/js-tokens-4.0.0.tgz", "resolved": "https://registry.npmjs.org/js-tokens/-/js-tokens-4.0.0.tgz",
@@ -2898,6 +3228,14 @@
"js-yaml": "bin/js-yaml.js" "js-yaml": "bin/js-yaml.js"
} }
}, },
"node_modules/jsep": {
"version": "1.4.0",
"resolved": "https://registry.npmjs.org/jsep/-/jsep-1.4.0.tgz",
"integrity": "sha512-B7qPcEVE3NVkmSJbaYxvv4cHkVW7DQsZz13pUMrfS8z8Q/BuShN+gcTXrUlPiGqM2/t/EEaI030bpxMqY8gMlw==",
"engines": {
"node": ">= 10.16.0"
}
},
"node_modules/json-parse-even-better-errors": { "node_modules/json-parse-even-better-errors": {
"version": "2.3.1", "version": "2.3.1",
"resolved": "https://registry.npmjs.org/json-parse-even-better-errors/-/json-parse-even-better-errors-2.3.1.tgz", "resolved": "https://registry.npmjs.org/json-parse-even-better-errors/-/json-parse-even-better-errors-2.3.1.tgz",
@@ -2919,6 +3257,23 @@
"graceful-fs": "^4.1.6" "graceful-fs": "^4.1.6"
} }
}, },
"node_modules/jsonpath-plus": {
"version": "10.3.0",
"resolved": "https://registry.npmjs.org/jsonpath-plus/-/jsonpath-plus-10.3.0.tgz",
"integrity": "sha512-8TNmfeTCk2Le33A3vRRwtuworG/L5RrgMvdjhKZxvyShO+mBu2fP50OWUjRLNtvw344DdDarFh9buFAZs5ujeA==",
"dependencies": {
"@jsep-plugin/assignment": "^1.3.0",
"@jsep-plugin/regex": "^1.0.4",
"jsep": "^1.4.0"
},
"bin": {
"jsonpath": "bin/jsonpath-cli.js",
"jsonpath-plus": "bin/jsonpath-cli.js"
},
"engines": {
"node": ">=18.0.0"
}
},
"node_modules/jsonwebtoken": { "node_modules/jsonwebtoken": {
"version": "9.0.2", "version": "9.0.2",
"resolved": "https://registry.npmjs.org/jsonwebtoken/-/jsonwebtoken-9.0.2.tgz", "resolved": "https://registry.npmjs.org/jsonwebtoken/-/jsonwebtoken-9.0.2.tgz",
@@ -2993,11 +3348,22 @@
"resolved": "https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz", "resolved": "https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz",
"integrity": "sha512-v2kDEe57lecTulaDIuNTPy3Ry4gLGJ6Z1O3vE1krgXZNrsQ+LFTGHVxVjcXPs17LhbZVGedAJv8XZ1tvj5FvSg==" "integrity": "sha512-v2kDEe57lecTulaDIuNTPy3Ry4gLGJ6Z1O3vE1krgXZNrsQ+LFTGHVxVjcXPs17LhbZVGedAJv8XZ1tvj5FvSg=="
}, },
"node_modules/lodash.clonedeep": {
"version": "4.5.0",
"resolved": "https://registry.npmjs.org/lodash.clonedeep/-/lodash.clonedeep-4.5.0.tgz",
"integrity": "sha512-H5ZhCF25riFd9uB5UCkVKo61m3S/xZk1x4wA6yp/L3RFP6Z/eHH1ymQcGLo7J3GMPfm0V/7m1tryHuGVxpqEBQ=="
},
"node_modules/lodash.defaults": { "node_modules/lodash.defaults": {
"version": "4.2.0", "version": "4.2.0",
"resolved": "https://registry.npmjs.org/lodash.defaults/-/lodash.defaults-4.2.0.tgz", "resolved": "https://registry.npmjs.org/lodash.defaults/-/lodash.defaults-4.2.0.tgz",
"integrity": "sha512-qjxPLHd3r5DnsdGacqOMU6pb/avJzdh9tFX2ymgoZE27BmjXrNy/y4LoaiTeAb+O3gL8AfpJGtqfX/ae2leYYQ==" "integrity": "sha512-qjxPLHd3r5DnsdGacqOMU6pb/avJzdh9tFX2ymgoZE27BmjXrNy/y4LoaiTeAb+O3gL8AfpJGtqfX/ae2leYYQ=="
}, },
"node_modules/lodash.get": {
"version": "4.4.2",
"resolved": "https://registry.npmjs.org/lodash.get/-/lodash.get-4.4.2.tgz",
"integrity": "sha512-z+Uw/vLuy6gQe8cfaFWD7p0wVv8fJl3mbzXh33RS+0oW2wvUqiRXiQ69gLWSLpgB5/6sU+r6BlQR0MBILadqTQ==",
"deprecated": "This package is deprecated. Use the optional chaining (?.) operator instead."
},
"node_modules/lodash.includes": { "node_modules/lodash.includes": {
"version": "4.3.0", "version": "4.3.0",
"resolved": "https://registry.npmjs.org/lodash.includes/-/lodash.includes-4.3.0.tgz", "resolved": "https://registry.npmjs.org/lodash.includes/-/lodash.includes-4.3.0.tgz",
@@ -3013,6 +3379,12 @@
"resolved": "https://registry.npmjs.org/lodash.isboolean/-/lodash.isboolean-3.0.3.tgz", "resolved": "https://registry.npmjs.org/lodash.isboolean/-/lodash.isboolean-3.0.3.tgz",
"integrity": "sha512-Bz5mupy2SVbPHURB98VAcw+aHh4vRV5IPNhILUCsOzRmsTmSQ17jIuqopAentWoehktxGd9e/hbIXq980/1QJg==" "integrity": "sha512-Bz5mupy2SVbPHURB98VAcw+aHh4vRV5IPNhILUCsOzRmsTmSQ17jIuqopAentWoehktxGd9e/hbIXq980/1QJg=="
}, },
"node_modules/lodash.isequal": {
"version": "4.5.0",
"resolved": "https://registry.npmjs.org/lodash.isequal/-/lodash.isequal-4.5.0.tgz",
"integrity": "sha512-pDo3lu8Jhfjqls6GkMgpahsF9kCyayhgykjyLMNFTKWrpVdAQtYyB4muAMWozBB4ig/dtWAmsMxLEI8wuz+DYQ==",
"deprecated": "This package is deprecated. Use require('node:util').isDeepStrictEqual instead."
},
"node_modules/lodash.isinteger": { "node_modules/lodash.isinteger": {
"version": "4.0.4", "version": "4.0.4",
"resolved": "https://registry.npmjs.org/lodash.isinteger/-/lodash.isinteger-4.0.4.tgz", "resolved": "https://registry.npmjs.org/lodash.isinteger/-/lodash.isinteger-4.0.4.tgz",
@@ -3033,6 +3405,11 @@
"resolved": "https://registry.npmjs.org/lodash.isstring/-/lodash.isstring-4.0.1.tgz", "resolved": "https://registry.npmjs.org/lodash.isstring/-/lodash.isstring-4.0.1.tgz",
"integrity": "sha512-0wJxfxH1wgO3GrbuP+dTTk7op+6L41QCXbGINEmD+ny/G/eCqGzxyCsh7159S+mgDDcoarnBw6PC1PS5+wUGgw==" "integrity": "sha512-0wJxfxH1wgO3GrbuP+dTTk7op+6L41QCXbGINEmD+ny/G/eCqGzxyCsh7159S+mgDDcoarnBw6PC1PS5+wUGgw=="
}, },
"node_modules/lodash.mergewith": {
"version": "4.6.2",
"resolved": "https://registry.npmjs.org/lodash.mergewith/-/lodash.mergewith-4.6.2.tgz",
"integrity": "sha512-GK3g5RPZWTRSeLSpgP8Xhra+pnjBC56q9FZYe1d5RN3TJ35dbkGy3YqBSMbyCrlbi+CM9Z3Jk5yTL7RCsqboyQ=="
},
"node_modules/lodash.once": { "node_modules/lodash.once": {
"version": "4.1.1", "version": "4.1.1",
"resolved": "https://registry.npmjs.org/lodash.once/-/lodash.once-4.1.1.tgz", "resolved": "https://registry.npmjs.org/lodash.once/-/lodash.once-4.1.1.tgz",
@@ -3442,6 +3819,14 @@
"url": "https://github.com/fb55/nth-check?sponsor=1" "url": "https://github.com/fb55/nth-check?sponsor=1"
} }
}, },
"node_modules/oauth4webapi": {
"version": "3.8.3",
"resolved": "https://registry.npmjs.org/oauth4webapi/-/oauth4webapi-3.8.3.tgz",
"integrity": "sha512-pQ5BsX3QRTgnt5HxgHwgunIRaDXBdkT23tf8dfzmtTIL2LTpdmxgbpbBm0VgFWAIDlezQvQCTgnVIUmHupXHxw==",
"funding": {
"url": "https://github.com/sponsors/panva"
}
},
"node_modules/object-assign": { "node_modules/object-assign": {
"version": "4.1.1", "version": "4.1.1",
"resolved": "https://registry.npmjs.org/object-assign/-/object-assign-4.1.1.tgz", "resolved": "https://registry.npmjs.org/object-assign/-/object-assign-4.1.1.tgz",
@@ -3480,6 +3865,24 @@
"wrappy": "1" "wrappy": "1"
} }
}, },
"node_modules/openapi-types": {
"version": "12.1.3",
"resolved": "https://registry.npmjs.org/openapi-types/-/openapi-types-12.1.3.tgz",
"integrity": "sha512-N4YtSYJqghVu4iek2ZUvcN/0aqH1kRDuNqzcycDxhOUpg7GdvLa2F3DgS6yBNhInhv2r/6I0Flkn7CqL8+nIcw==",
"peer": true
},
"node_modules/openid-client": {
"version": "6.8.1",
"resolved": "https://registry.npmjs.org/openid-client/-/openid-client-6.8.1.tgz",
"integrity": "sha512-VoYT6enBo6Vj2j3Q5Ec0AezS+9YGzQo1f5Xc42lreMGlfP4ljiXPKVDvCADh+XHCV/bqPu/wWSiCVXbJKvrODw==",
"dependencies": {
"jose": "^6.1.0",
"oauth4webapi": "^3.8.2"
},
"funding": {
"url": "https://github.com/sponsors/panva"
}
},
"node_modules/pac-proxy-agent": { "node_modules/pac-proxy-agent": {
"version": "7.2.0", "version": "7.2.0",
"resolved": "https://registry.npmjs.org/pac-proxy-agent/-/pac-proxy-agent-7.2.0.tgz", "resolved": "https://registry.npmjs.org/pac-proxy-agent/-/pac-proxy-agent-7.2.0.tgz",
@@ -4396,6 +4799,11 @@
"url": "https://github.com/privatenumber/resolve-pkg-maps?sponsor=1" "url": "https://github.com/privatenumber/resolve-pkg-maps?sponsor=1"
} }
}, },
"node_modules/rfc4648": {
"version": "1.5.4",
"resolved": "https://registry.npmjs.org/rfc4648/-/rfc4648-1.5.4.tgz",
"integrity": "sha512-rRg/6Lb+IGfJqO05HZkN50UtY7K/JhxJag1kP23+zyMfrvoB0B7RWv06MbOzoc79RgCdNTiUaNsTT1AJZ7Z+cg=="
},
"node_modules/rimraf": { "node_modules/rimraf": {
"version": "3.0.2", "version": "3.0.2",
"resolved": "https://registry.npmjs.org/rimraf/-/rimraf-3.0.2.tgz", "resolved": "https://registry.npmjs.org/rimraf/-/rimraf-3.0.2.tgz",
@@ -4826,6 +5234,14 @@
"node": ">= 0.8" "node": ">= 0.8"
} }
}, },
"node_modules/stream-buffers": {
"version": "3.0.3",
"resolved": "https://registry.npmjs.org/stream-buffers/-/stream-buffers-3.0.3.tgz",
"integrity": "sha512-pqMqwQCso0PBJt2PQmDO0cFj0lyqmiwOMiMSkVtRokl7e+ZTRYgDHKnuZNbqjiJXgsg4nuqtD/zxuo9KqTp0Yw==",
"engines": {
"node": ">= 0.10.0"
}
},
"node_modules/streamx": { "node_modules/streamx": {
"version": "2.23.0", "version": "2.23.0",
"resolved": "https://registry.npmjs.org/streamx/-/streamx-2.23.0.tgz", "resolved": "https://registry.npmjs.org/streamx/-/streamx-2.23.0.tgz",
@@ -4895,6 +5311,78 @@
} }
] ]
}, },
"node_modules/swagger-jsdoc": {
"version": "6.2.8",
"resolved": "https://registry.npmjs.org/swagger-jsdoc/-/swagger-jsdoc-6.2.8.tgz",
"integrity": "sha512-VPvil1+JRpmJ55CgAtn8DIcpBs0bL5L3q5bVQvF4tAW/k/9JYSj7dCpaYCAv5rufe0vcCbBRQXGvzpkWjvLklQ==",
"dependencies": {
"commander": "6.2.0",
"doctrine": "3.0.0",
"glob": "7.1.6",
"lodash.mergewith": "^4.6.2",
"swagger-parser": "^10.0.3",
"yaml": "2.0.0-1"
},
"bin": {
"swagger-jsdoc": "bin/swagger-jsdoc.js"
},
"engines": {
"node": ">=12.0.0"
}
},
"node_modules/swagger-jsdoc/node_modules/glob": {
"version": "7.1.6",
"resolved": "https://registry.npmjs.org/glob/-/glob-7.1.6.tgz",
"integrity": "sha512-LwaxwyZ72Lk7vZINtNNrywX0ZuLyStrdDtabefZKAY5ZGJhVtgdznluResxNmPitE0SAO+O26sWTHeKSI2wMBA==",
"deprecated": "Glob versions prior to v9 are no longer supported",
"dependencies": {
"fs.realpath": "^1.0.0",
"inflight": "^1.0.4",
"inherits": "2",
"minimatch": "^3.0.4",
"once": "^1.3.0",
"path-is-absolute": "^1.0.0"
},
"engines": {
"node": "*"
},
"funding": {
"url": "https://github.com/sponsors/isaacs"
}
},
"node_modules/swagger-parser": {
"version": "10.0.3",
"resolved": "https://registry.npmjs.org/swagger-parser/-/swagger-parser-10.0.3.tgz",
"integrity": "sha512-nF7oMeL4KypldrQhac8RyHerJeGPD1p2xDh900GPvc+Nk7nWP6jX2FcC7WmkinMoAmoO774+AFXcWsW8gMWEIg==",
"dependencies": {
"@apidevtools/swagger-parser": "10.0.3"
},
"engines": {
"node": ">=10"
}
},
"node_modules/swagger-ui-dist": {
"version": "5.31.0",
"resolved": "https://registry.npmjs.org/swagger-ui-dist/-/swagger-ui-dist-5.31.0.tgz",
"integrity": "sha512-zSUTIck02fSga6rc0RZP3b7J7wgHXwLea8ZjgLA3Vgnb8QeOl3Wou2/j5QkzSGeoz6HusP/coYuJl33aQxQZpg==",
"dependencies": {
"@scarf/scarf": "=1.4.0"
}
},
"node_modules/swagger-ui-express": {
"version": "5.0.1",
"resolved": "https://registry.npmjs.org/swagger-ui-express/-/swagger-ui-express-5.0.1.tgz",
"integrity": "sha512-SrNU3RiBGTLLmFU8GIJdOdanJTl4TOmT27tt3bWWHppqYmAZ6IDuEuBvMU6nZq0zLEe6b/1rACXCgLZqO6ZfrA==",
"dependencies": {
"swagger-ui-dist": ">=5.0.0"
},
"engines": {
"node": ">= v0.10.32"
},
"peerDependencies": {
"express": ">=4.0.0 || >=5.0.0-beta"
}
},
"node_modules/tar": { "node_modules/tar": {
"version": "6.2.1", "version": "6.2.1",
"resolved": "https://registry.npmjs.org/tar/-/tar-6.2.1.tgz", "resolved": "https://registry.npmjs.org/tar/-/tar-6.2.1.tgz",
@@ -5045,8 +5533,7 @@
"node_modules/undici-types": { "node_modules/undici-types": {
"version": "6.21.0", "version": "6.21.0",
"resolved": "https://registry.npmjs.org/undici-types/-/undici-types-6.21.0.tgz", "resolved": "https://registry.npmjs.org/undici-types/-/undici-types-6.21.0.tgz",
"integrity": "sha512-iwDZqg0QAGrg9Rav5H4n0M64c3mkR59cJ6wQp+7C4nI0gsmExaedaYLNO44eT4AtBBwjbTiGPMlt2Md0T9H9JQ==", "integrity": "sha512-iwDZqg0QAGrg9Rav5H4n0M64c3mkR59cJ6wQp+7C4nI0gsmExaedaYLNO44eT4AtBBwjbTiGPMlt2Md0T9H9JQ=="
"devOptional": true
}, },
"node_modules/universalify": { "node_modules/universalify": {
"version": "2.0.1", "version": "2.0.1",
@@ -5069,6 +5556,14 @@
"resolved": "https://registry.npmjs.org/urlpattern-polyfill/-/urlpattern-polyfill-10.0.0.tgz", "resolved": "https://registry.npmjs.org/urlpattern-polyfill/-/urlpattern-polyfill-10.0.0.tgz",
"integrity": "sha512-H/A06tKD7sS1O1X2SshBVeA5FLycRpjqiBeqGKmBwBDBy28EnRjORxTNe269KSSr5un5qyWi1iL61wLxpd+ZOg==" "integrity": "sha512-H/A06tKD7sS1O1X2SshBVeA5FLycRpjqiBeqGKmBwBDBy28EnRjORxTNe269KSSr5un5qyWi1iL61wLxpd+ZOg=="
}, },
"node_modules/user-agents": {
"version": "1.1.669",
"resolved": "https://registry.npmjs.org/user-agents/-/user-agents-1.1.669.tgz",
"integrity": "sha512-pbIzG+AOqCaIpySKJ4IAm1l0VyE4jMnK4y1thV8lm8PYxI+7X5uWcppOK7zY79TCKKTAnJH3/4gaVIZHsjrmJA==",
"dependencies": {
"lodash.clonedeep": "^4.5.0"
}
},
"node_modules/util": { "node_modules/util": {
"version": "0.12.5", "version": "0.12.5",
"resolved": "https://registry.npmjs.org/util/-/util-0.12.5.tgz", "resolved": "https://registry.npmjs.org/util/-/util-0.12.5.tgz",
@@ -5106,6 +5601,14 @@
"uuid": "dist/bin/uuid" "uuid": "dist/bin/uuid"
} }
}, },
"node_modules/validator": {
"version": "13.15.23",
"resolved": "https://registry.npmjs.org/validator/-/validator-13.15.23.tgz",
"integrity": "sha512-4yoz1kEWqUjzi5zsPbAS/903QXSYp0UOtHsPpp7p9rHAw/W+dkInskAE386Fat3oKRROwO98d9ZB0G4cObgUyw==",
"engines": {
"node": ">= 0.10"
}
},
"node_modules/vary": { "node_modules/vary": {
"version": "1.1.2", "version": "1.1.2",
"resolved": "https://registry.npmjs.org/vary/-/vary-1.1.2.tgz", "resolved": "https://registry.npmjs.org/vary/-/vary-1.1.2.tgz",
@@ -5284,6 +5787,14 @@
"resolved": "https://registry.npmjs.org/yallist/-/yallist-4.0.0.tgz", "resolved": "https://registry.npmjs.org/yallist/-/yallist-4.0.0.tgz",
"integrity": "sha512-3wdGidZyq5PB084XLES5TpOSRA3wjXAlIWMhum2kRcv/41Sn2emQ0dycQW4uZXLejwKvg6EsvbdlVL+FYEct7A==" "integrity": "sha512-3wdGidZyq5PB084XLES5TpOSRA3wjXAlIWMhum2kRcv/41Sn2emQ0dycQW4uZXLejwKvg6EsvbdlVL+FYEct7A=="
}, },
"node_modules/yaml": {
"version": "2.0.0-1",
"resolved": "https://registry.npmjs.org/yaml/-/yaml-2.0.0-1.tgz",
"integrity": "sha512-W7h5dEhywMKenDJh2iX/LABkbFnBxasD27oyXWDS/feDsxiw0dD5ncXdYXgkvAsXIY2MpW/ZKkr9IU30DBdMNQ==",
"engines": {
"node": ">= 6"
}
},
"node_modules/yargs": { "node_modules/yargs": {
"version": "17.7.2", "version": "17.7.2",
"resolved": "https://registry.npmjs.org/yargs/-/yargs-17.7.2.tgz", "resolved": "https://registry.npmjs.org/yargs/-/yargs-17.7.2.tgz",
@@ -5318,6 +5829,34 @@
"fd-slicer": "~1.1.0" "fd-slicer": "~1.1.0"
} }
}, },
"node_modules/z-schema": {
"version": "5.0.5",
"resolved": "https://registry.npmjs.org/z-schema/-/z-schema-5.0.5.tgz",
"integrity": "sha512-D7eujBWkLa3p2sIpJA0d1pr7es+a7m0vFAnZLlCEKq/Ij2k0MLi9Br2UPxoxdYystm5K1yeBGzub0FlYUEWj2Q==",
"dependencies": {
"lodash.get": "^4.4.2",
"lodash.isequal": "^4.5.0",
"validator": "^13.7.0"
},
"bin": {
"z-schema": "bin/z-schema"
},
"engines": {
"node": ">=8.0.0"
},
"optionalDependencies": {
"commander": "^9.4.1"
}
},
"node_modules/z-schema/node_modules/commander": {
"version": "9.5.0",
"resolved": "https://registry.npmjs.org/commander/-/commander-9.5.0.tgz",
"integrity": "sha512-KRs7WVDKg86PWiuAqhDrAQnTXZKraVcCc6vFdL14qrZ/DcWwuRo7VoiYXalXO7S5GKpqYiVEwCbgFDfxNHKJBQ==",
"optional": true,
"engines": {
"node": "^12.20.0 || >=14"
}
},
"node_modules/zod": { "node_modules/zod": {
"version": "3.25.76", "version": "3.25.76",
"resolved": "https://registry.npmjs.org/zod/-/zod-3.25.76.tgz", "resolved": "https://registry.npmjs.org/zod/-/zod-3.25.76.tgz",

View File

@@ -1,6 +1,6 @@
{ {
"name": "dutchie-menus-backend", "name": "dutchie-menus-backend",
"version": "1.5.1", "version": "1.6.0",
"description": "Backend API for Dutchie Menus scraper and management", "description": "Backend API for Dutchie Menus scraper and management",
"main": "dist/index.js", "main": "dist/index.js",
"scripts": { "scripts": {
@@ -22,6 +22,7 @@
"seed:dt:cities:bulk": "tsx src/scripts/seed-dt-cities-bulk.ts" "seed:dt:cities:bulk": "tsx src/scripts/seed-dt-cities-bulk.ts"
}, },
"dependencies": { "dependencies": {
"@kubernetes/client-node": "^1.4.0",
"@types/bcryptjs": "^3.0.0", "@types/bcryptjs": "^3.0.0",
"axios": "^1.6.2", "axios": "^1.6.2",
"bcrypt": "^5.1.1", "bcrypt": "^5.1.1",
@@ -35,6 +36,7 @@
"helmet": "^7.1.0", "helmet": "^7.1.0",
"https-proxy-agent": "^7.0.2", "https-proxy-agent": "^7.0.2",
"ioredis": "^5.8.2", "ioredis": "^5.8.2",
"ip2location-nodejs": "^9.7.0",
"ipaddr.js": "^2.2.0", "ipaddr.js": "^2.2.0",
"jsonwebtoken": "^9.0.2", "jsonwebtoken": "^9.0.2",
"minio": "^7.1.3", "minio": "^7.1.3",
@@ -47,6 +49,9 @@
"puppeteer-extra-plugin-stealth": "^2.11.2", "puppeteer-extra-plugin-stealth": "^2.11.2",
"sharp": "^0.32.0", "sharp": "^0.32.0",
"socks-proxy-agent": "^8.0.2", "socks-proxy-agent": "^8.0.2",
"swagger-jsdoc": "^6.2.8",
"swagger-ui-express": "^5.0.1",
"user-agents": "^1.1.669",
"uuid": "^9.0.1", "uuid": "^9.0.1",
"zod": "^3.22.4" "zod": "^3.22.4"
}, },
@@ -58,6 +63,8 @@
"@types/node": "^20.10.5", "@types/node": "^20.10.5",
"@types/node-cron": "^3.0.11", "@types/node-cron": "^3.0.11",
"@types/pg": "^8.15.6", "@types/pg": "^8.15.6",
"@types/swagger-jsdoc": "^6.0.4",
"@types/swagger-ui-express": "^4.1.8",
"@types/uuid": "^9.0.7", "@types/uuid": "^9.0.7",
"tsx": "^4.7.0", "tsx": "^4.7.0",
"typescript": "^5.3.3" "typescript": "^5.3.3"

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@@ -0,0 +1 @@
cannaiq-menus-1.7.0.zip

View File

@@ -0,0 +1,130 @@
/**
* Count Jane stores - v2: Try Algolia store search
* Usage: npx ts-node scripts/count-jane-stores-v2.ts
*/
import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
puppeteer.use(StealthPlugin());
const STATES = [
'AZ', 'CA', 'CO', 'FL', 'IL', 'MA', 'MI', 'NV', 'NJ', 'NY', 'OH', 'PA', 'WA', 'OR'
];
async function main() {
console.log('Counting Jane stores by exploring state pages...\n');
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox'],
});
const page = await browser.newPage();
const allStores: Map<number, any> = new Map();
await page.setRequestInterception(true);
page.on('request', (req) => {
const type = req.resourceType();
if (['image', 'font', 'media', 'stylesheet'].includes(type)) {
req.abort();
} else {
req.continue();
}
});
page.on('response', async (response) => {
const url = response.url();
const contentType = response.headers()['content-type'] || '';
if (url.includes('iheartjane.com') && contentType.includes('json')) {
try {
const json = await response.json();
// Look for stores in any response
if (json.stores && Array.isArray(json.stores)) {
for (const s of json.stores) {
if (s.id) allStores.set(s.id, s);
}
}
// Also check hits (Algolia format)
if (json.hits && Array.isArray(json.hits)) {
for (const s of json.hits) {
if (s.id) allStores.set(s.id, s);
}
}
} catch {}
}
});
// First visit the main stores page
console.log('Visiting main stores page...');
await page.goto('https://www.iheartjane.com/stores', {
waitUntil: 'networkidle0',
timeout: 60000,
});
await new Promise(r => setTimeout(r, 3000));
// Try to scroll to load more stores
console.log('Scrolling to load more...');
for (let i = 0; i < 5; i++) {
await page.evaluate(() => window.scrollBy(0, 1000));
await new Promise(r => setTimeout(r, 1000));
}
// Try clicking "Load More" if it exists
try {
const loadMore = await page.$('button:has-text("Load More"), [class*="load-more"]');
if (loadMore) {
console.log('Clicking Load More...');
await loadMore.click();
await new Promise(r => setTimeout(r, 3000));
}
} catch {}
// Extract stores from DOM as fallback
const domStores = await page.evaluate(() => {
const storeElements = document.querySelectorAll('[data-store-id], [class*="StoreCard"], [class*="store-card"]');
return storeElements.length;
});
console.log(`\nStores from DOM elements: ${domStores}`);
await browser.close();
// Count by state
const byState: Record<string, number> = {};
for (const store of allStores.values()) {
const state = store.state || 'Unknown';
byState[state] = (byState[state] || 0) + 1;
}
console.log('\n=== JANE STORE COUNTS ===\n');
console.log(`Unique stores captured: ${allStores.size}`);
if (allStores.size > 0) {
console.log('\nBy State:');
const sorted = Object.entries(byState).sort((a, b) => b[1] - a[1]);
for (const [state, count] of sorted.slice(0, 20)) {
console.log(` ${state}: ${count}`);
}
// Check Arizona specifically
const azStores = Array.from(allStores.values()).filter(s =>
s.state === 'Arizona' || s.state === 'AZ'
);
console.log(`\nArizona stores: ${azStores.length}`);
if (azStores.length > 0) {
console.log('AZ stores:');
for (const s of azStores.slice(0, 10)) {
console.log(` - ${s.name} (ID: ${s.id}) - ${s.city}`);
}
}
}
// Note about total
console.log('\n--- Note ---');
console.log('Jane uses server-side rendering. To get full store count,');
console.log('you may need to check their public marketing materials or');
console.log('iterate through known store IDs.');
}
main().catch(console.error);

View File

@@ -0,0 +1,98 @@
/**
* Count Jane stores by state
* Usage: npx ts-node scripts/count-jane-stores.ts
*/
import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
puppeteer.use(StealthPlugin());
async function main() {
console.log('Counting Jane stores...\n');
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox'],
});
const page = await browser.newPage();
// Capture store data from API
const stores: any[] = [];
await page.setRequestInterception(true);
page.on('request', (req) => {
const type = req.resourceType();
if (['image', 'font', 'media', 'stylesheet'].includes(type)) {
req.abort();
} else {
req.continue();
}
});
page.on('response', async (response) => {
const url = response.url();
if (url.includes('iheartjane.com') && url.includes('stores')) {
try {
const json = await response.json();
if (json.stores && Array.isArray(json.stores)) {
stores.push(...json.stores);
}
} catch {}
}
});
// Visit the store directory
console.log('Loading Jane store directory...');
await page.goto('https://www.iheartjane.com/stores', {
waitUntil: 'networkidle2',
timeout: 60000,
});
// Wait for stores to load
await new Promise(r => setTimeout(r, 5000));
// Also try to get store count from page content
const pageStoreCount = await page.evaluate(() => {
// Look for store count in page text
const text = document.body.innerText;
const match = text.match(/(\d+)\s*stores?/i);
return match ? parseInt(match[1]) : null;
});
await browser.close();
// Count by state
const byState: Record<string, number> = {};
for (const store of stores) {
const state = store.state || 'Unknown';
byState[state] = (byState[state] || 0) + 1;
}
console.log('\n=== JANE STORE COUNTS ===\n');
console.log(`Total stores captured from API: ${stores.length}`);
if (pageStoreCount) {
console.log(`Page claims: ${pageStoreCount} stores`);
}
console.log('\nBy State:');
const sorted = Object.entries(byState).sort((a, b) => b[1] - a[1]);
for (const [state, count] of sorted) {
console.log(` ${state}: ${count}`);
}
// Check Arizona specifically
const azStores = stores.filter(s =>
s.state === 'Arizona' || s.state === 'AZ'
);
console.log(`\nArizona stores: ${azStores.length}`);
if (azStores.length > 0) {
console.log('Sample AZ stores:');
for (const s of azStores.slice(0, 5)) {
console.log(` - ${s.name} (ID: ${s.id}) - ${s.city}`);
}
}
}
main().catch(console.error);

View File

@@ -0,0 +1,65 @@
#!/bin/bash
# Download IP2Location LITE DB3 (City-level) database
# Free for commercial use with attribution
# https://lite.ip2location.com/database/db3-ip-country-region-city
set -e
DATA_DIR="${1:-./data/ip2location}"
DB_FILE="IP2LOCATION-LITE-DB3.BIN"
mkdir -p "$DATA_DIR"
cd "$DATA_DIR"
echo "Downloading IP2Location LITE DB3 database..."
# IP2Location LITE DB3 - includes city, region, country, lat/lng
# You need to register at https://lite.ip2location.com/ to get a download token
# Then set IP2LOCATION_TOKEN environment variable
if [ -z "$IP2LOCATION_TOKEN" ]; then
echo ""
echo "ERROR: IP2LOCATION_TOKEN not set"
echo ""
echo "To download the database:"
echo "1. Register free at https://lite.ip2location.com/"
echo "2. Get your download token from the dashboard"
echo "3. Run: IP2LOCATION_TOKEN=your_token ./scripts/download-ip2location.sh"
echo ""
exit 1
fi
# Download DB3.LITE (IPv4 + City)
DOWNLOAD_URL="https://www.ip2location.com/download/?token=${IP2LOCATION_TOKEN}&file=DB3LITEBIN"
echo "Downloading from IP2Location..."
curl -L -o ip2location.zip "$DOWNLOAD_URL"
echo "Extracting..."
unzip -o ip2location.zip
# Rename to standard name
if [ -f "IP2LOCATION-LITE-DB3.BIN" ]; then
echo "Database ready: $DATA_DIR/IP2LOCATION-LITE-DB3.BIN"
elif [ -f "IP-COUNTRY-REGION-CITY.BIN" ]; then
mv "IP-COUNTRY-REGION-CITY.BIN" "$DB_FILE"
echo "Database ready: $DATA_DIR/$DB_FILE"
else
# Find whatever BIN file was extracted
BIN_FILE=$(ls *.BIN 2>/dev/null | head -1)
if [ -n "$BIN_FILE" ]; then
mv "$BIN_FILE" "$DB_FILE"
echo "Database ready: $DATA_DIR/$DB_FILE"
else
echo "ERROR: No BIN file found in archive"
ls -la
exit 1
fi
fi
# Cleanup
rm -f ip2location.zip *.txt LICENSE* README*
echo ""
echo "Done! Database saved to: $DATA_DIR/$DB_FILE"
echo "Update monthly by re-running this script."

View File

@@ -0,0 +1,184 @@
/**
* Explore all Treez page URLs to find the full product catalog
*/
import puppeteer, { Page } from 'puppeteer';
const STORE_ID = 'best';
async function sleep(ms: number): Promise<void> {
return new Promise(resolve => setTimeout(resolve, ms));
}
async function bypassAgeGate(page: Page): Promise<void> {
const ageGate = await page.$('[data-testid="age-gate-modal"]');
if (ageGate) {
console.log(' Age gate detected, bypassing...');
const btn = await page.$('[data-testid="age-gate-submit-button"]');
if (btn) await btn.click();
await sleep(2000);
}
}
async function countProducts(page: Page): Promise<number> {
return page.evaluate(() =>
document.querySelectorAll('[class*="product_product__"]').length
);
}
async function scrollAndCount(page: Page, maxScrolls: number = 30): Promise<{ products: number; scrolls: number }> {
let previousHeight = 0;
let scrollCount = 0;
let sameHeightCount = 0;
while (scrollCount < maxScrolls) {
const currentHeight = await page.evaluate(() => document.body.scrollHeight);
if (currentHeight === previousHeight) {
sameHeightCount++;
if (sameHeightCount >= 3) break;
} else {
sameHeightCount = 0;
}
await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
await sleep(1500);
previousHeight = currentHeight;
scrollCount++;
}
const products = await countProducts(page);
return { products, scrolls: scrollCount };
}
async function testUrl(page: Page, path: string): Promise<{ products: number; scrolls: number; error?: string }> {
const url = `https://${STORE_ID}.treez.io${path}`;
console.log(`\nTesting: ${url}`);
try {
await page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 });
await sleep(2000);
await bypassAgeGate(page);
await sleep(1000);
const initialCount = await countProducts(page);
console.log(` Initial products: ${initialCount}`);
if (initialCount > 0) {
const result = await scrollAndCount(page);
console.log(` After scroll: ${result.products} products (${result.scrolls} scrolls)`);
return result;
}
// Check for brand/category cards instead
const cardCount = await page.evaluate(() => {
const selectors = [
'[class*="brand"]',
'[class*="Brand"]',
'[class*="category"]',
'[class*="Category"]',
'[class*="card"]',
'a[href*="/brand/"]',
'a[href*="/category/"]',
];
let count = 0;
selectors.forEach(sel => {
count += document.querySelectorAll(sel).length;
});
return count;
});
console.log(` Cards/links found: ${cardCount}`);
return { products: initialCount, scrolls: 0 };
} catch (error: any) {
console.log(` Error: ${error.message}`);
return { products: 0, scrolls: 0, error: error.message };
}
}
async function main() {
console.log('='.repeat(60));
console.log('Exploring Treez Page URLs');
console.log('='.repeat(60));
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox'],
});
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });
// Block images to speed up
await page.setRequestInterception(true);
page.on('request', (req) => {
if (['image', 'font', 'media', 'stylesheet'].includes(req.resourceType())) {
req.abort();
} else {
req.continue();
}
});
const urlsToTest = [
'/onlinemenu/?customerType=ADULT', // Homepage
'/onlinemenu/brands?customerType=ADULT', // Brands page
'/onlinemenu/shop?customerType=ADULT', // Shop page?
'/onlinemenu/products?customerType=ADULT', // Products page?
'/onlinemenu/menu?customerType=ADULT', // Menu page?
'/onlinemenu/all?customerType=ADULT', // All products?
'/onlinemenu/flower?customerType=ADULT', // Flower category
'/onlinemenu/vapes?customerType=ADULT', // Vapes category
'/onlinemenu/edibles?customerType=ADULT', // Edibles category
'/onlinemenu/concentrates?customerType=ADULT', // Concentrates category
];
const results: { path: string; products: number; scrolls: number }[] = [];
for (const path of urlsToTest) {
const result = await testUrl(page, path);
results.push({ path, ...result });
}
// Look for navigation links on the main page
console.log('\n' + '='.repeat(60));
console.log('Checking navigation structure on homepage...');
console.log('='.repeat(60));
await page.goto(`https://${STORE_ID}.treez.io/onlinemenu/?customerType=ADULT`, {
waitUntil: 'networkidle2',
timeout: 30000,
});
await sleep(2000);
await bypassAgeGate(page);
await sleep(1000);
const navLinks = await page.evaluate(() => {
const links: { text: string; href: string }[] = [];
document.querySelectorAll('a[href*="/onlinemenu/"]').forEach(el => {
const text = el.textContent?.trim() || '';
const href = el.getAttribute('href') || '';
if (text && !links.some(l => l.href === href)) {
links.push({ text: text.slice(0, 50), href });
}
});
return links;
});
console.log('\nNavigation links found:');
navLinks.forEach(l => console.log(` "${l.text}" → ${l.href}`));
// Summary
console.log('\n' + '='.repeat(60));
console.log('Summary');
console.log('='.repeat(60));
results.sort((a, b) => b.products - a.products);
results.forEach(r => {
console.log(`${r.products.toString().padStart(4)} products | ${r.path}`);
});
await browser.close();
}
main().catch(console.error);

View File

@@ -0,0 +1,247 @@
/**
* Explore Treez site structure to find full product catalog
*
* Usage: npx ts-node scripts/explore-treez-structure.ts
*/
import puppeteer from 'puppeteer';
const STORE_ID = 'best';
async function sleep(ms: number): Promise<void> {
return new Promise(resolve => setTimeout(resolve, ms));
}
async function main() {
console.log('='.repeat(60));
console.log('Exploring Treez Site Structure');
console.log('='.repeat(60));
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox'],
});
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });
try {
// Navigate to base menu URL
const baseUrl = `https://${STORE_ID}.treez.io/onlinemenu/?customerType=ADULT`;
console.log(`\n[1] Navigating to: ${baseUrl}`);
await page.goto(baseUrl, { waitUntil: 'networkidle2', timeout: 60000 });
await sleep(3000);
// Bypass age gate if present
const ageGate = await page.$('[data-testid="age-gate-modal"]');
if (ageGate) {
console.log('[1] Age gate detected, bypassing...');
const btn = await page.$('[data-testid="age-gate-submit-button"]');
if (btn) await btn.click();
await sleep(2000);
}
// Get all navigation links
console.log('\n[2] Extracting navigation structure...');
const navInfo = await page.evaluate(() => {
const links: { text: string; href: string }[] = [];
// Look for nav links
document.querySelectorAll('nav a, [class*="nav"] a, [class*="menu"] a, header a').forEach(el => {
const text = el.textContent?.trim() || '';
const href = el.getAttribute('href') || '';
if (text && href && !links.some(l => l.href === href)) {
links.push({ text, href });
}
});
// Look for category tabs/buttons
document.querySelectorAll('[class*="category"], [class*="tab"], [role="tab"]').forEach(el => {
const text = el.textContent?.trim() || '';
const href = el.getAttribute('href') || el.getAttribute('data-href') || '';
if (text && !links.some(l => l.text === text)) {
links.push({ text, href: href || `(click: ${el.className})` });
}
});
// Get current URL
const currentUrl = window.location.href;
// Count products on page
const productCount = document.querySelectorAll('[class*="product_product__"]').length;
return { links, currentUrl, productCount };
});
console.log(`Current URL: ${navInfo.currentUrl}`);
console.log(`Products on homepage: ${navInfo.productCount}`);
console.log('\nNavigation links found:');
navInfo.links.forEach(l => {
console.log(` "${l.text}" → ${l.href}`);
});
// Look for category buttons/tabs specifically
console.log('\n[3] Looking for category navigation...');
const categories = await page.evaluate(() => {
const cats: { text: string; className: string; tagName: string }[] = [];
// Find all clickable elements that might be categories
const selectors = [
'[class*="CategoryNav"]',
'[class*="category"]',
'[class*="Category"]',
'[class*="nav"] button',
'[class*="tab"]',
'[role="tablist"] *',
'.MuiTab-root',
'[class*="filter"]',
];
selectors.forEach(sel => {
document.querySelectorAll(sel).forEach(el => {
const text = el.textContent?.trim() || '';
if (text && text.length < 50 && !cats.some(c => c.text === text)) {
cats.push({
text,
className: el.className?.toString().slice(0, 80) || '',
tagName: el.tagName,
});
}
});
});
return cats;
});
console.log('Category-like elements:');
categories.forEach(c => {
console.log(` [${c.tagName}] "${c.text}" (class: ${c.className})`);
});
// Try clicking on "Flower" or "All" if found
console.log('\n[4] Looking for "Flower" or "All Products" link...');
const clickTargets = ['Flower', 'All', 'All Products', 'Shop All', 'View All'];
for (const target of clickTargets) {
const element = await page.evaluate((targetText) => {
const els = Array.from(document.querySelectorAll('a, button, [role="tab"], [class*="category"]'));
const match = els.find(el =>
el.textContent?.trim().toLowerCase() === targetText.toLowerCase()
);
if (match) {
return {
found: true,
text: match.textContent?.trim(),
tag: match.tagName,
};
}
return { found: false };
}, target);
if (element.found) {
console.log(`Found "${element.text}" (${element.tag}), clicking...`);
await page.evaluate((targetText) => {
const els = Array.from(document.querySelectorAll('a, button, [role="tab"], [class*="category"]'));
const match = els.find(el =>
el.textContent?.trim().toLowerCase() === targetText.toLowerCase()
);
if (match) (match as HTMLElement).click();
}, target);
await sleep(3000);
const newUrl = page.url();
const newCount = await page.evaluate(() =>
document.querySelectorAll('[class*="product_product__"]').length
);
console.log(` New URL: ${newUrl}`);
console.log(` Products after click: ${newCount}`);
if (newCount > navInfo.productCount) {
console.log(` ✓ Found more products! (${navInfo.productCount}${newCount})`);
}
break;
}
}
// Check page height and scroll behavior
console.log('\n[5] Checking scroll behavior on current page...');
let previousHeight = 0;
let scrollCount = 0;
let previousProductCount = await page.evaluate(() =>
document.querySelectorAll('[class*="product_product__"]').length
);
while (scrollCount < 10) {
const currentHeight = await page.evaluate(() => document.body.scrollHeight);
if (currentHeight === previousHeight) {
console.log(` Scroll ${scrollCount + 1}: No height change, stopping`);
break;
}
await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
await sleep(1500);
const currentProductCount = await page.evaluate(() =>
document.querySelectorAll('[class*="product_product__"]').length
);
console.log(` Scroll ${scrollCount + 1}: height=${currentHeight}, products=${currentProductCount}`);
if (currentProductCount === previousProductCount && scrollCount > 2) {
console.log(' No new products loading, stopping');
break;
}
previousHeight = currentHeight;
previousProductCount = currentProductCount;
scrollCount++;
}
// Try direct URL patterns
console.log('\n[6] Testing URL patterns...');
const urlPatterns = [
'/onlinemenu/flower?customerType=ADULT',
'/onlinemenu/all?customerType=ADULT',
'/onlinemenu?category=flower&customerType=ADULT',
'/onlinemenu?view=all&customerType=ADULT',
];
for (const pattern of urlPatterns) {
const testUrl = `https://${STORE_ID}.treez.io${pattern}`;
console.log(`\nTrying: ${testUrl}`);
await page.goto(testUrl, { waitUntil: 'networkidle2', timeout: 30000 });
await sleep(2000);
// Bypass age gate again if needed
const gate = await page.$('[data-testid="age-gate-modal"]');
if (gate) {
const btn = await page.$('[data-testid="age-gate-submit-button"]');
if (btn) await btn.click();
await sleep(2000);
}
const productCount = await page.evaluate(() =>
document.querySelectorAll('[class*="product_product__"]').length
);
console.log(` Products found: ${productCount}`);
}
// Screenshot the final state
await page.screenshot({ path: '/tmp/treez-explore.png', fullPage: true });
console.log('\n[7] Screenshot saved to /tmp/treez-explore.png');
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await browser.close();
}
}
main().catch(console.error);

Some files were not shown because too many files have changed in this diff Show More