Commit Graph

390 Commits

Author SHA1 Message Date
Kelly
c7541ec2eb chore: Rename all references from dispensary-scraper to cannaiq 2025-12-15 07:19:33 -07:00
Kelly
8676762d6b chore: trigger CI build 2025-12-15 06:51:58 -07:00
Kelly
3f393ef77f fix: Correct repo name in auto-merge URL 2025-12-15 06:42:16 -07:00
Kelly
a8360c7260 feat: Migrate to spdy.io infrastructure
- Namespace: dispensary-scraper → cannaiq
- Registry: code.cannabrands.app → git.spdy.io
- Database: External PostgreSQL at 10.100.6.50
- MinIO: Internal at 10.100.9.80:9000
- CI: ci.spdy.io

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 06:40:48 -07:00
Kelly
0979c9c37a Revert "feat(scheduler): Support sub-hour interval_minutes in task_schedules"
This reverts commit b607fd7f44.
2025-12-14 18:50:25 -07:00
Kelly
b607fd7f44 feat(scheduler): Support sub-hour interval_minutes in task_schedules
- Add interval_minutes column to TaskSchedule interface
- Prefer interval_minutes over interval_hours when calculating next_run_at
- Add jitter (0-20% of interval) for sub-hour schedules to prevent detection
- Update getSchedules() to include interval_minutes and dispensary_name
- Update updateSchedule() to allow setting interval_minutes
- Add migration 121 for interval_minutes column

Part of Real-Time Inventory Tracking feature.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 18:22:55 -07:00
Kelly
bf988529eb fix(ci): switch from buildx to regular docker plugin
BuildKit container driver has sysctl permission issues in LXC.
Using plugins/docker instead of woodpeckerci/plugin-docker-buildx.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 17:05:06 -07:00
Kelly
04153a2efa chore: retry CI after docker update 2025-12-14 17:01:05 -07:00
Kelly
a1a6876064 chore: retry CI after docker restart 2025-12-14 16:55:33 -07:00
Kelly
83466a03c3 chore: retry CI build 2025-12-14 16:40:52 -07:00
Kelly
35d6a17740 feat: Add daily baseline payload logic (12:01 AM - 3:00 AM window)
- Replace saveRawPayload with saveDailyBaseline in all handlers
- Full payloads only saved once per day per store during window
- Inventory snapshots still saved every crawl (lightweight tracking)
- Add last_baseline_at column to dispensaries table
- Show baseline status in Per-Store Schedules dashboard
- Display baseline window info (12:01 AM - 3:00 AM) in UI

Reduces storage ~95% for high-frequency stores while maintaining
full audit capability via daily baselines.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 16:24:41 -07:00
Kelly
294d3db7a2 fix: Remove NOW() from partial indexes (not immutable)
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 15:58:05 -07:00
Kelly
bbbd21ba94 chore: Ignore test scripts and .claude directory 2025-12-14 15:57:27 -07:00
Kelly
3496be3064 feat(treez): Fetch all products with match_all query (+19% more)
- Update buildProductQuery() to use match_all by default
- Captures hidden, below-threshold, and out-of-stock products
- Add extractPrimaryImage() and extractImages() to normalizer
- Add product_refresh_treez handler for platform-specific refresh
- Add product_refresh_treez to TaskRole type

Best Dispensary: 228 → 271 products (+43)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 15:56:06 -07:00
Kelly
af859a85f9 feat: Add Real-Time Inventory Tracking infrastructure
Implements per-store high-frequency crawl scheduling and inventory
snapshot tracking for sales velocity estimation (Hoodie Analytics parity).

Database migrations:
- 117: Per-store crawl_interval_minutes and next_crawl_at columns
- 118: inventory_snapshots table (30-day retention)
- 119: product_visibility_events table for OOS/brand alerts (90-day)

Backend changes:
- inventory-snapshots.ts: Shared utility normalizing Dutchie/Jane/Treez
- visibility-events.ts: Detects OOS, price changes, brand drops
- task-scheduler.ts: checkHighFrequencyStores() runs every 60s
- Handler updates: 2-line additions to save snapshots/events

API endpoints:
- GET /api/tasks/schedules/high-frequency
- PUT /api/tasks/schedules/high-frequency/:id
- DELETE /api/tasks/schedules/high-frequency/:id

Frontend:
- TasksDashboard: Per-Store Schedules section with stats

Features:
- Per-store intervals (15/30/60 min configurable)
- Jitter (0-20%) to avoid detection patterns
- Cross-platform support (Dutchie, Jane, Treez)
- No crawler core changes - scheduling/post-crawl only

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 15:53:04 -07:00
Kelly
d3f5e4ef4b feat(nav): Add Payloads menu item to admin sidebar
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 14:26:37 -07:00
Kelly
abef265ae9 feat(workers): Add platform badge (D/J/T) to active tasks display
- Add PlatformBadge component showing D=Dutchie, J=Jane, T=Treez
- Include platform field in worker-registry API response
- Fix null running_seconds displaying as "nulls"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 12:21:23 -07:00
Kelly
b28a91fca5 fix: Task completion result and null duration display bugs
1. task-worker.ts: Pass full result object to completeTask instead of
   non-existent result.data property (was causing {} to be stored)

2. WorkersDashboard.tsx: Handle null running_seconds in formatSecondsToTime
   (was displaying "nulls" due to JS type coercion)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 12:02:05 -07:00
Kelly
60b221e7fb feat: Add payloads dashboard, disable snapshots, fix scheduler
Frontend:
- Add PayloadsDashboard page with search, filter, view, and diff
- Update TasksDashboard default sort: pending → claimed → completed
- Add payload API methods to api.ts

Backend:
- Disable snapshot creation in product-refresh handler
- Remove product_refresh from schedule role options
- Disable compression in payload-storage (plain JSON for debugging)
- Fix task-scheduler: map 'embedded' menu_type to 'dutchie' platform
- Fix task-scheduler: use schedule.interval_hours as skipRecentHours

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 11:54:25 -07:00
Kelly
15cb657f13 fix(docker): Revert to libasound2 for Debian bookworm
- libasound2t64 is for Debian trixie (13), not bookworm (12)
- Keep build tools fix for native modules

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 11:07:03 -07:00
Kelly
f15920e508 fix(docker): Add build tools for native modules and fix Debian package name
- Add python3 and build-essential to builder stage for bcrypt/sharp compilation
- Change libasound2 to libasound2t64 for Debian bookworm compatibility
- Copy pre-built node_modules from builder instead of re-running npm install
- Prune dev dependencies in builder for smaller production image

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 11:01:22 -07:00
Kelly
9518ca48a5 feat(tasks): Task tracking, IP-per-store, and schedule edit fixes
- Add task completion verification with DB and output layers
- Add reconciliation loop to sync worker memory with DB state
- Implement IP-per-store-per-platform conflict detection
- Add task ID hash to MinIO payload filenames for traceability
- Fix schedule edit modal with dispensary info in API responses
- Add task ID display after dispensary name in worker dashboard
- Add migrations for proxy_ip and source tracking columns

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 10:49:21 -07:00
Kelly
3e9667571f fix(ui): Restore round worker slot circles with hover tooltips 2025-12-14 09:58:24 -07:00
Kelly
8f6efd377b fix(ui): Remove 'test' from fingerprint tooltip 2025-12-14 03:40:17 -07:00
Kelly
83e9718d78 fix(ui): Worker slot preflight checklist and fingerprint hover
- Fix fingerprint tooltip to use actual API field names (browserName, deviceCategory, detectedTimezone)
- Show real preflight steps: HTTP Preflight, Geo Session, Pool Ready
- Checkmarks appear as each step completes, spinners while in progress

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 03:38:46 -07:00
Kelly
f5cb17e1d4 feat(dutchie): Full payload with specials and all product statuses
- Set includeEnterpriseSpecials: true to get BOGO/sale deal names
- Set Status: 'All' to capture both Active and Inactive (sold out) products
- Make schedules query backward-compatible for missing pool_id column

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 03:35:25 -07:00
Kelly
f48a503e82 fix(tasks): Filter out disabled dispensaries in createStaggeredTasks
Tasks were being created for dispensaries with crawl_enabled=false
(duplicates, deprecated stores). Added EXISTS check to filter only
crawl_enabled=true stores before creating tasks.

This prevents errors like:
"Dispensary 207 not found or not crawl_enabled"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 03:07:03 -07:00
Kelly
e7b392141a feat(ui): Task pool toggle, sortable columns, worker slot visualization
Tasks Dashboard:
- Add clickable Pool Open/Paused toggle button in header
- Add sortable columns (ID, Role, Store, Status, Worker, Duration, Created)
- Show menu_type and pool badges under Store column
- Add Pool column to Schedules table
- Filter stores by platform in Create Task modal

Workers Dashboard:
- Redesign pod visualization to show 3 worker slots per pod
- Each slot shows preflight checklist (Overload? Terminating? Pool Query?)
- Once qualified, shows City/State, Proxy IP, Antidetect status
- Hover shows full fingerprint data (browser, platform, bot detection)

Backend:
- Add menu_type to listTasks query
- Add pool_id/pool_name to schedules query with task_pools JOIN
- Migration 114: Add pool_id column to task_schedules table

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 03:00:19 -07:00
Kelly
15a5a4239e fix(tasks): Make pool JOIN defensive when table doesn't exist
Auto-migrate fails early, so task_pools may not exist yet.
Check table existence before including pool columns/joins.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 02:29:07 -07:00
Kelly
20d7534b93 fix(ci): prefix docker tags with sha- to prevent scientific notation parsing
Git SHAs like 1861e183 or 698995e4 get parsed as scientific notation
by JSON parsers, breaking Docker tag creation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 02:10:17 -07:00
Kelly
698995e46f chore: bump task worker version comment
Force new git SHA to avoid CI scientific notation bug.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 02:02:30 -07:00
Kelly
1861e18396 feat(workers): Implement geo-based task pools
Workers now follow the correct flow:
1. Check what pools have pending tasks
2. Claim a pool (e.g., Phoenix AZ)
3. Get Evomi proxy for that geo
4. Run preflight with geo proxy
5. Pull tasks from pool (up to 6 stores)
6. Execute tasks
7. Release pool when exhausted (6 stores visited)

Task pools group dispensaries by metro area (100mi radius):
- Phoenix AZ, Tucson AZ
- Los Angeles CA, San Francisco CA, San Diego CA, Sacramento CA
- Denver CO, Chicago IL, Boston MA, Detroit MI
- Las Vegas NV, Reno NV, Newark NJ, New York NY
- Oklahoma City OK, Tulsa OK, Portland OR, Seattle WA

Benefits:
- Workers know geo BEFORE getting proxy (no more "No geo assigned")
- IP diversity within metro area (Phoenix worker can use Tempe IP)
- Simpler worker logic - just match pool geo
- Pre-organized tasks, not grouped at claim time

New files:
- migrations/113_task_pools.sql - schema, seed data, functions
- src/services/task-pool.ts - TypeScript service

Env vars:
- USE_TASK_POOLS=true (new system)
- USE_IDENTITY_POOL=false (disabled)

🤖 Generated with [Claude Code](https://claude.ai/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-14 01:41:52 -07:00
Kelly
eedc027ff6 fix(workers): Report geo to worker_registry when identity claimed
Workers were showing "No geo assigned" on dashboard because geo info
was set internally but never reported to worker_registry after
identity pool claim.

Now updates current_state and current_city columns when identity
is claimed, so dashboard shows correct geo assignment.

Also documents CI/CD batching rule to minimize build time.

🤖 Generated with [Claude Code](https://claude.ai/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-14 01:14:31 -07:00
Kelly
ec5fcd9bc4 fix(proxy): Use rotating IPs instead of sticky sessions
Removes session parameter from Evomi proxy URL so each request
gets a different IP. Prevents all workers from sharing same IP.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 00:48:04 -07:00
Kelly
58150dafa6 docs: Add CI/CD workflow rule - commit and wait
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 00:46:43 -07:00
Kelly
06adab7225 fix(preflight): Add state fallback when IP lookup fails
- Try ip-api.com first, then ipapi.co as fallback
- If both fail, use state coords from targetState param
- Prevents workers from getting stuck in preflight loop

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 00:24:31 -07:00
Kelly
38d7678a2e feat(antidetect): Use actual proxy IP location for browser fingerprint
- Replace hardcoded state coords with IP geolocation lookup via ip-api.com
- Browser timezone and geolocation now match actual proxy IP location
- City-level proxy targeting already in place via Evomi _city- parameter
- Add browser-factory.ts shared utility for antidetect setup

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 23:49:25 -07:00
Kelly
aac1181f3d perf(analytics): Fix 7.5s national summary endpoint
- Use denormalized d.product_count instead of JOIN to store_products
- Remove expensive per-product aggregations (avg_price, brand counts, stock)
- Query now runs in <100ms instead of 7.5s

The massive JOIN between dispensaries and store_products was causing
the slow load. State metrics now use pre-computed product_count column.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 23:31:10 -07:00
Kelly
4eaf7e50d7 feat(ui): Add dropdown for Add User/Origin button
- Single dropdown button shows both options
- Selecting an option switches to that tab and opens modal
- Cleaner UX than separate buttons per tab

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 23:06:42 -07:00
Kelly
4cb4e1c502 feat(workers): Session pool system - claim tasks first, then get IP
New worker flow (enabled via USE_SESSION_POOL=true):
1. Worker claims up to 6 tasks for same geo (atomically marked claimed)
2. Gets Evomi proxy for that geo
3. Checks IP availability (not in use, not in 8hr cooldown)
4. Locks IP exclusively to this worker
5. Runs preflight with locked IP
6. Executes tasks (3 concurrent)
7. After 6 tasks, retires session (8hr IP cooldown)
8. Repeats with new IP

Key files:
- migrations/112_worker_session_pool.sql: Session table + atomic claiming
- services/worker-session.ts: Session lifecycle management
- tasks/task-worker.ts: sessionPoolMainLoop() with new flow
- services/crawl-rotator.ts: setFixedProxy() for session locking

Failed tasks return to pending for retry by another worker.
No two workers can share same IP simultaneously.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 22:54:45 -07:00
Kelly
f0bb454ca2 fix(workers): Require geo for worker qualification status
- Workers without geo now show orange "NO GEO" badge instead of gold qualified
- Orange ring + X badge on avatar when preflight OK but no geo
- Gold ring + checkmark only when fully qualified (preflight + geo)
- Add VenetianMask icon for antidetect status indicator
- Lock K8s replica count at exactly 8 pods in CLAUDE.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 21:58:47 -07:00
Kelly
b8bdc48c1e fix: Remove non-existent progress columns from worker tasks query 2025-12-13 21:24:55 -07:00
Kelly
8173fd2845 fix: Rename duplicate formatDuration function 2025-12-13 21:03:15 -07:00
Kelly
3921e66933 perf: Use denormalized product_count in pipeline and favorites routes
- pipeline.ts: Replace correlated subquery with d.product_count
- consumer-favorites.ts: Replace correlated subquery with d.product_count

Correlated subqueries were causing N+1 query patterns. Using the
denormalized column is O(1) lookup per row.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 20:45:14 -07:00
Kelly
ad79605961 perf(dashboard): Fix slow activity endpoint with denormalized column + cache
- Use dispensaries.product_count instead of correlated subquery
- Add 1 minute in-memory cache for /dashboard/activity
- Reduces query time from ~30s to <100ms

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 20:43:12 -07:00
Kelly
6439de5cd4 feat(ui): Nested task slots in worker dashboard
Backend:
- Add active_tasks array to GET /worker-registry/workers response
- Include task details: role, dispensary, running_seconds, progress

Frontend:
- Show nested task list under each worker with duration
- Display empty slots when worker has capacity
- Update pod visualization to show 3 task slot nodes
- Active slots pulse blue, empty slots gray
- Hover for task details (dispensary, duration, progress)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 20:40:15 -07:00
Kelly
b51ba17d32 fix(ui): Fallback to fingerprint region for worker geo display
geoState was only using current_state column which is often null.
Now falls back to fingerprint.detectedLocation.region like geoCity does.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 20:34:50 -07:00
Kelly
2d631dfad0 feat(ui): Auto-detect trusted origin type from URL/pattern
- Remove Type dropdown from trusted origins form
- Auto-detect domain, IP, or regex from input
- Convert *.domain.com wildcards to proper regex
- Simplify form to just Name, URL/Pattern, Description

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 20:23:03 -07:00
Kelly
072388ffb2 fix(identity): Use unique session IDs for proxy rotation + add task pool gate
- Fix buildEvomiProxyUrl to use passed session ID from identity pool
  instead of truncating to worker+region (causing same IP for all workers)
- Add task pool gate feature with database-backed state
- Add /tasks/pool/toggle endpoint and UI toggle button
- Fix isTaskPoolPaused() missing await in claimTask

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 20:17:52 -07:00
Kelly
b456fe5097 feat: Add trusted origins management UI at /users
- Create trusted_origins table for DB-backed origin management
- Add API routes for CRUD operations on trusted origins
- Add tabbed interface on /users page with Users and Trusted Origins tabs
- Seeds default trusted origins (cannaiq.co, findadispo.com, findagram.co, etc.)
- Fix TypeScript error in WorkersDashboard fingerprint type

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 19:54:26 -07:00