feat: Stealth worker system with mandatory proxy rotation

## Worker System
- Role-agnostic workers that can handle any task type
- Pod-based architecture with StatefulSet (5-15 pods, 5 workers each)
- Custom pod names (Aethelgard, Xylos, Kryll, etc.)
- Worker registry with friendly names and resource monitoring
- Hub-and-spoke visualization on JobQueue page

## Stealth & Anti-Detection (REQUIRED)
- Proxies are MANDATORY - workers fail to start without active proxies
- CrawlRotator initializes on worker startup
- Loads proxies from `proxies` table
- Auto-rotates proxy + fingerprint on 403 errors
- 12 browser fingerprints (Chrome, Firefox, Safari, Edge)
- Locale/timezone matching for geographic consistency

## Task System
- Renamed product_resync → product_refresh
- Task chaining: store_discovery → entry_point → product_discovery
- Priority-based claiming with FOR UPDATE SKIP LOCKED
- Heartbeat and stale task recovery

## UI Updates
- JobQueue: Pod visualization, resource monitoring on hover
- WorkersDashboard: Simplified worker list
- Removed unused filters from task list

## Other
- IP2Location service for visitor analytics
- Findagram consumer features scaffolding
- Documentation updates

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Kelly
2025-12-10 00:44:59 -07:00
parent 0295637ed6
commit 56cc171287
61 changed files with 8591 additions and 2076 deletions

View File

@@ -33,8 +33,8 @@ or overwrites of existing data.
| Table | Purpose | Key Columns |
|-------|---------|-------------|
| `dispensaries` | Store locations | id, name, slug, city, state, platform_dispensary_id |
| `dutchie_products` | Canonical products | id, dispensary_id, external_product_id, name, brand_name, stock_status |
| `dutchie_product_snapshots` | Historical snapshots | dutchie_product_id, crawled_at, rec_min_price_cents |
| `store_products` | Canonical products | id, dispensary_id, external_product_id, name, brand_name, stock_status |
| `store_product_snapshots` | Historical snapshots | store_product_id, crawled_at, rec_min_price_cents |
| `brands` (view: v_brands) | Derived from products | brand_name, brand_id, product_count |
| `categories` (view: v_categories) | Derived from products | type, subcategory, product_count |
@@ -147,12 +147,10 @@ CREATE TABLE IF NOT EXISTS products_from_legacy (
---
### 3. Dutchie Products
### 3. Products (Legacy dutchie_products)
**Source:** `dutchie_legacy.dutchie_products`
**Target:** `cannaiq.dutchie_products`
These tables have nearly identical schemas. The mapping is direct:
**Target:** `cannaiq.store_products`
| Legacy Column | Canonical Column | Notes |
|---------------|------------------|-------|
@@ -180,15 +178,15 @@ ON CONFLICT (dispensary_id, external_product_id) DO NOTHING
---
### 4. Dutchie Product Snapshots
### 4. Product Snapshots (Legacy dutchie_product_snapshots)
**Source:** `dutchie_legacy.dutchie_product_snapshots`
**Target:** `cannaiq.dutchie_product_snapshots`
**Target:** `cannaiq.store_product_snapshots`
| Legacy Column | Canonical Column | Notes |
|---------------|------------------|-------|
| id | - | Generate new |
| dutchie_product_id | dutchie_product_id | Map via product lookup |
| dutchie_product_id | store_product_id | Map via product lookup |
| dispensary_id | dispensary_id | Map via dispensary lookup |
| crawled_at | crawled_at | Direct |
| rec_min_price_cents | rec_min_price_cents | Direct |
@@ -201,7 +199,7 @@ ON CONFLICT (dispensary_id, external_product_id) DO NOTHING
```sql
-- No unique constraint on snapshots - all are historical records
-- Just INSERT, no conflict handling needed
INSERT INTO dutchie_product_snapshots (...) VALUES (...)
INSERT INTO store_product_snapshots (...) VALUES (...)
```
---