feat: Stealth worker system with mandatory proxy rotation
## Worker System - Role-agnostic workers that can handle any task type - Pod-based architecture with StatefulSet (5-15 pods, 5 workers each) - Custom pod names (Aethelgard, Xylos, Kryll, etc.) - Worker registry with friendly names and resource monitoring - Hub-and-spoke visualization on JobQueue page ## Stealth & Anti-Detection (REQUIRED) - Proxies are MANDATORY - workers fail to start without active proxies - CrawlRotator initializes on worker startup - Loads proxies from `proxies` table - Auto-rotates proxy + fingerprint on 403 errors - 12 browser fingerprints (Chrome, Firefox, Safari, Edge) - Locale/timezone matching for geographic consistency ## Task System - Renamed product_resync → product_refresh - Task chaining: store_discovery → entry_point → product_discovery - Priority-based claiming with FOR UPDATE SKIP LOCKED - Heartbeat and stale task recovery ## UI Updates - JobQueue: Pod visualization, resource monitoring on hover - WorkersDashboard: Simplified worker list - Removed unused filters from task list ## Other - IP2Location service for visitor analytics - Findagram consumer features scaffolding - Documentation updates 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -33,8 +33,8 @@ or overwrites of existing data.
|
||||
| Table | Purpose | Key Columns |
|
||||
|-------|---------|-------------|
|
||||
| `dispensaries` | Store locations | id, name, slug, city, state, platform_dispensary_id |
|
||||
| `dutchie_products` | Canonical products | id, dispensary_id, external_product_id, name, brand_name, stock_status |
|
||||
| `dutchie_product_snapshots` | Historical snapshots | dutchie_product_id, crawled_at, rec_min_price_cents |
|
||||
| `store_products` | Canonical products | id, dispensary_id, external_product_id, name, brand_name, stock_status |
|
||||
| `store_product_snapshots` | Historical snapshots | store_product_id, crawled_at, rec_min_price_cents |
|
||||
| `brands` (view: v_brands) | Derived from products | brand_name, brand_id, product_count |
|
||||
| `categories` (view: v_categories) | Derived from products | type, subcategory, product_count |
|
||||
|
||||
@@ -147,12 +147,10 @@ CREATE TABLE IF NOT EXISTS products_from_legacy (
|
||||
|
||||
---
|
||||
|
||||
### 3. Dutchie Products
|
||||
### 3. Products (Legacy dutchie_products)
|
||||
|
||||
**Source:** `dutchie_legacy.dutchie_products`
|
||||
**Target:** `cannaiq.dutchie_products`
|
||||
|
||||
These tables have nearly identical schemas. The mapping is direct:
|
||||
**Target:** `cannaiq.store_products`
|
||||
|
||||
| Legacy Column | Canonical Column | Notes |
|
||||
|---------------|------------------|-------|
|
||||
@@ -180,15 +178,15 @@ ON CONFLICT (dispensary_id, external_product_id) DO NOTHING
|
||||
|
||||
---
|
||||
|
||||
### 4. Dutchie Product Snapshots
|
||||
### 4. Product Snapshots (Legacy dutchie_product_snapshots)
|
||||
|
||||
**Source:** `dutchie_legacy.dutchie_product_snapshots`
|
||||
**Target:** `cannaiq.dutchie_product_snapshots`
|
||||
**Target:** `cannaiq.store_product_snapshots`
|
||||
|
||||
| Legacy Column | Canonical Column | Notes |
|
||||
|---------------|------------------|-------|
|
||||
| id | - | Generate new |
|
||||
| dutchie_product_id | dutchie_product_id | Map via product lookup |
|
||||
| dutchie_product_id | store_product_id | Map via product lookup |
|
||||
| dispensary_id | dispensary_id | Map via dispensary lookup |
|
||||
| crawled_at | crawled_at | Direct |
|
||||
| rec_min_price_cents | rec_min_price_cents | Direct |
|
||||
@@ -201,7 +199,7 @@ ON CONFLICT (dispensary_id, external_product_id) DO NOTHING
|
||||
```sql
|
||||
-- No unique constraint on snapshots - all are historical records
|
||||
-- Just INSERT, no conflict handling needed
|
||||
INSERT INTO dutchie_product_snapshots (...) VALUES (...)
|
||||
INSERT INTO store_product_snapshots (...) VALUES (...)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user