feat(k8s): Add StatefulSet for persistent workers
- Add scraper-worker-statefulset.yaml with 8 persistent pods - updateStrategy: OnDelete prevents automatic restarts - Workers maintain stable identity across restarts - Document worker architecture in CLAUDE.md - Add worker registry API endpoint documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
52
CLAUDE.md
52
CLAUDE.md
@@ -205,6 +205,58 @@ These binaries mimic real browser TLS fingerprints to avoid detection.
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## Worker Architecture (Kubernetes)
|
||||||
|
|
||||||
|
### Persistent Workers (StatefulSet)
|
||||||
|
|
||||||
|
Workers run as a **StatefulSet** with 8 persistent pods. They maintain identity across restarts.
|
||||||
|
|
||||||
|
**Pod Names**: `scraper-worker-0` through `scraper-worker-7`
|
||||||
|
|
||||||
|
**Key Properties**:
|
||||||
|
- `updateStrategy: OnDelete` - Pods only update when manually deleted (no automatic restarts)
|
||||||
|
- `podManagementPolicy: Parallel` - All pods start simultaneously
|
||||||
|
- Workers register with their pod name as identity
|
||||||
|
|
||||||
|
**K8s Manifest**: `backend/k8s/scraper-worker-statefulset.yaml`
|
||||||
|
|
||||||
|
### Worker Lifecycle
|
||||||
|
|
||||||
|
1. **Startup**: Worker registers in `worker_registry` table with pod name
|
||||||
|
2. **Preflight**: Runs dual-transport preflights (curl + http), reports IPs and fingerprint
|
||||||
|
3. **Task Loop**: Polls for tasks, executes them, reports status
|
||||||
|
4. **Shutdown**: Graceful 60-second termination period
|
||||||
|
|
||||||
|
### NEVER Restart Workers Unnecessarily
|
||||||
|
|
||||||
|
**Claude must NOT**:
|
||||||
|
- Restart workers unless explicitly requested
|
||||||
|
- Use `kubectl rollout restart` on workers
|
||||||
|
- Use `kubectl set image` on workers (this triggers restart)
|
||||||
|
|
||||||
|
**To update worker code** (only when user authorizes):
|
||||||
|
1. Build and push new image with version tag
|
||||||
|
2. Update StatefulSet image reference
|
||||||
|
3. Manually delete pods one at a time when ready: `kubectl delete pod scraper-worker-0 -n dispensary-scraper`
|
||||||
|
|
||||||
|
### Worker Registry API
|
||||||
|
|
||||||
|
**Endpoint**: `GET /api/worker-registry/workers`
|
||||||
|
|
||||||
|
**Response Fields**:
|
||||||
|
| Field | Description |
|
||||||
|
|-------|-------------|
|
||||||
|
| `pod_name` | Kubernetes pod name |
|
||||||
|
| `worker_id` | Internal worker UUID |
|
||||||
|
| `status` | active, idle, offline |
|
||||||
|
| `curl_ip` | IP from curl preflight |
|
||||||
|
| `http_ip` | IP from Puppeteer preflight |
|
||||||
|
| `preflight_status` | pending, passed, failed |
|
||||||
|
| `preflight_at` | Timestamp of last preflight |
|
||||||
|
| `fingerprint_data` | Browser fingerprint JSON |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## Documentation
|
## Documentation
|
||||||
|
|
||||||
| Doc | Purpose |
|
| Doc | Purpose |
|
||||||
|
|||||||
Reference in New Issue
Block a user