feat(k8s): Add StatefulSet for persistent workers
- Add scraper-worker-statefulset.yaml with 8 persistent pods - updateStrategy: OnDelete prevents automatic restarts - Workers maintain stable identity across restarts - Document worker architecture in CLAUDE.md - Add worker registry API endpoint documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
52
CLAUDE.md
52
CLAUDE.md
@@ -205,6 +205,58 @@ These binaries mimic real browser TLS fingerprints to avoid detection.
|
||||
|
||||
---
|
||||
|
||||
## Worker Architecture (Kubernetes)
|
||||
|
||||
### Persistent Workers (StatefulSet)
|
||||
|
||||
Workers run as a **StatefulSet** with 8 persistent pods. They maintain identity across restarts.
|
||||
|
||||
**Pod Names**: `scraper-worker-0` through `scraper-worker-7`
|
||||
|
||||
**Key Properties**:
|
||||
- `updateStrategy: OnDelete` - Pods only update when manually deleted (no automatic restarts)
|
||||
- `podManagementPolicy: Parallel` - All pods start simultaneously
|
||||
- Workers register with their pod name as identity
|
||||
|
||||
**K8s Manifest**: `backend/k8s/scraper-worker-statefulset.yaml`
|
||||
|
||||
### Worker Lifecycle
|
||||
|
||||
1. **Startup**: Worker registers in `worker_registry` table with pod name
|
||||
2. **Preflight**: Runs dual-transport preflights (curl + http), reports IPs and fingerprint
|
||||
3. **Task Loop**: Polls for tasks, executes them, reports status
|
||||
4. **Shutdown**: Graceful 60-second termination period
|
||||
|
||||
### NEVER Restart Workers Unnecessarily
|
||||
|
||||
**Claude must NOT**:
|
||||
- Restart workers unless explicitly requested
|
||||
- Use `kubectl rollout restart` on workers
|
||||
- Use `kubectl set image` on workers (this triggers restart)
|
||||
|
||||
**To update worker code** (only when user authorizes):
|
||||
1. Build and push new image with version tag
|
||||
2. Update StatefulSet image reference
|
||||
3. Manually delete pods one at a time when ready: `kubectl delete pod scraper-worker-0 -n dispensary-scraper`
|
||||
|
||||
### Worker Registry API
|
||||
|
||||
**Endpoint**: `GET /api/worker-registry/workers`
|
||||
|
||||
**Response Fields**:
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `pod_name` | Kubernetes pod name |
|
||||
| `worker_id` | Internal worker UUID |
|
||||
| `status` | active, idle, offline |
|
||||
| `curl_ip` | IP from curl preflight |
|
||||
| `http_ip` | IP from Puppeteer preflight |
|
||||
| `preflight_status` | pending, passed, failed |
|
||||
| `preflight_at` | Timestamp of last preflight |
|
||||
| `fingerprint_data` | Browser fingerprint JSON |
|
||||
|
||||
---
|
||||
|
||||
## Documentation
|
||||
|
||||
| Doc | Purpose |
|
||||
|
||||
Reference in New Issue
Block a user