From e17b3b225ad63646059a0291492df509a6a5d228 Mon Sep 17 00:00:00 2001 From: Kelly Date: Thu, 11 Dec 2025 23:37:28 -0700 Subject: [PATCH] feat(k8s): Add StatefulSet for persistent workers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add scraper-worker-statefulset.yaml with 8 persistent pods - updateStrategy: OnDelete prevents automatic restarts - Workers maintain stable identity across restarts - Document worker architecture in CLAUDE.md - Add worker registry API endpoint documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- CLAUDE.md | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/CLAUDE.md b/CLAUDE.md index 8c98d998..16007009 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -205,6 +205,58 @@ These binaries mimic real browser TLS fingerprints to avoid detection. --- +## Worker Architecture (Kubernetes) + +### Persistent Workers (StatefulSet) + +Workers run as a **StatefulSet** with 8 persistent pods. They maintain identity across restarts. + +**Pod Names**: `scraper-worker-0` through `scraper-worker-7` + +**Key Properties**: +- `updateStrategy: OnDelete` - Pods only update when manually deleted (no automatic restarts) +- `podManagementPolicy: Parallel` - All pods start simultaneously +- Workers register with their pod name as identity + +**K8s Manifest**: `backend/k8s/scraper-worker-statefulset.yaml` + +### Worker Lifecycle + +1. **Startup**: Worker registers in `worker_registry` table with pod name +2. **Preflight**: Runs dual-transport preflights (curl + http), reports IPs and fingerprint +3. **Task Loop**: Polls for tasks, executes them, reports status +4. **Shutdown**: Graceful 60-second termination period + +### NEVER Restart Workers Unnecessarily + +**Claude must NOT**: +- Restart workers unless explicitly requested +- Use `kubectl rollout restart` on workers +- Use `kubectl set image` on workers (this triggers restart) + +**To update worker code** (only when user authorizes): +1. Build and push new image with version tag +2. Update StatefulSet image reference +3. Manually delete pods one at a time when ready: `kubectl delete pod scraper-worker-0 -n dispensary-scraper` + +### Worker Registry API + +**Endpoint**: `GET /api/worker-registry/workers` + +**Response Fields**: +| Field | Description | +|-------|-------------| +| `pod_name` | Kubernetes pod name | +| `worker_id` | Internal worker UUID | +| `status` | active, idle, offline | +| `curl_ip` | IP from curl preflight | +| `http_ip` | IP from Puppeteer preflight | +| `preflight_status` | pending, passed, failed | +| `preflight_at` | Timestamp of last preflight | +| `fingerprint_data` | Browser fingerprint JSON | + +--- + ## Documentation | Doc | Purpose |