Compare commits
1 Commits
production
...
feat/persi
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
80f048ad57 |
52
CLAUDE.md
52
CLAUDE.md
@@ -205,6 +205,58 @@ These binaries mimic real browser TLS fingerprints to avoid detection.
|
||||
|
||||
---
|
||||
|
||||
## Worker Architecture (Kubernetes)
|
||||
|
||||
### Persistent Workers (StatefulSet)
|
||||
|
||||
Workers run as a **StatefulSet** with 8 persistent pods. They maintain identity across restarts.
|
||||
|
||||
**Pod Names**: `scraper-worker-0` through `scraper-worker-7`
|
||||
|
||||
**Key Properties**:
|
||||
- `updateStrategy: OnDelete` - Pods only update when manually deleted (no automatic restarts)
|
||||
- `podManagementPolicy: Parallel` - All pods start simultaneously
|
||||
- Workers register with their pod name as identity
|
||||
|
||||
**K8s Manifest**: `backend/k8s/scraper-worker-statefulset.yaml`
|
||||
|
||||
### Worker Lifecycle
|
||||
|
||||
1. **Startup**: Worker registers in `worker_registry` table with pod name
|
||||
2. **Preflight**: Runs dual-transport preflights (curl + http), reports IPs and fingerprint
|
||||
3. **Task Loop**: Polls for tasks, executes them, reports status
|
||||
4. **Shutdown**: Graceful 60-second termination period
|
||||
|
||||
### NEVER Restart Workers Unnecessarily
|
||||
|
||||
**Claude must NOT**:
|
||||
- Restart workers unless explicitly requested
|
||||
- Use `kubectl rollout restart` on workers
|
||||
- Use `kubectl set image` on workers (this triggers restart)
|
||||
|
||||
**To update worker code** (only when user authorizes):
|
||||
1. Build and push new image with version tag
|
||||
2. Update StatefulSet image reference
|
||||
3. Manually delete pods one at a time when ready: `kubectl delete pod scraper-worker-0 -n dispensary-scraper`
|
||||
|
||||
### Worker Registry API
|
||||
|
||||
**Endpoint**: `GET /api/worker-registry/workers`
|
||||
|
||||
**Response Fields**:
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `pod_name` | Kubernetes pod name |
|
||||
| `worker_id` | Internal worker UUID |
|
||||
| `status` | active, idle, offline |
|
||||
| `curl_ip` | IP from curl preflight |
|
||||
| `http_ip` | IP from Puppeteer preflight |
|
||||
| `preflight_status` | pending, passed, failed |
|
||||
| `preflight_at` | Timestamp of last preflight |
|
||||
| `fingerprint_data` | Browser fingerprint JSON |
|
||||
|
||||
---
|
||||
|
||||
## Documentation
|
||||
|
||||
| Doc | Purpose |
|
||||
|
||||
77
backend/k8s/scraper-worker-statefulset.yaml
Normal file
77
backend/k8s/scraper-worker-statefulset.yaml
Normal file
@@ -0,0 +1,77 @@
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: scraper-worker
|
||||
namespace: dispensary-scraper
|
||||
labels:
|
||||
app: scraper-worker
|
||||
spec:
|
||||
clusterIP: None # Headless service required for StatefulSet
|
||||
selector:
|
||||
app: scraper-worker
|
||||
ports:
|
||||
- port: 3010
|
||||
name: http
|
||||
---
|
||||
apiVersion: apps/v1
|
||||
kind: StatefulSet
|
||||
metadata:
|
||||
name: scraper-worker
|
||||
namespace: dispensary-scraper
|
||||
spec:
|
||||
serviceName: scraper-worker
|
||||
replicas: 8
|
||||
podManagementPolicy: Parallel # Start all pods at once
|
||||
updateStrategy:
|
||||
type: OnDelete # Pods only update when manually deleted - no automatic restarts
|
||||
selector:
|
||||
matchLabels:
|
||||
app: scraper-worker
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: scraper-worker
|
||||
spec:
|
||||
terminationGracePeriodSeconds: 60
|
||||
imagePullSecrets:
|
||||
- name: regcred
|
||||
containers:
|
||||
- name: worker
|
||||
image: code.cannabrands.app/creationshop/dispensary-scraper:2ed088b4
|
||||
imagePullPolicy: Always
|
||||
command: ["node"]
|
||||
args: ["dist/tasks/task-worker.js"]
|
||||
env:
|
||||
- name: WORKER_MODE
|
||||
value: "true"
|
||||
- name: POD_NAME
|
||||
valueFrom:
|
||||
fieldRef:
|
||||
fieldPath: metadata.name
|
||||
- name: MAX_CONCURRENT_TASKS
|
||||
value: "50"
|
||||
- name: API_BASE_URL
|
||||
value: http://scraper
|
||||
- name: NODE_OPTIONS
|
||||
value: --max-old-space-size=1500
|
||||
envFrom:
|
||||
- configMapRef:
|
||||
name: scraper-config
|
||||
- secretRef:
|
||||
name: scraper-secrets
|
||||
resources:
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 1Gi
|
||||
limits:
|
||||
cpu: 500m
|
||||
memory: 2Gi
|
||||
livenessProbe:
|
||||
exec:
|
||||
command:
|
||||
- /bin/sh
|
||||
- -c
|
||||
- pgrep -f 'task-worker' > /dev/null
|
||||
initialDelaySeconds: 10
|
||||
periodSeconds: 30
|
||||
failureThreshold: 3
|
||||
Reference in New Issue
Block a user