Compare commits
1 Commits
master
...
feat/persi
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
80f048ad57 |
52
CLAUDE.md
52
CLAUDE.md
@@ -205,6 +205,58 @@ These binaries mimic real browser TLS fingerprints to avoid detection.
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## Worker Architecture (Kubernetes)
|
||||||
|
|
||||||
|
### Persistent Workers (StatefulSet)
|
||||||
|
|
||||||
|
Workers run as a **StatefulSet** with 8 persistent pods. They maintain identity across restarts.
|
||||||
|
|
||||||
|
**Pod Names**: `scraper-worker-0` through `scraper-worker-7`
|
||||||
|
|
||||||
|
**Key Properties**:
|
||||||
|
- `updateStrategy: OnDelete` - Pods only update when manually deleted (no automatic restarts)
|
||||||
|
- `podManagementPolicy: Parallel` - All pods start simultaneously
|
||||||
|
- Workers register with their pod name as identity
|
||||||
|
|
||||||
|
**K8s Manifest**: `backend/k8s/scraper-worker-statefulset.yaml`
|
||||||
|
|
||||||
|
### Worker Lifecycle
|
||||||
|
|
||||||
|
1. **Startup**: Worker registers in `worker_registry` table with pod name
|
||||||
|
2. **Preflight**: Runs dual-transport preflights (curl + http), reports IPs and fingerprint
|
||||||
|
3. **Task Loop**: Polls for tasks, executes them, reports status
|
||||||
|
4. **Shutdown**: Graceful 60-second termination period
|
||||||
|
|
||||||
|
### NEVER Restart Workers Unnecessarily
|
||||||
|
|
||||||
|
**Claude must NOT**:
|
||||||
|
- Restart workers unless explicitly requested
|
||||||
|
- Use `kubectl rollout restart` on workers
|
||||||
|
- Use `kubectl set image` on workers (this triggers restart)
|
||||||
|
|
||||||
|
**To update worker code** (only when user authorizes):
|
||||||
|
1. Build and push new image with version tag
|
||||||
|
2. Update StatefulSet image reference
|
||||||
|
3. Manually delete pods one at a time when ready: `kubectl delete pod scraper-worker-0 -n dispensary-scraper`
|
||||||
|
|
||||||
|
### Worker Registry API
|
||||||
|
|
||||||
|
**Endpoint**: `GET /api/worker-registry/workers`
|
||||||
|
|
||||||
|
**Response Fields**:
|
||||||
|
| Field | Description |
|
||||||
|
|-------|-------------|
|
||||||
|
| `pod_name` | Kubernetes pod name |
|
||||||
|
| `worker_id` | Internal worker UUID |
|
||||||
|
| `status` | active, idle, offline |
|
||||||
|
| `curl_ip` | IP from curl preflight |
|
||||||
|
| `http_ip` | IP from Puppeteer preflight |
|
||||||
|
| `preflight_status` | pending, passed, failed |
|
||||||
|
| `preflight_at` | Timestamp of last preflight |
|
||||||
|
| `fingerprint_data` | Browser fingerprint JSON |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## Documentation
|
## Documentation
|
||||||
|
|
||||||
| Doc | Purpose |
|
| Doc | Purpose |
|
||||||
|
|||||||
77
backend/k8s/scraper-worker-statefulset.yaml
Normal file
77
backend/k8s/scraper-worker-statefulset.yaml
Normal file
@@ -0,0 +1,77 @@
|
|||||||
|
apiVersion: v1
|
||||||
|
kind: Service
|
||||||
|
metadata:
|
||||||
|
name: scraper-worker
|
||||||
|
namespace: dispensary-scraper
|
||||||
|
labels:
|
||||||
|
app: scraper-worker
|
||||||
|
spec:
|
||||||
|
clusterIP: None # Headless service required for StatefulSet
|
||||||
|
selector:
|
||||||
|
app: scraper-worker
|
||||||
|
ports:
|
||||||
|
- port: 3010
|
||||||
|
name: http
|
||||||
|
---
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: StatefulSet
|
||||||
|
metadata:
|
||||||
|
name: scraper-worker
|
||||||
|
namespace: dispensary-scraper
|
||||||
|
spec:
|
||||||
|
serviceName: scraper-worker
|
||||||
|
replicas: 8
|
||||||
|
podManagementPolicy: Parallel # Start all pods at once
|
||||||
|
updateStrategy:
|
||||||
|
type: OnDelete # Pods only update when manually deleted - no automatic restarts
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: scraper-worker
|
||||||
|
template:
|
||||||
|
metadata:
|
||||||
|
labels:
|
||||||
|
app: scraper-worker
|
||||||
|
spec:
|
||||||
|
terminationGracePeriodSeconds: 60
|
||||||
|
imagePullSecrets:
|
||||||
|
- name: regcred
|
||||||
|
containers:
|
||||||
|
- name: worker
|
||||||
|
image: code.cannabrands.app/creationshop/dispensary-scraper:2ed088b4
|
||||||
|
imagePullPolicy: Always
|
||||||
|
command: ["node"]
|
||||||
|
args: ["dist/tasks/task-worker.js"]
|
||||||
|
env:
|
||||||
|
- name: WORKER_MODE
|
||||||
|
value: "true"
|
||||||
|
- name: POD_NAME
|
||||||
|
valueFrom:
|
||||||
|
fieldRef:
|
||||||
|
fieldPath: metadata.name
|
||||||
|
- name: MAX_CONCURRENT_TASKS
|
||||||
|
value: "50"
|
||||||
|
- name: API_BASE_URL
|
||||||
|
value: http://scraper
|
||||||
|
- name: NODE_OPTIONS
|
||||||
|
value: --max-old-space-size=1500
|
||||||
|
envFrom:
|
||||||
|
- configMapRef:
|
||||||
|
name: scraper-config
|
||||||
|
- secretRef:
|
||||||
|
name: scraper-secrets
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 100m
|
||||||
|
memory: 1Gi
|
||||||
|
limits:
|
||||||
|
cpu: 500m
|
||||||
|
memory: 2Gi
|
||||||
|
livenessProbe:
|
||||||
|
exec:
|
||||||
|
command:
|
||||||
|
- /bin/sh
|
||||||
|
- -c
|
||||||
|
- pgrep -f 'task-worker' > /dev/null
|
||||||
|
initialDelaySeconds: 10
|
||||||
|
periodSeconds: 30
|
||||||
|
failureThreshold: 3
|
||||||
Reference in New Issue
Block a user