Compare commits
2 Commits
feat/prefl
...
fix/api-se
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
2513e22171 | ||
|
|
e17b3b225a |
52
CLAUDE.md
52
CLAUDE.md
@@ -205,6 +205,58 @@ These binaries mimic real browser TLS fingerprints to avoid detection.
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## Worker Architecture (Kubernetes)
|
||||||
|
|
||||||
|
### Persistent Workers (StatefulSet)
|
||||||
|
|
||||||
|
Workers run as a **StatefulSet** with 8 persistent pods. They maintain identity across restarts.
|
||||||
|
|
||||||
|
**Pod Names**: `scraper-worker-0` through `scraper-worker-7`
|
||||||
|
|
||||||
|
**Key Properties**:
|
||||||
|
- `updateStrategy: OnDelete` - Pods only update when manually deleted (no automatic restarts)
|
||||||
|
- `podManagementPolicy: Parallel` - All pods start simultaneously
|
||||||
|
- Workers register with their pod name as identity
|
||||||
|
|
||||||
|
**K8s Manifest**: `backend/k8s/scraper-worker-statefulset.yaml`
|
||||||
|
|
||||||
|
### Worker Lifecycle
|
||||||
|
|
||||||
|
1. **Startup**: Worker registers in `worker_registry` table with pod name
|
||||||
|
2. **Preflight**: Runs dual-transport preflights (curl + http), reports IPs and fingerprint
|
||||||
|
3. **Task Loop**: Polls for tasks, executes them, reports status
|
||||||
|
4. **Shutdown**: Graceful 60-second termination period
|
||||||
|
|
||||||
|
### NEVER Restart Workers Unnecessarily
|
||||||
|
|
||||||
|
**Claude must NOT**:
|
||||||
|
- Restart workers unless explicitly requested
|
||||||
|
- Use `kubectl rollout restart` on workers
|
||||||
|
- Use `kubectl set image` on workers (this triggers restart)
|
||||||
|
|
||||||
|
**To update worker code** (only when user authorizes):
|
||||||
|
1. Build and push new image with version tag
|
||||||
|
2. Update StatefulSet image reference
|
||||||
|
3. Manually delete pods one at a time when ready: `kubectl delete pod scraper-worker-0 -n dispensary-scraper`
|
||||||
|
|
||||||
|
### Worker Registry API
|
||||||
|
|
||||||
|
**Endpoint**: `GET /api/worker-registry/workers`
|
||||||
|
|
||||||
|
**Response Fields**:
|
||||||
|
| Field | Description |
|
||||||
|
|-------|-------------|
|
||||||
|
| `pod_name` | Kubernetes pod name |
|
||||||
|
| `worker_id` | Internal worker UUID |
|
||||||
|
| `status` | active, idle, offline |
|
||||||
|
| `curl_ip` | IP from curl preflight |
|
||||||
|
| `http_ip` | IP from Puppeteer preflight |
|
||||||
|
| `preflight_status` | pending, passed, failed |
|
||||||
|
| `preflight_at` | Timestamp of last preflight |
|
||||||
|
| `fingerprint_data` | Browser fingerprint JSON |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## Documentation
|
## Documentation
|
||||||
|
|
||||||
| Doc | Purpose |
|
| Doc | Purpose |
|
||||||
|
|||||||
175
backend/docs/API_SECURITY.md
Normal file
175
backend/docs/API_SECURITY.md
Normal file
@@ -0,0 +1,175 @@
|
|||||||
|
# API Security Documentation
|
||||||
|
|
||||||
|
This document describes the authentication and authorization configuration for all CannaiQ API endpoints.
|
||||||
|
|
||||||
|
## Authentication Methods
|
||||||
|
|
||||||
|
### 1. Trusted Origins (No Token Required)
|
||||||
|
|
||||||
|
Requests from trusted sources are automatically authenticated with `internal` role:
|
||||||
|
|
||||||
|
**Trusted IPs:**
|
||||||
|
- `127.0.0.1` (localhost IPv4)
|
||||||
|
- `::1` (localhost IPv6)
|
||||||
|
- `::ffff:127.0.0.1` (IPv4-mapped IPv6)
|
||||||
|
|
||||||
|
**Trusted Domains:**
|
||||||
|
- `https://cannaiq.co`
|
||||||
|
- `https://www.cannaiq.co`
|
||||||
|
- `https://findadispo.com`
|
||||||
|
- `https://www.findadispo.com`
|
||||||
|
- `https://findagram.co`
|
||||||
|
- `https://www.findagram.co`
|
||||||
|
- `http://localhost:3010`
|
||||||
|
- `http://localhost:8080`
|
||||||
|
- `http://localhost:5173`
|
||||||
|
|
||||||
|
**Trusted Patterns:**
|
||||||
|
- `*.cannabrands.app`
|
||||||
|
- `*.cannaiq.co`
|
||||||
|
|
||||||
|
**Internal Header:**
|
||||||
|
- `X-Internal-Request` header matching `INTERNAL_REQUEST_SECRET` env var
|
||||||
|
|
||||||
|
### 2. Bearer Token Authentication
|
||||||
|
|
||||||
|
External requests must include a valid token:
|
||||||
|
|
||||||
|
```
|
||||||
|
Authorization: Bearer <token>
|
||||||
|
```
|
||||||
|
|
||||||
|
**Token Types:**
|
||||||
|
- **JWT Token**: User session tokens (7-day expiry)
|
||||||
|
- **API Token**: Long-lived tokens for integrations (stored in `api_tokens` table)
|
||||||
|
|
||||||
|
## Authorization Levels
|
||||||
|
|
||||||
|
### Public (No Auth)
|
||||||
|
Routes accessible without authentication:
|
||||||
|
- `GET /health` - Health check
|
||||||
|
- `GET /api/health/*` - Comprehensive health endpoints
|
||||||
|
- `GET /outbound-ip` - Server's outbound IP
|
||||||
|
- `GET /api/v1/deals` - Public deals endpoint
|
||||||
|
|
||||||
|
### Authenticated (Trusted Origin or Token)
|
||||||
|
Routes requiring authentication but no specific role:
|
||||||
|
|
||||||
|
| Route | Description |
|
||||||
|
|-------|-------------|
|
||||||
|
| `/api/payloads/*` | Raw crawl payload access |
|
||||||
|
| `/api/workers/*` | Worker monitoring |
|
||||||
|
| `/api/worker-registry/*` | Worker registration and heartbeats |
|
||||||
|
| `/api/stores/*` | Store CRUD |
|
||||||
|
| `/api/products/*` | Product listing |
|
||||||
|
| `/api/dispensaries/*` | Dispensary data |
|
||||||
|
|
||||||
|
### Admin Only (Requires `admin` or `superadmin` role)
|
||||||
|
Routes restricted to administrators:
|
||||||
|
|
||||||
|
| Route | Description |
|
||||||
|
|-------|-------------|
|
||||||
|
| `/api/job-queue/*` | Job queue management |
|
||||||
|
| `/api/k8s/*` | Kubernetes control (scaling) |
|
||||||
|
| `/api/pipeline/*` | Pipeline stage transitions |
|
||||||
|
| `/api/tasks/*` | Task queue management |
|
||||||
|
| `/api/admin/orchestrator/*` | Orchestrator dashboard |
|
||||||
|
| `/api/admin/trusted-origins/*` | Manage trusted origins |
|
||||||
|
| `/api/admin/debug/*` | Debug endpoints |
|
||||||
|
|
||||||
|
**Note:** The `internal` role (localhost/trusted origins) bypasses role checks, granting automatic admin access for local development and internal services.
|
||||||
|
|
||||||
|
## Endpoint Security Matrix
|
||||||
|
|
||||||
|
| Endpoint Group | Auth Required | Role Required | Notes |
|
||||||
|
|----------------|---------------|---------------|-------|
|
||||||
|
| `/api/payloads/*` | Yes | None | Query API for raw crawl data |
|
||||||
|
| `/api/job-queue/*` | Yes | admin | Legacy job queue (deprecated) |
|
||||||
|
| `/api/workers/*` | Yes | None | Worker status monitoring |
|
||||||
|
| `/api/worker-registry/*` | Yes | None | Workers register via trusted IPs |
|
||||||
|
| `/api/k8s/*` | Yes | admin | K8s scaling controls |
|
||||||
|
| `/api/pipeline/*` | Yes | admin | Store pipeline transitions |
|
||||||
|
| `/api/tasks/*` | Yes | admin | Task queue CRUD |
|
||||||
|
| `/api/admin/orchestrator/*` | Yes | admin | Orchestrator metrics/alerts |
|
||||||
|
| `/api/admin/trusted-origins/*` | Yes | admin | Auth bypass management |
|
||||||
|
| `/api/v1/*` | Varies | Varies | Public API (per-endpoint) |
|
||||||
|
| `/api/consumer/*` | Varies | Varies | Consumer features |
|
||||||
|
|
||||||
|
## Implementation Details
|
||||||
|
|
||||||
|
### Middleware Stack
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// Authentication middleware - validates token or trusted origin
|
||||||
|
import { authMiddleware } from '../auth/middleware';
|
||||||
|
|
||||||
|
// Role requirement middleware - checks user role
|
||||||
|
import { requireRole } from '../auth/middleware';
|
||||||
|
|
||||||
|
// Usage in route files:
|
||||||
|
router.use(authMiddleware); // All routes need auth
|
||||||
|
router.use(requireRole('admin', 'superadmin')); // Admin-only routes
|
||||||
|
```
|
||||||
|
|
||||||
|
### Auth Middleware Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
Request → Check Bearer Token
|
||||||
|
├─ Valid JWT → Set user from token → Continue
|
||||||
|
├─ Valid API Token → Set user as api_token role → Continue
|
||||||
|
└─ No Token → Check Trusted Origin
|
||||||
|
├─ Trusted → Set user as internal role → Continue
|
||||||
|
└─ Not Trusted → 401 Unauthorized
|
||||||
|
```
|
||||||
|
|
||||||
|
### Role Check Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
Request → authMiddleware → requireRole('admin')
|
||||||
|
├─ role === 'internal' → Continue (bypass)
|
||||||
|
├─ role in ['admin', 'superadmin'] → Continue
|
||||||
|
└─ else → 403 Forbidden
|
||||||
|
```
|
||||||
|
|
||||||
|
## Worker Pod Authentication
|
||||||
|
|
||||||
|
Worker pods (in Kubernetes) authenticate via:
|
||||||
|
|
||||||
|
1. **Internal IP**: Pods communicate via cluster IPs, which are trusted
|
||||||
|
2. **Internal Header**: Optional `X-Internal-Request` header for explicit trust
|
||||||
|
|
||||||
|
Endpoints used by workers:
|
||||||
|
- `POST /api/worker-registry/register` - Report for duty
|
||||||
|
- `POST /api/worker-registry/heartbeat` - Stay alive
|
||||||
|
- `POST /api/worker-registry/deregister` - Graceful shutdown
|
||||||
|
- `POST /api/worker-registry/task-completed` - Report task completion
|
||||||
|
|
||||||
|
## API Token Management
|
||||||
|
|
||||||
|
API tokens are managed via:
|
||||||
|
- `GET /api/api-tokens` - List tokens
|
||||||
|
- `POST /api/api-tokens` - Create token
|
||||||
|
- `DELETE /api/api-tokens/:id` - Revoke token
|
||||||
|
|
||||||
|
Token properties:
|
||||||
|
- `token`: The bearer token value
|
||||||
|
- `name`: Human-readable identifier
|
||||||
|
- `rate_limit`: Requests per minute
|
||||||
|
- `expires_at`: Optional expiration
|
||||||
|
- `active`: Enable/disable toggle
|
||||||
|
- `allowed_endpoints`: Optional endpoint restrictions
|
||||||
|
|
||||||
|
## Security Best Practices
|
||||||
|
|
||||||
|
1. **Never expose tokens in URLs** - Use Authorization header
|
||||||
|
2. **Use HTTPS in production** - All traffic encrypted
|
||||||
|
3. **Rotate API tokens periodically** - Set expiration dates
|
||||||
|
4. **Monitor rate limits** - Prevent abuse
|
||||||
|
5. **Audit access logs** - Track API usage via `api_usage_logs` table
|
||||||
|
|
||||||
|
## Related Files
|
||||||
|
|
||||||
|
- `src/auth/middleware.ts` - Auth middleware implementation
|
||||||
|
- `src/routes/api-tokens.ts` - Token management endpoints
|
||||||
|
- `src/middleware/apiTokenTracker.ts` - Usage tracking
|
||||||
|
- `src/middleware/trustedDomains.ts` - Domain trust markers
|
||||||
@@ -15,9 +15,14 @@
|
|||||||
|
|
||||||
import { Router, Request, Response } from 'express';
|
import { Router, Request, Response } from 'express';
|
||||||
import { pool } from '../db/pool';
|
import { pool } from '../db/pool';
|
||||||
|
import { authMiddleware, requireRole } from '../auth/middleware';
|
||||||
|
|
||||||
const router = Router();
|
const router = Router();
|
||||||
|
|
||||||
|
// All job-queue routes require authentication and admin role
|
||||||
|
router.use(authMiddleware);
|
||||||
|
router.use(requireRole('admin', 'superadmin'));
|
||||||
|
|
||||||
// In-memory queue state (would be in Redis in production)
|
// In-memory queue state (would be in Redis in production)
|
||||||
let queuePaused = false;
|
let queuePaused = false;
|
||||||
|
|
||||||
|
|||||||
@@ -7,9 +7,14 @@
|
|||||||
|
|
||||||
import { Router, Request, Response } from 'express';
|
import { Router, Request, Response } from 'express';
|
||||||
import * as k8s from '@kubernetes/client-node';
|
import * as k8s from '@kubernetes/client-node';
|
||||||
|
import { authMiddleware, requireRole } from '../auth/middleware';
|
||||||
|
|
||||||
const router = Router();
|
const router = Router();
|
||||||
|
|
||||||
|
// K8s control routes require authentication and admin role
|
||||||
|
router.use(authMiddleware);
|
||||||
|
router.use(requireRole('admin', 'superadmin'));
|
||||||
|
|
||||||
// K8s client setup - lazy initialization
|
// K8s client setup - lazy initialization
|
||||||
let appsApi: k8s.AppsV1Api | null = null;
|
let appsApi: k8s.AppsV1Api | null = null;
|
||||||
let k8sError: string | null = null;
|
let k8sError: string | null = null;
|
||||||
|
|||||||
@@ -11,9 +11,14 @@ import { getLatestTrace, getTracesForDispensary, getTraceById } from '../service
|
|||||||
import { getProviderDisplayName } from '../utils/provider-display';
|
import { getProviderDisplayName } from '../utils/provider-display';
|
||||||
import * as fs from 'fs';
|
import * as fs from 'fs';
|
||||||
import * as path from 'path';
|
import * as path from 'path';
|
||||||
|
import { authMiddleware, requireRole } from '../auth/middleware';
|
||||||
|
|
||||||
const router = Router();
|
const router = Router();
|
||||||
|
|
||||||
|
// Orchestrator admin routes require authentication and admin role
|
||||||
|
router.use(authMiddleware);
|
||||||
|
router.use(requireRole('admin', 'superadmin'));
|
||||||
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
// ORCHESTRATOR METRICS
|
// ORCHESTRATOR METRICS
|
||||||
// ============================================================
|
// ============================================================
|
||||||
|
|||||||
@@ -21,9 +21,13 @@ import {
|
|||||||
listPayloadMetadata,
|
listPayloadMetadata,
|
||||||
} from '../utils/payload-storage';
|
} from '../utils/payload-storage';
|
||||||
import { Pool } from 'pg';
|
import { Pool } from 'pg';
|
||||||
|
import { authMiddleware } from '../auth/middleware';
|
||||||
|
|
||||||
const router = Router();
|
const router = Router();
|
||||||
|
|
||||||
|
// All payload routes require authentication (trusted origins or API token)
|
||||||
|
router.use(authMiddleware);
|
||||||
|
|
||||||
// Get pool instance for queries
|
// Get pool instance for queries
|
||||||
const getDbPool = (): Pool => getPool() as unknown as Pool;
|
const getDbPool = (): Pool => getPool() as unknown as Pool;
|
||||||
|
|
||||||
|
|||||||
@@ -18,9 +18,14 @@
|
|||||||
|
|
||||||
import { Router, Request, Response } from 'express';
|
import { Router, Request, Response } from 'express';
|
||||||
import { pool } from '../db/pool';
|
import { pool } from '../db/pool';
|
||||||
|
import { authMiddleware, requireRole } from '../auth/middleware';
|
||||||
|
|
||||||
const router = Router();
|
const router = Router();
|
||||||
|
|
||||||
|
// Pipeline routes require authentication and admin role
|
||||||
|
router.use(authMiddleware);
|
||||||
|
router.use(requireRole('admin', 'superadmin'));
|
||||||
|
|
||||||
// Valid stages
|
// Valid stages
|
||||||
const STAGES = ['discovered', 'validated', 'promoted', 'sandbox', 'production', 'failing'] as const;
|
const STAGES = ['discovered', 'validated', 'promoted', 'sandbox', 'production', 'failing'] as const;
|
||||||
type Stage = typeof STAGES[number];
|
type Stage = typeof STAGES[number];
|
||||||
|
|||||||
@@ -19,9 +19,14 @@ import {
|
|||||||
resumeTaskPool,
|
resumeTaskPool,
|
||||||
getTaskPoolStatus,
|
getTaskPoolStatus,
|
||||||
} from '../tasks/task-pool-state';
|
} from '../tasks/task-pool-state';
|
||||||
|
import { authMiddleware, requireRole } from '../auth/middleware';
|
||||||
|
|
||||||
const router = Router();
|
const router = Router();
|
||||||
|
|
||||||
|
// Task routes require authentication and admin role
|
||||||
|
router.use(authMiddleware);
|
||||||
|
router.use(requireRole('admin', 'superadmin'));
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* GET /api/tasks
|
* GET /api/tasks
|
||||||
* List tasks with optional filters
|
* List tasks with optional filters
|
||||||
|
|||||||
@@ -23,11 +23,14 @@
|
|||||||
import { Router, Request, Response } from 'express';
|
import { Router, Request, Response } from 'express';
|
||||||
import { pool } from '../db/pool';
|
import { pool } from '../db/pool';
|
||||||
import os from 'os';
|
import os from 'os';
|
||||||
import { runPuppeteerPreflightWithRetry } from '../services/puppeteer-preflight';
|
import { authMiddleware } from '../auth/middleware';
|
||||||
import { CrawlRotator } from '../services/crawl-rotator';
|
|
||||||
|
|
||||||
const router = Router();
|
const router = Router();
|
||||||
|
|
||||||
|
// Worker registry routes require authentication
|
||||||
|
// Note: Internal workers (pods) can access via trusted IP (localhost, in-cluster)
|
||||||
|
router.use(authMiddleware);
|
||||||
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
// WORKER REGISTRATION
|
// WORKER REGISTRATION
|
||||||
// ============================================================
|
// ============================================================
|
||||||
@@ -866,58 +869,4 @@ router.get('/pods', async (_req: Request, res: Response) => {
|
|||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
// ============================================================
|
|
||||||
// PREFLIGHT SMOKE TEST
|
|
||||||
// ============================================================
|
|
||||||
|
|
||||||
/**
|
|
||||||
* POST /api/worker-registry/preflight-test
|
|
||||||
* Run an HTTP (Puppeteer) preflight test and return results
|
|
||||||
*
|
|
||||||
* This is a smoke test endpoint to verify the preflight system works.
|
|
||||||
* Returns IP, fingerprint data, bot detection results, and products fetched.
|
|
||||||
*/
|
|
||||||
router.post('/preflight-test', async (_req: Request, res: Response) => {
|
|
||||||
try {
|
|
||||||
console.log('[PreflightTest] Starting HTTP preflight smoke test...');
|
|
||||||
|
|
||||||
// Create a temporary CrawlRotator for the test
|
|
||||||
const crawlRotator = new CrawlRotator();
|
|
||||||
|
|
||||||
// Run the Puppeteer preflight (with 1 retry)
|
|
||||||
const startTime = Date.now();
|
|
||||||
const result = await runPuppeteerPreflightWithRetry(crawlRotator, 1);
|
|
||||||
const duration = Date.now() - startTime;
|
|
||||||
|
|
||||||
console.log(`[PreflightTest] Completed in ${duration}ms - passed: ${result.passed}`);
|
|
||||||
|
|
||||||
res.json({
|
|
||||||
success: true,
|
|
||||||
test: 'http_preflight',
|
|
||||||
duration_ms: duration,
|
|
||||||
result: {
|
|
||||||
passed: result.passed,
|
|
||||||
proxy_ip: result.proxyIp,
|
|
||||||
fingerprint: result.fingerprint,
|
|
||||||
bot_detection: result.botDetection,
|
|
||||||
products_returned: result.productsReturned,
|
|
||||||
browser_user_agent: result.browserUserAgent,
|
|
||||||
ip_verified: result.ipVerified,
|
|
||||||
proxy_available: result.proxyAvailable,
|
|
||||||
proxy_connected: result.proxyConnected,
|
|
||||||
antidetect_ready: result.antidetectReady,
|
|
||||||
response_time_ms: result.responseTimeMs,
|
|
||||||
error: result.error
|
|
||||||
}
|
|
||||||
});
|
|
||||||
} catch (error: any) {
|
|
||||||
console.error('[PreflightTest] Error:', error.message);
|
|
||||||
res.status(500).json({
|
|
||||||
success: false,
|
|
||||||
test: 'http_preflight',
|
|
||||||
error: error.message
|
|
||||||
});
|
|
||||||
}
|
|
||||||
});
|
|
||||||
|
|
||||||
export default router;
|
export default router;
|
||||||
|
|||||||
@@ -26,9 +26,13 @@
|
|||||||
import { Router, Request, Response } from 'express';
|
import { Router, Request, Response } from 'express';
|
||||||
import { pool } from '../db/pool';
|
import { pool } from '../db/pool';
|
||||||
import * as k8s from '@kubernetes/client-node';
|
import * as k8s from '@kubernetes/client-node';
|
||||||
|
import { authMiddleware } from '../auth/middleware';
|
||||||
|
|
||||||
const router = Router();
|
const router = Router();
|
||||||
|
|
||||||
|
// All worker routes require authentication (trusted origins or API token)
|
||||||
|
router.use(authMiddleware);
|
||||||
|
|
||||||
// ============================================================
|
// ============================================================
|
||||||
// K8S SCALING CONFIGURATION (added 2024-12-10)
|
// K8S SCALING CONFIGURATION (added 2024-12-10)
|
||||||
// Per TASK_WORKFLOW_2024-12-10.md: Admin can scale workers from UI
|
// Per TASK_WORKFLOW_2024-12-10.md: Admin can scale workers from UI
|
||||||
|
|||||||
@@ -26,34 +26,6 @@ const TEST_PLATFORM_ID = '6405ef617056e8014d79101b';
|
|||||||
const FINGERPRINT_DEMO_URL = 'https://demo.fingerprint.com/';
|
const FINGERPRINT_DEMO_URL = 'https://demo.fingerprint.com/';
|
||||||
const AMIUNIQUE_URL = 'https://amiunique.org/fingerprint';
|
const AMIUNIQUE_URL = 'https://amiunique.org/fingerprint';
|
||||||
|
|
||||||
// IP geolocation API for timezone lookup (free, no key required)
|
|
||||||
const IP_API_URL = 'http://ip-api.com/json';
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Look up timezone from IP address using ip-api.com
|
|
||||||
* Returns IANA timezone (e.g., 'America/New_York') or null on failure
|
|
||||||
*/
|
|
||||||
async function getTimezoneFromIp(ip: string): Promise<{ timezone: string; city?: string; region?: string } | null> {
|
|
||||||
try {
|
|
||||||
const axios = require('axios');
|
|
||||||
const response = await axios.get(`${IP_API_URL}/${ip}?fields=status,timezone,city,regionName`, {
|
|
||||||
timeout: 5000,
|
|
||||||
});
|
|
||||||
|
|
||||||
if (response.data?.status === 'success' && response.data?.timezone) {
|
|
||||||
return {
|
|
||||||
timezone: response.data.timezone,
|
|
||||||
city: response.data.city,
|
|
||||||
region: response.data.regionName,
|
|
||||||
};
|
|
||||||
}
|
|
||||||
return null;
|
|
||||||
} catch (err: any) {
|
|
||||||
console.log(`[PuppeteerPreflight] IP geolocation lookup failed: ${err.message}`);
|
|
||||||
return null;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
export interface PuppeteerPreflightResult extends PreflightResult {
|
export interface PuppeteerPreflightResult extends PreflightResult {
|
||||||
method: 'http';
|
method: 'http';
|
||||||
/** Number of products returned (proves API access) */
|
/** Number of products returned (proves API access) */
|
||||||
@@ -70,13 +42,6 @@ export interface PuppeteerPreflightResult extends PreflightResult {
|
|||||||
expectedProxyIp?: string;
|
expectedProxyIp?: string;
|
||||||
/** Whether IP verification passed (detected IP matches proxy) */
|
/** Whether IP verification passed (detected IP matches proxy) */
|
||||||
ipVerified?: boolean;
|
ipVerified?: boolean;
|
||||||
/** Detected timezone from IP geolocation */
|
|
||||||
detectedTimezone?: string;
|
|
||||||
/** Detected location from IP geolocation */
|
|
||||||
detectedLocation?: {
|
|
||||||
city?: string;
|
|
||||||
region?: string;
|
|
||||||
};
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@@ -171,52 +136,7 @@ export async function runPuppeteerPreflight(
|
|||||||
};
|
};
|
||||||
|
|
||||||
// =========================================================================
|
// =========================================================================
|
||||||
// STEP 1a: Get IP address directly via simple API (more reliable than scraping)
|
// STEP 1: Visit fingerprint.com demo to verify anti-detect and get IP
|
||||||
// =========================================================================
|
|
||||||
console.log(`[PuppeteerPreflight] Getting proxy IP address...`);
|
|
||||||
try {
|
|
||||||
const ipApiResponse = await page.evaluate(async () => {
|
|
||||||
try {
|
|
||||||
const response = await fetch('https://api.ipify.org?format=json');
|
|
||||||
const data = await response.json();
|
|
||||||
return { ip: data.ip, error: null };
|
|
||||||
} catch (err: any) {
|
|
||||||
return { ip: null, error: err.message };
|
|
||||||
}
|
|
||||||
});
|
|
||||||
|
|
||||||
if (ipApiResponse.ip) {
|
|
||||||
result.proxyIp = ipApiResponse.ip;
|
|
||||||
result.proxyConnected = true;
|
|
||||||
console.log(`[PuppeteerPreflight] Detected proxy IP: ${ipApiResponse.ip}`);
|
|
||||||
|
|
||||||
// Look up timezone from IP
|
|
||||||
const geoData = await getTimezoneFromIp(ipApiResponse.ip);
|
|
||||||
if (geoData) {
|
|
||||||
result.detectedTimezone = geoData.timezone;
|
|
||||||
result.detectedLocation = { city: geoData.city, region: geoData.region };
|
|
||||||
console.log(`[PuppeteerPreflight] IP Geolocation: ${geoData.city}, ${geoData.region} (${geoData.timezone})`);
|
|
||||||
|
|
||||||
// Set browser timezone to match proxy location via CDP
|
|
||||||
try {
|
|
||||||
const client = await page.target().createCDPSession();
|
|
||||||
await client.send('Emulation.setTimezoneOverride', { timezoneId: geoData.timezone });
|
|
||||||
console.log(`[PuppeteerPreflight] Browser timezone set to: ${geoData.timezone}`);
|
|
||||||
} catch (tzErr: any) {
|
|
||||||
console.log(`[PuppeteerPreflight] Failed to set browser timezone: ${tzErr.message}`);
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
console.log(`[PuppeteerPreflight] WARNING: Could not determine timezone from IP - timezone mismatch possible`);
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
console.log(`[PuppeteerPreflight] IP lookup failed: ${ipApiResponse.error || 'unknown error'}`);
|
|
||||||
}
|
|
||||||
} catch (ipErr: any) {
|
|
||||||
console.log(`[PuppeteerPreflight] IP API error: ${ipErr.message}`);
|
|
||||||
}
|
|
||||||
|
|
||||||
// =========================================================================
|
|
||||||
// STEP 1b: Visit fingerprint.com demo to verify anti-detect
|
|
||||||
// =========================================================================
|
// =========================================================================
|
||||||
console.log(`[PuppeteerPreflight] Testing anti-detect at ${FINGERPRINT_DEMO_URL}...`);
|
console.log(`[PuppeteerPreflight] Testing anti-detect at ${FINGERPRINT_DEMO_URL}...`);
|
||||||
|
|
||||||
@@ -279,8 +199,6 @@ export async function runPuppeteerPreflight(
|
|||||||
// Don't fail - residential proxies often show different egress IPs
|
// Don't fail - residential proxies often show different egress IPs
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Note: Timezone already set earlier via ipify.org IP lookup
|
|
||||||
}
|
}
|
||||||
|
|
||||||
if (fingerprintData.visitorId) {
|
if (fingerprintData.visitorId) {
|
||||||
|
|||||||
@@ -435,47 +435,29 @@ export class TaskWorker {
|
|||||||
|
|
||||||
/**
|
/**
|
||||||
* Report preflight status to worker_registry
|
* Report preflight status to worker_registry
|
||||||
* Function signature: update_worker_preflight(worker_id, transport, status, ip, response_ms, error, fingerprint)
|
|
||||||
*/
|
*/
|
||||||
private async reportPreflightStatus(): Promise<void> {
|
private async reportPreflightStatus(): Promise<void> {
|
||||||
try {
|
try {
|
||||||
// Update worker_registry directly via SQL (more reliable than API)
|
// Update worker_registry directly via SQL (more reliable than API)
|
||||||
// CURL preflight - includes IP address
|
|
||||||
await this.pool.query(`
|
await this.pool.query(`
|
||||||
SELECT update_worker_preflight($1, 'curl', $2, $3, $4, $5, $6)
|
SELECT update_worker_preflight($1, 'curl', $2, $3, $4)
|
||||||
`, [
|
`, [
|
||||||
this.workerId,
|
this.workerId,
|
||||||
this.preflightCurlPassed ? 'passed' : 'failed',
|
this.preflightCurlPassed ? 'passed' : 'failed',
|
||||||
this.preflightCurlResult?.proxyIp || null,
|
|
||||||
this.preflightCurlResult?.responseTimeMs || null,
|
this.preflightCurlResult?.responseTimeMs || null,
|
||||||
this.preflightCurlResult?.error || null,
|
this.preflightCurlResult?.error || null,
|
||||||
null, // No fingerprint for curl
|
|
||||||
]);
|
]);
|
||||||
|
|
||||||
// HTTP preflight - includes IP, fingerprint, and timezone data
|
|
||||||
const httpFingerprint = this.preflightHttpResult ? {
|
|
||||||
...this.preflightHttpResult.fingerprint,
|
|
||||||
detectedTimezone: (this.preflightHttpResult as any).detectedTimezone,
|
|
||||||
detectedLocation: (this.preflightHttpResult as any).detectedLocation,
|
|
||||||
productsReturned: this.preflightHttpResult.productsReturned,
|
|
||||||
botDetection: (this.preflightHttpResult as any).botDetection,
|
|
||||||
} : null;
|
|
||||||
|
|
||||||
await this.pool.query(`
|
await this.pool.query(`
|
||||||
SELECT update_worker_preflight($1, 'http', $2, $3, $4, $5, $6)
|
SELECT update_worker_preflight($1, 'http', $2, $3, $4)
|
||||||
`, [
|
`, [
|
||||||
this.workerId,
|
this.workerId,
|
||||||
this.preflightHttpPassed ? 'passed' : 'failed',
|
this.preflightHttpPassed ? 'passed' : 'failed',
|
||||||
this.preflightHttpResult?.proxyIp || null,
|
|
||||||
this.preflightHttpResult?.responseTimeMs || null,
|
this.preflightHttpResult?.responseTimeMs || null,
|
||||||
this.preflightHttpResult?.error || null,
|
this.preflightHttpResult?.error || null,
|
||||||
httpFingerprint ? JSON.stringify(httpFingerprint) : null,
|
|
||||||
]);
|
]);
|
||||||
|
|
||||||
console.log(`[TaskWorker] Preflight status reported to worker_registry`);
|
console.log(`[TaskWorker] Preflight status reported to worker_registry`);
|
||||||
if (this.preflightHttpResult?.proxyIp) {
|
|
||||||
console.log(`[TaskWorker] HTTP IP: ${this.preflightHttpResult.proxyIp}, Timezone: ${(this.preflightHttpResult as any).detectedTimezone || 'unknown'}`);
|
|
||||||
}
|
|
||||||
} catch (err: any) {
|
} catch (err: any) {
|
||||||
// Non-fatal - worker can still function
|
// Non-fatal - worker can still function
|
||||||
console.warn(`[TaskWorker] Could not report preflight status: ${err.message}`);
|
console.warn(`[TaskWorker] Could not report preflight status: ${err.message}`);
|
||||||
|
|||||||
@@ -14,11 +14,27 @@ export function Settings() {
|
|||||||
loadSettings();
|
loadSettings();
|
||||||
}, []);
|
}, []);
|
||||||
|
|
||||||
|
// AI-related settings are managed in /ai-settings, filter them out here
|
||||||
|
const AI_SETTING_KEYS = [
|
||||||
|
'ai_model',
|
||||||
|
'ai_provider',
|
||||||
|
'anthropic_api_key',
|
||||||
|
'openai_api_key',
|
||||||
|
'anthropic_model',
|
||||||
|
'openai_model',
|
||||||
|
'anthropic_enabled',
|
||||||
|
'openai_enabled',
|
||||||
|
];
|
||||||
|
|
||||||
const loadSettings = async () => {
|
const loadSettings = async () => {
|
||||||
setLoading(true);
|
setLoading(true);
|
||||||
try {
|
try {
|
||||||
const data = await api.getSettings();
|
const data = await api.getSettings();
|
||||||
setSettings(data.settings);
|
// Filter out AI settings - those are managed in /ai-settings
|
||||||
|
const filteredSettings = (data.settings || []).filter(
|
||||||
|
(s: any) => !AI_SETTING_KEYS.includes(s.key)
|
||||||
|
);
|
||||||
|
setSettings(filteredSettings);
|
||||||
} catch (error) {
|
} catch (error) {
|
||||||
console.error('Failed to load settings:', error);
|
console.error('Failed to load settings:', error);
|
||||||
} finally {
|
} finally {
|
||||||
|
|||||||
Reference in New Issue
Block a user