Compare commits

...

10 Commits

Author SHA1 Message Date
Kelly
3488905ccc fix: Delete completed tasks from pool instead of marking complete
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Completed tasks are now deleted from worker_tasks table.
Only failed tasks remain in the pool for retry/review.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-16 21:19:20 -07:00
Kelly
3ee09fbe84 feat: Treez SSR support, task improvements, worker geo display
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- Add SSR config extraction for Treez sites (BEST Dispensary)
- Increase MAX_RETRIES from 3 to 5 for task failures
- Update task list ordering: active > pending > failed > completed
- Show detected proxy location in worker dashboard (from fingerprint)
- Hardcode 'dutchie' menu_type in promotion.ts (remove deriveMenuType)
- Update provider display to show actual provider names

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-16 19:22:04 -07:00
Kelly
7d65e0ae59 fix: Use cannaiq namespace for deployments
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- Revert namespace from dispensary-scraper to cannaiq
- Keep registry.spdy.io for image URLs (k8s nodes need HTTPS)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-16 16:37:50 -07:00
Kelly
25f9118662 fix: Use registry.spdy.io for k8s deployments
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Update kubectl set image commands to use HTTPS registry URL
- Fix namespace from cannaiq to dispensary-scraper
- Add guidance on when to use which registry URL

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-16 12:37:11 -07:00
Kelly
5c0de752af fix: Check inventory_snapshots for product_discovery output verification
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
raw_crawl_payloads only saved during baseline window (12:01-3:00 AM),
but inventory_snapshots are always saved. This caused product_discovery
tasks to fail verification outside the baseline window.
2025-12-16 10:20:48 -07:00
Kelly
a90b10a1f7 feat(k8s): Add daily registry sync cronjob for base images
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
2025-12-16 09:49:36 -07:00
Kelly
75822ab67d docs: Add Docker registry cache instructions
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
2025-12-16 09:34:55 -07:00
Kelly
df4d599478 chore: test CI after fixes
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
2025-12-16 09:22:53 -07:00
Kelly
4544718cad chore: trigger CI after DNS fix
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-16 09:21:13 -07:00
Kelly
47da61ed71 chore: trigger CI rebuild
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2025-12-16 09:19:58 -07:00
9 changed files with 360 additions and 102 deletions

View File

@@ -76,15 +76,13 @@ steps:
- /kaniko/executor
--context=/woodpecker/src/git.spdy.io/Creationshop/cannaiq/backend
--dockerfile=/woodpecker/src/git.spdy.io/Creationshop/cannaiq/backend/Dockerfile
--destination=10.100.9.70:5000/cannaiq/backend:latest
--destination=10.100.9.70:5000/cannaiq/backend:sha-${CI_COMMIT_SHA:0:8}
--destination=registry.spdy.io/cannaiq/backend:latest
--destination=registry.spdy.io/cannaiq/backend:sha-${CI_COMMIT_SHA:0:8}
--build-arg=APP_BUILD_VERSION=sha-${CI_COMMIT_SHA:0:8}
--build-arg=APP_GIT_SHA=${CI_COMMIT_SHA}
--build-arg=APP_BUILD_TIME=${CI_PIPELINE_CREATED}
--registry-mirror=10.100.9.70:5000
--insecure-registry=10.100.9.70:5000
--cache=true
--cache-repo=10.100.9.70:5000/cannaiq/cache-backend
--cache-repo=registry.spdy.io/cannaiq/cache-backend
--cache-ttl=168h
depends_on: []
when:
@@ -97,12 +95,10 @@ steps:
- /kaniko/executor
--context=/woodpecker/src/git.spdy.io/Creationshop/cannaiq/cannaiq
--dockerfile=/woodpecker/src/git.spdy.io/Creationshop/cannaiq/cannaiq/Dockerfile
--destination=10.100.9.70:5000/cannaiq/frontend:latest
--destination=10.100.9.70:5000/cannaiq/frontend:sha-${CI_COMMIT_SHA:0:8}
--registry-mirror=10.100.9.70:5000
--insecure-registry=10.100.9.70:5000
--destination=registry.spdy.io/cannaiq/frontend:latest
--destination=registry.spdy.io/cannaiq/frontend:sha-${CI_COMMIT_SHA:0:8}
--cache=true
--cache-repo=10.100.9.70:5000/cannaiq/cache-cannaiq
--cache-repo=registry.spdy.io/cannaiq/cache-cannaiq
--cache-ttl=168h
depends_on: []
when:
@@ -115,12 +111,10 @@ steps:
- /kaniko/executor
--context=/woodpecker/src/git.spdy.io/Creationshop/cannaiq/findadispo/frontend
--dockerfile=/woodpecker/src/git.spdy.io/Creationshop/cannaiq/findadispo/frontend/Dockerfile
--destination=10.100.9.70:5000/cannaiq/findadispo:latest
--destination=10.100.9.70:5000/cannaiq/findadispo:sha-${CI_COMMIT_SHA:0:8}
--registry-mirror=10.100.9.70:5000
--insecure-registry=10.100.9.70:5000
--destination=registry.spdy.io/cannaiq/findadispo:latest
--destination=registry.spdy.io/cannaiq/findadispo:sha-${CI_COMMIT_SHA:0:8}
--cache=true
--cache-repo=10.100.9.70:5000/cannaiq/cache-findadispo
--cache-repo=registry.spdy.io/cannaiq/cache-findadispo
--cache-ttl=168h
depends_on: []
when:
@@ -133,12 +127,10 @@ steps:
- /kaniko/executor
--context=/woodpecker/src/git.spdy.io/Creationshop/cannaiq/findagram/frontend
--dockerfile=/woodpecker/src/git.spdy.io/Creationshop/cannaiq/findagram/frontend/Dockerfile
--destination=10.100.9.70:5000/cannaiq/findagram:latest
--destination=10.100.9.70:5000/cannaiq/findagram:sha-${CI_COMMIT_SHA:0:8}
--registry-mirror=10.100.9.70:5000
--insecure-registry=10.100.9.70:5000
--destination=registry.spdy.io/cannaiq/findagram:latest
--destination=registry.spdy.io/cannaiq/findagram:sha-${CI_COMMIT_SHA:0:8}
--cache=true
--cache-repo=10.100.9.70:5000/cannaiq/cache-findagram
--cache-repo=registry.spdy.io/cannaiq/cache-findagram
--cache-ttl=168h
depends_on: []
when:
@@ -177,13 +169,13 @@ steps:
token: $K8S_TOKEN
KUBEEOF
- chmod 600 ~/.kube/config
- kubectl set image deployment/scraper scraper=10.100.9.70:5000/cannaiq/backend:sha-${CI_COMMIT_SHA:0:8} -n cannaiq
- kubectl set image deployment/scraper scraper=registry.spdy.io/cannaiq/backend:sha-${CI_COMMIT_SHA:0:8} -n cannaiq
- kubectl rollout status deployment/scraper -n cannaiq --timeout=300s
- REPLICAS=$(kubectl get deployment scraper-worker -n cannaiq -o jsonpath='{.spec.replicas}'); if [ "$REPLICAS" = "0" ]; then kubectl scale deployment/scraper-worker --replicas=5 -n cannaiq; fi
- kubectl set image deployment/scraper-worker worker=10.100.9.70:5000/cannaiq/backend:sha-${CI_COMMIT_SHA:0:8} -n cannaiq
- kubectl set image deployment/cannaiq-frontend cannaiq-frontend=10.100.9.70:5000/cannaiq/frontend:sha-${CI_COMMIT_SHA:0:8} -n cannaiq
- kubectl set image deployment/findadispo-frontend findadispo-frontend=10.100.9.70:5000/cannaiq/findadispo:sha-${CI_COMMIT_SHA:0:8} -n cannaiq
- kubectl set image deployment/findagram-frontend findagram-frontend=10.100.9.70:5000/cannaiq/findagram:sha-${CI_COMMIT_SHA:0:8} -n cannaiq
- kubectl set image deployment/scraper-worker worker=registry.spdy.io/cannaiq/backend:sha-${CI_COMMIT_SHA:0:8} -n cannaiq
- kubectl set image deployment/cannaiq-frontend cannaiq-frontend=registry.spdy.io/cannaiq/frontend:sha-${CI_COMMIT_SHA:0:8} -n cannaiq
- kubectl set image deployment/findadispo-frontend findadispo-frontend=registry.spdy.io/cannaiq/findadispo:sha-${CI_COMMIT_SHA:0:8} -n cannaiq
- kubectl set image deployment/findagram-frontend findagram-frontend=registry.spdy.io/cannaiq/findagram:sha-${CI_COMMIT_SHA:0:8} -n cannaiq
- kubectl rollout status deployment/cannaiq-frontend -n cannaiq --timeout=120s
depends_on:
- docker-backend

View File

@@ -151,18 +151,6 @@ function generateSlug(name: string, city: string, state: string): string {
return base;
}
/**
* Derive menu_type from platform_menu_url pattern
*/
function deriveMenuType(url: string | null): string {
if (!url) return 'unknown';
if (url.includes('/dispensary/')) return 'standalone';
if (url.includes('/embedded-menu/')) return 'embedded';
if (url.includes('/stores/')) return 'standalone';
// Custom domain = embedded widget on store's site
if (!url.includes('dutchie.com')) return 'embedded';
return 'unknown';
}
/**
* Log a promotion action to dutchie_promotion_log
@@ -415,7 +403,7 @@ async function promoteLocation(
loc.timezone, // $15 timezone
loc.platform_location_id, // $16 platform_dispensary_id
loc.platform_menu_url, // $17 menu_url
deriveMenuType(loc.platform_menu_url), // $18 menu_type
'dutchie', // $18 menu_type
loc.description, // $19 description
loc.logo_image, // $20 logo_image
loc.banner_image, // $21 banner_image

View File

@@ -289,6 +289,102 @@ export function getStoreConfig(): TreezStoreConfig | null {
return currentStoreConfig;
}
/**
* Extract store config from page HTML for SSR sites.
*
* SSR sites (like BEST Dispensary) pre-render data and don't make client-side
* API requests. The config is embedded in __NEXT_DATA__ or window variables.
*
* Looks for:
* - __NEXT_DATA__.props.pageProps.msoStoreConfig.orgId / entityId
* - window.__SETTINGS__.msoOrgId / msoStoreEntityId
* - treezStores config in page data
*/
async function extractConfigFromPage(page: Page): Promise<TreezStoreConfig | null> {
console.log('[Treez Client] Attempting to extract config from page HTML (SSR fallback)...');
const config = await page.evaluate(() => {
// Try __NEXT_DATA__ first (Next.js SSR)
const nextDataEl = document.getElementById('__NEXT_DATA__');
if (nextDataEl) {
try {
const nextData = JSON.parse(nextDataEl.textContent || '{}');
const pageProps = nextData?.props?.pageProps;
// Look for MSO config in various locations
const msoConfig = pageProps?.msoStoreConfig || pageProps?.storeConfig || {};
const settings = pageProps?.settings || {};
// Extract org-id and entity-id
let orgId = msoConfig.orgId || msoConfig.msoOrgId || settings.msoOrgId;
let entityId = msoConfig.entityId || msoConfig.msoStoreEntityId || settings.msoStoreEntityId;
// Also check treezStores array
if (!orgId || !entityId) {
const treezStores = pageProps?.treezStores || nextData?.props?.treezStores;
if (treezStores && Array.isArray(treezStores) && treezStores.length > 0) {
const store = treezStores[0];
orgId = orgId || store.orgId || store.organization_id;
entityId = entityId || store.entityId || store.entity_id || store.storeId;
}
}
// Check for API settings
const apiSettings = pageProps?.apiSettings || settings.api || {};
if (orgId && entityId) {
return {
orgId,
entityId,
esUrl: apiSettings.esUrl || null,
apiKey: apiSettings.apiKey || null,
};
}
} catch (e) {
console.error('Error parsing __NEXT_DATA__:', e);
}
}
// Try window variables
const win = window as any;
if (win.__SETTINGS__) {
const s = win.__SETTINGS__;
if (s.msoOrgId && s.msoStoreEntityId) {
return {
orgId: s.msoOrgId,
entityId: s.msoStoreEntityId,
esUrl: s.esUrl || null,
apiKey: s.apiKey || null,
};
}
}
return null;
});
if (!config || !config.orgId || !config.entityId) {
console.log('[Treez Client] Could not extract config from page');
return null;
}
// Build full config with defaults for missing values
const fullConfig: TreezStoreConfig = {
orgId: config.orgId,
entityId: config.entityId,
// Default ES URL pattern - gapcommerce is the common tenant
esUrl: config.esUrl || 'https://search-gapcommerce.gapcommerceapi.com/product/search',
// Use default API key from config
apiKey: config.apiKey || TREEZ_CONFIG.esApiKey,
};
console.log('[Treez Client] Extracted config from page (SSR):');
console.log(` ES URL: ${fullConfig.esUrl}`);
console.log(` Org ID: ${fullConfig.orgId}`);
console.log(` Entity ID: ${fullConfig.entityId}`);
return fullConfig;
}
// ============================================================
// PRODUCT FETCHING (Direct API Approach)
// ============================================================
@@ -343,9 +439,15 @@ export async function fetchAllProducts(
// Wait for initial page load to trigger first API request
await sleep(3000);
// Check if we captured the store config
// Check if we captured the store config from network requests
if (!currentStoreConfig) {
console.error('[Treez Client] Failed to capture store config from browser requests');
console.log('[Treez Client] No API requests captured - trying SSR fallback...');
// For SSR sites, extract config from page HTML
currentStoreConfig = await extractConfigFromPage(page);
}
if (!currentStoreConfig) {
console.error('[Treez Client] Failed to capture store config from browser requests or page HTML');
throw new Error('Failed to capture Treez store config');
}

View File

@@ -261,28 +261,24 @@ class TaskService {
}
/**
* Mark a task as completed with verification
* Returns true if completion was verified in DB, false otherwise
* Mark a task as completed and remove from pool
* Completed tasks are deleted - only failed tasks stay in the pool for retry/review
* Returns true if task was successfully deleted
*/
async completeTask(taskId: number, result?: Record<string, unknown>): Promise<boolean> {
await pool.query(
`UPDATE worker_tasks
SET status = 'completed', completed_at = NOW(), result = $2
WHERE id = $1`,
[taskId, result ? JSON.stringify(result) : null]
);
// Verify completion was recorded
const verify = await pool.query(
`SELECT status FROM worker_tasks WHERE id = $1`,
// Delete the completed task from the pool
// Only failed tasks stay in the table for retry/review
const deleteResult = await pool.query(
`DELETE FROM worker_tasks WHERE id = $1 RETURNING id`,
[taskId]
);
if (verify.rows[0]?.status !== 'completed') {
console.error(`[TaskService] Task ${taskId} completion NOT VERIFIED - DB shows status: ${verify.rows[0]?.status}`);
if (deleteResult.rowCount === 0) {
console.error(`[TaskService] Task ${taskId} completion FAILED - task not found or already deleted`);
return false;
}
console.log(`[TaskService] Task ${taskId} completed and removed from pool`);
return true;
}
@@ -351,7 +347,7 @@ class TaskService {
* Hard failures: Auto-retry up to MAX_RETRIES with exponential backoff
*/
async failTask(taskId: number, errorMessage: string): Promise<boolean> {
const MAX_RETRIES = 3;
const MAX_RETRIES = 5;
const isSoft = this.isSoftFailure(errorMessage);
// Get current retry count
@@ -490,7 +486,15 @@ class TaskService {
${poolJoin}
LEFT JOIN worker_registry w ON w.worker_id = t.worker_id
${whereClause}
ORDER BY t.created_at DESC
ORDER BY
CASE t.status
WHEN 'active' THEN 1
WHEN 'pending' THEN 2
WHEN 'failed' THEN 3
WHEN 'completed' THEN 4
ELSE 5
END,
t.created_at DESC
LIMIT ${limit} OFFSET ${offset}`,
params
);
@@ -1001,9 +1005,31 @@ class TaskService {
const claimedAt = task.claimed_at || task.created_at;
switch (task.role) {
case 'product_refresh':
case 'product_discovery': {
// Verify payload was saved to raw_crawl_payloads after task was claimed
// For product_discovery, verify inventory snapshots were saved (always happens)
// Note: raw_crawl_payloads only saved during baseline window, so check snapshots instead
const snapshotResult = await pool.query(
`SELECT COUNT(*)::int as count
FROM inventory_snapshots
WHERE dispensary_id = $1
AND captured_at > $2`,
[task.dispensary_id, claimedAt]
);
const snapshotCount = snapshotResult.rows[0]?.count || 0;
if (snapshotCount === 0) {
return {
verified: false,
reason: `No inventory snapshots found for dispensary ${task.dispensary_id} after ${claimedAt}`
};
}
return { verified: true };
}
case 'product_refresh': {
// For product_refresh, verify payload was saved to raw_crawl_payloads
const payloadResult = await pool.query(
`SELECT id, product_count, fetched_at
FROM raw_crawl_payloads

View File

@@ -1,29 +1,36 @@
/**
* Provider Display Names
*
* Maps internal provider identifiers to safe display labels.
* Internal identifiers (menu_type, product_provider, crawler_type) remain unchanged.
* Only the display label shown to users is transformed.
* Maps internal menu_type values to display labels.
* - standalone/embedded → dutchie (both are Dutchie platform)
* - treez → treez
* - jane/iheartjane → jane
*/
export const ProviderDisplayNames: Record<string, string> = {
// All menu providers map to anonymous "Menu Feed" label
dutchie: 'Menu Feed',
treez: 'Menu Feed',
jane: 'Menu Feed',
iheartjane: 'Menu Feed',
blaze: 'Menu Feed',
flowhub: 'Menu Feed',
weedmaps: 'Menu Feed',
leafly: 'Menu Feed',
leaflogix: 'Menu Feed',
tymber: 'Menu Feed',
dispense: 'Menu Feed',
// Dutchie (standalone and embedded are both Dutchie)
dutchie: 'dutchie',
standalone: 'dutchie',
embedded: 'dutchie',
// Other platforms
treez: 'treez',
jane: 'jane',
iheartjane: 'jane',
// Future platforms
blaze: 'blaze',
flowhub: 'flowhub',
weedmaps: 'weedmaps',
leafly: 'leafly',
leaflogix: 'leaflogix',
tymber: 'tymber',
dispense: 'dispense',
// Catch-all
unknown: 'Menu Feed',
default: 'Menu Feed',
'': 'Menu Feed',
unknown: 'unknown',
default: 'unknown',
'': 'unknown',
};
/**

View File

@@ -1,32 +1,36 @@
/**
* Provider Display Names
*
* Maps internal provider identifiers to safe display labels.
* Internal identifiers (menu_type, product_provider, crawler_type) remain unchanged.
* Only the display label shown to users is transformed.
*
* IMPORTANT: Raw provider names (dutchie, treez, jane, etc.) must NEVER
* be displayed directly in the UI. Always use this utility.
* Maps internal menu_type values to display labels.
* - standalone/embedded → Dutchie (both are Dutchie platform)
* - treez → Treez
* - jane/iheartjane → Jane
*/
export const ProviderDisplayNames: Record<string, string> = {
// All menu providers map to anonymous "Menu Feed" label
dutchie: 'Menu Feed',
treez: 'Menu Feed',
jane: 'Menu Feed',
iheartjane: 'Menu Feed',
blaze: 'Menu Feed',
flowhub: 'Menu Feed',
weedmaps: 'Menu Feed',
leafly: 'Menu Feed',
leaflogix: 'Menu Feed',
tymber: 'Menu Feed',
dispense: 'Menu Feed',
// Dutchie (standalone and embedded are both Dutchie)
dutchie: 'dutchie',
standalone: 'dutchie',
embedded: 'dutchie',
// Other platforms
treez: 'treez',
jane: 'jane',
iheartjane: 'jane',
// Future platforms
blaze: 'blaze',
flowhub: 'flowhub',
weedmaps: 'weedmaps',
leafly: 'leafly',
leaflogix: 'leaflogix',
tymber: 'tymber',
dispense: 'dispense',
// Catch-all
unknown: 'Menu Feed',
default: 'Menu Feed',
'': 'Menu Feed',
unknown: 'unknown',
default: 'unknown',
'': 'unknown',
};
/**

View File

@@ -383,9 +383,10 @@ function PreflightSummary({ worker, poolOpen = true }: { worker: Worker; poolOpe
const fingerprint = worker.fingerprint_data;
const httpError = worker.preflight_http_error;
const httpMs = worker.preflight_http_ms;
// Geo from current_city/state columns, or fallback to fingerprint detected location
const geoState = worker.current_state || fingerprint?.detectedLocation?.region;
const geoCity = worker.current_city || fingerprint?.detectedLocation?.city;
// Show DETECTED proxy location (from fingerprint), not assigned state
// This lets us verify the proxy is geo-targeted correctly
const geoState = fingerprint?.detectedLocation?.region || worker.current_state;
const geoCity = fingerprint?.detectedLocation?.city || worker.current_city;
// Worker is ONLY qualified if http preflight passed AND has geo assigned
const hasGeo = Boolean(geoState);
const isQualified = (worker.is_qualified || httpStatus === 'passed') && hasGeo;
@@ -702,8 +703,9 @@ function WorkerSlot({
const httpIp = worker?.http_ip;
const fingerprint = worker?.fingerprint_data;
const geoState = worker?.current_state || (fingerprint as any)?.detectedLocation?.region;
const geoCity = worker?.current_city || (fingerprint as any)?.detectedLocation?.city;
// Show DETECTED proxy location (from fingerprint), not assigned state
const geoState = (fingerprint as any)?.detectedLocation?.region || worker?.current_state;
const geoCity = (fingerprint as any)?.detectedLocation?.city || worker?.current_city;
const isQualified = worker?.is_qualified;
// Build fingerprint tooltip

84
docs/DOCKER_REGISTRY.md Normal file
View File

@@ -0,0 +1,84 @@
# Using the Docker Registry Cache
To avoid Docker Hub rate limits, use our registry at `registry.spdy.io` (HTTPS) or `10.100.9.70:5000` (HTTP internal).
## For Woodpecker CI (Kaniko builds)
In your `.woodpecker.yml`, use these Kaniko flags:
```yaml
docker-build:
image: gcr.io/kaniko-project/executor:debug
commands:
- /kaniko/executor
--context=/woodpecker/src/...
--dockerfile=Dockerfile
--destination=10.100.9.70:5000/your-image:tag
--registry-mirror=10.100.9.70:5000
--insecure-registry=10.100.9.70:5000
--cache=true
--cache-repo=10.100.9.70:5000/your-image/cache
--cache-ttl=168h
```
**Key points:**
- `--registry-mirror=10.100.9.70:5000` - Pulls base images from local cache
- `--insecure-registry=10.100.9.70:5000` - Allows HTTP (not HTTPS)
- `--cache=true` + `--cache-repo=...` - Caches build layers locally
## Available Base Images
The local registry has these cached:
| Image | Tags |
|-------|------|
| `node` | `20-slim`, `22-slim`, `22-alpine`, `20-alpine` |
| `alpine` | `latest` |
| `nginx` | `alpine` |
| `bitnami/kubectl` | `latest` |
| `gcr.io/kaniko-project/executor` | `debug` |
Need a different image? Add it to the cache using crane:
```bash
kubectl run cache-image --rm -it --restart=Never \
--image=gcr.io/go-containerregistry/crane:latest \
-- copy docker.io/library/IMAGE:TAG 10.100.9.70:5000/library/IMAGE:TAG --insecure
```
## Which Registry URL to Use
| Context | URL | Why |
|---------|-----|-----|
| Kaniko builds (CI) | `10.100.9.70:5000` | Internal HTTP, faster |
| kubectl set image | `registry.spdy.io` | HTTPS, k8s nodes can pull |
| Checking images | Either works | Same backend |
## DO NOT USE
- ~~`--registry-mirror=mirror.gcr.io`~~ - Rate limited by Docker Hub
- ~~Direct pulls from `docker.io`~~ - Rate limited (100 pulls/6hr anonymous)
- ~~`10.100.9.70:5000` in kubectl commands~~ - k8s nodes require HTTPS
## Checking Cached Images
List all cached images:
```bash
curl -s http://10.100.9.70:5000/v2/_catalog | jq
```
List tags for a specific image:
```bash
curl -s http://10.100.9.70:5000/v2/library/node/tags/list | jq
```
## Troubleshooting
### "no such host" or DNS errors
The CI runner can't reach the registry mirror. Make sure you're using `10.100.9.70:5000`, not `mirror.gcr.io`.
### "manifest unknown"
The image/tag isn't cached. Add it using the crane command above.
### HTTP vs HTTPS errors
Always use `--insecure-registry=10.100.9.70:5000` - the local registry uses HTTP.

View File

@@ -0,0 +1,53 @@
# Daily job to sync base images from Docker Hub to local registry
# Runs at 3 AM daily to refresh the cache before rate limits reset
apiVersion: batch/v1
kind: CronJob
metadata:
name: registry-sync
namespace: woodpecker
spec:
schedule: "0 3 * * *" # 3 AM daily
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: sync
image: gcr.io/go-containerregistry/crane:latest
command:
- /bin/sh
- -c
- |
set -e
echo "=== Registry Sync: $(date) ==="
REGISTRY="10.100.9.70:5000"
# Base images to cache
IMAGES="
library/node:20-slim
library/node:22-slim
library/node:22
library/node:22-alpine
library/node:20-alpine
library/alpine:latest
library/nginx:alpine
bitnami/kubectl:latest
"
for img in $IMAGES; do
echo "Syncing docker.io/$img -> $REGISTRY/$img"
crane copy "docker.io/$img" "$REGISTRY/$img" --insecure || echo "WARN: Failed $img"
done
echo "=== Sync complete ==="
resources:
limits:
memory: "256Mi"
cpu: "200m"
requests:
memory: "128Mi"
cpu: "100m"