Commit Graph

183 Commits

Author SHA1 Message Date
Kelly
d813874b3a feat: Add Claude and OpenAI support for SEO content generation
Configure via env vars:
- AI_PROVIDER=claude|openai (default: claude)
- ANTHROPIC_API_KEY=sk-ant-...
- OPENAI_API_KEY=sk-...

Falls back to template generation if no API key configured.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 22:57:59 -07:00
Kelly
a3b7ae9802 fix: Fix SEO generator - lazy pool init, fallback queries
- Move pool initialization inside functions (lazy loading)
- Fix page_key parsing for state-XX format
- Add fallback query if mv_state_metrics doesn't exist

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 22:56:45 -07:00
Kelly
7a1835778b fix: Add missing SEO pages list and sync endpoints
- GET /api/seo/pages - List all SEO pages with filters
- POST /api/seo/sync-state-pages - Create pages for states with dispensaries

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 22:51:35 -07:00
Kelly
3bc0effa33 feat: Responsive admin UI, SEO pages, and click analytics
## Responsive Admin UI
- Layout.tsx: Mobile sidebar drawer with hamburger menu
- Dashboard.tsx: 2-col grid on mobile, responsive stats cards
- OrchestratorDashboard.tsx: Responsive table with hidden columns
- PagesTab.tsx: Responsive filters and table

## SEO Pages
- New /admin/seo section with state landing pages
- SEO page generation and management
- State page content with dispensary/product counts

## Click Analytics
- Product click tracking infrastructure
- Click analytics dashboard

## Other Changes
- Consumer features scaffolding (alerts, deals, favorites)
- Health panel component
- Workers dashboard improvements
- Legacy DutchieAZ pages removed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 22:48:21 -07:00
Kelly
38d3ea1408 chore: Remove build artifacts from tracking
These files are now in .gitignore and should not be committed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 22:45:23 -07:00
kelly
414b97b3c0 Merge pull request 'feature/workers-dashboard' (#1) from feature/workers-dashboard into master
Reviewed-on: https://code.cannabrands.app/Creationshop/dispensary-scraper/pulls/1
2025-12-08 02:54:26 +00:00
Kelly
05c55809b6 fix: Regenerate findagram package-lock.json for npm ci compatibility
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 19:16:27 -07:00
Kelly
705bb57a23 fix: Remove unused imports and fix ESLint errors in findadispo
- Remove unused MapPin import from Dashboard.jsx
- Remove unused Clock import from DashboardHome.jsx
- Remove unused Mail import from Profile.jsx
- Remove unused CardHeader, CardTitle imports from SavedSearches.jsx
- Add eslint-disable for heading-has-content in card.jsx (shadcn pattern)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 19:02:40 -07:00
Kelly
17ca0bd3ee fix: Sync findadispo package-lock.json with package.json 2025-12-07 15:31:37 -07:00
Kelly
112e127b5d ci: trigger build 2025-12-07 15:20:22 -07:00
Kelly
c5a8ef84bf chore: Add package-lock.json for findadispo 2025-12-07 14:11:42 -07:00
Kelly
dd299e0d4c fix: Remove broken llm-scraper submodule reference 2025-12-07 13:53:21 -07:00
Kelly
a91565ca5a ci: Use woodpeckerci/plugin-docker-buildx and fix secrets syntax 2025-12-07 13:47:36 -07:00
Kelly
9e30e806f9 ci: Rename to .ci.yml to match woodpecker convention 2025-12-07 13:38:17 -07:00
Kelly
c779e6919f ci: Fix woodpecker syntax - use steps instead of pipeline 2025-12-07 13:21:00 -07:00
Kelly
566872eae8 ci: trigger pipeline test 2025-12-07 13:19:14 -07:00
Kelly
2d82cf9323 ci: Move pipeline to .woodpecker/ci.yml 2025-12-07 13:06:23 -07:00
Kelly
861201290a ci: Switch to Woodpecker CI pipeline
Replaces Gitea Actions with Woodpecker CI config.

Pipeline:
- CI: typecheck backend, build all 3 frontends (all branches)
- CD: build 4 Docker images, deploy to k8s (master only)

Required secrets in Woodpecker:
- registry_username
- registry_password
- kubeconfig_data (base64 encoded)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 12:18:13 -07:00
Kelly
84cdc1c12c chore: trigger CI test 2025-12-07 12:14:26 -07:00
Kelly
c6ab066d25 ci: Add Gitea Actions CI/CD pipeline
- ci.yml: Runs on all branches - typecheck backend, build all 3 frontends
- deploy.yml: Runs on master only after CI passes
  - Builds and pushes 4 Docker images to Gitea registry
  - Deploys to Kubernetes (scraper, scraper-worker, 3 frontends)

Required secrets:
- REGISTRY_USERNAME: Gitea username
- REGISTRY_PASSWORD: Gitea password/token
- KUBECONFIG: Base64-encoded kubeconfig

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 12:10:19 -07:00
Kelly
b4a2fb7d03 feat: Add v2 architecture with multi-state support and orchestrator services
Major additions:
- Multi-state expansion: states table, StateSelector, NationalDashboard, StateHeatmap, CrossStateCompare
- Orchestrator services: trace service, error taxonomy, retry manager, proxy rotator
- Discovery system: dutchie discovery service, geo validation, city seeding scripts
- Analytics infrastructure: analytics v2 routes, brand/pricing/stores intelligence pages
- Local development: setup-local.sh starts all 5 services (postgres, backend, cannaiq, findadispo, findagram)
- Migrations 037-056: crawler profiles, states, analytics indexes, worker metadata

Frontend pages added:
- Discovery, ChainsDashboard, IntelligenceBrands, IntelligencePricing, IntelligenceStores
- StateHeatmap, CrossStateCompare, SyncInfoPanel

Components added:
- StateSelector, OrchestratorTraceModal, WorkflowStepper

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 11:30:57 -07:00
Kelly
8ac64ba077 feat(cannaiq): Add Workers Dashboard and visibility tracking
Workers Dashboard:
- New /workers route with two-pane layout
- Workers table showing Alice, Henry, Bella, Oscar with role badges
- Run history with visibility stats (lost/restored counts)
- "Run Now" action to trigger workers immediately

Migrations:
- 057: Add visibility tracking columns (visibility_lost, visibility_lost_at, visibility_restored_at)
- 058: Add ID resolution columns for Henry worker
- 059: Add job queue columns (max_retries, retry_count, worker_id, locked_at, locked_by)

Backend fixes:
- Add httpStatus to CrawlResult interface for error classification
- Fix pool.ts typing for event listener
- Update completeJob to accept visibility stats in metadata

Frontend fixes:
- Fix NationalDashboard crash with safe formatMoney helper
- Fix OrchestratorDashboard/Stores StoreInfo type mismatches
- Add workerName/workerRole to getDutchieAZSchedules API type

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 11:04:12 -07:00
Kelly
1d1263afc6 docs: Add mandatory local mode checklist for crawls and tests
CLAUDE.md now requires explicit local mode confirmation before
running any crawler, orchestrator, sandbox test, or image scrape.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-06 12:37:48 -07:00
Kelly
e63329457c feat: Add local storage adapter and update CLAUDE.md with permanent rules
- Add local-storage.ts with smart folder structure:
  /storage/products/{brand}/{state}/{product_id}/
- Add storage-adapter.ts unified abstraction
- Add docker-compose.local.yml (NO MinIO)
- Add start-local.sh convenience script
- Update CLAUDE.md with:
  - PERMANENT RULES section (no data deletion)
  - DEPLOYMENT AUTHORIZATION requirements
  - LOCAL DEVELOPMENT defaults
  - STORAGE BEHAVIOR documentation
  - FORBIDDEN ACTIONS list
  - UI ANONYMIZATION rules

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-06 12:36:10 -07:00
Kelly
a0f8d3911c feat: Add Findagram and FindADispo consumer frontends
- Add findagram.co React frontend with product search, brands, categories
- Add findadispo.com React frontend with dispensary locator
- Wire findagram to backend /api/az/* endpoints
- Update category/brand links to route to /products with filters
- Add k8s manifests for both frontends
- Add multi-domain user support migrations

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 16:10:15 -07:00
Kelly
d120a07ed7 fix(cannaiq): Fix hardcoded localhost URLs in product image paths
- Add .dockerignore to exclude .env.local from Docker builds
- Replace http://localhost:9020/dutchie/ with /api/images/dutchie/ in:
  - StoreDetail.tsx
  - ProductDetail.tsx
  - StoreView.tsx

This fixes production builds connecting to localhost for images.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 15:32:27 -07:00
Kelly
85e69ef6ad feat(api): Add API key scoping for /api/v1 endpoints
- Add key_type column to wp_dutchie_api_permissions (internal/wordpress)
- Create apiScope middleware with scope types and helpers
- Internal keys: full access to ALL dispensaries
- WordPress keys: restricted to single dispensary
- Update all /api/v1 handlers to honor scope
- Add /dispensaries and /search endpoints to public API

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 06:13:20 -07:00
Kelly
d91c55a344 feat: Add stale process monitor, users route, landing page, archive old scripts
- Add backend stale process monitoring API (/api/stale-processes)
- Add users management route
- Add frontend landing page and stale process monitor UI on /scraper-tools
- Move old development scripts to backend/archive/
- Update frontend build with new features

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 04:07:31 -07:00
Kelly
d2d44d2aeb Improve ScraperMonitor tab loading efficiency and add New/Updated columns
- Tab-specific data loading: Only fetch APIs needed for the active tab
- AZ Live tab fetches only AZ monitor APIs
- Dispensary Jobs tab fetches only legacy job APIs
- Crawl History tab fetches only scraper history APIs
- Auto-refresh now respects active tab
- Added New and Updated columns to Crawl History table with color coding

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 18:26:56 -07:00
Kelly
b082b2cf05 Add curaleaf/sol dutchie detection, update batch crawl script with all 57 store IDs
- Add curaleaf.com and livewithsol.com to dutchie detection patterns
- Update crawl-five-sequential.ts with all 57 dutchie store IDs for batch crawling

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 01:00:56 -07:00
Kelly
ff1510f475 feat(api): add bulk crawl endpoints for all Dutchie stores
- GET /api/az/admin/dutchie-stores - Lists all Dutchie stores with crawl status
- POST /api/az/admin/crawl-all - Enqueues product crawl jobs for all ready stores

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 00:07:20 -07:00
Kelly
1083b51e6d feat(detection): extract reactEnv dispensaryId and prefer Dutchie on page 2025-12-03 22:31:40 -07:00
Kelly
129d318314 fix(detection): crawl websites to find Dutchie menus and retry missing platform IDs 2025-12-03 22:16:31 -07:00
Kelly
33e12de3f2 Remove domain-based shortcuts for Curaleaf/Sol detection
Menu detection now always crawls websites to find actual embedded menu
providers instead of marking stores as proprietary based on domain alone.
This fixes detection for stores like Curaleaf that may use Dutchie embeds.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 21:55:31 -07:00
Kelly
95fc8bb4cc Improve menu detection to extract platform ID from URL and crawl proprietary domains
- Add extractFromMenuUrl() to discovery.ts that extracts either cName or platformId directly
  from Dutchie URLs (handles /api/v2/embedded-menu/<id>.js pattern)
- Add isObjectId() helper to identify MongoDB ObjectIds in URLs
- Update menu-detection.ts to skip GraphQL resolution when URL contains platformId directly
- For proprietary domains (curaleaf, sol), crawl website to find actual menu provider
  instead of blindly marking as not_crawlable
- If website crawl finds Dutchie embedded menu, set menu_type='dutchie' and resolve platform ID
- Tested successfully with consumeaz.com which discovers Dutchie embedded menu JS URL

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 21:41:45 -07:00
Kelly
202a3b92bf Add dbaName to Dispensary type to fix TypeScript build error
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 21:12:05 -07:00
Kelly
05f4de86d3 Fix detection query to include dutchie stores missing platform_id
When onlyUnknown=true and onlyMissingPlatformId=true, the query now
uses OR logic instead of AND to include:
- Stores with unknown menu_type (new stores needing detection)
- Stores with menu_type='dutchie' but no platform_dispensary_id

This allows re-detection to:
1. Resolve platform IDs for actual Dutchie stores
2. Reclassify stores that migrated away from Dutchie (e.g. to Curaleaf)

Also changed default for onlyMissingPlatformId to true so scheduled
detection jobs always attempt to resolve missing platform IDs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 21:09:37 -07:00
Kelly
7e97b8ce81 build: update frontend dist for CI/CD deployment
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 20:59:27 -07:00
Kelly
98f8e5e28d feat(api): include dba_name in AZ stores API response
Add dba_name to DISPENSARY_COLUMNS for the /api/az/stores list endpoint
and getDispensaryById for the single store endpoint. This allows the
frontend to display DBA names (trade names) when available, falling
back to the legal entity name.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 20:41:29 -07:00
Kelly
bd65674f3a fix(menu-detection): remove non-existent platform_dispensary_id_resolved_at column
The UPDATE query was trying to set a column that doesn't exist in the database
schema, causing platform ID resolution to fail silently. Now stores the
resolved_at timestamp in provider_detection_data JSONB instead.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 19:40:22 -07:00
Kelly
66e07b2009 fix(monitor): remove non-existent worker columns from job_run_logs query
The job_run_logs table tracks scheduled job orchestration, not individual
worker jobs. Worker info (worker_id, worker_hostname) belongs on
dispensary_crawl_jobs, not job_run_logs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 18:45:05 -07:00
Kelly
54f40d26bb refactor(scheduler): separate detection from product crawl jobs
Product crawl job now only targets ready dispensaries (menu_type=dutchie
AND platform_dispensary_id IS NOT NULL). Detection is handled by the
separate menu_detection schedule.

This ensures:
- Single path for menu discovery (menu_detection job)
- Product crawl only processes ready dutchie stores
- All 5 workers claim jobs via queue/locking

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 18:16:06 -07:00
Kelly
e234dc2947 feat(frontend): rewire dashboard to use AZ data endpoint
Main dashboard now uses /api/az/dashboard for dispensary, product, and
brand counts instead of the legacy /api/dashboard/stats endpoint. This
ensures the dashboard displays consistent data with the /az pages.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 17:28:22 -07:00
Kelly
cac414dafd Fix snapshot creation order - run before image downloads
Reorder processProducts() to create snapshots BEFORE attempting image downloads.
Previously, if image downloads hung or failed, the process would be killed before
snapshots were created, resulting in 0 snapshots despite successful product upserts.

Changes:
- Move Step 3 (snapshot creation) before Step 4 (image downloads)
- Ensures core crawl data (products + snapshots) is persisted even if images fail
- Adds chunked batch processing for improved memory management

Tested locally: 771 snapshots created for dispensary 112 with quantity data populated.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 17:27:44 -07:00
Kelly
e67849bb3a Fix SQL queries to use correct column names in store summary endpoints
- Changed missing_from_feed=true to stock_status='missing_from_feed'
- Changed products_inserted/snapshots_created to products_new/products_updated
- Changed crawl_jobs table reference to dispensary_crawl_jobs
- Fixed product query to use actual snapshot columns (price in cents, etc.)
- Added explicit column list for dispensaries to avoid SELECT * issues

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 15:23:50 -07:00
Kelly
45209c3518 Add Curaleaf provider detection to menu detection service
- Add 'curaleaf' to MenuProvider type enum
- Add Curaleaf URL patterns BEFORE Dutchie in PROVIDER_URL_PATTERNS for proper precedence
- Add isCuraleafUrl() and extractCuraleafStoreUrl() helper functions
- Check website field for Curaleaf pattern before any Dutchie resolution
- Clear stale Dutchie menu_url when store is identified as Curaleaf
- Mark Curaleaf stores as not_crawlable with reason until crawler is built
- This prevents 60s Dutchie timeouts for stores that migrated to Curaleaf

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 15:12:22 -07:00
Kelly
c10710d6a7 Fix cName bug: extract cName from menuUrl per dispensary
- Add extractCName() helper to parse cName from dispensary.menuUrl
- Handles /embedded-menu/<cName> and /dispensary/<cName> URL patterns
- Falls back to dispensary.slug if menuUrl extraction fails
- Pass cName to fetchAllProductsBothModes and fetchAllProducts
- Make cName required parameter (no hardcoded defaults)
- Add normBool and normDate helpers for API data normalization
- Refactor graphql-client to use server-side fetch with Puppeteer session cookies

Previously all stores were using AZ-Deeply-Rooted cName, causing 0 products
for other dispensaries like Sol Flower.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-02 20:53:28 -07:00
Kelly
9caa52fd5b Update CLAUDE guidelines: local image storage, no MinIO 2025-12-02 13:32:57 -07:00
Kelly
04b5c3bd09 Add CLAUDE guidelines for consolidated pipeline 2025-12-02 13:28:23 -07:00
Kelly
9219d8a77a Add brand history view to dutchie GraphQL migration 2025-12-02 11:34:01 -07:00