Initial commit - Dutchie dispensary scraper

This commit is contained in:
Kelly
2025-11-28 19:45:44 -07:00
commit 5757a8e9bd
23375 changed files with 3788799 additions and 0 deletions

62
DATABASE-GUIDE.md Normal file
View File

@@ -0,0 +1,62 @@
# Database Usage Guide
## ⚠️ CRITICAL: Which Database To Use
### Dutchie Database (PORT 54320) - **USE THIS FOR EVERYTHING**
```bash
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus"
```
**Use for:**
- ✅ Running the backend application (`npm run dev`)
- ✅ Database migrations
- ✅ API development
- ✅ Frontend development
- ✅ All production features
- ✅ Testing the complete application
### Sail Database (PORT 5432) - **SCRAPER TESTING ONLY**
```bash
DATABASE_URL="postgresql://sail:password@localhost:5432/dutchie_menus"
```
**Use ONLY for:**
- ⚠️ Testing individual scrapers in isolation
- ⚠️ And ONLY when explicitly needed
## Default Rule
**When in doubt, use Dutchie (port 54320).**
## Quick Reference Commands
### Start Backend (Production)
```bash
cd /home/kelly/dutchie-menus/backend
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npm run dev
```
### Run Migrations
```bash
cd /home/kelly/dutchie-menus/backend
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npx tsx src/db/migrate.ts
```
### Database Console Access
```bash
# Dutchie (production)
PGPASSWORD=dutchie_local_pass psql -h localhost -p 54320 -U dutchie -d dutchie_menus
# Sail (scraper testing only)
PGPASSWORD=password psql -h localhost -p 5432 -U sail -d dutchie_menus
```
## Why Two Databases?
- **Dutchie**: Main application database, contains real data
- **Sail**: Isolated environment for testing scrapers without affecting production data
## Common Mistake
**Don't do this:** Using sail database for application features/migrations
**Do this:** Use dutchie database for all application work

56
SESSION_NOTES.md Normal file
View File

@@ -0,0 +1,56 @@
# Dutchie Menus Project - Session Notes
## Important Rules & Context
### Container Management
- **ONLY manage containers that start with `dutchie-`**
- User has multiple projects running - be careful not to affect other containers
### Scraping Setup
- Uses fingerprints and antidetect for web scraping
- Bypasses age gates using real Puppeteer clicks (not page.evaluate)
- Mobile Chrome user agent: `Mozilla/5.0 (Linux; Android 10; SM-G973F) AppleWebKit/537.36`
### Database Configuration
- **Correct DATABASE_URL**: `postgresql://sail:password@localhost:5432/dutchie_menus`
- ⚠️ **Common Issue**: Backend sometimes connects to wrong DB: `postgresql://kelly:kelly@localhost:5432/hub`
- Always verify backend is using correct .env file from `/home/kelly/dutchie-menus/backend/.env`
### Project Structure
- Backend: `/home/kelly/dutchie-menus/backend` (Port 3010)
- Frontend: `/home/kelly/dutchie-menus/frontend` (Port 5174)
- Database: PostgreSQL on localhost:5432
### Key Technical Decisions
- **Brands Table**: Stores brand names separately from products
- Unique constraint on `(store_id, name)`
- Index on `(store_id)` for fast queries
- **Specials Table**: Daily tracking with date-based filtering
- Composite index on `(store_id, valid_date DESC)` for performance
- Designed for "specials change every day" requirement
### Scraping Breakthroughs
- **Age Gate Solution**: Use `await option.click()` instead of `page.evaluate(() => element.click())`
- React synthetic events require real browser interactions
- Two-gate system: State selector dropdown → "I'm over 21" button
### Current Implementation Status
- ✅ StoreBrands and StoreSpecials pages created
- ✅ API endpoints for /brands and /specials
- ✅ Routes added to App.tsx
- ✅ Navigation working from StoreDetail page
- ✅ 127 brands scraped and saved for Curaleaf Phoenix Airport
## Common Issues & Solutions
### Issue: "0 brands" showing on page
**Cause**: Backend connected to wrong database
**Solution**: Restart backend from correct directory with .env file:
```bash
cd /home/kelly/dutchie-menus/backend
npm run dev
```
### Issue: Scraper not bypassing age gate
**Cause**: Using page.evaluate for clicks
**Solution**: Use real Puppeteer clicks: `await element.click()`

85
SESSION_RESUME.md Normal file
View File

@@ -0,0 +1,85 @@
# Session Resume - Dutchie Menus Project
## Current State (2025-11-17)
### Services Running
- **Frontend**: http://localhost:5174 (Vite + React)
- **Backend**: http://localhost:3012 (Express + TypeScript)
- **Database**: PostgreSQL in Docker (port 54320)
- DB Name: dutchie_menus
- User: dutchie
- Password: dutchie_local_pass
### Files Modified This Session
1. `/home/kelly/dutchie-menus/backend/.env`
- Fixed DATABASE_URL to use correct Docker database
2. `/home/kelly/dutchie-menus/frontend/src/lib/api.ts`
- Updated API_URL to port 3012
3. `/home/kelly/dutchie-menus/frontend/src/pages/StoreDetail.tsx`
- Fixed navigation buttons (Brands/Specials) to use `setViewMode()` instead of routing
- Added "Discover Categories" and "Scrape Store" buttons in navbar
- Navigation bar is sticky at top
4. `/home/kelly/dutchie-menus/backend/create-brands-table.ts`
- Fixed database connection string to use Docker database
- **ALREADY RAN**: Created `brands` and `specials` tables successfully
### Database Tables Created
- ✅ brands (with indexes on store_id and unique constraint on store_id+name)
- ✅ specials (with indexes on store_id+valid_date and product_id)
### Current Issue
**Category Discovery Failing** - When clicking "Scrape Store", the scraper finds 0 categories:
- The scraper cannot find navigation links on the Curaleaf website
- Without categories, no products can be scraped
- Without products, no brands can be extracted
### What Works
- ✅ Frontend loads correctly
- ✅ Backend connects to database
- ✅ Login works
- ✅ Navigation buttons switch between Products/Brands/Specials views
- ✅ Database tables exist
### What Doesn't Work
- ❌ Category discovery returns 0 categories
- ❌ Product scraping cannot proceed without categories
- ❌ Brands don't populate (because no products are scraped)
## To Restart This Session
### Say to Claude:
```
Continue the dutchie-menus session. The category discovery is failing (finding 0 categories) when scraping the Curaleaf store. We need to fix the scraper so it can find categories on https://curaleaf.com/stores/curaleaf-az-48th-street and then scrape products/brands.
Current URLs:
- Frontend: http://localhost:5174/stores/az/curaleaf/curaleaf-az-48th-street
- Backend: http://localhost:3012
The scraper logs show:
- "Found 0 navigation links"
- "Custom menu detected - extracting from navigation"
- "Created 0 custom categories"
Need to investigate why category discovery is failing and fix it.
```
### Quick Start Commands
```bash
# Terminal 1 - Frontend
cd /home/kelly/dutchie-menus/frontend && npm run dev
# Terminal 2 - Backend
cd /home/kelly/dutchie-menus/backend && PORT=3012 DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npm run dev
# Check backend logs
tail -f <backend_output>
```
### Next Steps
1. Investigate category discovery code in backend scraper
2. Debug why navigation links aren't being found
3. Either fix the scraper or manually add categories
4. Test scraping to populate products and brands

12
backend/.dockerignore Normal file
View File

@@ -0,0 +1,12 @@
node_modules
dist
npm-debug.log
.env
.env.local
.git
.gitignore
README.md
*.log
.DS_Store
coverage
.nyc_output

17
backend/.env Normal file
View File

@@ -0,0 +1,17 @@
PORT=3010
NODE_ENV=development
# Database
DATABASE_URL=postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus
# MinIO (connecting to Docker from host)
MINIO_ENDPOINT=localhost
MINIO_PORT=9020
MINIO_USE_SSL=false
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin
MINIO_BUCKET=dutchie
MINIO_PUBLIC_ENDPOINT=http://localhost:9020
# JWT
JWT_SECRET=your-secret-key-change-in-production

434
backend/API_TOKENS_GUIDE.md Normal file
View File

@@ -0,0 +1,434 @@
# API Token Management System
## Overview
Complete API token management with usage tracking, rate limiting, and analytics.
## Features
**Create Named API Tokens** - Give each token a descriptive name (e.g., "WordPress Plugin - Main Site")
**Usage Tracking** - Track every API request (endpoint, response time, size, IP, user agent)
**Rate Limiting** - Configurable per-token rate limits
**Analytics** - View usage stats, popular endpoints, response times
**Endpoint Restrictions** - Limit tokens to specific endpoints
**Expiration Dates** - Set token expiry dates
**Enable/Disable** - Activate or deactivate tokens without deleting
**Security** - Tokens are hashed, one-way generation
## Creating an API Token
### Via API
**Endpoint:** `POST /api/api-tokens`
```bash
curl -X POST \
-H "Authorization: Bearer YOUR_ADMIN_JWT" \
-H "Content-Type: application/json" \
-d '{
"name": "WordPress Plugin - Main Site",
"description": "Token for the main WordPress website",
"rate_limit": 200,
"expires_at": "2026-12-31",
"allowed_endpoints": ["/products*", "/stores*", "/categories*"]
}' \
http://localhost:3010/api/api-tokens
```
**Response:**
```json
{
"token": {
"id": 1,
"name": "WordPress Plugin - Main Site",
"token": "a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0u1v2w3x4y5z6",
"description": "Token for the main WordPress website",
"active": true,
"rate_limit": 200,
"allowed_endpoints": ["/products*", "/stores*", "/categories*"],
"expires_at": "2026-12-31T00:00:00.000Z",
"created_at": "2025-01-15T10:30:00.000Z"
},
"message": "API token created successfully. Save this token securely - it cannot be retrieved later."
}
```
**IMPORTANT:** Save the token immediately! It cannot be retrieved after creation.
## Using an API Token
Replace JWT token with API token in Authorization header:
```bash
curl -H "Authorization: Bearer a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0u1v2w3x4y5z6" \
http://localhost:3010/api/products?store_id=1
```
The API will:
1. ✅ Validate the token
2. ✅ Check if it's active
3. ✅ Check if it's expired
4. ✅ Verify endpoint is allowed
5. ✅ Check rate limit
6. ✅ Track the request
7. ✅ Return data with rate limit headers
## Token Configuration
### Rate Limiting
Set requests per minute for each token:
```json
{
"name": "Mobile App",
"rate_limit": 500
}
```
**Rate Limit Headers** in responses:
```
X-RateLimit-Limit: 500
X-RateLimit-Remaining: 487
X-RateLimit-Reset: 2025-01-15T10:31:00.000Z
```
When limit is exceeded:
```json
{
"error": "Rate limit exceeded",
"limit": 500,
"current": 501,
"retry_after": 60
}
```
### Endpoint Restrictions
Limit tokens to specific endpoints using wildcards:
```json
{
"name": "Public Website",
"allowed_endpoints": [
"/products*", // All product endpoints
"/stores*", // All store endpoints
"/categories*" // All category endpoints
]
}
```
Examples:
- `/products*` - matches `/products`, `/products/123`, `/products/meta/brands`
- `/stores/*/brands` - matches `/stores/1/brands`, `/stores/2/brands`
- `null` or `[]` - allows all endpoints
### Expiration
Set token expiry date:
```json
{
"name": "Temporary Integration",
"expires_at": "2025-12-31"
}
```
After expiration:
```json
{
"error": "Token has expired"
}
```
## Managing Tokens
### List All Tokens
```bash
GET /api/api-tokens
```
Returns tokens with usage stats:
```json
{
"tokens": [
{
"id": 1,
"name": "WordPress Plugin - Main Site",
"description": "Token for the main WordPress website",
"active": true,
"rate_limit": 200,
"last_used_at": "2025-01-15T12:45:00.000Z",
"created_at": "2025-01-15T10:30:00.000Z",
"requests_24h": 1247,
"requests_7d": 8932,
"total_requests": 45123
}
]
}
```
### Get Single Token
```bash
GET /api/api-tokens/:id
```
### Update Token
```bash
PUT /api/api-tokens/:id
```
Update token properties:
```json
{
"name": "WordPress Plugin - Main Site (Updated)",
"active": false,
"rate_limit": 300
}
```
### Delete Token
```bash
DELETE /api/api-tokens/:id
```
## Usage Analytics
### Token-Specific Usage
**Endpoint:** `GET /api/api-tokens/:id/usage?days=7`
```json
{
"hourly_usage": [
{
"hour": "2025-01-15T12:00:00.000Z",
"requests": 245,
"avg_response_time": 127,
"successful_requests": 243,
"failed_requests": 2
}
],
"endpoint_usage": [
{
"endpoint": "/products",
"method": "GET",
"requests": 1840,
"avg_response_time": 132
},
{
"endpoint": "/stores",
"method": "GET",
"requests": 423,
"avg_response_time": 89
}
],
"recent_requests": [
{
"endpoint": "/products",
"method": "GET",
"status_code": 200,
"response_time_ms": 145,
"ip_address": "192.168.1.100",
"created_at": "2025-01-15T12:45:30.000Z"
}
]
}
```
### Overall API Statistics
**Endpoint:** `GET /api/api-tokens/stats/overview?days=7`
```json
{
"overview": {
"active_tokens": 5,
"total_requests": 45892,
"avg_response_time": 142,
"successful_requests": 45234,
"failed_requests": 658
},
"top_tokens": [
{
"id": 1,
"name": "WordPress Plugin - Main Site",
"requests": 18234,
"avg_response_time": 134
}
],
"top_endpoints": [
{
"endpoint": "/products",
"method": "GET",
"requests": 28934,
"avg_response_time": 145
}
]
}
```
## Usage Tracking
Every API request tracks:
-**Token ID** - Which token made the request
-**Endpoint** - Which API endpoint was called
-**Method** - GET, POST, PUT, DELETE
-**Status Code** - 200, 404, 500, etc.
-**Response Time** - Milliseconds
-**Request Size** - Bytes
-**Response Size** - Bytes
-**IP Address** - Client IP
-**User Agent** - Client identifier
-**Timestamp** - When request was made
This data powers:
- Usage analytics dashboards
- Performance monitoring
- Rate limiting
- Billing/quota tracking
- Security auditing
## Use Cases
### WordPress Plugin Token
```json
{
"name": "WordPress - example.com",
"description": "Main WordPress website integration",
"rate_limit": 200,
"allowed_endpoints": ["/products*", "/stores*", "/categories*"]
}
```
### Mobile App Token
```json
{
"name": "iOS App - Production",
"description": "Production iOS application",
"rate_limit": 500,
"allowed_endpoints": null
}
```
### Partner Integration Token
```json
{
"name": "Partner ABC - Integration",
"description": "Third-party partner integration",
"rate_limit": 100,
"allowed_endpoints": ["/products*"],
"expires_at": "2025-12-31"
}
```
### Testing/Development Token
```json
{
"name": "Development - Local Testing",
"description": "For local development and testing",
"rate_limit": 1000,
"expires_at": "2025-06-30"
}
```
## Security Best Practices
1. **Name Tokens Descriptively** - Know where requests come from
2. **Use Minimum Rate Limits** - Start low, increase if needed
3. **Restrict Endpoints** - Only allow necessary endpoints
4. **Set Expiration Dates** - Especially for temporary integrations
5. **Monitor Usage** - Watch for unusual patterns
6. **Rotate Tokens** - Periodically create new tokens
7. **Disable Before Deleting** - Test impact before removal
8. **Store Securely** - Treat like passwords
9. **Use HTTPS** - Always use secure connections
10. **Track IP Addresses** - Monitor for unauthorized access
## Database Schema
### api_tokens Table
```sql
CREATE TABLE api_tokens (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
token VARCHAR(255) UNIQUE NOT NULL,
description TEXT,
user_id INTEGER REFERENCES users(id),
active BOOLEAN DEFAULT true,
rate_limit INTEGER DEFAULT 100,
allowed_endpoints TEXT[],
expires_at TIMESTAMP,
last_used_at TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```
### api_token_usage Table
```sql
CREATE TABLE api_token_usage (
id SERIAL PRIMARY KEY,
token_id INTEGER REFERENCES api_tokens(id) ON DELETE CASCADE,
endpoint VARCHAR(255) NOT NULL,
method VARCHAR(10) NOT NULL,
status_code INTEGER,
response_time_ms INTEGER,
request_size INTEGER,
response_size INTEGER,
ip_address INET,
user_agent TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```
## Admin UI
Access token management at:
```
http://localhost:5173/api-tokens
```
Features:
- ✅ Create new tokens
- ✅ View all tokens with usage stats
- ✅ Enable/disable tokens
- ✅ View detailed usage analytics
- ✅ Monitor rate limits
- ✅ Delete tokens
## Monitoring & Alerts
Set up monitoring for:
1. **Rate Limit Violations** - Track 429 responses
2. **Failed Requests** - Monitor 4xx/5xx errors
3. **Unusual Patterns** - Spike in requests
4. **Token Expiration** - Alert before expiry
5. **Slow Responses** - Performance degradation
## Migration from JWT to API Tokens
If currently using JWT tokens:
1. Create API token for each integration
2. Update clients to use new API tokens
3. Monitor usage to ensure migration is successful
4. Deactivate old JWT tokens
5. Clean up old authentication
API tokens are better for:
- ✅ Non-user integrations (plugins, apps, partners)
- ✅ Usage tracking
- ✅ Per-client rate limiting
- ✅ Endpoint restrictions
Keep JWT for:
- ✅ User authentication
- ✅ Admin panel access
- ✅ Short-lived sessions

414
backend/API_USAGE.md Normal file
View File

@@ -0,0 +1,414 @@
# Dutchie Analytics API - Usage Guide
## Base URL
```
http://localhost:3010/api
```
## Authentication
All endpoints require JWT authentication via Bearer token:
```
Authorization: Bearer YOUR_JWT_TOKEN
```
## Products API
### Get Products with Filtering, Sorting & Field Selection
**Endpoint:** `GET /products`
#### Basic Usage
```bash
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/products?store_id=1&limit=10"
```
#### Field Selection (Reduce Payload Size)
Only return specific fields to reduce bandwidth and improve performance:
```bash
# Only get essential fields
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/products?store_id=1&fields=id,name,price,brand,in_stock"
# Get fields needed for product cards
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/products?store_id=1&fields=id,name,price,brand,thc_percentage,image_url_full,in_stock"
```
#### Advanced Filtering
**Filter by Category:**
```bash
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/products?store_id=1&category_id=5"
```
**Filter by Stock Status:**
```bash
# Only in-stock products
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/products?store_id=1&in_stock=true"
```
**Search Products:**
```bash
# Search in name, brand, and description
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/products?search=blue+dream"
```
**Filter by Brand:**
```bash
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/products?store_id=1&brand=Cresco"
```
**Filter by Price Range:**
```bash
# Products between $20 and $50
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/products?store_id=1&min_price=20&max_price=50"
```
**Filter by THC Percentage:**
```bash
# High THC products (>20%)
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/products?store_id=1&min_thc=20"
```
**Filter by Strain Type:**
```bash
# Get only indica strains
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/products?store_id=1&strain_type=indica"
```
#### Sorting
**Sort by Price:**
```bash
# Lowest price first
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/products?store_id=1&sort_by=price&sort_order=asc"
# Highest price first
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/products?store_id=1&sort_by=price&sort_order=desc"
```
**Sort by THC:**
```bash
# Highest THC first
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/products?store_id=1&sort_by=thc_percentage&sort_order=desc"
```
**Sort by Name:**
```bash
# Alphabetical
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/products?store_id=1&sort_by=name&sort_order=asc"
```
**Available Sort Fields:**
- `id`
- `name`
- `brand`
- `price`
- `thc_percentage`
- `cbd_percentage`
- `last_seen_at`
- `created_at`
#### Pagination
```bash
# Get first page (50 items)
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/products?store_id=1&limit=50&offset=0"
# Get second page
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/products?store_id=1&limit=50&offset=50"
```
#### Complex Query Example
Get affordable indica products sorted by THC:
```bash
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/products?\
store_id=1&\
strain_type=indica&\
max_price=40&\
min_thc=15&\
in_stock=true&\
sort_by=thc_percentage&\
sort_order=desc&\
limit=20&\
fields=id,name,brand,price,thc_percentage,strain_type,image_url_full"
```
#### Response Format
```json
{
"products": [
{
"id": 123,
"name": "Blue Dream 3.5g",
"brand": "Cresco Labs",
"price": 45.00,
"thc_percentage": 24.5,
"cbd_percentage": 0.5,
"strain_type": "hybrid",
"in_stock": true,
"image_url_full": "http://localhost:9020/dutchie/products/123/full.jpg",
"thumbnail_url": "http://localhost:9020/dutchie/products/123/thumb.jpg",
"medium_url": "http://localhost:9020/dutchie/products/123/medium.jpg",
"store_name": "Curaleaf - Phoenix",
"category_name": "Flower"
}
],
"total": 145,
"limit": 50,
"offset": 0,
"filters": {
"store_id": "1",
"category_id": null,
"in_stock": "true",
"search": null,
"brand": null,
"min_price": null,
"max_price": null,
"min_thc": null,
"max_thc": null,
"strain_type": null,
"sort_by": "last_seen_at",
"sort_order": "DESC"
}
}
```
### Get Single Product
**Endpoint:** `GET /products/:id`
```bash
# Full product data
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/products/123"
# With field selection
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/products/123?fields=id,name,price,description,image_url_full"
```
### Get Available Brands (Meta Endpoint)
**Endpoint:** `GET /products/meta/brands`
Get list of all brands for filter dropdowns:
```bash
# All brands
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/products/meta/brands"
# Brands for specific store
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/products/meta/brands?store_id=1"
```
**Response:**
```json
{
"brands": [
"Cresco Labs",
"Select",
"Timeless",
"Canamo",
"Item 9"
]
}
```
### Get Price Range (Meta Endpoint)
**Endpoint:** `GET /products/meta/price-range`
Get min/max/avg prices for filter sliders:
```bash
# All products
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/products/meta/price-range"
# For specific store
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/products/meta/price-range?store_id=1"
```
**Response:**
```json
{
"min_price": 15.00,
"max_price": 120.00,
"avg_price": 42.50
}
```
## Stores API
### Get All Stores
```bash
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/stores"
```
### Get Single Store
```bash
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/stores/1"
```
### Get Store Brands
```bash
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/stores/1/brands"
```
### Get Store Specials
```bash
# Today's specials
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/stores/1/specials"
# Specials for specific date
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/stores/1/specials?date=2025-01-15"
```
### Trigger Store Scrape (Admin)
```bash
curl -X POST -H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"parallel": 3}' \
"http://localhost:3010/api/stores/1/scrape"
```
## Categories API
### Get Categories (Flat List)
```bash
# All categories
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/categories"
# For specific store
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/categories?store_id=1"
```
### Get Category Tree (Hierarchical)
```bash
curl -H "Authorization: Bearer YOUR_TOKEN" \
"http://localhost:3010/api/categories/tree?store_id=1"
```
## Real-World Usage Examples
### Build a Product Filter UI
1. **Get filter options:**
```javascript
// Get brands for dropdown
const brands = await fetch('/api/products/meta/brands?store_id=1');
// Get price range for slider
const priceRange = await fetch('/api/products/meta/price-range?store_id=1');
// Get categories for checkboxes
const categories = await fetch('/api/categories?store_id=1');
```
2. **Fetch filtered products:**
```javascript
const params = new URLSearchParams({
store_id: '1',
brand: selectedBrand,
min_price: minPrice,
max_price: maxPrice,
category_id: selectedCategory,
in_stock: 'true',
sort_by: 'price',
sort_order: 'asc',
fields: 'id,name,price,brand,thc_percentage,image_url_full,in_stock'
});
const products = await fetch(`/api/products?${params}`);
```
### Build a WordPress Product Grid
```php
// Efficient field selection for grid display
$fields = 'id,name,price,brand,thc_percentage,image_url_full,in_stock';
$response = wp_remote_get(
"http://localhost:3010/api/products?store_id=1&in_stock=true&limit=12&fields=$fields",
['headers' => ['Authorization' => 'Bearer ' . $token]]
);
$data = json_decode(wp_remote_retrieve_body($response));
foreach ($data->products as $product) {
// Only contains requested fields = smaller payload
echo render_product_card($product);
}
```
### Build Price Comparison Tool
```javascript
// Get cheapest products across all stores
const cheapProducts = await fetch(
'/api/products?in_stock=true&sort_by=price&sort_order=asc&limit=50&fields=id,name,price,store_name'
);
// Get highest THC products
const strongProducts = await fetch(
'/api/products?in_stock=true&min_thc=25&sort_by=thc_percentage&sort_order=desc&limit=50'
);
```
## Performance Tips
1. **Use Field Selection:** Only request fields you need to reduce bandwidth
2. **Pagination:** Use reasonable `limit` values (50-100)
3. **Caching:** Cache brand lists, price ranges, and category data
4. **Combine Filters:** Use multiple filters to reduce result set size
5. **Index Optimization:** The API uses indexed fields for filtering
## Rate Limits
- Standard users: 100 requests/minute
- Admin users: 500 requests/minute
- Exceeded limits return `429 Too Many Requests`
## Error Responses
```json
{
"error": "Failed to fetch products"
}
```
HTTP Status Codes:
- `200` - Success
- `400` - Bad Request
- `401` - Unauthorized
- `404` - Not Found
- `429` - Too Many Requests
- `500` - Server Error

36
backend/Dockerfile Normal file
View File

@@ -0,0 +1,36 @@
# Build stage
FROM node:20-alpine AS builder
WORKDIR /app
# Copy package files
COPY package*.json ./
# Install dependencies
RUN npm ci
# Copy source code
COPY . .
# Build TypeScript
RUN npm run build
# Production stage
FROM node:20-alpine
WORKDIR /app
# Copy package files
COPY package*.json ./
# Install production dependencies only
RUN npm ci --only=production
# Copy built code from builder
COPY --from=builder /app/dist ./dist
# Expose port
EXPOSE 3010
# Start the application
CMD ["node", "dist/index.js"]

40
backend/Dockerfile.dev Executable file
View File

@@ -0,0 +1,40 @@
FROM node:20-slim
# Install Chromium and dependencies for Puppeteer
RUN apt-get update && apt-get install -y \
chromium \
fonts-liberation \
libasound2 \
libatk-bridge2.0-0 \
libatk1.0-0 \
libatspi2.0-0 \
libcups2 \
libdbus-1-3 \
libdrm2 \
libgbm1 \
libgtk-3-0 \
libnspr4 \
libnss3 \
libwayland-client0 \
libxcomposite1 \
libxdamage1 \
libxfixes3 \
libxkbcommon0 \
libxrandr2 \
xdg-utils \
curl \
&& rm -rf /var/lib/apt/lists/*
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true
ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
EXPOSE 3000
CMD ["npm", "run", "dev"]

180
backend/SESSION_SUMMARY.md Normal file
View File

@@ -0,0 +1,180 @@
# Session Summary - Age Gate Bypass Implementation
## Problem
Brands weren't populating when clicking "Scrape Store" for Curaleaf dispensaries. Root cause: category discovery was finding 0 categories because age verification gates were blocking access to the Dutchie menu.
## What Was Accomplished
### 1. Created Universal Age Gate Bypass System
**File:** `/home/kelly/dutchie-menus/backend/src/utils/age-gate.ts`
Three main functions:
- `setAgeGateCookies(page, url, state)` - Sets age gate bypass cookies BEFORE navigation
- `bypassAgeGate(page, state)` - Attempts to bypass age gate AFTER page load using multiple methods
- `detectStateFromUrl(url)` - Auto-detects state from URL patterns
**Key Features:**
- Multiple bypass strategies: custom dropdowns, standard selects, buttons, state cards
- Enhanced React event dispatching (mousedown, mouseup, click, change, input)
- Proper failure detection - checks final URL to verify success
- Cookie-based approach as primary method
### 2. Updated Category Discovery
**File:** `/home/kelly/dutchie-menus/backend/src/services/category-discovery.ts`
Changes at lines 120-129:
```typescript
// Set age gate bypass cookies BEFORE navigation
const state = detectStateFromUrl(baseUrl);
await setAgeGateCookies(page, baseUrl, state);
await page.goto(baseUrl, { waitUntil: 'domcontentloaded', timeout: 60000 });
// If age gate still appears, try to bypass it
await bypassAgeGate(page, state);
```
### 3. Updated Scraper
**File:** `/home/kelly/dutchie-menus/backend/src/services/scraper.ts`
Changes at lines 379-392:
- Imports setAgeGateCookies (line 10)
- Sets cookies before navigation (line 381)
- Attempts bypass if age gate still appears (line 392)
### 4. Test Scripts Created
**`test-improved-age-gate.ts`** - Tests cookie-based bypass approach
**`capture-age-gate-cookies.ts`** - Opens visible browser to manually capture real cookies (requires X11)
**`debug-age-gate-detailed.ts`** - Visible browser debugging
**`debug-after-state-select.ts`** - Checks state after selecting Arizona
## Current Status
### ✅ Working
1. Age gate detection properly identifies gates
2. Cookie setting function works (cookies are set correctly)
3. Multiple bypass methods attempt to click through gates
4. Failure detection now accurately reports when bypass fails
5. Category discovery successfully created 10 categories for Curaleaf store (ID 18)
### ❌ Not Working
**Curaleaf's Specific Age Gate:**
- URL: `https://curaleaf.com/age-gate?returnurl=...`
- Uses React-based custom dropdown (shadcn/radix UI)
- Ignores cookies we set
- Doesn't respond to automated clicks
- Doesn't trigger navigation after state selection
**Why It Fails:**
- Curaleaf's React implementation likely checks for:
- Real user interaction patterns
- Additional signals beyond cookies (session storage, local storage)
- Specific event sequences that automation doesn't replicate
- Automated clicks don't trigger the React state changes needed
## Test Results
```bash
# Latest test - Cookie approach
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npx tsx test-improved-age-gate.ts
```
**Result:** ❌ FAILED
- Cookies set successfully
- Page still redirects to `/age-gate`
- Click automation finds elements but navigation doesn't occur
## Database State
**Categories created:** 10 categories for store_id 18 (Curaleaf - 48th Street)
- Categories are in the database and ready
- Dutchie menu detection works
- Category URLs are properly formatted
**Brands:** Not tested yet (can't scrape without bypassing age gate)
## Files Modified
1. `/home/kelly/dutchie-menus/backend/src/utils/age-gate.ts` - Created
2. `/home/kelly/dutchie-menus/backend/src/services/category-discovery.ts` - Updated
3. `/home/kelly/dutchie-menus/backend/src/services/scraper.ts` - Updated
4. `/home/kelly/dutchie-menus/backend/curaleaf-cookies.json` - Test cookies (don't work)
## Next Steps / Options
### Option 1: Test with Different Store
Find a Dutchie dispensary with a simpler age gate that responds to automation.
### Option 2: Get Real Cookies
Run `capture-age-gate-cookies.ts` on a machine with display/X11:
```bash
# On machine with display
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npx tsx capture-age-gate-cookies.ts
# Manually complete age gate in browser
# Press ENTER to capture real cookies
# Copy cookies to production server
```
### Option 3: Switch to Playwright
Playwright may handle React apps better than Puppeteer. Consider migrating the age gate bypass to use Playwright.
### Option 4: Use Proxy with Session Persistence
Use a rotating proxy service that maintains session state between requests.
### Option 5: Focus on Other Stores First
Skip Curaleaf for now and implement scraping for dispensaries without complex age gates, then circle back.
## How to Continue
1. **Check if categories exist:**
```sql
SELECT * FROM categories WHERE store_id = 18;
```
2. **Test scraping (will fail at age gate):**
```bash
# Via UI: Click "Scrape Store" button for Curaleaf
# OR via test script
```
3. **Try different store:**
- Find another Dutchie dispensary
- Add it to the stores table
- Run category discovery
- Test scraping
## Important Notes
- All cannabis dispensary sites will have age gates (as you correctly noted)
- The bypass infrastructure is solid and should work for simpler gates
- Curaleaf specifically uses advanced React patterns that resist automation
- Categories ARE created successfully despite age gate (using fallback detection)
- The scraper WILL work once we can bypass the age gate
## Key Insight
Category discovery succeeded because it checks the page source to detect Dutchie, then uses predefined categories. Product scraping requires actually viewing the product listings, which can't happen while stuck at the age gate.
## Commands to Resume
```bash
# Start backend (if not running)
cd /home/kelly/dutchie-menus/backend
PORT=3012 DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npm run dev
# Check categories
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npx tsx -e "
import { pool } from './src/db/migrate';
const result = await pool.query('SELECT * FROM categories WHERE store_id = 18');
console.log(result.rows);
process.exit(0);
"
# Test age gate bypass
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npx tsx test-improved-age-gate.ts
```
## What to Tell Claude When Resuming
"Continue working on the age gate bypass for Curaleaf. We have categories created but can't scrape products because the age gate blocks us. The cookie approach didn't work. Options: try a different store, get real cookies, or switch to Playwright."

View File

@@ -0,0 +1,71 @@
const { Pool } = require('pg');
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
const azStores = [
{ slug: 'curaleaf-az-48th-street-med', name: 'Curaleaf - 48th Street (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-83rd-dispensary-med', name: 'Curaleaf - 83rd Ave (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-bell-med', name: 'Curaleaf - Bell (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-camelback-med', name: 'Curaleaf - Camelback (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-central-med', name: 'Curaleaf - Central (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-gilbert-med', name: 'Curaleaf - Gilbert (Medical)', city: 'Gilbert' },
{ slug: 'curaleaf-az-glendale-east', name: 'Curaleaf - Glendale East', city: 'Glendale' },
{ slug: 'curaleaf-az-glendale-east-the-kind-relief-med', name: 'Curaleaf - Glendale East Kind Relief (Medical)', city: 'Glendale' },
{ slug: 'curaleaf-az-glendale-med', name: 'Curaleaf - Glendale (Medical)', city: 'Glendale' },
{ slug: 'curaleaf-az-midtown-med', name: 'Curaleaf - Midtown (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-peoria-med', name: 'Curaleaf - Peoria (Medical)', city: 'Peoria' },
{ slug: 'curaleaf-az-phoenix-med', name: 'Curaleaf - Phoenix Airport (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-queen-creek', name: 'Curaleaf - Queen Creek', city: 'Queen Creek' },
{ slug: 'curaleaf-az-queen-creek-whoa-qc-inc-med', name: 'Curaleaf - Queen Creek WHOA (Medical)', city: 'Queen Creek' },
{ slug: 'curaleaf-az-scottsdale-natural-remedy-patient-center-med', name: 'Curaleaf - Scottsdale Natural Remedy (Medical)', city: 'Scottsdale' },
{ slug: 'curaleaf-az-sedona-med', name: 'Curaleaf - Sedona (Medical)', city: 'Sedona' },
{ slug: 'curaleaf-az-tucson-med', name: 'Curaleaf - Tucson (Medical)', city: 'Tucson' },
{ slug: 'curaleaf-az-youngtown-med', name: 'Curaleaf - Youngtown (Medical)', city: 'Youngtown' }
];
async function addStores() {
const client = await pool.connect();
try {
for (const store of azStores) {
const dutchieUrl = `https://curaleaf.com/stores/${store.slug}`;
// Skip sandbox stores
if (store.slug.includes('sandbox')) continue;
const result = await client.query(`
INSERT INTO stores (
name,
slug,
dutchie_url,
active,
scrape_enabled,
logo_url
)
VALUES ($1, $2, $3, $4, $5, $6)
ON CONFLICT (slug) DO UPDATE
SET name = $1, dutchie_url = $3
RETURNING id, name
`, [
store.name,
store.slug,
dutchieUrl,
true,
true,
'https://curaleaf.com/favicon.ico' // Using favicon as logo for now
]);
console.log(`✅ Added: ${result.rows[0].name} (ID: ${result.rows[0].id})`);
}
console.log(`\n🎉 Successfully added ${azStores.length - 1} Curaleaf Arizona stores!`);
} catch (error) {
console.error('❌ Error adding stores:', error);
} finally {
client.release();
pool.end();
}
}
addStores();

View File

@@ -0,0 +1,71 @@
const { Pool } = require('pg');
const pool = new Pool({
connectionString: 'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus'
});
const azStores = [
{ slug: 'curaleaf-az-48th-street-med', name: 'Curaleaf - 48th Street (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-83rd-dispensary-med', name: 'Curaleaf - 83rd Ave (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-bell-med', name: 'Curaleaf - Bell (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-camelback-med', name: 'Curaleaf - Camelback (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-central-med', name: 'Curaleaf - Central (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-gilbert-med', name: 'Curaleaf - Gilbert (Medical)', city: 'Gilbert' },
{ slug: 'curaleaf-az-glendale-east', name: 'Curaleaf - Glendale East', city: 'Glendale' },
{ slug: 'curaleaf-az-glendale-east-the-kind-relief-med', name: 'Curaleaf - Glendale East Kind Relief (Medical)', city: 'Glendale' },
{ slug: 'curaleaf-az-glendale-med', name: 'Curaleaf - Glendale (Medical)', city: 'Glendale' },
{ slug: 'curaleaf-az-midtown-med', name: 'Curaleaf - Midtown (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-peoria-med', name: 'Curaleaf - Peoria (Medical)', city: 'Peoria' },
{ slug: 'curaleaf-az-phoenix-med', name: 'Curaleaf - Phoenix Airport (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-queen-creek', name: 'Curaleaf - Queen Creek', city: 'Queen Creek' },
{ slug: 'curaleaf-az-queen-creek-whoa-qc-inc-med', name: 'Curaleaf - Queen Creek WHOA (Medical)', city: 'Queen Creek' },
{ slug: 'curaleaf-az-scottsdale-natural-remedy-patient-center-med', name: 'Curaleaf - Scottsdale Natural Remedy (Medical)', city: 'Scottsdale' },
{ slug: 'curaleaf-az-sedona-med', name: 'Curaleaf - Sedona (Medical)', city: 'Sedona' },
{ slug: 'curaleaf-az-tucson-med', name: 'Curaleaf - Tucson (Medical)', city: 'Tucson' },
{ slug: 'curaleaf-az-youngtown-med', name: 'Curaleaf - Youngtown (Medical)', city: 'Youngtown' }
];
async function addStores() {
const client = await pool.connect();
try {
for (const store of azStores) {
const dutchieUrl = `https://curaleaf.com/stores/${store.slug}`;
// Skip sandbox stores
if (store.slug.includes('sandbox')) continue;
const result = await client.query(`
INSERT INTO stores (
name,
slug,
dutchie_url,
active,
scrape_enabled,
logo_url
)
VALUES ($1, $2, $3, $4, $5, $6)
ON CONFLICT (slug) DO UPDATE
SET name = $1, dutchie_url = $3, logo_url = $6
RETURNING id, name
`, [
store.name,
store.slug,
dutchieUrl,
true,
true,
'https://curaleaf.com/favicon.ico' // Using favicon as logo for now
]);
console.log(`✅ Added: ${result.rows[0].name} (ID: ${result.rows[0].id})`);
}
console.log(`\n🎉 Successfully added ${azStores.length} Curaleaf Arizona stores!`);
} catch (error) {
console.error('❌ Error adding stores:', error);
} finally {
client.release();
pool.end();
}
}
addStores();

View File

@@ -0,0 +1,22 @@
import { pool } from './src/db/migrate.js';
async function main() {
console.log('Adding discount columns to products table...');
try {
await pool.query(`
ALTER TABLE products
ADD COLUMN IF NOT EXISTS discount_type VARCHAR(50),
ADD COLUMN IF NOT EXISTS discount_value VARCHAR(100);
`);
console.log('✅ Successfully added discount_type and discount_value columns');
} catch (error: any) {
console.error('❌ Error adding columns:', error.message);
throw error;
} finally {
await pool.end();
}
}
main().catch(console.error);

82
backend/add-geo-fields.ts Normal file
View File

@@ -0,0 +1,82 @@
import { pool } from './src/db/migrate';
async function addGeoFields() {
console.log('🗺️ Adding geo-location fields...\n');
try {
await pool.query(`
ALTER TABLE stores
ADD COLUMN IF NOT EXISTS latitude DECIMAL(10, 8),
ADD COLUMN IF NOT EXISTS longitude DECIMAL(11, 8),
ADD COLUMN IF NOT EXISTS region VARCHAR(100),
ADD COLUMN IF NOT EXISTS market_area VARCHAR(255),
ADD COLUMN IF NOT EXISTS timezone VARCHAR(50)
`);
console.log('✅ Added geo fields to stores table');
// Create indexes for geo queries
await pool.query(`
CREATE INDEX IF NOT EXISTS idx_stores_location ON stores(latitude, longitude) WHERE latitude IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_stores_city_state ON stores(city, state);
CREATE INDEX IF NOT EXISTS idx_stores_region ON stores(region);
CREATE INDEX IF NOT EXISTS idx_stores_market ON stores(market_area);
`);
console.log('✅ Created geo indexes');
// Create location-based views
await pool.query(`
CREATE OR REPLACE VIEW stores_by_region AS
SELECT
region,
state,
COUNT(*) as store_count,
COUNT(DISTINCT city) as cities,
array_agg(DISTINCT name ORDER BY name) as store_names
FROM stores
WHERE active = true
GROUP BY region, state
ORDER BY store_count DESC
`);
await pool.query(`
CREATE OR REPLACE VIEW market_coverage AS
SELECT
city,
state,
zip,
COUNT(*) as dispensaries,
array_agg(name ORDER BY name) as store_names,
COUNT(DISTINCT id) as unique_stores
FROM stores
WHERE active = true
GROUP BY city, state, zip
ORDER BY dispensaries DESC
`);
console.log('✅ Created location views');
console.log('\n✅ Geo-location setup complete!');
console.log('\n📊 Available location views:');
console.log(' - stores_by_region: Stores grouped by region/state');
console.log(' - market_coverage: Dispensary density by city');
console.log('\n💡 Your database now supports:');
console.log(' ✅ Lead Generation (contact info + locations)');
console.log(' ✅ Market Research (pricing + inventory data)');
console.log(' ✅ Investment Planning (market coverage + trends)');
console.log(' ✅ Retail Partner Discovery (store directory)');
console.log(' ✅ Geo-targeted Campaigns (lat/long + regions)');
console.log(' ✅ Trend Analysis (price history + timestamps)');
console.log(' ✅ Directory/App Creation (full store catalog)');
console.log(' ✅ Delivery Optimization (locations + addresses)');
} catch (error) {
console.error('❌ Error:', error);
} finally {
await pool.end();
}
}
addGeoFields();

View File

@@ -0,0 +1,215 @@
import { pool } from './src/db/migrate';
async function addPriceHistory() {
console.log('💰 Adding price history tracking...\n');
const client = await pool.connect();
try {
await client.query('BEGIN');
// Step 1: Create price_history table
console.log('1. Creating price_history table...');
await client.query(`
CREATE TABLE IF NOT EXISTS price_history (
id SERIAL PRIMARY KEY,
product_id INTEGER REFERENCES products(id) ON DELETE CASCADE,
store_id INTEGER REFERENCES stores(id) ON DELETE CASCADE,
brand_id INTEGER REFERENCES brands(id) ON DELETE SET NULL,
category_id INTEGER REFERENCES categories(id) ON DELETE SET NULL,
product_name VARCHAR(500),
price DECIMAL(10, 2),
sale_price DECIMAL(10, 2),
original_price DECIMAL(10, 2),
discount_percentage DECIMAL(5, 2),
discount_amount DECIMAL(10, 2),
in_stock BOOLEAN DEFAULT true,
is_special BOOLEAN DEFAULT false,
recorded_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
`);
// Step 2: Add indexes for fast price queries
console.log('2. Creating indexes for price queries...');
await client.query(`
CREATE INDEX IF NOT EXISTS idx_price_history_product ON price_history(product_id, recorded_at DESC);
CREATE INDEX IF NOT EXISTS idx_price_history_store ON price_history(store_id, recorded_at DESC);
CREATE INDEX IF NOT EXISTS idx_price_history_brand ON price_history(brand_id, recorded_at DESC);
CREATE INDEX IF NOT EXISTS idx_price_history_date ON price_history(recorded_at DESC);
CREATE INDEX IF NOT EXISTS idx_price_history_price_change ON price_history(product_id, price, recorded_at DESC);
`);
// Step 3: Create function to log price changes
console.log('3. Creating price change trigger function...');
await client.query(`
CREATE OR REPLACE FUNCTION log_price_change()
RETURNS TRIGGER AS $$
BEGIN
-- Only log if price actually changed or product just appeared
IF (TG_OP = 'INSERT') OR
(OLD.price IS DISTINCT FROM NEW.price) OR
(OLD.sale_price IS DISTINCT FROM NEW.sale_price) OR
(OLD.discount_percentage IS DISTINCT FROM NEW.discount_percentage) OR
(OLD.in_stock IS DISTINCT FROM NEW.in_stock) THEN
INSERT INTO price_history (
product_id, store_id, brand_id, category_id, product_name,
price, sale_price, original_price, discount_percentage, discount_amount,
in_stock, is_special, recorded_at
) VALUES (
NEW.id, NEW.store_id, NEW.brand_id, NEW.category_id, NEW.name,
NEW.price, NEW.sale_price, NEW.original_price,
NEW.discount_percentage, NEW.discount_amount,
NEW.in_stock, NEW.is_special, CURRENT_TIMESTAMP
);
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
`);
// Step 4: Create trigger on products table
console.log('4. Creating trigger on products table...');
await client.query(`
DROP TRIGGER IF EXISTS price_change_trigger ON products;
CREATE TRIGGER price_change_trigger
AFTER INSERT OR UPDATE ON products
FOR EACH ROW
EXECUTE FUNCTION log_price_change();
`);
// Step 5: Populate initial price history from existing products
console.log('5. Populating initial price history...');
await client.query(`
INSERT INTO price_history (
product_id, store_id, brand_id, category_id, product_name,
price, sale_price, original_price, discount_percentage, discount_amount,
in_stock, is_special, recorded_at
)
SELECT
id, store_id, brand_id, category_id, name,
price, sale_price, original_price, discount_percentage, discount_amount,
in_stock, is_special, COALESCE(first_seen_at, created_at)
FROM products
WHERE price IS NOT NULL
ON CONFLICT DO NOTHING
`);
const countResult = await client.query('SELECT COUNT(*) FROM price_history');
console.log(`${countResult.rows[0].count} price records created`);
// Step 6: Create helpful views for price monitoring
console.log('6. Creating price monitoring views...');
// Price changes view
await client.query(`
CREATE OR REPLACE VIEW price_changes AS
WITH price_with_previous AS (
SELECT
ph.*,
LAG(ph.price) OVER (PARTITION BY ph.product_id ORDER BY ph.recorded_at) as previous_price,
LAG(ph.recorded_at) OVER (PARTITION BY ph.product_id ORDER BY ph.recorded_at) as previous_date
FROM price_history ph
)
SELECT
pwp.product_id,
pwp.product_name,
s.name as store_name,
b.name as brand_name,
pwp.previous_price,
pwp.price as current_price,
pwp.price - pwp.previous_price as price_change,
ROUND(((pwp.price - pwp.previous_price) / NULLIF(pwp.previous_price, 0) * 100)::numeric, 2) as price_change_percent,
pwp.previous_date,
pwp.recorded_at as current_date
FROM price_with_previous pwp
JOIN stores s ON pwp.store_id = s.id
LEFT JOIN brands b ON pwp.brand_id = b.id
WHERE pwp.previous_price IS NOT NULL
AND pwp.price IS DISTINCT FROM pwp.previous_price
ORDER BY pwp.recorded_at DESC
`);
// Current prices view
await client.query(`
CREATE OR REPLACE VIEW current_prices AS
SELECT DISTINCT ON (product_id)
ph.product_id,
ph.product_name,
s.name as store_name,
b.name as brand_name,
c.name as category_name,
ph.price,
ph.sale_price,
ph.discount_percentage,
ph.in_stock,
ph.is_special,
ph.recorded_at as last_updated
FROM price_history ph
JOIN stores s ON ph.store_id = s.id
LEFT JOIN brands b ON ph.brand_id = b.id
LEFT JOIN categories c ON ph.category_id = c.id
ORDER BY ph.product_id, ph.recorded_at DESC
`);
// Price trends view (last 30 days)
await client.query(`
CREATE OR REPLACE VIEW price_trends_30d AS
WITH price_data AS (
SELECT
ph.product_id,
ph.product_name,
s.name as store_name,
ph.price,
ph.recorded_at,
ROW_NUMBER() OVER (PARTITION BY ph.product_id ORDER BY ph.recorded_at ASC) as first_row,
ROW_NUMBER() OVER (PARTITION BY ph.product_id ORDER BY ph.recorded_at DESC) as last_row
FROM price_history ph
JOIN stores s ON ph.store_id = s.id
WHERE ph.recorded_at >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT
product_id,
product_name,
store_name,
COUNT(*) as price_points,
MIN(price) as min_price,
MAX(price) as max_price,
ROUND(AVG(price)::numeric, 2) as avg_price,
MAX(CASE WHEN first_row = 1 THEN price END) as starting_price,
MAX(CASE WHEN last_row = 1 THEN price END) as current_price
FROM price_data
GROUP BY product_id, product_name, store_name
`);
await client.query('COMMIT');
console.log('\n✅ Price history tracking enabled!');
console.log('\n📊 Available price monitoring views:');
console.log(' - price_changes: See all price increases/decreases');
console.log(' - current_prices: Latest price for each product');
console.log(' - price_trends_30d: Price trends over last 30 days');
console.log('\n💡 Example queries:');
console.log(' -- Recent price increases:');
console.log(' SELECT * FROM price_changes WHERE price_change > 0 ORDER BY current_date DESC LIMIT 10;');
console.log(' ');
console.log(' -- Products on sale:');
console.log(' SELECT * FROM current_prices WHERE sale_price IS NOT NULL;');
console.log(' ');
console.log(' -- Biggest price drops:');
console.log(' SELECT * FROM price_changes WHERE price_change < 0 ORDER BY price_change ASC LIMIT 10;');
} catch (error) {
await client.query('ROLLBACK');
console.error('❌ Error:', error);
throw error;
} finally {
client.release();
await pool.end();
}
}
addPriceHistory();

View File

@@ -0,0 +1,45 @@
import { Pool } from 'pg';
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
async function addLocationColumns() {
try {
console.log('Adding location columns to proxies table...\n');
await pool.query(`
ALTER TABLE proxies
ADD COLUMN IF NOT EXISTS city VARCHAR(100),
ADD COLUMN IF NOT EXISTS state VARCHAR(100),
ADD COLUMN IF NOT EXISTS country VARCHAR(100),
ADD COLUMN IF NOT EXISTS country_code VARCHAR(10),
ADD COLUMN IF NOT EXISTS latitude DECIMAL(10, 8),
ADD COLUMN IF NOT EXISTS longitude DECIMAL(11, 8)
`);
console.log('✅ Location columns added successfully\n');
// Show updated schema
const result = await pool.query(`
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'proxies'
ORDER BY ordinal_position
`);
console.log('Updated proxies table schema:');
console.log('─'.repeat(60));
result.rows.forEach(row => {
console.log(` ${row.column_name}: ${row.data_type}`);
});
console.log('─'.repeat(60));
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await pool.end();
}
}
addLocationColumns();

View File

@@ -0,0 +1,74 @@
import { pool } from './src/db/migrate';
const solFlowerStores = [
{
name: 'Sol Flower - Sun City',
slug: 'sol-flower-sun-city',
dutchie_url: 'https://dutchie.com/dispensary/sol-flower-dispensary',
},
{
name: 'Sol Flower - South Tucson',
slug: 'sol-flower-south-tucson',
dutchie_url: 'https://dutchie.com/dispensary/sol-flower-dispensary-south-tucson',
},
{
name: 'Sol Flower - North Tucson',
slug: 'sol-flower-north-tucson',
dutchie_url: 'https://dutchie.com/dispensary/sol-flower-dispensary-north-tucson',
},
{
name: 'Sol Flower - McClintock (Tempe)',
slug: 'sol-flower-mcclintock',
dutchie_url: 'https://dutchie.com/dispensary/sol-flower-dispensary-mcclintock',
},
{
name: 'Sol Flower - Deer Valley (Phoenix)',
slug: 'sol-flower-deer-valley',
dutchie_url: 'https://dutchie.com/dispensary/sol-flower-dispensary-deer-valley',
},
];
async function addSolFlowerStores() {
console.log('🌻 Adding Sol Flower stores to database...\n');
try {
for (const store of solFlowerStores) {
// Check if store already exists
const existing = await pool.query(
'SELECT id FROM stores WHERE slug = $1',
[store.slug]
);
if (existing.rows.length > 0) {
console.log(`⏭️ Skipping ${store.name} - already exists (ID: ${existing.rows[0].id})`);
continue;
}
// Insert store
const result = await pool.query(
`INSERT INTO stores (name, slug, dutchie_url, active, scrape_enabled, logo_url)
VALUES ($1, $2, $3, true, true, $4)
RETURNING id`,
[store.name, store.slug, store.dutchie_url, 'https://dutchie.com/favicon.ico']
);
console.log(`✅ Added ${store.name} (ID: ${result.rows[0].id})`);
}
console.log('\n✅ All Sol Flower stores added successfully!');
// Show all stores
console.log('\n📊 All stores in database:');
const allStores = await pool.query(
'SELECT id, name, dutchie_url FROM stores ORDER BY id'
);
console.table(allStores.rows);
} catch (error) {
console.error('❌ Error:', error);
} finally {
await pool.end();
}
}
addSolFlowerStores();

View File

@@ -0,0 +1,90 @@
import { Pool } from 'pg';
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
async function addTestBrands() {
try {
// Update the store slug to match the local URL
console.log('Updating store slug...');
await pool.query(
`UPDATE stores
SET slug = $1,
dutchie_url = $2,
updated_at = NOW()
WHERE slug = $3`,
[
'curaleaf-az-48th-street',
'https://curaleaf.com/stores/curaleaf-dispensary-48th-street',
'curaleaf-az-48th-street-med'
]
);
console.log('✓ Store slug updated\n');
// Get the store ID
const storeResult = await pool.query(
'SELECT id FROM stores WHERE slug = $1',
['curaleaf-az-48th-street']
);
if (storeResult.rows.length === 0) {
console.log('Store not found!');
return;
}
const storeId = storeResult.rows[0].id;
// Sample products with brands commonly found at dispensaries
const testProducts = [
{ name: 'Select Elite Live Resin Cartridge - Clementine', brand: 'Select', price: 45.00, category: 'vape-pens', thc: 82.5 },
{ name: 'Curaleaf Flower - Blue Dream', brand: 'Curaleaf', price: 35.00, category: 'flower', thc: 22.0 },
{ name: 'Grassroots RSO Syringe', brand: 'Grassroots', price: 40.00, category: 'concentrates', thc: 75.0 },
{ name: 'Stiiizy Pod - Skywalker OG', brand: 'Stiiizy', price: 50.00, category: 'vape-pens', thc: 85.0 },
{ name: 'Cookies Flower - Gary Payton', brand: 'Cookies', price: 55.00, category: 'flower', thc: 28.0 },
{ name: 'Raw Garden Live Resin - Wedding Cake', brand: 'Raw Garden', price: 48.00, category: 'concentrates', thc: 80.5 },
{ name: 'Jeeter Pre-Roll - Zkittlez', brand: 'Jeeter', price: 12.00, category: 'pre-rolls', thc: 24.0 },
{ name: 'Kiva Camino Gummies - Wild Cherry', brand: 'Kiva', price: 20.00, category: 'edibles', thc: 5.0 },
{ name: 'Wyld Gummies - Raspberry', brand: 'Wyld', price: 18.00, category: 'edibles', thc: 10.0 },
{ name: 'Papa & Barkley Releaf Balm', brand: 'Papa & Barkley', price: 45.00, category: 'topicals', thc: 3.0 },
{ name: 'Brass Knuckles Cartridge - Gorilla Glue', brand: 'Brass Knuckles', price: 42.00, category: 'vape-pens', thc: 83.0 },
{ name: 'Heavy Hitters Ultra Extract - Sour Diesel', brand: 'Heavy Hitters', price: 55.00, category: 'concentrates', thc: 90.0 },
{ name: 'Cresco Liquid Live Resin - Pineapple Express', brand: 'Cresco', price: 50.00, category: 'vape-pens', thc: 87.0 },
{ name: 'Verano Pre-Roll - Mag Landrace', brand: 'Verano', price: 15.00, category: 'pre-rolls', thc: 26.0 },
{ name: 'Select Nano Gummies - Watermelon', brand: 'Select', price: 22.00, category: 'edibles', thc: 10.0 }
];
console.log(`Inserting ${testProducts.length} test products with brands...`);
console.log('─'.repeat(80));
for (const product of testProducts) {
await pool.query(
`INSERT INTO products (
store_id, name, brand, price, thc_percentage,
dutchie_url, in_stock
)
VALUES ($1, $2, $3, $4, $5, $6, true)`,
[
storeId,
product.name,
product.brand,
product.price,
product.thc,
`https://curaleaf.com/stores/curaleaf-dispensary-48th-street/product/${product.name.toLowerCase().replace(/\s+/g, '-')}`
]
);
console.log(`${product.brand} - ${product.name}`);
}
console.log('─'.repeat(80));
console.log(`\n✅ Added ${testProducts.length} test products with brands to the store\n`);
console.log(`View at: http://localhost:5174/stores/az/curaleaf/curaleaf-az-48th-street\n`);
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await pool.end();
}
}
addTestBrands();

View File

@@ -0,0 +1,23 @@
const { Pool } = require('pg');
const pool = new Pool({
connectionString: process.env.DATABASE_URL || 'postgresql://kelly:kelly@localhost:5432/hub'
});
(async () => {
try {
await pool.query(`
UPDATE proxy_test_jobs
SET status = 'cancelled',
completed_at = CURRENT_TIMESTAMP,
updated_at = CURRENT_TIMESTAMP
WHERE id = 2
`);
console.log('✅ Cancelled job ID 2');
process.exit(0);
} catch (error) {
console.error('Error:', error);
process.exit(1);
}
})();

View File

@@ -0,0 +1,53 @@
import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
import { writeFileSync } from 'fs';
puppeteer.use(StealthPlugin());
async function captureAgeGateCookies() {
const browser = await puppeteer.launch({
headless: false, // Visible browser so you can complete age gate manually
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-blink-features=AutomationControlled'
]
});
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });
console.log('\n===========================================');
console.log('INSTRUCTIONS:');
console.log('1. A browser window will open');
console.log('2. Complete the age gate manually');
console.log('3. Wait until you see the store page load');
console.log('4. Press ENTER in this terminal when done');
console.log('===========================================\n');
await page.goto('https://curaleaf.com/stores/curaleaf-az-48th-street');
// Wait for user to complete age gate manually
await new Promise((resolve) => {
process.stdin.once('data', () => resolve(null));
});
// Get cookies after age gate
const cookies = await page.cookies();
console.log('\nCaptured cookies:', JSON.stringify(cookies, null, 2));
// Save cookies to file
writeFileSync(
'/home/kelly/dutchie-menus/backend/curaleaf-cookies.json',
JSON.stringify(cookies, null, 2)
);
console.log('\n✅ Cookies saved to curaleaf-cookies.json');
console.log('Current URL:', page.url());
await browser.close();
process.exit(0);
}
captureAgeGateCookies();

View File

@@ -0,0 +1,27 @@
import { Pool } from 'pg';
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
async function check() {
try {
const result = await pool.query("SELECT id, name, slug, dutchie_url FROM stores WHERE slug LIKE '%48th%'");
console.log('Stores with "48th" in slug:');
console.log('─'.repeat(80));
result.rows.forEach(store => {
console.log(`ID: ${store.id}`);
console.log(`Name: ${store.name}`);
console.log(`Slug: ${store.slug}`);
console.log(`URL: ${store.dutchie_url}`);
console.log('─'.repeat(80));
});
console.log(`Total: ${result.rowCount}`);
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await pool.end();
}
}
check();

View File

@@ -0,0 +1,23 @@
import { pool } from './src/db/migrate.js';
async function main() {
const result = await pool.query(`
SELECT id, brand_name, status, products_scraped
FROM brand_jobs
WHERE dispensary_id = 112
AND status = 'completed'
ORDER BY products_scraped DESC
LIMIT 10
`);
console.log('\nCompleted Brand Jobs:');
console.log('='.repeat(80));
result.rows.forEach((row: any) => {
console.log(`${row.id}: ${row.brand_name} - ${row.products_scraped} products`);
});
console.log('='.repeat(80) + '\n');
await pool.end();
}
main().catch(console.error);

View File

@@ -0,0 +1,22 @@
import { pool } from './src/db/migrate.js';
async function checkBrandNames() {
const result = await pool.query(`
SELECT brand_slug, brand_name
FROM brand_scrape_jobs
WHERE dispensary_id = 112
ORDER BY id
LIMIT 20
`);
console.log('\nBrand Names in Database:');
console.log('='.repeat(60));
result.rows.forEach((row, idx) => {
console.log(`${idx + 1}. slug: ${row.brand_slug}`);
console.log(` name: ${row.brand_name}`);
});
await pool.end();
}
checkBrandNames().catch(console.error);

View File

@@ -0,0 +1,17 @@
import { pool } from './src/db/migrate';
async function checkTable() {
const result = await pool.query(`
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_name = 'brands'
ORDER BY ordinal_position
`);
console.log('Brands table structure:');
console.table(result.rows);
await pool.end();
}
checkTable();

55
backend/check-brands.ts Normal file
View File

@@ -0,0 +1,55 @@
import pg from 'pg';
const { Pool } = pg;
const pool = new Pool({
connectionString: process.env.DATABASE_URL
});
async function checkBrands() {
try {
// Get dispensary info
const dispensaryResult = await pool.query(
"SELECT id, name FROM dispensaries WHERE dutchie_slug = 'AZ-Deeply-Rooted'"
);
if (dispensaryResult.rows.length === 0) {
console.log('Dispensary not found');
return;
}
const dispensary = dispensaryResult.rows[0];
console.log(`Dispensary: ${dispensary.name} (ID: ${dispensary.id})`);
// Get brand count
const brandCountResult = await pool.query(
`SELECT COUNT(DISTINCT brand) as brand_count
FROM products
WHERE dispensary_id = $1 AND brand IS NOT NULL AND brand != ''`,
[dispensary.id]
);
console.log(`\nTotal distinct brands: ${brandCountResult.rows[0].brand_count}`);
// List all brands
const brandsResult = await pool.query(
`SELECT DISTINCT brand, COUNT(*) as product_count
FROM products
WHERE dispensary_id = $1 AND brand IS NOT NULL AND brand != ''
GROUP BY brand
ORDER BY brand`,
[dispensary.id]
);
console.log(`\nBrands found:`);
brandsResult.rows.forEach(row => {
console.log(` - ${row.brand} (${row.product_count} products)`);
});
} catch (error) {
console.error('Error:', error);
} finally {
await pool.end();
}
}
checkBrands();

27
backend/check-db.ts Normal file
View File

@@ -0,0 +1,27 @@
import { Pool } from 'pg';
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
async function checkDB() {
try {
const result = await pool.query(`
SELECT COUNT(*) as total,
COUNT(*) FILTER (WHERE active = true) as active
FROM proxies
`);
console.log('Proxies:', result.rows[0]);
const stores = await pool.query('SELECT slug FROM stores LIMIT 5');
console.log('Sample stores:', stores.rows.map(r => r.slug));
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await pool.end();
}
}
checkDB();

View File

@@ -0,0 +1,74 @@
import pg from 'pg';
const { Pool } = pg;
const pool = new Pool({
connectionString: process.env.DATABASE_URL
});
async function check() {
try {
// Get dispensary info - try different column names
const dispensaryResult = await pool.query(
"SELECT * FROM dispensaries WHERE name ILIKE '%Deeply Rooted%' LIMIT 1"
);
if (dispensaryResult.rows.length === 0) {
console.log('Dispensary not found. Listing all dispensaries:');
const all = await pool.query("SELECT id, name FROM dispensaries LIMIT 10");
all.rows.forEach(d => console.log(` ID ${d.id}: ${d.name}`));
return;
}
const dispensary = dispensaryResult.rows[0];
console.log(`Dispensary: ${dispensary.name} (ID: ${dispensary.id})`);
console.log(`Columns:`, Object.keys(dispensary));
// Get product count
const productCountResult = await pool.query(
`SELECT COUNT(*) as total_products FROM products WHERE dispensary_id = $1`,
[dispensary.id]
);
console.log(`\nTotal products: ${productCountResult.rows[0].total_products}`);
// Get brand count and list
const brandCountResult = await pool.query(
`SELECT COUNT(DISTINCT brand) as brand_count
FROM products
WHERE dispensary_id = $1 AND brand IS NOT NULL AND brand != ''`,
[dispensary.id]
);
console.log(`Total distinct brands: ${brandCountResult.rows[0].brand_count}`);
// List all brands
const brandsResult = await pool.query(
`SELECT DISTINCT brand, COUNT(*) as product_count
FROM products
WHERE dispensary_id = $1 AND brand IS NOT NULL AND brand != ''
GROUP BY brand
ORDER BY product_count DESC`,
[dispensary.id]
);
console.log(`\nBrands with products:`);
brandsResult.rows.forEach(row => {
console.log(` - ${row.brand} (${row.product_count} products)`);
});
// Count products without brands
const noBrandResult = await pool.query(
`SELECT COUNT(*) as no_brand_count
FROM products
WHERE dispensary_id = $1 AND (brand IS NULL OR brand = '')`,
[dispensary.id]
);
console.log(`\nProducts without brand: ${noBrandResult.rows[0].no_brand_count}`);
} catch (error) {
console.error('Error:', error);
} finally {
await pool.end();
}
}
check();

View File

@@ -0,0 +1,33 @@
import pkg from 'pg';
const { Pool } = pkg;
const pool = new Pool({
connectionString: process.env.DATABASE_URL
});
async function main() {
const result = await pool.query(`
SELECT
COUNT(*) as total_products,
COUNT(CASE WHEN discount_type IS NOT NULL AND discount_value IS NOT NULL THEN 1 END) as products_with_discounts
FROM products
`);
console.log('Product Count:');
console.log(result.rows[0]);
// Get a sample of products with discounts
const sample = await pool.query(`
SELECT name, brand, regular_price, sale_price, discount_type, discount_value
FROM products
WHERE discount_type IS NOT NULL AND discount_value IS NOT NULL
LIMIT 5
`);
console.log('\nSample Products with Discounts:');
console.log(sample.rows);
await pool.end();
}
main().catch(console.error);

View File

@@ -0,0 +1,25 @@
import { pool } from './src/db/migrate.js';
async function main() {
const result = await pool.query(`
SELECT
COUNT(*) as total,
COUNT(regular_price) as with_prices,
COUNT(*) - COUNT(regular_price) as without_prices
FROM products
WHERE dispensary_id = 112
`);
const stats = result.rows[0];
const pct = ((parseInt(stats.with_prices) / parseInt(stats.total)) * 100).toFixed(1);
console.log('\nENRICHMENT PROGRESS:');
console.log(` Total products: ${stats.total}`);
console.log(` With prices: ${stats.with_prices} (${pct}%)`);
console.log(` Without prices: ${stats.without_prices}`);
console.log('');
await pool.end();
}
main().catch(console.error);

View File

@@ -0,0 +1,89 @@
import { firefox } from 'playwright';
import { getRandomProxy } from './src/utils/proxyManager.js';
async function checkForPrices() {
const proxy = await getRandomProxy();
if (!proxy) {
console.log('No proxy available');
process.exit(1);
}
const browser = await firefox.launch({ headless: true });
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
proxy: {
server: proxy.server,
username: proxy.username,
password: proxy.password
}
});
const page = await context.newPage();
const brandUrl = 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted/brands/alien-labs';
console.log(`Loading: ${brandUrl}`);
await page.goto(brandUrl, { waitUntil: 'domcontentloaded', timeout: 60000 });
await page.waitForTimeout(5000);
// Check for any dollar signs on the entire page
const pageText = await page.evaluate(() => document.body.textContent);
const hasDollarSigns = pageText?.includes('$');
console.log('\n' + '='.repeat(80));
console.log('PRICE AVAILABILITY CHECK:');
console.log('='.repeat(80));
console.log(`\nPage contains '$' symbol: ${hasDollarSigns ? 'YES' : 'NO'}`);
if (hasDollarSigns) {
// Find all text containing dollar signs
const priceElements = await page.evaluate(() => {
const walker = document.createTreeWalker(
document.body,
NodeFilter.SHOW_TEXT,
null
);
const results: string[] = [];
let node;
while (node = walker.nextNode()) {
const text = node.textContent?.trim();
if (text && text.includes('$')) {
results.push(text);
}
}
return results.slice(0, 10); // First 10 instances
});
console.log('\nText containing "$":');
priceElements.forEach((text, idx) => {
console.log(` ${idx + 1}. ${text.substring(0, 100)}`);
});
}
// Check specifically within product cards
const productCardPrices = await page.evaluate(() => {
const cards = Array.from(document.querySelectorAll('a[href*="/product/"]'));
return cards.slice(0, 5).map(card => ({
text: card.textContent?.substring(0, 200),
hasDollar: card.textContent?.includes('$') || false
}));
});
console.log('\nFirst 5 Product Cards:');
productCardPrices.forEach((card, idx) => {
console.log(`\n Card ${idx + 1}:`);
console.log(` Has $: ${card.hasDollar}`);
console.log(` Text: ${card.text}`);
});
console.log('\n' + '='.repeat(80));
await browser.close();
process.exit(0);
}
checkForPrices().catch(console.error);

33
backend/check-jobs.js Normal file
View File

@@ -0,0 +1,33 @@
const { Pool } = require('pg');
const pool = new Pool({
connectionString: process.env.DATABASE_URL || 'postgresql://kelly:kelly@localhost:5432/hub'
});
(async () => {
try {
const result = await pool.query(`
SELECT id, status, total_proxies, tested_proxies, passed_proxies, failed_proxies,
created_at, started_at, completed_at
FROM proxy_test_jobs
ORDER BY created_at DESC
LIMIT 5
`);
console.log('\n📊 Recent Proxy Test Jobs:');
console.log('='.repeat(80));
result.rows.forEach(job => {
console.log(`\nJob ID: ${job.id}`);
console.log(`Status: ${job.status}`);
console.log(`Progress: ${job.tested_proxies}/${job.total_proxies} (${job.passed_proxies} passed, ${job.failed_proxies} failed)`);
console.log(`Created: ${job.created_at}`);
console.log(`Started: ${job.started_at || 'N/A'}`);
console.log(`Completed: ${job.completed_at || 'N/A'}`);
});
process.exit(0);
} catch (error) {
console.error('Error:', error);
process.exit(1);
}
})();

40
backend/check-jobs.ts Normal file
View File

@@ -0,0 +1,40 @@
import pg from 'pg';
const client = new pg.Client({
connectionString: process.env.DATABASE_URL,
});
async function checkJobs() {
await client.connect();
const statusRes = await client.query(`
SELECT status, COUNT(*) as count
FROM brand_scrape_jobs
WHERE dispensary_id = 112
GROUP BY status
ORDER BY status
`);
console.log('\n📊 Job Status Summary:');
console.log('====================');
statusRes.rows.forEach(row => {
console.log(`${row.status}: ${row.count}`);
});
const activeRes = await client.query(`
SELECT worker_id, COUNT(*) as count
FROM brand_scrape_jobs
WHERE dispensary_id = 112 AND status = 'in_progress'
GROUP BY worker_id
`);
console.log('\n👷 Active Workers:');
console.log('==================');
activeRes.rows.forEach(row => {
console.log(`${row.worker_id}: ${row.count} jobs`);
});
await client.end();
}
checkJobs();

110
backend/check-leaks.ts Normal file
View File

@@ -0,0 +1,110 @@
import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
import { Pool } from 'pg';
puppeteer.use(StealthPlugin());
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
async function check() {
let browser;
try {
const proxyResult = await pool.query(`SELECT host, port, protocol FROM proxies ORDER BY RANDOM() LIMIT 1`);
const proxy = proxyResult.rows[0];
browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox', `--proxy-server=${proxy.protocol}://${proxy.host}:${proxy.port}`]
});
const page = await browser.newPage();
await page.setUserAgent('Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)');
console.log('🔍 CHECKING FOR DATA LEAKS\n');
await page.goto('https://curaleaf.com/stores/curaleaf-dispensary-phoenix-airport/brands', {
waitUntil: 'networkidle2',
timeout: 60000
});
await page.waitForTimeout(5000);
// Check what browser exposes
const browserData = await page.evaluate(() => ({
// Automation detection
webdriver: navigator.webdriver,
hasHeadlessUA: /headless/i.test(navigator.userAgent),
// User agent
userAgent: navigator.userAgent,
// Chrome detection
hasChrome: typeof (window as any).chrome !== 'undefined',
chromeKeys: (window as any).chrome ? Object.keys((window as any).chrome) : [],
// Permissions
permissions: navigator.permissions ? 'exists' : 'missing',
// Languages
languages: navigator.languages,
language: navigator.language,
// Plugins
pluginCount: navigator.plugins.length,
// Platform
platform: navigator.platform,
// Screen
screenWidth: screen.width,
screenHeight: screen.height,
// JavaScript working?
jsWorking: true,
// Page content
title: document.title,
bodyLength: document.body.innerHTML.length,
hasReactRoot: document.getElementById('__next') !== null,
scriptTags: document.querySelectorAll('script').length
}));
console.log('📋 BROWSER FINGERPRINT:');
console.log('─'.repeat(60));
console.log('navigator.webdriver:', browserData.webdriver, browserData.webdriver ? '❌ LEAKED!' : '✅');
console.log('navigator.userAgent:', browserData.userAgent);
console.log('Has "headless" in UA:', browserData.hasHeadlessUA, browserData.hasHeadlessUA ? '❌' : '✅');
console.log('window.chrome exists:', browserData.hasChrome, browserData.hasChrome ? '✅' : '❌ SUSPICIOUS');
console.log('Chrome keys:', browserData.chromeKeys.join(', '));
console.log('Languages:', browserData.languages);
console.log('Platform:', browserData.platform);
console.log('Plugins:', browserData.pluginCount);
console.log('\n📄 PAGE STATE:');
console.log('─'.repeat(60));
console.log('JavaScript executing:', browserData.jsWorking ? '✅ YES' : '❌ NO');
console.log('Page title:', `"${browserData.title}"`);
console.log('Body HTML size:', browserData.bodyLength, 'chars');
console.log('React root exists:', browserData.hasReactRoot ? '✅' : '❌');
console.log('Script tags:', browserData.scriptTags);
if (browserData.bodyLength < 1000) {
console.log('\n⚠ PROBLEM: Body too small! JS likely failed to load/execute');
}
if (!browserData.title) {
console.log('⚠️ PROBLEM: No page title! Page didn\'t render');
}
} catch (error: any) {
console.error('❌', error.message);
} finally {
if (browser) await browser.close();
await pool.end();
}
}
check();

View File

@@ -0,0 +1,72 @@
import { pool } from './src/db/migrate.js';
async function checkProductData() {
// Get a few recently saved products
const result = await pool.query(`
SELECT
slug,
name,
brand,
variant,
regular_price,
sale_price,
thc_percentage,
cbd_percentage,
strain_type,
in_stock,
stock_status,
image_url
FROM products
WHERE dispensary_id = 112
AND brand IN ('(the) Essence', 'Abundant Organics', 'AAchieve', 'Alien Labs')
ORDER BY updated_at DESC
LIMIT 10
`);
console.log('\n📊 Recently Saved Products:');
console.log('='.repeat(100));
result.rows.forEach((row, idx) => {
console.log(`\n${idx + 1}. ${row.name} (${row.brand})`);
console.log(` Variant: ${row.variant || 'N/A'}`);
console.log(` Regular Price: $${row.regular_price || 'N/A'}`);
console.log(` Sale Price: $${row.sale_price || 'N/A'}`);
console.log(` THC %: ${row.thc_percentage || 'N/A'}%`);
console.log(` CBD %: ${row.cbd_percentage || 'N/A'}%`);
console.log(` Strain: ${row.strain_type || 'N/A'}`);
console.log(` Stock: ${row.stock_status || (row.in_stock ? 'In stock' : 'Out of stock')}`);
console.log(` Image: ${row.image_url ? '✓' : 'N/A'}`);
});
console.log('\n' + '='.repeat(100));
// Count how many products have complete data
const stats = await pool.query(`
SELECT
COUNT(*) as total,
COUNT(regular_price) as has_price,
COUNT(thc_percentage) as has_thc,
COUNT(cbd_percentage) as has_cbd,
COUNT(variant) as has_variant,
COUNT(strain_type) as has_strain,
COUNT(image_url) as has_image
FROM products
WHERE dispensary_id = 112
AND brand IN ('(the) Essence', 'Abundant Organics', 'AAchieve', 'Alien Labs')
`);
const stat = stats.rows[0];
console.log('\n📈 Data Completeness for Recently Scraped Brands:');
console.log(` Total products: ${stat.total}`);
console.log(` Has price: ${stat.has_price} (${Math.round(stat.has_price / stat.total * 100)}%)`);
console.log(` Has THC%: ${stat.has_thc} (${Math.round(stat.has_thc / stat.total * 100)}%)`);
console.log(` Has CBD%: ${stat.has_cbd} (${Math.round(stat.has_cbd / stat.total * 100)}%)`);
console.log(` Has variant: ${stat.has_variant} (${Math.round(stat.has_variant / stat.total * 100)}%)`);
console.log(` Has strain type: ${stat.has_strain} (${Math.round(stat.has_strain / stat.total * 100)}%)`);
console.log(` Has image: ${stat.has_image} (${Math.round(stat.has_image / stat.total * 100)}%)`);
console.log('');
await pool.end();
}
checkProductData().catch(console.error);

View File

@@ -0,0 +1,75 @@
import { firefox } from 'playwright';
import { getRandomProxy } from './src/utils/proxyManager.js';
async function checkProductDetailPage() {
const proxy = await getRandomProxy();
if (!proxy) {
console.log('No proxy available');
process.exit(1);
}
const browser = await firefox.launch({ headless: true });
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
proxy: {
server: proxy.server,
username: proxy.username,
password: proxy.password
}
});
const page = await context.newPage();
// Load a product detail page
const productUrl = 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted/product/alien-labs-cured-resin-cart-dark-web';
console.log(`Loading: ${productUrl}`);
await page.goto(productUrl, { waitUntil: 'domcontentloaded', timeout: 60000 });
await page.waitForTimeout(5000);
// Extract all data from the product detail page
const productData = await page.evaluate(() => {
const pageText = document.body.textContent || '';
// Check for prices
const priceElements = Array.from(document.querySelectorAll('*')).filter(el => {
const text = el.textContent?.trim() || '';
return text.match(/\$\d+/) && el.children.length === 0; // Leaf nodes only
});
// Check for stock information
const stockElements = Array.from(document.querySelectorAll('*')).filter(el => {
const text = el.textContent?.toLowerCase() || '';
return (text.includes('stock') || text.includes('available') || text.includes('in stock') || text.includes('out of stock')) && el.children.length === 0;
});
return {
hasPrice: pageText.includes('$'),
priceText: priceElements.slice(0, 5).map(el => el.textContent?.trim()),
stockText: stockElements.slice(0, 5).map(el => el.textContent?.trim()),
pageTextSample: pageText.substring(0, 500)
};
});
console.log('\n' + '='.repeat(80));
console.log('PRODUCT DETAIL PAGE DATA:');
console.log('='.repeat(80));
console.log('\nHas "$" symbol:', productData.hasPrice);
console.log('\nPrice elements found:');
productData.priceText.forEach((text, idx) => {
console.log(` ${idx + 1}. ${text}`);
});
console.log('\nStock elements found:');
productData.stockText.forEach((text, idx) => {
console.log(` ${idx + 1}. ${text}`);
});
console.log('\nPage text sample:');
console.log(productData.pageTextSample);
console.log('\n' + '='.repeat(80));
await browser.close();
process.exit(0);
}
checkProductDetailPage().catch(console.error);

View File

@@ -0,0 +1,56 @@
import { pool } from './src/db/migrate.js';
async function checkProductPrices() {
const result = await pool.query(`
SELECT
id,
name,
brand,
regular_price,
sale_price,
in_stock,
stock_status
FROM products
WHERE dispensary_id = 112
ORDER BY brand, name
LIMIT 50
`);
console.log('\n' + '='.repeat(100));
console.log('PRODUCTS WITH PRICES');
console.log('='.repeat(100) + '\n');
result.rows.forEach((row, idx) => {
const regularPrice = row.regular_price ? `$${row.regular_price.toFixed(2)}` : 'N/A';
const salePrice = row.sale_price ? `$${row.sale_price.toFixed(2)}` : 'N/A';
const stock = row.in_stock ? (row.stock_status || 'In Stock') : 'Out of Stock';
console.log(`${idx + 1}. ${row.brand} - ${row.name.substring(0, 50)}`);
console.log(` Price: ${regularPrice} | Sale: ${salePrice} | Stock: ${stock}`);
console.log('');
});
console.log('='.repeat(100) + '\n');
// Summary stats
const stats = await pool.query(`
SELECT
COUNT(*) as total_products,
COUNT(regular_price) as products_with_price,
COUNT(sale_price) as products_with_sale,
COUNT(CASE WHEN in_stock THEN 1 END) as in_stock_count
FROM products
WHERE dispensary_id = 112
`);
console.log('SUMMARY:');
console.log(` Total products: ${stats.rows[0].total_products}`);
console.log(` Products with regular price: ${stats.rows[0].products_with_price}`);
console.log(` Products with sale price: ${stats.rows[0].products_with_sale}`);
console.log(` Products in stock: ${stats.rows[0].in_stock_count}`);
console.log('\n');
await pool.end();
}
checkProductPrices().catch(console.error);

View File

@@ -0,0 +1,47 @@
import { pool } from './src/db/migrate.js';
async function main() {
const result = await pool.query(`
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_name = 'products'
ORDER BY ordinal_position;
`);
console.log('Products table columns:');
result.rows.forEach(row => {
console.log(` ${row.column_name}: ${row.data_type} (${row.is_nullable === 'YES' ? 'nullable' : 'NOT NULL'})`);
});
const constraints = await pool.query(`
SELECT constraint_name, constraint_type
FROM information_schema.table_constraints
WHERE table_name = 'products';
`);
console.log('\nProducts table constraints:');
constraints.rows.forEach(row => {
console.log(` ${row.constraint_name}: ${row.constraint_type}`);
});
// Get unique constraints details
const uniqueConstraints = await pool.query(`
SELECT
tc.constraint_name,
kcu.column_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
ON tc.constraint_name = kcu.constraint_name
WHERE tc.table_name = 'products'
AND tc.constraint_type IN ('PRIMARY KEY', 'UNIQUE');
`);
console.log('\nUnique/Primary key constraints:');
uniqueConstraints.rows.forEach(row => {
console.log(` ${row.constraint_name}: ${row.column_name}`);
});
await pool.end();
}
main().catch(console.error);

29
backend/check-products.ts Normal file
View File

@@ -0,0 +1,29 @@
import { Pool } from 'pg';
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
async function checkProducts() {
try {
console.log('Products table columns:');
const productsColumns = await pool.query(`
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'products'
ORDER BY ordinal_position
`);
productsColumns.rows.forEach(r => console.log(` - ${r.column_name}: ${r.data_type}`));
console.log('\nSample products:');
const products = await pool.query('SELECT * FROM products LIMIT 3');
console.log(JSON.stringify(products.rows, null, 2));
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await pool.end();
}
}
checkProducts();

31
backend/check-proxies.ts Normal file
View File

@@ -0,0 +1,31 @@
import { Pool } from 'pg';
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
async function checkProxies() {
try {
const result = await pool.query(`
SELECT
COUNT(*) as total,
COUNT(*) FILTER (WHERE active = true) as active,
COUNT(*) FILTER (WHERE state = 'Arizona') as arizona
FROM proxies
`);
console.log('Proxy Stats:');
console.log('─'.repeat(40));
console.log(`Total: ${result.rows[0].total}`);
console.log(`Active: ${result.rows[0].active}`);
console.log(`Arizona: ${result.rows[0].arizona}`);
console.log('─'.repeat(40));
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await pool.end();
}
}
checkProxies();

View File

@@ -0,0 +1,36 @@
const { Pool } = require('pg');
const pool = new Pool({
connectionString: process.env.DATABASE_URL || 'postgresql://kelly:kelly@localhost:5432/hub'
});
(async () => {
try {
const stats = await pool.query(`
SELECT
COUNT(*) as total,
COUNT(*) FILTER (WHERE active = true) as active,
COUNT(*) FILTER (WHERE active = false) as inactive,
COUNT(*) FILTER (WHERE test_result = 'success') as passed,
COUNT(*) FILTER (WHERE test_result = 'failed') as failed,
COUNT(*) FILTER (WHERE test_result IS NULL) as untested
FROM proxies
`);
const s = stats.rows[0];
console.log('\n📊 Proxy Statistics:');
console.log('='.repeat(60));
console.log(`Total Proxies: ${s.total}`);
console.log(`Active: ${s.active} (passing tests)`);
console.log(`Inactive: ${s.inactive} (failed tests)`);
console.log(`Test Results:`);
console.log(` ✅ Passed: ${s.passed}`);
console.log(` ❌ Failed: ${s.failed}`);
console.log(` ⚪ Untested: ${s.untested}`);
process.exit(0);
} catch (error) {
console.error('Error:', error);
process.exit(1);
}
})();

View File

@@ -0,0 +1,60 @@
const { Pool } = require('pg');
const pool = new Pool({
connectionString: process.env.DATABASE_URL || 'postgresql://kelly:kelly@localhost:5432/hub'
});
(async () => {
try {
// Check for categories that have been scraped
const historyResult = await pool.query(`
SELECT
s.id as store_id,
s.name as store_name,
c.id as category_id,
c.name as category_name,
c.last_scraped_at,
(
SELECT COUNT(*)
FROM products p
WHERE p.store_id = s.id
AND p.category_id = c.id
) as product_count
FROM stores s
LEFT JOIN categories c ON c.store_id = s.id
WHERE c.last_scraped_at IS NOT NULL
ORDER BY c.last_scraped_at DESC
LIMIT 10
`);
console.log('\n📊 Scraper History:');
console.log('='.repeat(80));
if (historyResult.rows.length === 0) {
console.log('No scraper history found. No categories have been scraped yet.');
} else {
historyResult.rows.forEach(row => {
console.log(`\nStore: ${row.store_name} (ID: ${row.store_id})`);
console.log(`Category: ${row.category_name} (ID: ${row.category_id})`);
console.log(`Last Scraped: ${row.last_scraped_at}`);
console.log(`Products: ${row.product_count}`);
});
}
// Check total categories
const totalCategoriesResult = await pool.query(`
SELECT COUNT(*) as total FROM categories
`);
console.log(`\n\nTotal Categories: ${totalCategoriesResult.rows[0].total}`);
// Check categories with last_scraped_at
const scrapedCategoriesResult = await pool.query(`
SELECT COUNT(*) as scraped FROM categories WHERE last_scraped_at IS NOT NULL
`);
console.log(`Categories Scraped: ${scrapedCategoriesResult.rows[0].scraped}`);
process.exit(0);
} catch (error) {
console.error('Error:', error);
process.exit(1);
}
})();

View File

@@ -0,0 +1,32 @@
import { pool } from './src/db/migrate.js';
async function main() {
const result = await pool.query(`
SELECT name, brand, regular_price, sale_price, in_stock, stock_status
FROM products
WHERE dispensary_id = 112
AND brand = 'Select'
ORDER BY name
LIMIT 10
`);
console.log('\n' + '='.repeat(100));
console.log('SELECT BRAND PRODUCTS WITH PRICES (NEW ONE-PASS APPROACH)');
console.log('='.repeat(100) + '\n');
result.rows.forEach((row, idx) => {
const regularPrice = row.regular_price ? `$${parseFloat(row.regular_price).toFixed(2)}` : 'N/A';
const salePrice = row.sale_price ? `$${parseFloat(row.sale_price).toFixed(2)}` : 'N/A';
const stock = row.in_stock ? (row.stock_status || 'In Stock') : 'Out of Stock';
console.log(`${idx + 1}. ${row.name}`);
console.log(` Price: ${regularPrice} | Sale: ${salePrice} | Stock: ${stock}`);
console.log('');
});
console.log('='.repeat(100) + '\n');
await pool.end();
}
main().catch(console.error);

View File

@@ -0,0 +1,161 @@
import { firefox } from 'playwright';
import { pool } from './src/db/migrate.js';
import { getRandomProxy } from './src/utils/proxyManager.js';
async function checkProduct() {
const proxy = await getRandomProxy();
if (!proxy) {
console.log('No proxy available');
process.exit(1);
}
console.log(`Using proxy: ${proxy.server}`);
const browser = await firefox.launch({
headless: true,
firefoxUserPrefs: {
'geo.enabled': true,
}
});
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
geolocation: { latitude: 33.4484, longitude: -112.0740 },
permissions: ['geolocation'],
proxy: {
server: proxy.server,
username: proxy.username,
password: proxy.password
}
});
const page = await context.newPage();
try {
console.log('Loading product page...');
const url = process.argv[2] || 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted/product/abundant-organics-flower-mylar-abundant-horizon';
await page.goto(url, {
waitUntil: 'domcontentloaded',
timeout: 30000
});
await page.waitForTimeout(5000);
const productData = await page.evaluate(() => {
const data: any = { fields: {} };
const allText = document.body.textContent || '';
// 1. BASIC INFO
const nameEl = document.querySelector('h1');
data.fields.name = nameEl?.textContent?.trim() || null;
// 2. CATEGORY - look for breadcrumbs or category links
const breadcrumbs = Array.from(document.querySelectorAll('[class*="breadcrumb"] a, nav a'));
data.fields.category = breadcrumbs.map(b => b.textContent?.trim()).filter(Boolean);
// 3. BRAND
const brandSelectors = ['[class*="brand"]', '[data-testid*="brand"]', 'span:has-text("Brand")', 'label:has-text("Brand")'];
for (const sel of brandSelectors) {
try {
const el = document.querySelector(sel);
if (el && el.textContent && !el.textContent.includes('Brand:')) {
data.fields.brand = el.textContent.trim();
break;
}
} catch {}
}
// 4. PRICES
const priceMatches = allText.match(/\$(\d+\.?\d*)/g);
data.fields.prices = priceMatches || [];
// 5. THC/CBD CONTENT
const thcMatch = allText.match(/THC[:\s]*(\d+\.?\d*)\s*%/i);
const cbdMatch = allText.match(/CBD[:\s]*(\d+\.?\d*)\s*%/i);
data.fields.thc = thcMatch ? parseFloat(thcMatch[1]) : null;
data.fields.cbd = cbdMatch ? parseFloat(cbdMatch[1]) : null;
// 6. STRAIN TYPE
if (allText.match(/\bindica\b/i)) data.fields.strainType = 'Indica';
else if (allText.match(/\bsativa\b/i)) data.fields.strainType = 'Sativa';
else if (allText.match(/\bhybrid\b/i)) data.fields.strainType = 'Hybrid';
// 7. WEIGHT/SIZE OPTIONS
const weights = allText.matchAll(/(\d+\.?\d*\s*(?:g|oz|mg|ml|gram|ounce))/gi);
data.fields.weights = Array.from(weights).map(m => m[1].trim());
// 8. DESCRIPTION
const descSelectors = ['[class*="description"]', '[class*="Description"]', 'p[class*="product"]'];
for (const sel of descSelectors) {
const el = document.querySelector(sel);
if (el?.textContent && el.textContent.length > 20) {
data.fields.description = el.textContent.trim().substring(0, 500);
break;
}
}
// 9. EFFECTS
const effectNames = ['Relaxed', 'Happy', 'Euphoric', 'Uplifted', 'Creative', 'Energetic', 'Focused', 'Calm', 'Sleepy', 'Hungry'];
data.fields.effects = effectNames.filter(e => allText.match(new RegExp(`\\b${e}\\b`, 'i')));
// 10. TERPENES
const terpeneNames = ['Myrcene', 'Limonene', 'Caryophyllene', 'Pinene', 'Linalool', 'Humulene'];
data.fields.terpenes = terpeneNames.filter(t => allText.match(new RegExp(`\\b${t}\\b`, 'i')));
// 11. FLAVORS
const flavorNames = ['Sweet', 'Citrus', 'Earthy', 'Pine', 'Berry', 'Diesel', 'Sour', 'Floral', 'Spicy'];
data.fields.flavors = flavorNames.filter(f => allText.match(new RegExp(`\\b${f}\\b`, 'i')));
// 12. SPECIAL INFO
data.fields.hasSpecialText = allText.includes('Special') || allText.includes('Sale') || allText.includes('Deal');
const endsMatch = allText.match(/(?:ends?|expires?)\s+(?:in\s+)?(\d+)\s+(min|hour|day)/i);
data.fields.specialEndsIn = endsMatch ? `${endsMatch[1]} ${endsMatch[2]}` : null;
// 13. IMAGE URLS
const images = Array.from(document.querySelectorAll('img[src*="dutchie"]'));
data.fields.imageUrls = images.map(img => (img as HTMLImageElement).src).filter(Boolean);
// 14. ALL VISIBLE TEXT (for debugging)
data.allVisibleText = allText.substring(0, 1000);
// 15. STRUCTURED DATA FROM SCRIPTS
const scripts = Array.from(document.querySelectorAll('script'));
data.structuredData = {};
for (const script of scripts) {
const content = script.textContent || '';
const idMatch = content.match(/"id":"([a-f0-9-]+)"/);
if (idMatch && idMatch[1].length > 10) {
data.structuredData.productId = idMatch[1];
}
const variantMatch = content.match(/"variantId":"([^"]+)"/);
if (variantMatch) {
data.structuredData.variantId = variantMatch[1];
}
const categoryMatch = content.match(/"category":"([^"]+)"/);
if (categoryMatch) {
data.structuredData.category = categoryMatch[1];
}
}
return data;
});
console.log('\n=== PRODUCT DATA (Time: ' + new Date().toISOString() + ') ===');
console.log(JSON.stringify(productData, null, 2));
await browser.close();
await pool.end();
} catch (error) {
console.error('Error:', error);
await browser.close();
await pool.end();
process.exit(1);
}
}
checkProduct();

49
backend/check-store.ts Normal file
View File

@@ -0,0 +1,49 @@
import { Pool } from 'pg';
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
async function checkStore() {
try {
const store = await pool.query(`
SELECT id, name, slug FROM stores WHERE slug = 'curaleaf-az-48th-street'
`);
if (store.rows.length > 0) {
console.log('Store found:', store.rows[0]);
// Check if it has products
const products = await pool.query(`
SELECT COUNT(*) as total, COUNT(DISTINCT brand) as brands
FROM products WHERE store_id = $1
`, [store.rows[0].id]);
console.log('Store products:', products.rows[0]);
// Check distinct brands
const brands = await pool.query(`
SELECT DISTINCT brand FROM products
WHERE store_id = $1 AND brand IS NOT NULL
ORDER BY brand
`, [store.rows[0].id]);
console.log('\nCurrent brands:', brands.rows.map(r => r.brand));
} else {
console.log('Store not found');
// Show available stores
const stores = await pool.query(`
SELECT slug FROM stores WHERE slug LIKE '%48th%'
`);
console.log('Stores with 48th:', stores.rows.map(r => r.slug));
}
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await pool.end();
}
}
checkStore();

24
backend/check-stores.ts Normal file
View File

@@ -0,0 +1,24 @@
import { pool } from './src/db/migrate';
async function checkStores() {
try {
const result = await pool.query(`
SELECT id, name, slug, dutchie_url
FROM stores
WHERE name ILIKE '%sol flower%'
ORDER BY name
`);
console.log(`Found ${result.rows.length} Sol Flower stores:\n`);
result.rows.forEach(store => {
console.log(`ID ${store.id}: ${store.name}`);
console.log(` URL: ${store.dutchie_url}\n`);
});
} catch (error) {
console.error('Error:', error);
} finally {
await pool.end();
}
}
checkStores();

35
backend/check-tables.ts Normal file
View File

@@ -0,0 +1,35 @@
import { Pool } from 'pg';
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
async function checkTables() {
try {
const result = await pool.query(`
SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'public'
ORDER BY table_name
`);
console.log('Tables in database:');
result.rows.forEach(r => console.log(' -', r.table_name));
// Check stores table structure
console.log('\nStores table columns:');
const storesColumns = await pool.query(`
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'stores'
`);
storesColumns.rows.forEach(r => console.log(` - ${r.column_name}: ${r.data_type}`));
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await pool.end();
}
}
checkTables();

View File

@@ -0,0 +1,105 @@
import { chromium } from 'playwright';
const GOOGLE_UA = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36';
async function main() {
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext({
userAgent: GOOGLE_UA
});
const page = await context.newPage();
try {
console.log('Loading menu page...');
await page.goto('https://best.treez.io/onlinemenu/?customerType=ADULT', {
waitUntil: 'networkidle',
timeout: 30000
});
await page.waitForTimeout(3000);
// Look for category navigation elements
console.log('\n=== Checking for category filters/tabs ===\n');
// Check for common category selectors
const categorySelectors = [
'nav a',
'nav button',
'[role="tab"]',
'[class*="category"]',
'[class*="filter"]',
'[class*="nav"]',
'.menu-category',
'.category-filter',
'.product-category'
];
for (const selector of categorySelectors) {
const elements = await page.locator(selector).all();
if (elements.length > 0) {
console.log(`\nFound ${elements.length} elements matching "${selector}":`);
for (let i = 0; i < Math.min(10, elements.length); i++) {
const text = await elements[i].textContent();
const href = await elements[i].getAttribute('href');
const className = await elements[i].getAttribute('class');
console.log(` ${i + 1}. Text: "${text?.trim()}" | Class: "${className}" | Href: "${href}"`);
}
}
}
// Check the main navigation
console.log('\n=== Main Navigation Structure ===\n');
const navElements = await page.locator('nav, [role="navigation"]').all();
console.log(`Found ${navElements.length} navigation elements`);
for (let i = 0; i < navElements.length; i++) {
const navHtml = await navElements[i].innerHTML();
console.log(`\nNavigation ${i + 1}:`);
console.log(navHtml.substring(0, 500)); // First 500 chars
console.log('...');
}
// Check for dropdowns or select elements
console.log('\n=== Checking for dropdowns ===\n');
const selects = await page.locator('select').all();
console.log(`Found ${selects.length} select elements`);
for (let i = 0; i < selects.length; i++) {
const options = await selects[i].locator('option').all();
console.log(`\nSelect ${i + 1} has ${options.length} options:`);
for (let j = 0; j < Math.min(10, options.length); j++) {
const text = await options[j].textContent();
const value = await options[j].getAttribute('value');
console.log(` - "${text}" (value: ${value})`);
}
}
// Look for any clickable category buttons
console.log('\n=== Checking for category buttons ===\n');
const buttons = await page.locator('button').all();
console.log(`Found ${buttons.length} total buttons`);
const categoryButtons = [];
for (const button of buttons) {
const text = await button.textContent();
const className = await button.getAttribute('class');
if (text && (text.includes('Flower') || text.includes('Edible') || text.includes('Vape') ||
text.includes('Concentrate') || text.includes('Pre-Roll') || text.includes('All'))) {
categoryButtons.push({ text: text.trim(), class: className });
}
}
console.log(`Found ${categoryButtons.length} potential category buttons:`);
categoryButtons.forEach((btn, i) => {
console.log(` ${i + 1}. "${btn.text}" (class: ${btn.class})`);
});
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await browser.close();
}
}
main().catch(console.error);

View File

@@ -0,0 +1,52 @@
import { pool } from './src/db/migrate.js';
async function main() {
try {
// Count total products and unique brands
const stats = await pool.query(`
SELECT
COUNT(*) as total_products,
COUNT(DISTINCT brand) as unique_brands
FROM products
WHERE dispensary_id = 149
`);
console.log('Stats:', stats.rows[0]);
// Get sample products to verify brand extraction
const samples = await pool.query(`
SELECT brand, name, variant, dutchie_url
FROM products
WHERE dispensary_id = 149
ORDER BY RANDOM()
LIMIT 10
`);
console.log('\nSample products:');
samples.rows.forEach(row => {
console.log(`Brand: "${row.brand}" | Name: "${row.name}" | Variant: "${row.variant}"`);
});
// Get brand distribution
const brands = await pool.query(`
SELECT brand, COUNT(*) as count
FROM products
WHERE dispensary_id = 149
GROUP BY brand
ORDER BY count DESC
LIMIT 15
`);
console.log('\nTop brands:');
brands.rows.forEach(row => {
console.log(`${row.brand}: ${row.count} products`);
});
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await pool.end();
}
}
main().catch(console.error);

View File

@@ -0,0 +1,70 @@
import { chromium } from 'playwright';
const GOOGLE_UA = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36';
async function main() {
console.log('Checking BEST Dispensary Treez menu for pagination...');
const browser = await chromium.launch({ headless: false });
const context = await browser.newContext({ userAgent: GOOGLE_UA });
const page = await context.newPage();
try {
console.log('Loading menu page...');
await page.goto('https://best.treez.io/onlinemenu/?customerType=ADULT', {
waitUntil: 'networkidle',
timeout: 30000
});
await page.waitForTimeout(3000);
// Check initial count
const initialItems = await page.locator('.menu-item').all();
console.log(`Initial menu items found: ${initialItems.length}`);
// Check for pagination controls
const paginationButtons = await page.locator('button:has-text("Next"), button:has-text("Load More"), .pagination, [class*="page"], [class*="Pagination"]').all();
console.log(`Pagination controls found: ${paginationButtons.length}`);
// Check page height and scroll
const scrollHeight = await page.evaluate(() => document.body.scrollHeight);
console.log(`Page scroll height: ${scrollHeight}px`);
// Try scrolling to bottom
console.log('Scrolling to bottom...');
await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
await page.waitForTimeout(2000);
// Check if more items loaded after scroll
const afterScrollItems = await page.locator('.menu-item').all();
console.log(`Menu items after scroll: ${afterScrollItems.length}`);
// Check for categories/filters
const categories = await page.locator('[class*="category"], [class*="filter"], nav a, .nav-link').all();
console.log(`Category/filter links found: ${categories.length}`);
if (categories.length > 0) {
console.log('\nCategory links:');
for (let i = 0; i < Math.min(categories.length, 10); i++) {
const text = await categories[i].textContent();
const href = await categories[i].getAttribute('href');
console.log(` - ${text?.trim()} (${href})`);
}
}
// Take a screenshot
await page.screenshot({ path: '/tmp/treez-menu-check.png', fullPage: true });
console.log('\nScreenshot saved to /tmp/treez-menu-check.png');
// Get page HTML to analyze structure
const html = await page.content();
console.log(`\nPage HTML length: ${html.length} characters`);
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await browser.close();
}
}
main().catch(console.error);

View File

@@ -0,0 +1,215 @@
import { pool } from './src/db/migrate';
async function migrateCompleteSchema() {
console.log('🔧 Migrating to complete normalized schema...\n');
const client = await pool.connect();
try {
await client.query('BEGIN');
// Step 1: Add brand_id to products table
console.log('1. Adding brand_id column to products...');
await client.query(`
ALTER TABLE products
ADD COLUMN IF NOT EXISTS brand_id INTEGER REFERENCES brands(id) ON DELETE SET NULL
`);
// Step 2: Ensure brands.name has UNIQUE constraint
console.log('2. Ensuring brands.name has UNIQUE constraint...');
await client.query(`
ALTER TABLE brands DROP CONSTRAINT IF EXISTS brands_name_key;
ALTER TABLE brands ADD CONSTRAINT brands_name_key UNIQUE (name);
`);
// Step 9: Migrate existing brand text to brands table and update FKs
console.log('3. Migrating existing brand data...');
await client.query(`
-- Insert unique brands from products into brands table
INSERT INTO brands (name)
SELECT DISTINCT brand
FROM products
WHERE brand IS NOT NULL AND brand != ''
ON CONFLICT (name) DO NOTHING
`);
// Update products to use brand_id
await client.query(`
UPDATE products p
SET brand_id = b.id
FROM brands b
WHERE p.brand = b.name
AND p.brand IS NOT NULL
AND p.brand != ''
`);
// Step 9: Create product_brands junction table for historical tracking
console.log('3. Creating product_brands tracking table...');
await client.query(`
CREATE TABLE IF NOT EXISTS product_brands (
id SERIAL PRIMARY KEY,
product_id INTEGER REFERENCES products(id) ON DELETE CASCADE,
brand_id INTEGER REFERENCES brands(id) ON DELETE CASCADE,
first_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(product_id, brand_id)
)
`);
// Populate product_brands from current data
await client.query(`
INSERT INTO product_brands (product_id, brand_id)
SELECT id, brand_id
FROM products
WHERE brand_id IS NOT NULL
ON CONFLICT (product_id, brand_id) DO NOTHING
`);
// Step 9: Add store contact information and address fields
console.log('4. Adding store contact info and address fields...');
await client.query(`
ALTER TABLE stores
ADD COLUMN IF NOT EXISTS address TEXT,
ADD COLUMN IF NOT EXISTS city VARCHAR(255),
ADD COLUMN IF NOT EXISTS state VARCHAR(50),
ADD COLUMN IF NOT EXISTS zip VARCHAR(20),
ADD COLUMN IF NOT EXISTS phone VARCHAR(50),
ADD COLUMN IF NOT EXISTS website TEXT,
ADD COLUMN IF NOT EXISTS email VARCHAR(255)
`);
// Step 9: Add product discount tracking
console.log('5. Adding product discount fields...');
await client.query(`
ALTER TABLE products
ADD COLUMN IF NOT EXISTS discount_percentage DECIMAL(5, 2),
ADD COLUMN IF NOT EXISTS discount_amount DECIMAL(10, 2),
ADD COLUMN IF NOT EXISTS sale_price DECIMAL(10, 2)
`);
// Step 9: Add missing timestamp columns
console.log('6. Ensuring all timestamp columns exist...');
// Products timestamps
await client.query(`
ALTER TABLE products
ADD COLUMN IF NOT EXISTS first_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
ADD COLUMN IF NOT EXISTS last_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
`);
// Categories timestamps
await client.query(`
ALTER TABLE categories
ADD COLUMN IF NOT EXISTS first_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
ADD COLUMN IF NOT EXISTS last_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
`);
// Step 9: Add indexes for reporting queries
console.log('7. Creating indexes for fast reporting...');
await client.query(`
-- Brand exposure queries (which stores carry which brands)
CREATE INDEX IF NOT EXISTS idx_store_brands_brand_active ON store_brands(brand_id, active);
CREATE INDEX IF NOT EXISTS idx_store_brands_store_active ON store_brands(store_id, active);
CREATE INDEX IF NOT EXISTS idx_store_brands_dates ON store_brands(first_seen_at, last_seen_at);
-- Product queries by store and brand
CREATE INDEX IF NOT EXISTS idx_products_store_brand ON products(store_id, brand_id);
CREATE INDEX IF NOT EXISTS idx_products_brand_stock ON products(brand_id, in_stock);
CREATE INDEX IF NOT EXISTS idx_products_dates ON products(first_seen_at, last_seen_at);
-- Category queries
CREATE INDEX IF NOT EXISTS idx_categories_store ON categories(store_id, scrape_enabled);
-- Specials queries
CREATE INDEX IF NOT EXISTS idx_products_specials ON products(store_id, is_special) WHERE is_special = true;
`);
// Step 9: Create helper views for common queries
console.log('8. Creating reporting views...');
// Brand exposure view
await client.query(`
CREATE OR REPLACE VIEW brand_exposure AS
SELECT
b.id as brand_id,
b.name as brand_name,
COUNT(DISTINCT sb.store_id) as store_count,
COUNT(DISTINCT CASE WHEN sb.active THEN sb.store_id END) as active_store_count,
MIN(sb.first_seen_at) as first_seen,
MAX(sb.last_seen_at) as last_seen
FROM brands b
LEFT JOIN store_brands sb ON b.id = sb.brand_id
GROUP BY b.id, b.name
ORDER BY active_store_count DESC, brand_name
`);
// Brand timeline view (track adds/drops)
await client.query(`
CREATE OR REPLACE VIEW brand_timeline AS
SELECT
sb.id,
b.name as brand_name,
s.name as store_name,
sb.first_seen_at as added_on,
CASE
WHEN sb.active THEN NULL
ELSE sb.last_seen_at
END as dropped_on,
sb.active as currently_active
FROM store_brands sb
JOIN brands b ON sb.brand_id = b.id
JOIN stores s ON sb.store_id = s.id
ORDER BY sb.first_seen_at DESC
`);
// Product inventory view
await client.query(`
CREATE OR REPLACE VIEW product_inventory AS
SELECT
p.id,
p.name as product_name,
b.name as brand_name,
s.name as store_name,
c.name as category_name,
p.price,
p.in_stock,
p.is_special,
p.first_seen_at,
p.last_seen_at
FROM products p
JOIN stores s ON p.store_id = s.id
LEFT JOIN brands b ON p.brand_id = b.id
LEFT JOIN categories c ON p.category_id = c.id
ORDER BY p.last_seen_at DESC
`);
await client.query('COMMIT');
console.log('\n✅ Schema migration complete!');
console.log('\n📊 Available reporting views:');
console.log(' - brand_exposure: See how many stores carry each brand');
console.log(' - brand_timeline: Track when brands were added/dropped');
console.log(' - product_inventory: Full product catalog with store/brand info');
console.log('\n💡 Example queries:');
console.log(' -- Brands by exposure:');
console.log(' SELECT * FROM brand_exposure ORDER BY active_store_count DESC;');
console.log(' ');
console.log(' -- Recently dropped brands:');
console.log(' SELECT * FROM brand_timeline WHERE dropped_on IS NOT NULL ORDER BY dropped_on DESC;');
console.log(' ');
console.log(' -- Products by brand:');
console.log(' SELECT * FROM product_inventory WHERE brand_name = \'Sol Flower\';');
} catch (error) {
await client.query('ROLLBACK');
console.error('❌ Migration failed:', error);
throw error;
} finally {
client.release();
await pool.end();
}
}
migrateCompleteSchema();

13
backend/count-products.ts Normal file
View File

@@ -0,0 +1,13 @@
import { pool } from './src/db/migrate.js';
async function countProducts() {
const result = await pool.query(
`SELECT COUNT(*) as total FROM products WHERE dispensary_id = 112`
);
console.log(`Total products for Deeply Rooted: ${result.rows[0].total}`);
await pool.end();
}
countProducts();

View File

@@ -0,0 +1,32 @@
import { pool } from './src/db/migrate';
async function createAZDHSTable() {
console.log('🗄️ Creating azdhs_list table...\n');
await pool.query(`
CREATE TABLE IF NOT EXISTS azdhs_list (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
company_name VARCHAR(255),
slug VARCHAR(255),
address VARCHAR(500),
city VARCHAR(100),
state VARCHAR(2) DEFAULT 'AZ',
zip VARCHAR(10),
phone VARCHAR(20),
email VARCHAR(255),
status_line TEXT,
azdhs_url TEXT,
latitude DECIMAL(10, 8),
longitude DECIMAL(11, 8),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
`);
console.log('✅ Table created successfully!');
await pool.end();
}
createAZDHSTable();

View File

@@ -0,0 +1,81 @@
import { Pool } from 'pg';
const pool = new Pool({
connectionString: 'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus'
});
async function createBrandsTable() {
try {
console.log('Creating brands table...');
await pool.query(`
CREATE TABLE IF NOT EXISTS brands (
id SERIAL PRIMARY KEY,
store_id INTEGER NOT NULL REFERENCES stores(id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
`);
console.log('✅ Brands table created');
// Create index for faster queries
await pool.query(`
CREATE INDEX IF NOT EXISTS idx_brands_store_id ON brands(store_id)
`);
console.log('✅ Index created on store_id');
// Create unique constraint
await pool.query(`
CREATE UNIQUE INDEX IF NOT EXISTS idx_brands_store_name ON brands(store_id, name)
`);
console.log('✅ Unique constraint created on (store_id, name)');
console.log('\nCreating specials table...');
await pool.query(`
CREATE TABLE IF NOT EXISTS specials (
id SERIAL PRIMARY KEY,
store_id INTEGER NOT NULL REFERENCES stores(id) ON DELETE CASCADE,
product_id INTEGER REFERENCES products(id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
description TEXT,
discount_amount NUMERIC(10, 2),
discount_percentage NUMERIC(5, 2),
special_price NUMERIC(10, 2),
original_price NUMERIC(10, 2),
valid_date DATE NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
`);
console.log('✅ Specials table created');
// Create composite index for fast date-based queries
await pool.query(`
CREATE INDEX IF NOT EXISTS idx_specials_store_date ON specials(store_id, valid_date DESC)
`);
console.log('✅ Index created on (store_id, valid_date)');
// Create index on product_id for joins
await pool.query(`
CREATE INDEX IF NOT EXISTS idx_specials_product_id ON specials(product_id)
`);
console.log('✅ Index created on product_id');
console.log('\n🎉 All tables and indexes created successfully!');
} catch (error: any) {
console.error('❌ Error:', error.message);
} finally {
await pool.end();
}
}
createBrandsTable();

View File

@@ -0,0 +1,61 @@
import { pool } from './src/db/migrate';
async function createBrandsTables() {
console.log('📦 Creating brands tracking tables...\n');
try {
// Brands table - stores unique brands across all stores
await pool.query(`
CREATE TABLE IF NOT EXISTS brands (
id SERIAL PRIMARY KEY,
name VARCHAR(255) UNIQUE NOT NULL,
logo_url TEXT,
first_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
`);
console.log('✅ Created brands table');
// Store-Brand relationship - tracks which brands are at which stores
await pool.query(`
CREATE TABLE IF NOT EXISTS store_brands (
id SERIAL PRIMARY KEY,
store_id INTEGER REFERENCES stores(id) ON DELETE CASCADE,
brand_id INTEGER REFERENCES brands(id) ON DELETE CASCADE,
first_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
active BOOLEAN DEFAULT true,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(store_id, brand_id)
)
`);
console.log('✅ Created store_brands table');
// Add indexes for performance
await pool.query(`
CREATE INDEX IF NOT EXISTS idx_store_brands_store_id ON store_brands(store_id);
CREATE INDEX IF NOT EXISTS idx_store_brands_brand_id ON store_brands(brand_id);
CREATE INDEX IF NOT EXISTS idx_store_brands_active ON store_brands(active);
CREATE INDEX IF NOT EXISTS idx_brands_name ON brands(name);
`);
console.log('✅ Created indexes');
console.log('\n✅ Brands tables created successfully!');
console.log('\nTable structure:');
console.log(' brands: Stores unique brand names and logos');
console.log(' store_brands: Tracks which brands are at which stores with timestamps');
console.log('\nReports you can run:');
console.log(' - Brand exposure: How many stores carry each brand');
console.log(' - Brand timeline: When brands were added/removed from stores');
console.log(' - Store changes: Which brands were added/dropped at a store');
} catch (error) {
console.error('❌ Error:', error);
} finally {
await pool.end();
}
}
createBrandsTables();

38
backend/create-table.js Normal file
View File

@@ -0,0 +1,38 @@
const { Pool } = require('pg');
const pool = new Pool({
connectionString: process.env.DATABASE_URL
});
(async () => {
try {
await pool.query(`
CREATE TABLE IF NOT EXISTS proxy_test_jobs (
id SERIAL PRIMARY KEY,
status VARCHAR(20) NOT NULL DEFAULT 'pending',
total_proxies INTEGER NOT NULL DEFAULT 0,
tested_proxies INTEGER NOT NULL DEFAULT 0,
passed_proxies INTEGER NOT NULL DEFAULT 0,
failed_proxies INTEGER NOT NULL DEFAULT 0,
started_at TIMESTAMP,
completed_at TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
`);
await pool.query(`
CREATE INDEX IF NOT EXISTS idx_proxy_test_jobs_status ON proxy_test_jobs(status);
`);
await pool.query(`
CREATE INDEX IF NOT EXISTS idx_proxy_test_jobs_created_at ON proxy_test_jobs(created_at DESC);
`);
console.log('✅ Table created successfully');
process.exit(0);
} catch (error) {
console.error('❌ Error:', error);
process.exit(1);
}
})();

View File

@@ -0,0 +1,22 @@
[
{
"name": "age_gate_passed",
"value": "true",
"domain": ".curaleaf.com",
"path": "/",
"expires": 9999999999,
"httpOnly": false,
"secure": false,
"sameSite": "Lax"
},
{
"name": "selected_state",
"value": "Arizona",
"domain": ".curaleaf.com",
"path": "/",
"expires": 9999999999,
"httpOnly": false,
"secure": false,
"sameSite": "Lax"
}
]

View File

@@ -0,0 +1,83 @@
import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
import { Browser, Page } from 'puppeteer';
puppeteer.use(StealthPlugin());
async function debugAfterStateSelect() {
let browser: Browser | null = null;
try {
const url = 'https://curaleaf.com/stores/curaleaf-az-48th-street';
browser = await puppeteer.launch({
headless: 'new',
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-blink-features=AutomationControlled'
]
});
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36');
console.log('Loading page...');
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 60000 });
await page.waitForTimeout(5000);
// Click dropdown and select Arizona
const stateButton = await page.$('button#state');
if (stateButton) {
console.log('Clicking state button...');
await stateButton.click();
await page.waitForTimeout(800);
console.log('Clicking Arizona...');
await page.evaluate(() => {
const options = Array.from(document.querySelectorAll('[role="option"]'));
const arizona = options.find(el => el.textContent?.toLowerCase() === 'arizona');
if (arizona instanceof HTMLElement) {
arizona.click();
}
});
await page.waitForTimeout(1000);
console.log('\n=== AFTER selecting Arizona ===');
// Check what buttons are now visible
const elementsAfter = await page.evaluate(() => {
return {
buttons: Array.from(document.querySelectorAll('button')).map(b => ({
text: b.textContent?.trim(),
classes: b.className,
id: b.id,
visible: b.offsetParent !== null
})),
links: Array.from(document.querySelectorAll('a')).filter(a => a.offsetParent !== null).map(a => ({
text: a.textContent?.trim(),
href: a.href
})),
hasAgeQuestion: document.body.textContent?.includes('21') || document.body.textContent?.includes('age')
};
});
console.log('\nVisible buttons:', JSON.stringify(elementsAfter.buttons.filter(b => b.visible), null, 2));
console.log('\nVisible links:', JSON.stringify(elementsAfter.links, null, 2));
console.log('\nHas age question:', elementsAfter.hasAgeQuestion);
}
await browser.close();
process.exit(0);
} catch (error) {
console.error('Error:', error);
if (browser) await browser.close();
process.exit(1);
}
}
debugAfterStateSelect();

View File

@@ -0,0 +1,96 @@
import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
import { Browser, Page } from 'puppeteer';
puppeteer.use(StealthPlugin());
async function debugDetailedAgeGate() {
let browser: Browser | null = null;
try {
const url = 'https://curaleaf.com/stores/curaleaf-az-48th-street';
browser = await puppeteer.launch({
headless: false, // Run with visible browser
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-blink-features=AutomationControlled'
]
});
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36');
console.log('Loading page...');
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 60000 });
await page.waitForTimeout(5000);
console.log(`\nCurrent URL: ${page.url()}`);
// Check for dropdown button
console.log('\nLooking for dropdown button #state...');
const stateButton = await page.$('button#state');
console.log('State button found:', !!stateButton);
if (stateButton) {
console.log('Clicking state button...');
await stateButton.click();
await page.waitForTimeout(1000);
console.log('\nLooking for dropdown options after clicking...');
const options = await page.evaluate(() => {
// Look for any elements that appeared after clicking
const allElements = Array.from(document.querySelectorAll('[role="option"], [class*="option"], [class*="Option"], li'));
return allElements.slice(0, 20).map(el => ({
text: el.textContent?.trim(),
tag: el.tagName,
role: el.getAttribute('role'),
classes: el.className
}));
});
console.log('Found options:', JSON.stringify(options, null, 2));
// Try to click Arizona
console.log('\nTrying to click Arizona option...');
const clicked = await page.evaluate(() => {
const allElements = Array.from(document.querySelectorAll('[role="option"], [class*="option"], [class*="Option"], li, div, span'));
const arizonaEl = allElements.find(el => el.textContent?.toLowerCase().includes('arizona'));
if (arizonaEl instanceof HTMLElement) {
console.log('Found Arizona element:', arizonaEl.textContent);
arizonaEl.click();
return true;
}
return false;
});
console.log('Arizona clicked:', clicked);
if (clicked) {
console.log('Waiting for navigation...');
try {
await page.waitForNavigation({ timeout: 10000 });
console.log('Navigation successful!');
} catch (e) {
console.log('Navigation timeout');
}
}
}
console.log(`\nFinal URL: ${page.url()}`);
console.log('\nPress Ctrl+C to close browser...');
// Keep browser open for inspection
await new Promise(() => {});
} catch (error) {
console.error('Error:', error);
if (browser) await browser.close();
process.exit(1);
}
}
debugDetailedAgeGate();

View File

@@ -0,0 +1,78 @@
import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
import { Browser, Page } from 'puppeteer';
puppeteer.use(StealthPlugin());
async function debugAgeGateElements() {
let browser: Browser | null = null;
try {
const url = 'https://curaleaf.com/stores/curaleaf-az-48th-street';
browser = await puppeteer.launch({
headless: 'new',
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-blink-features=AutomationControlled'
]
});
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36');
console.log('Loading page...');
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 60000 });
await page.waitForTimeout(5000); // Wait for React to render
const elements = await page.evaluate(() => {
const allClickable = Array.from(document.querySelectorAll('button, a, div[role="button"], [onclick]'));
return {
buttons: Array.from(document.querySelectorAll('button')).map(b => ({
text: b.textContent?.trim(),
classes: b.className,
id: b.id
})),
links: Array.from(document.querySelectorAll('a')).map(a => ({
text: a.textContent?.trim(),
href: a.href,
classes: a.className
})),
divs: Array.from(document.querySelectorAll('div[role="button"], div[onclick], [class*="card"], [class*="Card"], [class*="state"], [class*="State"]')).slice(0, 20).map(d => ({
text: d.textContent?.trim().substring(0, 100),
classes: d.className,
role: d.getAttribute('role')
}))
};
});
console.log('\n=== BUTTONS ===');
elements.buttons.forEach((b, i) => {
console.log(`${i + 1}. "${b.text}" [${b.classes}] #${b.id}`);
});
console.log('\n=== LINKS ===');
elements.links.slice(0, 10).forEach((a, i) => {
console.log(`${i + 1}. "${a.text}" -> ${a.href}`);
});
console.log('\n=== DIVS/CARDS ===');
elements.divs.forEach((d, i) => {
console.log(`${i + 1}. "${d.text}" [${d.classes}] role=${d.role}`);
});
await browser.close();
process.exit(0);
} catch (error) {
console.error('Error:', error);
if (browser) await browser.close();
process.exit(1);
}
}
debugAgeGateElements();

View File

@@ -0,0 +1,88 @@
import { chromium } from 'playwright-extra';
import stealth from 'puppeteer-extra-plugin-stealth';
import { pool } from './src/db/migrate';
chromium.use(stealth());
async function debugAZDHSPage() {
console.log('🔍 Debugging AZDHS page structure...\n');
const browser = await chromium.launch({
headless: false,
});
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
});
const page = await context.newPage();
try {
console.log('📄 Loading page...');
await page.goto('https://azcarecheck.azdhs.gov/s/?facilityId=001t000000L0TApAAN', {
waitUntil: 'domcontentloaded',
timeout: 60000
});
console.log('⏳ Waiting 30 seconds for you to scroll and load all dispensaries...\n');
await page.waitForTimeout(30000);
console.log('🔍 Analyzing page structure...\n');
const debug = await page.evaluate(() => {
// Get all unique tag names
const allElements = document.querySelectorAll('*');
const tagCounts: any = {};
const classSamples: string[] = [];
allElements.forEach(el => {
const tag = el.tagName.toLowerCase();
tagCounts[tag] = (tagCounts[tag] || 0) + 1;
// Sample some classes
if (el.className && typeof el.className === 'string' && el.className.length > 0 && classSamples.length < 50) {
classSamples.push(el.className.substring(0, 80));
}
});
// Look for elements with text that might be dispensary names
const textElements: any[] = [];
allElements.forEach(el => {
const text = el.textContent?.trim() || '';
if (text.length > 10 && text.length < 200 && el.children.length < 5) {
textElements.push({
tag: el.tagName.toLowerCase(),
class: el.className ? el.className.substring(0, 50) : '',
text: text.substring(0, 100)
});
}
});
return {
totalElements: allElements.length,
tagCounts: Object.entries(tagCounts).sort((a: any, b: any) => b[1] - a[1]).slice(0, 20),
classSamples: classSamples.slice(0, 20),
textElementsSample: textElements.slice(0, 10)
};
});
console.log('📊 Page Structure Analysis:');
console.log(`\nTotal elements: ${debug.totalElements}`);
console.log('\nTop 20 element types:');
console.table(debug.tagCounts);
console.log('\nSample classes:');
debug.classSamples.forEach((c: string, i: number) => console.log(` ${i + 1}. ${c}`));
console.log('\nSample text elements (potential dispensary names):');
console.table(debug.textElementsSample);
} catch (error) {
console.error(`❌ Error: ${error}`);
} finally {
console.log('\n👉 Browser will stay open for 30 seconds so you can inspect...');
await page.waitForTimeout(30000);
await browser.close();
await pool.end();
}
}
debugAZDHSPage();

View File

@@ -0,0 +1,79 @@
import { firefox } from 'playwright';
import { pool } from './src/db/migrate.js';
import { getRandomProxy } from './src/utils/proxyManager.js';
const dispensaryId = 112;
async function main() {
const dispensaryResult = await pool.query(
"SELECT id, name, menu_url FROM dispensaries WHERE id = $1",
[dispensaryId]
);
const menuUrl = dispensaryResult.rows[0].menu_url;
const proxy = await getRandomProxy();
const browser = await firefox.launch({ headless: true });
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
proxy: {
server: proxy.server,
username: proxy.username,
password: proxy.password
}
});
const page = await context.newPage();
const brandsUrl = `${menuUrl}/brands`;
console.log(`Loading: ${brandsUrl}`);
await page.goto(brandsUrl, { waitUntil: 'domcontentloaded', timeout: 60000 });
await page.waitForSelector('a[href*="/brands/"]', { timeout: 45000 });
await page.waitForTimeout(3000);
// Get the HTML structure of the first 5 brand links
const brandStructures = await page.evaluate(() => {
const brandLinks = Array.from(document.querySelectorAll('a[href*="/brands/"]')).slice(0, 10);
return brandLinks.map(link => {
const href = link.getAttribute('href') || '';
const slug = href.split('/brands/')[1]?.replace(/\/$/, '') || '';
return {
slug,
innerHTML: (link as HTMLElement).innerHTML.substring(0, 300),
textContent: link.textContent?.trim(),
childElementCount: link.childElementCount,
children: Array.from(link.children).map(child => ({
tag: child.tagName.toLowerCase(),
class: child.className,
text: child.textContent?.trim()
}))
};
});
});
console.log('\n' + '='.repeat(80));
console.log('BRAND LINK STRUCTURES:');
console.log('='.repeat(80));
brandStructures.forEach((brand, idx) => {
console.log(`\n${idx + 1}. slug: ${brand.slug}`);
console.log(` textContent: "${brand.textContent}"`);
console.log(` childElementCount: ${brand.childElementCount}`);
console.log(` children:`);
brand.children.forEach((child, childIdx) => {
console.log(` ${childIdx + 1}. <${child.tag}> class="${child.class}"`);
console.log(` text: "${child.text}"`);
});
console.log(` innerHTML: ${brand.innerHTML.substring(0, 200)}`);
});
console.log('\n' + '='.repeat(80));
await browser.close();
await pool.end();
}
main().catch(console.error);

View File

@@ -0,0 +1,77 @@
import { chromium } from 'playwright';
async function debugCuraleafButtons() {
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
viewport: { width: 1280, height: 720 }
});
const page = await context.newPage();
await page.goto('https://curaleaf.com/stores/curaleaf-dispensary-48th-street');
await page.waitForTimeout(2000);
console.log('\n=== BEFORE STATE SELECTION ===\n');
// Get all buttons
const buttonsBefore = await page.locator('button, [role="button"], a').evaluateAll(elements => {
return elements.map(el => ({
tag: el.tagName,
text: el.textContent?.trim().substring(0, 50),
id: el.id,
class: el.className,
visible: el.offsetParent !== null
})).filter(b => b.visible);
});
console.log('Buttons before state selection:');
buttonsBefore.forEach((b, i) => console.log(`${i + 1}. ${b.tag} - "${b.text}" [id: ${b.id}]`));
// Click state dropdown
const stateButton = page.locator('button#state').first();
await stateButton.click();
await page.waitForTimeout(1000);
// Click Arizona
const arizona = page.locator('[role="option"]').filter({ hasText: /^Arizona$/i }).first();
await arizona.click();
await page.waitForTimeout(2000);
console.log('\n=== AFTER STATE SELECTION ===\n');
const buttonsAfter = await page.locator('button, [role="button"], a').evaluateAll(elements => {
return elements.map(el => ({
tag: el.tagName,
text: el.textContent?.trim().substring(0, 50),
id: el.id,
class: el.className,
type: el.getAttribute('type'),
visible: el.offsetParent !== null
})).filter(b => b.visible);
});
console.log('Buttons after state selection:');
buttonsAfter.forEach((b, i) => console.log(`${i + 1}. ${b.tag} - "${b.text}" [id: ${b.id}] [type: ${b.type}]`));
// Check for any form elements
const forms = await page.locator('form').count();
console.log(`\nForms on page: ${forms}`);
if (forms > 0) {
const formActions = await page.locator('form').evaluateAll(forms => {
return forms.map(f => ({
action: f.action,
method: f.method
}));
});
console.log('Form details:', formActions);
}
await page.screenshot({ path: '/tmp/curaleaf-debug-after-state.png', fullPage: true });
console.log('\n📸 Screenshot: /tmp/curaleaf-debug-after-state.png');
await browser.close();
}
debugCuraleafButtons();

View File

@@ -0,0 +1,103 @@
import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
import { bypassAgeGate, detectStateFromUrl } from './src/utils/age-gate';
import { Browser, Page } from 'puppeteer';
puppeteer.use(StealthPlugin());
async function debugDutchieDetection() {
let browser: Browser | null = null;
try {
const url = 'https://curaleaf.com/stores/curaleaf-az-48th-street';
browser = await puppeteer.launch({
headless: 'new',
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-blink-features=AutomationControlled'
]
});
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36');
console.log('Loading page...');
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 60000 });
await page.waitForTimeout(3000);
console.log('Bypassing age gate...');
const state = detectStateFromUrl(url);
await bypassAgeGate(page, state);
console.log('\nWaiting 5 more seconds for Dutchie menu to load...');
await page.waitForTimeout(5000);
console.log('\nChecking for Dutchie markers...');
const dutchieInfo = await page.evaluate(() => {
// Check window.reactEnv
const hasReactEnv = !!(window as any).reactEnv;
const reactEnvDetails = hasReactEnv ? JSON.stringify((window as any).reactEnv, null, 2) : null;
// Check HTML content
const htmlContent = document.documentElement.innerHTML;
const hasAdminDutchie = htmlContent.includes('admin.dutchie.com');
const hasApiDutchie = htmlContent.includes('api.dutchie.com');
const hasEmbeddedMenu = htmlContent.includes('embedded-menu');
const hasReactEnvInHtml = htmlContent.includes('window.reactEnv');
// Check for Dutchie-specific elements
const hasProductListItems = document.querySelectorAll('[data-testid="product-list-item"]').length;
const hasDutchieScript = !!document.querySelector('script[src*="dutchie"]');
// Check meta tags
const metaTags = Array.from(document.querySelectorAll('meta')).map(m => ({
name: m.getAttribute('name'),
content: m.getAttribute('content'),
property: m.getAttribute('property')
})).filter(m => m.name || m.property);
return {
hasReactEnv,
reactEnvDetails,
hasAdminDutchie,
hasApiDutchie,
hasEmbeddedMenu,
hasReactEnvInHtml,
hasProductListItems,
hasDutchieScript,
metaTags,
pageTitle: document.title,
url: window.location.href
};
});
console.log('\n=== Dutchie Detection Results ===');
console.log('window.reactEnv exists:', dutchieInfo.hasReactEnv);
if (dutchieInfo.reactEnvDetails) {
console.log('window.reactEnv contents:', dutchieInfo.reactEnvDetails);
}
console.log('Has admin.dutchie.com in HTML:', dutchieInfo.hasAdminDutchie);
console.log('Has api.dutchie.com in HTML:', dutchieInfo.hasApiDutchie);
console.log('Has "embedded-menu" in HTML:', dutchieInfo.hasEmbeddedMenu);
console.log('Has "window.reactEnv" in HTML:', dutchieInfo.hasReactEnvInHtml);
console.log('Product list items found:', dutchieInfo.hasProductListItems);
console.log('Has Dutchie script tag:', dutchieInfo.hasDutchieScript);
console.log('Page title:', dutchieInfo.pageTitle);
console.log('Current URL:', dutchieInfo.url);
await browser.close();
process.exit(0);
} catch (error) {
console.error('Error:', error);
if (browser) await browser.close();
process.exit(1);
}
}
debugDutchieDetection();

View File

@@ -0,0 +1,134 @@
import { createStealthBrowser, createStealthContext, waitForPageLoad, isCloudflareChallenge, waitForCloudflareChallenge } from './src/utils/stealthBrowser';
import { getRandomProxy } from './src/utils/proxyManager';
import { pool } from './src/db/migrate';
import * as fs from 'fs/promises';
async function debugDutchieSelectors() {
console.log('🔍 Debugging Dutchie page structure...\n');
const url = 'https://dutchie.com/dispensary/sol-flower-dispensary';
// Get proxy
const proxy = await getRandomProxy();
console.log(`Using proxy: ${proxy?.server || 'none'}\n`);
const browser = await createStealthBrowser({ proxy: proxy || undefined, headless: true });
try {
const context = await createStealthContext(browser, { state: 'Arizona' });
const page = await context.newPage();
console.log(`Loading: ${url}`);
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 60000 });
// Check for Cloudflare
if (await isCloudflareChallenge(page)) {
console.log('🛡️ Cloudflare detected, waiting...');
await waitForCloudflareChallenge(page, 60000);
}
await waitForPageLoad(page);
// Wait for content
await page.waitForTimeout(5000);
console.log('\n📸 Taking screenshot...');
await page.screenshot({ path: '/tmp/dutchie-page.png', fullPage: true });
console.log('💾 Saving HTML...');
const html = await page.content();
await fs.writeFile('/tmp/dutchie-page.html', html);
console.log('\n🔎 Looking for common React/product patterns...\n');
// Try to find product containers by various methods
const patterns = [
// React data attributes
'a[href*="/product/"]',
'[data-testid*="product"]',
'[data-cy*="product"]',
'[data-test*="product"]',
// Common class patterns
'[class*="ProductCard"]',
'[class*="product-card"]',
'[class*="Product_"]',
'[class*="MenuItem"]',
'[class*="menu-item"]',
// Semantic HTML
'article',
'[role="article"]',
'[role="listitem"]',
// Link patterns
'a[href*="/menu/"]',
'a[href*="/products/"]',
'a[href*="/item/"]',
];
for (const selector of patterns) {
const count = await page.locator(selector).count();
if (count > 0) {
console.log(`${selector}: ${count} elements`);
// Get details of first element
try {
const first = page.locator(selector).first();
const html = await first.evaluate(el => el.outerHTML.substring(0, 500));
const classes = await first.getAttribute('class');
const testId = await first.getAttribute('data-testid');
console.log(` Classes: ${classes || 'none'}`);
console.log(` Data-testid: ${testId || 'none'}`);
console.log(` HTML preview: ${html}...`);
console.log('');
} catch (e) {
console.log(` (Could not get element details)`);
}
}
}
// Try to extract actual product links
console.log('\n🔗 Looking for product links...\n');
const links = await page.locator('a[href*="/product/"], a[href*="/menu/"], a[href*="/item/"]').all();
if (links.length > 0) {
console.log(`Found ${links.length} potential product links:`);
for (let i = 0; i < Math.min(5, links.length); i++) {
const href = await links[i].getAttribute('href');
const text = await links[i].textContent();
console.log(` ${i + 1}. ${href}`);
console.log(` Text: ${text?.substring(0, 100)}`);
}
}
// Check page title and URL
console.log(`\n📄 Page title: ${await page.title()}`);
console.log(`📍 Final URL: ${page.url()}`);
// Try to find the main content container
console.log('\n🎯 Looking for main content container...\n');
const mainPatterns = ['main', '[role="main"]', '#root', '#app', '[id*="app"]'];
for (const selector of mainPatterns) {
const count = await page.locator(selector).count();
if (count > 0) {
console.log(`${selector}: found`);
const classes = await page.locator(selector).first().getAttribute('class');
console.log(` Classes: ${classes || 'none'}`);
}
}
console.log('\n✅ Debug complete!');
console.log('📸 Screenshot saved to: /tmp/dutchie-page.png');
console.log('💾 HTML saved to: /tmp/dutchie-page.html');
} catch (error) {
console.error('❌ Error:', error);
} finally {
await browser.close();
await pool.end();
}
}
debugDutchieSelectors();

View File

@@ -0,0 +1,171 @@
import { chromium } from 'playwright';
import { pool } from './src/db/migrate';
import { getRandomProxy } from './src/utils/proxyManager';
import * as fs from 'fs';
async function debugGoogleScraper() {
console.log('🔍 Debugging Google scraper with proxy\n');
// Get a proxy
const proxy = await getRandomProxy();
if (!proxy) {
console.log('❌ No proxies available');
await pool.end();
return;
}
console.log(`🔌 Using proxy: ${proxy.server}\n`);
const browser = await chromium.launch({
headless: false, // Run in visible mode
args: ['--disable-blink-features=AutomationControlled']
});
const contextOptions: any = {
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
viewport: { width: 1920, height: 1080 },
locale: 'en-US',
timezoneId: 'America/Phoenix',
geolocation: { latitude: 33.4484, longitude: -112.0740 },
permissions: ['geolocation'],
proxy: {
server: proxy.server,
username: proxy.username,
password: proxy.password
}
};
const context = await browser.newContext(contextOptions);
// Add stealth
await context.addInitScript(() => {
Object.defineProperty(navigator, 'webdriver', { get: () => false });
(window as any).chrome = { runtime: {} };
});
const page = await context.newPage();
try {
// Test with the "All Greens Dispensary" example
const testAddress = '1035 W Main St, Quartzsite, AZ 85346';
const searchQuery = `${testAddress} dispensary`;
const searchUrl = `https://www.google.com/search?q=${encodeURIComponent(searchQuery)}`;
console.log(`🔍 Testing search: ${searchQuery}`);
console.log(`📍 URL: ${searchUrl}\n`);
await page.goto(searchUrl, { waitUntil: 'networkidle', timeout: 30000 });
await page.waitForTimeout(3000);
// Take screenshot
await page.screenshot({ path: '/tmp/google-search-debug.png', fullPage: true });
console.log('📸 Screenshot saved to /tmp/google-search-debug.png\n');
// Get the full HTML
const html = await page.content();
fs.writeFileSync('/tmp/google-search-debug.html', html);
console.log('💾 HTML saved to /tmp/google-search-debug.html\n');
// Try to find any text that looks like "All Greens"
const pageText = await page.evaluate(() => document.body.innerText);
const hasAllGreens = pageText.toLowerCase().includes('all greens');
console.log(`🔍 Page contains "All Greens": ${hasAllGreens}\n`);
if (hasAllGreens) {
console.log('✅ Google found the business!\n');
// Let's try to find where the name appears in the DOM
const nameInfo = await page.evaluate(() => {
const results: any[] = [];
const walker = document.createTreeWalker(
document.body,
NodeFilter.SHOW_TEXT,
null
);
let node;
while (node = walker.nextNode()) {
const text = node.textContent?.trim() || '';
if (text.toLowerCase().includes('all greens')) {
const element = node.parentElement;
results.push({
text: text,
tagName: element?.tagName,
className: element?.className,
id: element?.id,
dataAttrs: Array.from(element?.attributes || [])
.filter(attr => attr.name.startsWith('data-'))
.map(attr => `${attr.name}="${attr.value}"`)
});
}
}
return results;
});
console.log('📍 Found "All Greens" in these elements:');
console.log(JSON.stringify(nameInfo, null, 2));
}
// Try current selectors
console.log('\n🧪 Testing current selectors:\n');
const nameSelectors = [
'[data-attrid="title"]',
'h2[data-attrid="title"]',
'.SPZz6b h2',
'h3.LC20lb',
'.kp-header .SPZz6b'
];
for (const selector of nameSelectors) {
const element = await page.$(selector);
if (element) {
const text = await element.textContent();
console.log(`${selector}: "${text?.trim()}"`);
} else {
console.log(`${selector}: not found`);
}
}
// Look for website links
console.log('\n🔗 Looking for website links:\n');
const links = await page.evaluate(() => {
const allLinks = Array.from(document.querySelectorAll('a[href]'));
return allLinks
.filter(a => {
const href = (a as HTMLAnchorElement).href;
return href &&
!href.includes('google.com') &&
!href.includes('youtube.com') &&
!href.includes('facebook.com');
})
.slice(0, 10)
.map(a => ({
href: (a as HTMLAnchorElement).href,
text: a.textContent?.trim().substring(0, 50),
className: a.className
}));
});
console.log('First 10 non-Google links:');
console.log(JSON.stringify(links, null, 2));
// Look for phone numbers
console.log('\n📞 Looking for phone numbers:\n');
const phoneMatches = pageText.match(/\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}/g);
if (phoneMatches) {
console.log('Found phone numbers:', phoneMatches);
} else {
console.log('No phone numbers found in page text');
}
console.log('\n⏸ Browser will stay open for 30 seconds for manual inspection...');
await page.waitForTimeout(30000);
} finally {
await browser.close();
await pool.end();
}
}
debugGoogleScraper().catch(console.error);

View File

@@ -0,0 +1,56 @@
import { firefox } from 'playwright';
import { getRandomProxy } from './src/utils/proxyManager.js';
async function debugProductCard() {
const proxy = await getRandomProxy();
if (!proxy) {
console.log('No proxy available');
process.exit(1);
}
const browser = await firefox.launch({ headless: true });
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
proxy: {
server: proxy.server,
username: proxy.username,
password: proxy.password
}
});
const page = await context.newPage();
const brandUrl = 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted/brands/alien-labs';
console.log(`Loading: ${brandUrl}`);
await page.goto(brandUrl, { waitUntil: 'domcontentloaded', timeout: 60000 });
await page.waitForTimeout(3000);
// Get the first product card's full text content
const cardData = await page.evaluate(() => {
const card = document.querySelector('a[href*="/product/"]');
if (!card) return null;
return {
href: card.getAttribute('href'),
innerHTML: card.innerHTML.substring(0, 2000),
textContent: card.textContent?.substring(0, 1000)
};
});
console.log('\n' + '='.repeat(80));
console.log('FIRST PRODUCT CARD DATA:');
console.log('='.repeat(80));
console.log('\nHREF:', cardData?.href);
console.log('\nTEXT CONTENT:');
console.log(cardData?.textContent);
console.log('\nHTML (first 2000 chars):');
console.log(cardData?.innerHTML);
console.log('='.repeat(80));
await browser.close();
process.exit(0);
}
debugProductCard().catch(console.error);

View File

@@ -0,0 +1,97 @@
import { firefox } from 'playwright';
import { getRandomProxy } from './src/utils/proxyManager.js';
import { pool } from './src/db/migrate.js';
async function debugPage() {
const proxy = await getRandomProxy();
if (!proxy) {
console.log('No proxy available');
process.exit(1);
}
const browser = await firefox.launch({
headless: true,
firefoxUserPrefs: { 'geo.enabled': true }
});
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
geolocation: { latitude: 33.4484, longitude: -112.0740 },
permissions: ['geolocation'],
proxy: {
server: proxy.server,
username: proxy.username,
password: proxy.password
}
});
const page = await context.newPage();
try {
console.log('Loading page...');
await page.goto('https://dutchie.com/embedded-menu/AZ-Deeply-Rooted/products/', {
waitUntil: 'domcontentloaded',
timeout: 60000
});
await page.waitForTimeout(5000);
// Take screenshot
await page.screenshot({ path: '/tmp/products-page.png' });
console.log('Screenshot saved to /tmp/products-page.png');
// Get HTML sample
const html = await page.content();
console.log('\n=== PAGE TITLE ===');
console.log(await page.title());
console.log('\n=== SEARCHING FOR PRODUCT ELEMENTS ===');
// Try different selectors
const tests = [
'a[href*="/product/"]',
'[class*="Product"]',
'[class*="product"]',
'[class*="card"]',
'[class*="Card"]',
'[data-testid*="product"]',
'article',
'[role="article"]',
];
for (const selector of tests) {
const count = await page.locator(selector).count();
console.log(`${selector.padEnd(35)}${count} elements`);
}
// Get all links
console.log('\n=== ALL LINKS WITH "product" IN HREF ===');
const productLinks = await page.evaluate(() => {
return Array.from(document.querySelectorAll('a'))
.filter(a => a.href.includes('/product/'))
.map(a => ({
href: a.href,
text: a.textContent?.trim().substring(0, 100),
classes: a.className
}))
.slice(0, 10);
});
console.table(productLinks);
// Get sample HTML of body
console.log('\n=== SAMPLE HTML (first 2000 chars) ===');
const bodyHtml = await page.evaluate(() => document.body.innerHTML);
console.log(bodyHtml.substring(0, 2000));
await browser.close();
await pool.end();
} catch (error) {
console.error('Error:', error);
await browser.close();
await pool.end();
process.exit(1);
}
}
debugPage();

113
backend/debug-sale-price.ts Normal file
View File

@@ -0,0 +1,113 @@
import { firefox } from 'playwright';
import { getRandomProxy } from './src/utils/proxyManager.js';
async function main() {
const productUrl = 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted/product/easy-tiger-live-rosin-aio-pete-s-peach';
console.log('🔍 Investigating Product with Sale Price...\n');
console.log(`URL: ${productUrl}\n`);
const proxyConfig = await getRandomProxy();
if (!proxyConfig) {
throw new Error('No proxy available');
}
console.log(`🔐 Using proxy: ${proxyConfig.server}\n`);
const browser = await firefox.launch({
headless: true,
proxy: proxyConfig
});
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0'
});
const page = await context.newPage();
try {
console.log('📄 Loading product page...');
await page.goto(productUrl, {
waitUntil: 'domcontentloaded',
timeout: 60000
});
await page.waitForTimeout(3000);
console.log('✅ Page loaded\n');
// Get the full page HTML for inspection
const html = await page.content();
// Look for price-related elements
const priceData = await page.evaluate(() => {
// Try JSON-LD structured data
const scripts = Array.from(document.querySelectorAll('script[type="application/ld+json"]'));
let jsonLdData: any = null;
for (const script of scripts) {
try {
const data = JSON.parse(script.textContent || '');
if (data['@type'] === 'Product') {
jsonLdData = data;
break;
}
} catch (e) {}
}
// Look for price elements in various ways
const priceElements = Array.from(document.querySelectorAll('[class*="price"], [class*="Price"]'));
const priceTexts = priceElements.map(el => ({
className: el.className,
textContent: el.textContent?.trim().substring(0, 100)
}));
// Get all text containing dollar signs
const pageText = document.body.textContent || '';
const priceMatches = pageText.match(/\$\d+\.?\d*/g);
// Look for strikethrough prices (often used for original price when there's a sale)
const strikethroughElements = Array.from(document.querySelectorAll('s, del, [style*="line-through"]'));
const strikethroughPrices = strikethroughElements.map(el => el.textContent?.trim());
// Look for elements with "sale", "special", "discount" in class names
const saleElements = Array.from(document.querySelectorAll('[class*="sale"], [class*="Sale"], [class*="special"], [class*="Special"], [class*="discount"], [class*="Discount"]'));
const saleTexts = saleElements.map(el => ({
className: el.className,
textContent: el.textContent?.trim().substring(0, 100)
}));
return {
jsonLdData,
priceElements: priceTexts.slice(0, 10),
priceMatches: priceMatches?.slice(0, 20) || [],
strikethroughPrices: strikethroughPrices.slice(0, 5),
saleElements: saleTexts.slice(0, 10)
};
});
console.log('💰 Price Data Found:');
console.log(JSON.stringify(priceData, null, 2));
// Take a screenshot for visual reference
await page.screenshot({ path: '/tmp/sale-price-product.png', fullPage: true });
console.log('\n📸 Screenshot saved to /tmp/sale-price-product.png');
// Save a snippet of the HTML around price elements
const priceHtmlSnippet = await page.evaluate(() => {
const priceElements = Array.from(document.querySelectorAll('[class*="price"], [class*="Price"]'));
if (priceElements.length > 0) {
return priceElements.slice(0, 3).map(el => el.outerHTML).join('\n\n');
}
return 'No price elements found';
});
console.log('\n📝 Price HTML Snippet:');
console.log(priceHtmlSnippet);
} catch (error: any) {
console.error('❌ Error:', error.message);
} finally {
await browser.close();
}
}
main().catch(console.error);

91
backend/debug-scrape.ts Normal file
View File

@@ -0,0 +1,91 @@
import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
import { Pool } from 'pg';
import fs from 'fs';
puppeteer.use(StealthPlugin());
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
async function debug() {
let browser;
try {
// Get proxy
const proxyResult = await pool.query(`SELECT host, port, protocol FROM proxies ORDER BY RANDOM() LIMIT 1`);
const proxy = proxyResult.rows[0];
const proxyUrl = `${proxy.protocol}://${proxy.host}:${proxy.port}`;
console.log('🔌 Proxy:', proxyUrl);
browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox', `--proxy-server=${proxyUrl}`]
});
const page = await browser.newPage();
// Set Googlebot UA
await page.setUserAgent('Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)');
// Log all requests being made
page.on('request', request => {
console.log('\n📤 REQUEST:', request.method(), request.url());
console.log(' Headers:', JSON.stringify(request.headers(), null, 2));
});
// Log all responses
page.on('response', response => {
console.log('\n📥 RESPONSE:', response.status(), response.url());
console.log(' Headers:', JSON.stringify(response.headers(), null, 2));
});
const url = 'https://curaleaf.com/stores/curaleaf-dispensary-phoenix-airport/brands';
console.log('\n🌐 Going to:', url);
await page.goto(url, { waitUntil: 'networkidle2', timeout: 60000 });
await page.waitForTimeout(3000);
// Get what the browser sees
const pageData = await page.evaluate(() => ({
title: document.title,
url: window.location.href,
userAgent: navigator.userAgent,
bodyHTML: document.body.innerHTML,
bodyText: document.body.innerText
}));
console.log('\n📄 PAGE DATA:');
console.log('Title:', pageData.title);
console.log('URL:', pageData.url);
console.log('User Agent (browser sees):', pageData.userAgent);
console.log('Body HTML length:', pageData.bodyHTML.length, 'chars');
console.log('Body text length:', pageData.bodyText.length, 'chars');
// Save HTML to file
fs.writeFileSync('/tmp/page.html', pageData.bodyHTML);
console.log('\n💾 Saved HTML to /tmp/page.html');
// Save screenshot
await page.screenshot({ path: '/tmp/screenshot.png', fullPage: true });
console.log('📸 Saved screenshot to /tmp/screenshot.png');
// Show first 500 chars of HTML
console.log('\n📝 First 500 chars of HTML:');
console.log(pageData.bodyHTML.substring(0, 500));
// Show first 500 chars of text
console.log('\n📝 First 500 chars of text:');
console.log(pageData.bodyText.substring(0, 500));
} catch (error: any) {
console.error('❌ Error:', error.message);
} finally {
if (browser) await browser.close();
await pool.end();
}
}
debug();

View File

@@ -0,0 +1,68 @@
import { chromium } from 'playwright';
async function debugSolFlower() {
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext();
const page = await context.newPage();
try {
// Set age gate bypass cookies
await context.addCookies([
{
name: 'age_verified',
value: 'true',
domain: '.dutchie.com',
path: '/',
},
{
name: 'initial_location',
value: JSON.stringify({ state: 'Arizona' }),
domain: '.dutchie.com',
path: '/',
},
]);
console.log('🌐 Loading Sol Flower Sun City shop page...');
await page.goto('https://dutchie.com/dispensary/sol-flower-dispensary/shop', {
waitUntil: 'networkidle',
});
console.log('📸 Taking screenshot...');
await page.screenshot({ path: '/tmp/sol-flower-shop.png', fullPage: true });
// Try to find products with various selectors
console.log('\n🔍 Looking for products with different selectors:');
const selectors = [
'a[href*="/product/"]',
'[data-testid="product-card"]',
'[data-testid="product"]',
'.product-card',
'.ProductCard',
'article',
'[role="article"]',
];
for (const selector of selectors) {
const count = await page.locator(selector).count();
console.log(` ${selector}: ${count} elements`);
}
// Get the page HTML to inspect
console.log('\n📄 Page title:', await page.title());
// Check if there's any text indicating no products
const bodyText = await page.locator('body').textContent();
if (bodyText?.includes('No products') || bodyText?.includes('no items')) {
console.log('⚠️ Page indicates no products available');
}
console.log('\n✅ Screenshot saved to /tmp/sol-flower-shop.png');
} catch (error) {
console.error('❌ Error:', error);
} finally {
await browser.close();
}
}
debugSolFlower();

View File

@@ -0,0 +1,151 @@
import { firefox } from 'playwright';
import { getRandomProxy } from './src/utils/proxyManager.js';
async function main() {
const menuUrl = 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted';
const specialsUrl = `${menuUrl}/specials`;
console.log('🔍 Investigating Specials Page...\n');
console.log(`URL: ${specialsUrl}\n`);
const proxyConfig = await getRandomProxy();
if (!proxyConfig) {
throw new Error('No proxy available');
}
console.log(`🔐 Using proxy: ${proxyConfig.server}\n`);
const browser = await firefox.launch({
headless: true,
proxy: proxyConfig
});
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0'
});
const page = await context.newPage();
try {
console.log('📄 Loading specials page...');
await page.goto(specialsUrl, {
waitUntil: 'domcontentloaded',
timeout: 60000
});
await page.waitForTimeout(3000);
// Scroll to load content
for (let i = 0; i < 10; i++) {
await page.evaluate(() => window.scrollBy(0, window.innerHeight));
await page.waitForTimeout(1000);
}
console.log('✅ Page loaded\n');
// Check for JSON-LD data
const jsonLdData = await page.evaluate(() => {
const scripts = Array.from(document.querySelectorAll('script[type="application/ld+json"]'));
return scripts.map(script => {
try {
return JSON.parse(script.textContent || '');
} catch (e) {
return null;
}
}).filter(Boolean);
});
if (jsonLdData.length > 0) {
console.log('📋 JSON-LD Data Found:');
console.log(JSON.stringify(jsonLdData, null, 2));
console.log('\n');
}
// Look for product cards
const productCards = await page.evaluate(() => {
const cards = Array.from(document.querySelectorAll('a[href*="/product/"]'));
return cards.slice(0, 5).map(card => ({
href: card.getAttribute('href'),
text: card.textContent?.trim().substring(0, 100)
}));
});
console.log('🛍️ Product Cards Found:', productCards.length);
if (productCards.length > 0) {
console.log('First 5 products:');
productCards.forEach((card, idx) => {
console.log(` ${idx + 1}. ${card.href}`);
console.log(` Text: ${card.text}\n`);
});
}
// Look for special indicators
const specialData = await page.evaluate(() => {
const pageText = document.body.textContent || '';
// Look for common special-related keywords
const hasDiscount = pageText.toLowerCase().includes('discount');
const hasSale = pageText.toLowerCase().includes('sale');
const hasOff = pageText.toLowerCase().includes('off');
const hasDeal = pageText.toLowerCase().includes('deal');
const hasPromo = pageText.toLowerCase().includes('promo');
// Look for percentage or dollar off indicators
const percentMatches = pageText.match(/(\d+)%\s*off/gi);
const dollarMatches = pageText.match(/\$(\d+)\s*off/gi);
// Try to find any special tags or badges
const badges = Array.from(document.querySelectorAll('[class*="badge"], [class*="tag"], [class*="special"], [class*="sale"], [class*="discount"]'));
const badgeTexts = badges.map(b => b.textContent?.trim()).filter(Boolean).slice(0, 10);
return {
keywords: {
hasDiscount,
hasSale,
hasOff,
hasDeal,
hasPromo
},
percentMatches: percentMatches || [],
dollarMatches: dollarMatches || [],
badgeTexts,
totalBadges: badges.length
};
});
console.log('\n🏷 Special Indicators:');
console.log(JSON.stringify(specialData, null, 2));
// Get page title and any heading text
const pageInfo = await page.evaluate(() => {
const title = document.title;
const h1 = document.querySelector('h1')?.textContent?.trim();
const h2s = Array.from(document.querySelectorAll('h2')).map(h => h.textContent?.trim()).slice(0, 3);
return { title, h1, h2s };
});
console.log('\n📰 Page Info:');
console.log(JSON.stringify(pageInfo, null, 2));
// Check if there are any price elements visible
const priceInfo = await page.evaluate(() => {
const pageText = document.body.textContent || '';
const priceMatches = pageText.match(/\$(\d+\.?\d*)/g);
return {
pricesFound: priceMatches?.length || 0,
samplePrices: priceMatches?.slice(0, 10) || []
};
});
console.log('\n💰 Price Info:');
console.log(JSON.stringify(priceInfo, null, 2));
} catch (error: any) {
console.error('❌ Error:', error.message);
} finally {
await browser.close();
}
}
main().catch(console.error);

View File

@@ -0,0 +1,17 @@
import { pool } from './src/db/migrate.js';
async function main() {
try {
const result = await pool.query(`
DELETE FROM products WHERE dispensary_id = 149
`);
console.log(`Deleted ${result.rowCount} products from dispensary 149`);
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await pool.end();
}
}
main().catch(console.error);

View File

@@ -0,0 +1,71 @@
import { chromium } from 'playwright';
import { bypassAgeGatePlaywright } from './src/utils/age-gate-playwright';
async function diagnoseCuraleafPage() {
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
viewport: { width: 1280, height: 720 }
});
const page = await context.newPage();
try {
console.log('Loading Curaleaf page...');
await page.goto('https://curaleaf.com/stores/curaleaf-dispensary-48th-street', {
waitUntil: 'domcontentloaded'
});
await page.waitForTimeout(2000);
console.log('Bypassing age gate...');
const bypassed = await bypassAgeGatePlaywright(page, 'Arizona');
if (!bypassed) {
console.log('❌ Failed to bypass age gate');
await browser.close();
return;
}
console.log('✅ Age gate bypassed!');
console.log(`Current URL: ${page.url()}`);
await page.waitForTimeout(5000);
// Check page title
const title = await page.title();
console.log(`\nPage title: ${title}`);
// Check for "menu" or "shop" links
const menuLinks = await page.locator('a:has-text("menu"), a:has-text("shop"), a:has-text("order")').count();
console.log(`\nMenu/Shop links found: ${menuLinks}`);
if (menuLinks > 0) {
const links = await page.locator('a:has-text("menu"), a:has-text("shop"), a:has-text("order")').all();
console.log('\nMenu links:');
for (const link of links.slice(0, 5)) {
const text = await link.textContent();
const href = await link.getAttribute('href');
console.log(` - ${text}: ${href}`);
}
}
// Check body text
const bodyText = await page.textContent('body') || '';
console.log(`\nBody text length: ${bodyText.length} characters`);
console.log(`Contains "menu": ${bodyText.toLowerCase().includes('menu')}`);
console.log(`Contains "shop": ${bodyText.toLowerCase().includes('shop')}`);
console.log(`Contains "product": ${bodyText.toLowerCase().includes('product')}`);
console.log(`Contains "dutchie": ${bodyText.toLowerCase().includes('dutchie')}`);
// Take screenshot
await page.screenshot({ path: '/tmp/curaleaf-diagnosed.png', fullPage: true });
console.log('\n📸 Screenshot saved: /tmp/curaleaf-diagnosed.png');
} catch (error) {
console.error('Error:', error);
} finally {
await browser.close();
}
}
diagnoseCuraleafPage();

65
backend/dist/auth/middleware.js vendored Normal file
View File

@@ -0,0 +1,65 @@
"use strict";
var __importDefault = (this && this.__importDefault) || function (mod) {
return (mod && mod.__esModule) ? mod : { "default": mod };
};
Object.defineProperty(exports, "__esModule", { value: true });
exports.generateToken = generateToken;
exports.verifyToken = verifyToken;
exports.authenticateUser = authenticateUser;
exports.authMiddleware = authMiddleware;
exports.requireRole = requireRole;
const jsonwebtoken_1 = __importDefault(require("jsonwebtoken"));
const bcrypt_1 = __importDefault(require("bcrypt"));
const migrate_1 = require("../db/migrate");
const JWT_SECRET = process.env.JWT_SECRET || 'change_this_in_production';
function generateToken(user) {
return jsonwebtoken_1.default.sign({ id: user.id, email: user.email, role: user.role }, JWT_SECRET, { expiresIn: '7d' });
}
function verifyToken(token) {
try {
return jsonwebtoken_1.default.verify(token, JWT_SECRET);
}
catch (error) {
return null;
}
}
async function authenticateUser(email, password) {
const result = await migrate_1.pool.query('SELECT id, email, password_hash, role FROM users WHERE email = $1', [email]);
if (result.rows.length === 0) {
return null;
}
const user = result.rows[0];
const isValid = await bcrypt_1.default.compare(password, user.password_hash);
if (!isValid) {
return null;
}
return {
id: user.id,
email: user.email,
role: user.role
};
}
function authMiddleware(req, res, next) {
const authHeader = req.headers.authorization;
if (!authHeader || !authHeader.startsWith('Bearer ')) {
return res.status(401).json({ error: 'No token provided' });
}
const token = authHeader.substring(7);
const user = verifyToken(token);
if (!user) {
return res.status(401).json({ error: 'Invalid token' });
}
req.user = user;
next();
}
function requireRole(...roles) {
return (req, res, next) => {
if (!req.user) {
return res.status(401).json({ error: 'Not authenticated' });
}
if (!roles.includes(req.user.role)) {
return res.status(403).json({ error: 'Insufficient permissions' });
}
next();
};
}

41
backend/dist/db/add-jobs-table.js vendored Normal file
View File

@@ -0,0 +1,41 @@
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
const migrate_1 = require("./migrate");
async function addJobsTable() {
const client = await migrate_1.pool.connect();
try {
await client.query('BEGIN');
await client.query(`
CREATE TABLE IF NOT EXISTS jobs (
id SERIAL PRIMARY KEY,
type VARCHAR(50) NOT NULL,
status VARCHAR(50) DEFAULT 'pending',
store_id INTEGER REFERENCES stores(id) ON DELETE CASCADE,
progress INTEGER DEFAULT 0,
total_items INTEGER,
processed_items INTEGER DEFAULT 0,
error TEXT,
started_at TIMESTAMP,
completed_at TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_jobs_status ON jobs(status);
CREATE INDEX IF NOT EXISTS idx_jobs_type ON jobs(type);
CREATE INDEX IF NOT EXISTS idx_jobs_store_id ON jobs(store_id);
`);
await client.query('COMMIT');
console.log('✅ Jobs table created successfully');
}
catch (error) {
await client.query('ROLLBACK');
console.error('❌ Failed to create jobs table:', error);
throw error;
}
finally {
client.release();
}
}
addJobsTable()
.then(() => process.exit(0))
.catch(() => process.exit(1));

182
backend/dist/db/migrate.js vendored Normal file
View File

@@ -0,0 +1,182 @@
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
exports.pool = void 0;
exports.runMigrations = runMigrations;
const pg_1 = require("pg");
const pool = new pg_1.Pool({
connectionString: process.env.DATABASE_URL,
});
exports.pool = pool;
async function runMigrations() {
const client = await pool.connect();
try {
await client.query('BEGIN');
// Users table
await client.query(`
CREATE TABLE IF NOT EXISTS users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
role VARCHAR(50) DEFAULT 'admin',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
`);
// Stores table
await client.query(`
CREATE TABLE IF NOT EXISTS stores (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
slug VARCHAR(255) UNIQUE NOT NULL,
dutchie_url TEXT NOT NULL,
active BOOLEAN DEFAULT true,
scrape_enabled BOOLEAN DEFAULT true,
last_scraped_at TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
`);
// Categories table (shop, brands, specials)
await client.query(`
CREATE TABLE IF NOT EXISTS categories (
id SERIAL PRIMARY KEY,
store_id INTEGER REFERENCES stores(id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
slug VARCHAR(255) NOT NULL,
dutchie_url TEXT NOT NULL,
scrape_enabled BOOLEAN DEFAULT true,
last_scraped_at TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(store_id, slug)
);
`);
// Products table
await client.query(`
CREATE TABLE IF NOT EXISTS products (
id SERIAL PRIMARY KEY,
store_id INTEGER REFERENCES stores(id) ON DELETE CASCADE,
category_id INTEGER REFERENCES categories(id) ON DELETE SET NULL,
dutchie_product_id VARCHAR(255),
name VARCHAR(500) NOT NULL,
slug VARCHAR(500),
description TEXT,
price DECIMAL(10, 2),
original_price DECIMAL(10, 2),
strain_type VARCHAR(100),
thc_percentage DECIMAL(5, 2),
cbd_percentage DECIMAL(5, 2),
brand VARCHAR(255),
weight VARCHAR(100),
image_url TEXT,
local_image_path TEXT,
dutchie_url TEXT NOT NULL,
in_stock BOOLEAN DEFAULT true,
is_special BOOLEAN DEFAULT false,
metadata JSONB,
first_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(store_id, dutchie_product_id)
);
`);
// Campaigns table
await client.query(`
CREATE TABLE IF NOT EXISTS campaigns (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
slug VARCHAR(255) UNIQUE NOT NULL,
description TEXT,
display_style VARCHAR(50) DEFAULT 'grid',
active BOOLEAN DEFAULT true,
start_date TIMESTAMP,
end_date TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
`);
// Campaign products (many-to-many with ordering)
await client.query(`
CREATE TABLE IF NOT EXISTS campaign_products (
id SERIAL PRIMARY KEY,
campaign_id INTEGER REFERENCES campaigns(id) ON DELETE CASCADE,
product_id INTEGER REFERENCES products(id) ON DELETE CASCADE,
display_order INTEGER DEFAULT 0,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(campaign_id, product_id)
);
`);
// Click tracking
await client.query(`
CREATE TABLE IF NOT EXISTS clicks (
id SERIAL PRIMARY KEY,
product_id INTEGER REFERENCES products(id) ON DELETE CASCADE,
campaign_id INTEGER REFERENCES campaigns(id) ON DELETE SET NULL,
ip_address VARCHAR(45),
user_agent TEXT,
referrer TEXT,
clicked_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
`);
// Create index on clicked_at for analytics queries
await client.query(`
CREATE INDEX IF NOT EXISTS idx_clicks_clicked_at ON clicks(clicked_at);
CREATE INDEX IF NOT EXISTS idx_clicks_product_id ON clicks(product_id);
CREATE INDEX IF NOT EXISTS idx_clicks_campaign_id ON clicks(campaign_id);
`);
// Proxies table
await client.query(`
CREATE TABLE IF NOT EXISTS proxies (
id SERIAL PRIMARY KEY,
host VARCHAR(255) NOT NULL,
port INTEGER NOT NULL,
protocol VARCHAR(10) NOT NULL,
username VARCHAR(255),
password VARCHAR(255),
active BOOLEAN DEFAULT true,
is_anonymous BOOLEAN DEFAULT false,
last_tested_at TIMESTAMP,
test_result VARCHAR(50),
response_time_ms INTEGER,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(host, port, protocol)
);
`);
// Settings table
await client.query(`
CREATE TABLE IF NOT EXISTS settings (
key VARCHAR(255) PRIMARY KEY,
value TEXT NOT NULL,
description TEXT,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
`);
// Insert default settings
await client.query(`
INSERT INTO settings (key, value, description) VALUES
('scrape_interval_hours', '4', 'How often to scrape stores (in hours)'),
('scrape_specials_time', '00:01', 'Time to scrape specials daily (HH:MM in 24h format)'),
('analytics_retention_days', '365', 'How many days to keep analytics data'),
('proxy_timeout_ms', '3000', 'Proxy timeout in milliseconds'),
('proxy_test_url', 'https://httpbin.org/ip', 'URL to test proxies against')
ON CONFLICT (key) DO NOTHING;
`);
await client.query('COMMIT');
console.log('✅ Migrations completed successfully');
}
catch (error) {
await client.query('ROLLBACK');
console.error('❌ Migration failed:', error);
throw error;
}
finally {
client.release();
}
}
// Run migrations if this file is executed directly
if (require.main === module) {
runMigrations()
.then(() => process.exit(0))
.catch(() => process.exit(1));
}

72
backend/dist/db/seed.js vendored Normal file
View File

@@ -0,0 +1,72 @@
"use strict";
var __importDefault = (this && this.__importDefault) || function (mod) {
return (mod && mod.__esModule) ? mod : { "default": mod };
};
Object.defineProperty(exports, "__esModule", { value: true });
exports.seedDatabase = seedDatabase;
const migrate_1 = require("./migrate");
const bcrypt_1 = __importDefault(require("bcrypt"));
async function seedDatabase() {
const client = await migrate_1.pool.connect();
try {
// Create admin user
const adminEmail = process.env.ADMIN_EMAIL || 'admin@example.com';
const adminPassword = process.env.ADMIN_PASSWORD || 'password';
const passwordHash = await bcrypt_1.default.hash(adminPassword, 10);
await client.query(`
INSERT INTO users (email, password_hash, role)
VALUES ($1, $2, 'superadmin')
ON CONFLICT (email) DO UPDATE
SET password_hash = $2, role = 'superadmin'
`, [adminEmail, passwordHash]);
console.log(`✅ Admin user created: ${adminEmail}`);
// Create Deeply Rooted store
const storeResult = await client.query(`
INSERT INTO stores (name, slug, dutchie_url, active, scrape_enabled)
VALUES ('Deeply Rooted', 'AZ-Deeply-Rooted', 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted', true, true)
ON CONFLICT (slug) DO UPDATE
SET name = 'Deeply Rooted', dutchie_url = 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted'
RETURNING id
`);
const storeId = storeResult.rows[0].id;
console.log(`✅ Store created: Deeply Rooted (ID: ${storeId})`);
// Create categories for the store
const categories = [
{ name: 'Shop', slug: 'shop', url: 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted' },
{ name: 'Brands', slug: 'brands', url: 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted/brands' },
{ name: 'Specials', slug: 'specials', url: 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted/specials/sale/66501e094faefa00079b1835' }
];
for (const cat of categories) {
await client.query(`
INSERT INTO categories (store_id, name, slug, dutchie_url, scrape_enabled)
VALUES ($1, $2, $3, $4, true)
ON CONFLICT (store_id, slug) DO UPDATE
SET name = $2, dutchie_url = $4
`, [storeId, cat.name, cat.slug, cat.url]);
}
console.log('✅ Categories created: Shop, Brands, Specials');
// Create a default "Featured Products" campaign
await client.query(`
INSERT INTO campaigns (name, slug, description, display_style, active)
VALUES ('Featured Products', 'featured', 'Default featured products campaign', 'grid', true)
ON CONFLICT (slug) DO NOTHING
`);
console.log('✅ Default campaign created: Featured Products');
console.log('\n🎉 Seeding completed successfully!');
console.log(`\n📧 Login: ${adminEmail}`);
console.log(`🔑 Password: ${adminPassword}`);
}
catch (error) {
console.error('❌ Seeding failed:', error);
throw error;
}
finally {
client.release();
}
}
// Run seed if this file is executed directly
if (require.main === module) {
seedDatabase()
.then(() => process.exit(0))
.catch(() => process.exit(1));
}

View File

@@ -0,0 +1,48 @@
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
const migrate_1 = require("./migrate");
async function updateCategoriesHierarchy() {
const client = await migrate_1.pool.connect();
try {
await client.query('BEGIN');
// Add parent_id for nested categories
await client.query(`
ALTER TABLE categories
ADD COLUMN IF NOT EXISTS parent_id INTEGER REFERENCES categories(id) ON DELETE CASCADE;
ALTER TABLE categories
ADD COLUMN IF NOT EXISTS display_order INTEGER DEFAULT 0;
ALTER TABLE categories
ADD COLUMN IF NOT EXISTS description TEXT;
CREATE INDEX IF NOT EXISTS idx_categories_parent_id ON categories(parent_id);
`);
// Add category_path for easy searching (e.g., 'shop/flower')
await client.query(`
ALTER TABLE categories
ADD COLUMN IF NOT EXISTS path VARCHAR(500);
CREATE INDEX IF NOT EXISTS idx_categories_path ON categories(path);
`);
// Update existing categories to have paths
await client.query(`
UPDATE categories
SET path = slug
WHERE path IS NULL;
`);
await client.query('COMMIT');
console.log('✅ Categories hierarchy updated successfully');
}
catch (error) {
await client.query('ROLLBACK');
console.error('❌ Failed to update categories:', error);
throw error;
}
finally {
client.release();
}
}
updateCategoriesHierarchy()
.then(() => process.exit(0))
.catch(() => process.exit(1));

57
backend/dist/index.js vendored Normal file
View File

@@ -0,0 +1,57 @@
"use strict";
var __importDefault = (this && this.__importDefault) || function (mod) {
return (mod && mod.__esModule) ? mod : { "default": mod };
};
Object.defineProperty(exports, "__esModule", { value: true });
const express_1 = __importDefault(require("express"));
const cors_1 = __importDefault(require("cors"));
const dotenv_1 = __importDefault(require("dotenv"));
const minio_1 = require("./utils/minio");
const logger_1 = require("./services/logger");
dotenv_1.default.config();
const app = (0, express_1.default)();
const PORT = process.env.PORT || 3010;
app.use((0, cors_1.default)());
app.use(express_1.default.json());
app.get('/health', (req, res) => {
res.json({ status: 'ok', timestamp: new Date().toISOString() });
});
const auth_1 = __importDefault(require("./routes/auth"));
const dashboard_1 = __importDefault(require("./routes/dashboard"));
const stores_1 = __importDefault(require("./routes/stores"));
const categories_1 = __importDefault(require("./routes/categories"));
const products_1 = __importDefault(require("./routes/products"));
const campaigns_1 = __importDefault(require("./routes/campaigns"));
const analytics_1 = __importDefault(require("./routes/analytics"));
const settings_1 = __importDefault(require("./routes/settings"));
const proxies_1 = __importDefault(require("./routes/proxies"));
const logs_1 = __importDefault(require("./routes/logs"));
const scraper_monitor_1 = __importDefault(require("./routes/scraper-monitor"));
app.use('/api/auth', auth_1.default);
app.use('/api/dashboard', dashboard_1.default);
app.use('/api/stores', stores_1.default);
app.use('/api/categories', categories_1.default);
app.use('/api/products', products_1.default);
app.use('/api/campaigns', campaigns_1.default);
app.use('/api/analytics', analytics_1.default);
app.use('/api/settings', settings_1.default);
app.use('/api/proxies', proxies_1.default);
app.use('/api/logs', logs_1.default);
app.use('/api/scraper-monitor', scraper_monitor_1.default);
async function startServer() {
try {
logger_1.logger.info('system', 'Starting server...');
await (0, minio_1.initializeMinio)();
logger_1.logger.info('system', 'Minio initialized');
app.listen(PORT, () => {
logger_1.logger.info('system', `Server running on port ${PORT}`);
console.log(`🚀 Server running on port ${PORT}`);
});
}
catch (error) {
logger_1.logger.error('system', `Failed to start server: ${error}`);
console.error('Failed to start server:', error);
process.exit(1);
}
}
startServer();

121
backend/dist/routes/analytics.js vendored Normal file
View File

@@ -0,0 +1,121 @@
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
const express_1 = require("express");
const middleware_1 = require("../auth/middleware");
const migrate_1 = require("../db/migrate");
const router = (0, express_1.Router)();
router.use(middleware_1.authMiddleware);
// Get analytics overview
router.get('/overview', async (req, res) => {
try {
const { days = 30 } = req.query;
// Total clicks
const clicksResult = await migrate_1.pool.query(`
SELECT COUNT(*) as total_clicks
FROM clicks
WHERE clicked_at >= NOW() - INTERVAL '${parseInt(days)} days'
`);
// Unique products clicked
const uniqueProductsResult = await migrate_1.pool.query(`
SELECT COUNT(DISTINCT product_id) as unique_products
FROM clicks
WHERE clicked_at >= NOW() - INTERVAL '${parseInt(days)} days'
`);
// Clicks by day
const clicksByDayResult = await migrate_1.pool.query(`
SELECT DATE(clicked_at) as date, COUNT(*) as clicks
FROM clicks
WHERE clicked_at >= NOW() - INTERVAL '${parseInt(days)} days'
GROUP BY DATE(clicked_at)
ORDER BY date DESC
`);
// Top products
const topProductsResult = await migrate_1.pool.query(`
SELECT p.id, p.name, p.price, COUNT(c.id) as click_count
FROM clicks c
JOIN products p ON c.product_id = p.id
WHERE c.clicked_at >= NOW() - INTERVAL '${parseInt(days)} days'
GROUP BY p.id, p.name, p.price
ORDER BY click_count DESC
LIMIT 10
`);
res.json({
overview: {
total_clicks: parseInt(clicksResult.rows[0].total_clicks),
unique_products: parseInt(uniqueProductsResult.rows[0].unique_products)
},
clicks_by_day: clicksByDayResult.rows,
top_products: topProductsResult.rows
});
}
catch (error) {
console.error('Error fetching analytics:', error);
res.status(500).json({ error: 'Failed to fetch analytics' });
}
});
// Get product analytics
router.get('/products/:id', async (req, res) => {
try {
const { id } = req.params;
const { days = 30 } = req.query;
// Total clicks for this product
const totalResult = await migrate_1.pool.query(`
SELECT COUNT(*) as total_clicks
FROM clicks
WHERE product_id = $1
AND clicked_at >= NOW() - INTERVAL '${parseInt(days)} days'
`, [id]);
// Clicks by day
const byDayResult = await migrate_1.pool.query(`
SELECT DATE(clicked_at) as date, COUNT(*) as clicks
FROM clicks
WHERE product_id = $1
AND clicked_at >= NOW() - INTERVAL '${parseInt(days)} days'
GROUP BY DATE(clicked_at)
ORDER BY date DESC
`, [id]);
res.json({
product_id: parseInt(id),
total_clicks: parseInt(totalResult.rows[0].total_clicks),
clicks_by_day: byDayResult.rows
});
}
catch (error) {
console.error('Error fetching product analytics:', error);
res.status(500).json({ error: 'Failed to fetch product analytics' });
}
});
// Get campaign analytics
router.get('/campaigns/:id', async (req, res) => {
try {
const { id } = req.params;
const { days = 30 } = req.query;
// Total clicks for this campaign
const totalResult = await migrate_1.pool.query(`
SELECT COUNT(*) as total_clicks
FROM clicks
WHERE campaign_id = $1
AND clicked_at >= NOW() - INTERVAL '${parseInt(days)} days'
`, [id]);
// Clicks by product in this campaign
const byProductResult = await migrate_1.pool.query(`
SELECT p.id, p.name, COUNT(c.id) as clicks
FROM clicks c
JOIN products p ON c.product_id = p.id
WHERE c.campaign_id = $1
AND c.clicked_at >= NOW() - INTERVAL '${parseInt(days)} days'
GROUP BY p.id, p.name
ORDER BY clicks DESC
`, [id]);
res.json({
campaign_id: parseInt(id),
total_clicks: parseInt(totalResult.rows[0].total_clicks),
clicks_by_product: byProductResult.rows
});
}
catch (error) {
console.error('Error fetching campaign analytics:', error);
res.status(500).json({ error: 'Failed to fetch campaign analytics' });
}
});
exports.default = router;

43
backend/dist/routes/auth.js vendored Normal file
View File

@@ -0,0 +1,43 @@
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
const express_1 = require("express");
const middleware_1 = require("../auth/middleware");
const router = (0, express_1.Router)();
// Login
router.post('/login', async (req, res) => {
try {
const { email, password } = req.body;
if (!email || !password) {
return res.status(400).json({ error: 'Email and password required' });
}
const user = await (0, middleware_1.authenticateUser)(email, password);
if (!user) {
return res.status(401).json({ error: 'Invalid credentials' });
}
const token = (0, middleware_1.generateToken)(user);
res.json({
token,
user: {
id: user.id,
email: user.email,
role: user.role
}
});
}
catch (error) {
console.error('Login error:', error);
res.status(500).json({ error: 'Internal server error' });
}
});
// Get current user
router.get('/me', middleware_1.authMiddleware, async (req, res) => {
res.json({
user: req.user
});
});
// Refresh token
router.post('/refresh', middleware_1.authMiddleware, async (req, res) => {
const token = (0, middleware_1.generateToken)(req.user);
res.json({ token });
});
exports.default = router;

163
backend/dist/routes/campaigns.js vendored Normal file
View File

@@ -0,0 +1,163 @@
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
const express_1 = require("express");
const middleware_1 = require("../auth/middleware");
const migrate_1 = require("../db/migrate");
const router = (0, express_1.Router)();
router.use(middleware_1.authMiddleware);
// Get all campaigns
router.get('/', async (req, res) => {
try {
const result = await migrate_1.pool.query(`
SELECT c.*, COUNT(cp.product_id) as product_count
FROM campaigns c
LEFT JOIN campaign_products cp ON c.id = cp.campaign_id
GROUP BY c.id
ORDER BY c.created_at DESC
`);
res.json({ campaigns: result.rows });
}
catch (error) {
console.error('Error fetching campaigns:', error);
res.status(500).json({ error: 'Failed to fetch campaigns' });
}
});
// Get single campaign with products
router.get('/:id', async (req, res) => {
try {
const { id } = req.params;
const campaignResult = await migrate_1.pool.query(`
SELECT * FROM campaigns WHERE id = $1
`, [id]);
if (campaignResult.rows.length === 0) {
return res.status(404).json({ error: 'Campaign not found' });
}
const productsResult = await migrate_1.pool.query(`
SELECT p.*, cp.display_order
FROM products p
JOIN campaign_products cp ON p.id = cp.product_id
WHERE cp.campaign_id = $1
ORDER BY cp.display_order
`, [id]);
res.json({
campaign: campaignResult.rows[0],
products: productsResult.rows
});
}
catch (error) {
console.error('Error fetching campaign:', error);
res.status(500).json({ error: 'Failed to fetch campaign' });
}
});
// Create campaign
router.post('/', (0, middleware_1.requireRole)('superadmin', 'admin'), async (req, res) => {
try {
const { name, slug, description, display_style, active, start_date, end_date } = req.body;
if (!name || !slug) {
return res.status(400).json({ error: 'Name and slug required' });
}
const result = await migrate_1.pool.query(`
INSERT INTO campaigns (name, slug, description, display_style, active, start_date, end_date)
VALUES ($1, $2, $3, $4, $5, $6, $7)
RETURNING *
`, [name, slug, description, display_style || 'grid', active !== false, start_date, end_date]);
res.status(201).json({ campaign: result.rows[0] });
}
catch (error) {
console.error('Error creating campaign:', error);
if (error.code === '23505') {
return res.status(409).json({ error: 'Campaign slug already exists' });
}
res.status(500).json({ error: 'Failed to create campaign' });
}
});
// Update campaign
router.put('/:id', (0, middleware_1.requireRole)('superadmin', 'admin'), async (req, res) => {
try {
const { id } = req.params;
const { name, slug, description, display_style, active, start_date, end_date } = req.body;
const result = await migrate_1.pool.query(`
UPDATE campaigns
SET name = COALESCE($1, name),
slug = COALESCE($2, slug),
description = COALESCE($3, description),
display_style = COALESCE($4, display_style),
active = COALESCE($5, active),
start_date = COALESCE($6, start_date),
end_date = COALESCE($7, end_date),
updated_at = CURRENT_TIMESTAMP
WHERE id = $8
RETURNING *
`, [name, slug, description, display_style, active, start_date, end_date, id]);
if (result.rows.length === 0) {
return res.status(404).json({ error: 'Campaign not found' });
}
res.json({ campaign: result.rows[0] });
}
catch (error) {
console.error('Error updating campaign:', error);
if (error.code === '23505') {
return res.status(409).json({ error: 'Campaign slug already exists' });
}
res.status(500).json({ error: 'Failed to update campaign' });
}
});
// Delete campaign
router.delete('/:id', (0, middleware_1.requireRole)('superadmin'), async (req, res) => {
try {
const { id } = req.params;
const result = await migrate_1.pool.query(`
DELETE FROM campaigns WHERE id = $1 RETURNING id
`, [id]);
if (result.rows.length === 0) {
return res.status(404).json({ error: 'Campaign not found' });
}
res.json({ message: 'Campaign deleted successfully' });
}
catch (error) {
console.error('Error deleting campaign:', error);
res.status(500).json({ error: 'Failed to delete campaign' });
}
});
// Add product to campaign
router.post('/:id/products', (0, middleware_1.requireRole)('superadmin', 'admin'), async (req, res) => {
try {
const { id } = req.params;
const { product_id, display_order } = req.body;
if (!product_id) {
return res.status(400).json({ error: 'Product ID required' });
}
const result = await migrate_1.pool.query(`
INSERT INTO campaign_products (campaign_id, product_id, display_order)
VALUES ($1, $2, $3)
ON CONFLICT (campaign_id, product_id)
DO UPDATE SET display_order = $3
RETURNING *
`, [id, product_id, display_order || 0]);
res.status(201).json({ campaign_product: result.rows[0] });
}
catch (error) {
console.error('Error adding product to campaign:', error);
res.status(500).json({ error: 'Failed to add product to campaign' });
}
});
// Remove product from campaign
router.delete('/:id/products/:product_id', (0, middleware_1.requireRole)('superadmin', 'admin'), async (req, res) => {
try {
const { id, product_id } = req.params;
const result = await migrate_1.pool.query(`
DELETE FROM campaign_products
WHERE campaign_id = $1 AND product_id = $2
RETURNING *
`, [id, product_id]);
if (result.rows.length === 0) {
return res.status(404).json({ error: 'Product not in campaign' });
}
res.json({ message: 'Product removed from campaign' });
}
catch (error) {
console.error('Error removing product from campaign:', error);
res.status(500).json({ error: 'Failed to remove product from campaign' });
}
});
exports.default = router;

84
backend/dist/routes/categories.js vendored Normal file
View File

@@ -0,0 +1,84 @@
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
const express_1 = require("express");
const middleware_1 = require("../auth/middleware");
const migrate_1 = require("../db/migrate");
const router = (0, express_1.Router)();
router.use(middleware_1.authMiddleware);
// Get categories (flat list)
router.get('/', async (req, res) => {
try {
const { store_id } = req.query;
let query = `
SELECT
c.*,
COUNT(DISTINCT p.id) as product_count,
pc.name as parent_name
FROM categories c
LEFT JOIN products p ON c.id = p.category_id
LEFT JOIN categories pc ON c.parent_id = pc.id
`;
const params = [];
if (store_id) {
query += ' WHERE c.store_id = $1';
params.push(store_id);
}
query += `
GROUP BY c.id, pc.name
ORDER BY c.display_order, c.name
`;
const result = await migrate_1.pool.query(query, params);
res.json({ categories: result.rows });
}
catch (error) {
console.error('Error fetching categories:', error);
res.status(500).json({ error: 'Failed to fetch categories' });
}
});
// Get category tree (hierarchical)
router.get('/tree', async (req, res) => {
try {
const { store_id } = req.query;
if (!store_id) {
return res.status(400).json({ error: 'store_id is required' });
}
// Get all categories for the store
const result = await migrate_1.pool.query(`
SELECT
c.*,
COUNT(DISTINCT p.id) as product_count
FROM categories c
LEFT JOIN products p ON c.id = p.category_id AND p.in_stock = true
WHERE c.store_id = $1
GROUP BY c.id
ORDER BY c.display_order, c.name
`, [store_id]);
// Build tree structure
const categories = result.rows;
const categoryMap = new Map();
const tree = [];
// First pass: create map
categories.forEach(cat => {
categoryMap.set(cat.id, { ...cat, children: [] });
});
// Second pass: build tree
categories.forEach(cat => {
const node = categoryMap.get(cat.id);
if (cat.parent_id) {
const parent = categoryMap.get(cat.parent_id);
if (parent) {
parent.children.push(node);
}
}
else {
tree.push(node);
}
});
res.json({ tree });
}
catch (error) {
console.error('Error fetching category tree:', error);
res.status(500).json({ error: 'Failed to fetch category tree' });
}
});
exports.default = router;

102
backend/dist/routes/dashboard.js vendored Normal file
View File

@@ -0,0 +1,102 @@
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
const express_1 = require("express");
const middleware_1 = require("../auth/middleware");
const migrate_1 = require("../db/migrate");
const router = (0, express_1.Router)();
router.use(middleware_1.authMiddleware);
// Get dashboard stats
router.get('/stats', async (req, res) => {
try {
// Store stats
const storesResult = await migrate_1.pool.query(`
SELECT
COUNT(*) as total,
COUNT(*) FILTER (WHERE active = true) as active,
MIN(last_scraped_at) as oldest_scrape,
MAX(last_scraped_at) as latest_scrape
FROM stores
`);
// Product stats
const productsResult = await migrate_1.pool.query(`
SELECT
COUNT(*) as total,
COUNT(*) FILTER (WHERE in_stock = true) as in_stock,
COUNT(*) FILTER (WHERE local_image_path IS NOT NULL) as with_images
FROM products
`);
// Campaign stats
const campaignsResult = await migrate_1.pool.query(`
SELECT
COUNT(*) as total,
COUNT(*) FILTER (WHERE active = true) as active
FROM campaigns
`);
// Recent clicks (last 24 hours)
const clicksResult = await migrate_1.pool.query(`
SELECT COUNT(*) as clicks_24h
FROM clicks
WHERE clicked_at >= NOW() - INTERVAL '24 hours'
`);
// Recent products added (last 24 hours)
const recentProductsResult = await migrate_1.pool.query(`
SELECT COUNT(*) as new_products_24h
FROM products
WHERE first_seen_at >= NOW() - INTERVAL '24 hours'
`);
// Proxy stats
const proxiesResult = await migrate_1.pool.query(`
SELECT
COUNT(*) as total,
COUNT(*) FILTER (WHERE active = true) as active,
COUNT(*) FILTER (WHERE is_anonymous = true) as anonymous
FROM proxies
`);
res.json({
stores: storesResult.rows[0],
products: productsResult.rows[0],
campaigns: campaignsResult.rows[0],
clicks: clicksResult.rows[0],
recent: recentProductsResult.rows[0],
proxies: proxiesResult.rows[0]
});
}
catch (error) {
console.error('Error fetching dashboard stats:', error);
res.status(500).json({ error: 'Failed to fetch dashboard stats' });
}
});
// Get recent activity
router.get('/activity', async (req, res) => {
try {
const { limit = 20 } = req.query;
// Recent scrapes
const scrapesResult = await migrate_1.pool.query(`
SELECT s.name, s.last_scraped_at,
COUNT(p.id) as product_count
FROM stores s
LEFT JOIN products p ON s.id = p.store_id AND p.last_seen_at = s.last_scraped_at
WHERE s.last_scraped_at IS NOT NULL
GROUP BY s.id, s.name, s.last_scraped_at
ORDER BY s.last_scraped_at DESC
LIMIT $1
`, [limit]);
// Recent products
const productsResult = await migrate_1.pool.query(`
SELECT p.name, p.price, s.name as store_name, p.first_seen_at
FROM products p
JOIN stores s ON p.store_id = s.id
ORDER BY p.first_seen_at DESC
LIMIT $1
`, [limit]);
res.json({
recent_scrapes: scrapesResult.rows,
recent_products: productsResult.rows
});
}
catch (error) {
console.error('Error fetching dashboard activity:', error);
res.status(500).json({ error: 'Failed to fetch dashboard activity' });
}
});
exports.default = router;

29
backend/dist/routes/logs.js vendored Normal file
View File

@@ -0,0 +1,29 @@
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
const express_1 = require("express");
const middleware_1 = require("../auth/middleware");
const logger_1 = require("../services/logger");
const router = (0, express_1.Router)();
router.use(middleware_1.authMiddleware);
router.get('/', (0, middleware_1.requireRole)('superadmin', 'admin'), async (req, res) => {
try {
const { limit = '100', level, category } = req.query;
const logs = logger_1.logger.getLogs(parseInt(limit), level, category);
res.json({ logs });
}
catch (error) {
console.error('Error fetching logs:', error);
res.status(500).json({ error: 'Failed to fetch logs' });
}
});
router.delete('/', (0, middleware_1.requireRole)('superadmin'), async (req, res) => {
try {
logger_1.logger.clear();
res.json({ message: 'Logs cleared' });
}
catch (error) {
console.error('Error clearing logs:', error);
res.status(500).json({ error: 'Failed to clear logs' });
}
});
exports.default = router;

112
backend/dist/routes/products.js vendored Normal file
View File

@@ -0,0 +1,112 @@
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
const express_1 = require("express");
const middleware_1 = require("../auth/middleware");
const migrate_1 = require("../db/migrate");
const minio_1 = require("../utils/minio");
const router = (0, express_1.Router)();
router.use(middleware_1.authMiddleware);
// Get all products with filters
router.get('/', async (req, res) => {
try {
const { store_id, category_id, in_stock, search, limit = 50, offset = 0 } = req.query;
let query = `
SELECT p.*, s.name as store_name, c.name as category_name
FROM products p
LEFT JOIN stores s ON p.store_id = s.id
LEFT JOIN categories c ON p.category_id = c.id
WHERE 1=1
`;
const params = [];
let paramCount = 1;
if (store_id) {
query += ` AND p.store_id = $${paramCount}`;
params.push(store_id);
paramCount++;
}
if (category_id) {
query += ` AND p.category_id = $${paramCount}`;
params.push(category_id);
paramCount++;
}
if (in_stock !== undefined) {
query += ` AND p.in_stock = $${paramCount}`;
params.push(in_stock === 'true');
paramCount++;
}
if (search) {
query += ` AND (p.name ILIKE $${paramCount} OR p.brand ILIKE $${paramCount})`;
params.push(`%${search}%`);
paramCount++;
}
query += ` ORDER BY p.last_seen_at DESC LIMIT $${paramCount} OFFSET $${paramCount + 1}`;
params.push(limit, offset);
const result = await migrate_1.pool.query(query, params);
// Add image URLs
const products = result.rows.map(p => ({
...p,
image_url_full: p.local_image_path ? (0, minio_1.getImageUrl)(p.local_image_path) : p.image_url
}));
// Get total count
let countQuery = `SELECT COUNT(*) FROM products p WHERE 1=1`;
const countParams = [];
let countParamCount = 1;
if (store_id) {
countQuery += ` AND p.store_id = $${countParamCount}`;
countParams.push(store_id);
countParamCount++;
}
if (category_id) {
countQuery += ` AND p.category_id = $${countParamCount}`;
countParams.push(category_id);
countParamCount++;
}
if (in_stock !== undefined) {
countQuery += ` AND p.in_stock = $${countParamCount}`;
countParams.push(in_stock === 'true');
countParamCount++;
}
if (search) {
countQuery += ` AND (p.name ILIKE $${countParamCount} OR p.brand ILIKE $${countParamCount})`;
countParams.push(`%${search}%`);
countParamCount++;
}
const countResult = await migrate_1.pool.query(countQuery, countParams);
res.json({
products,
total: parseInt(countResult.rows[0].count),
limit: parseInt(limit),
offset: parseInt(offset)
});
}
catch (error) {
console.error('Error fetching products:', error);
res.status(500).json({ error: 'Failed to fetch products' });
}
});
// Get single product
router.get('/:id', async (req, res) => {
try {
const { id } = req.params;
const result = await migrate_1.pool.query(`
SELECT p.*, s.name as store_name, c.name as category_name
FROM products p
LEFT JOIN stores s ON p.store_id = s.id
LEFT JOIN categories c ON p.category_id = c.id
WHERE p.id = $1
`, [id]);
if (result.rows.length === 0) {
return res.status(404).json({ error: 'Product not found' });
}
const product = result.rows[0];
product.image_url_full = product.local_image_path
? (0, minio_1.getImageUrl)(product.local_image_path)
: product.image_url;
res.json({ product });
}
catch (error) {
console.error('Error fetching product:', error);
res.status(500).json({ error: 'Failed to fetch product' });
}
});
exports.default = router;

174
backend/dist/routes/proxies.js vendored Normal file
View File

@@ -0,0 +1,174 @@
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
const express_1 = require("express");
const middleware_1 = require("../auth/middleware");
const migrate_1 = require("../db/migrate");
const proxy_1 = require("../services/proxy");
const router = (0, express_1.Router)();
router.use(middleware_1.authMiddleware);
// Get all proxies
router.get('/', async (req, res) => {
try {
const result = await migrate_1.pool.query(`
SELECT id, host, port, protocol, active, is_anonymous,
last_tested_at, test_result, response_time_ms, created_at
FROM proxies
ORDER BY created_at DESC
`);
res.json({ proxies: result.rows });
}
catch (error) {
console.error('Error fetching proxies:', error);
res.status(500).json({ error: 'Failed to fetch proxies' });
}
});
// Get single proxy
router.get('/:id', async (req, res) => {
try {
const { id } = req.params;
const result = await migrate_1.pool.query(`
SELECT id, host, port, protocol, username, active, is_anonymous,
last_tested_at, test_result, response_time_ms, created_at
FROM proxies
WHERE id = $1
`, [id]);
if (result.rows.length === 0) {
return res.status(404).json({ error: 'Proxy not found' });
}
res.json({ proxy: result.rows[0] });
}
catch (error) {
console.error('Error fetching proxy:', error);
res.status(500).json({ error: 'Failed to fetch proxy' });
}
});
// Add single proxy
router.post('/', (0, middleware_1.requireRole)('superadmin', 'admin'), async (req, res) => {
try {
const { host, port, protocol, username, password } = req.body;
if (!host || !port || !protocol) {
return res.status(400).json({ error: 'Host, port, and protocol required' });
}
// Test and add proxy
const proxyId = await (0, proxy_1.addProxy)(host, port, protocol, username, password);
const result = await migrate_1.pool.query(`
SELECT * FROM proxies WHERE id = $1
`, [proxyId]);
res.status(201).json({ proxy: result.rows[0] });
}
catch (error) {
console.error('Error adding proxy:', error);
res.status(400).json({ error: error.message || 'Failed to add proxy' });
}
});
// Add multiple proxies
router.post('/bulk', (0, middleware_1.requireRole)('superadmin', 'admin'), async (req, res) => {
try {
const { proxies } = req.body;
if (!proxies || !Array.isArray(proxies)) {
return res.status(400).json({ error: 'Proxies array required' });
}
const result = await (0, proxy_1.addProxiesFromList)(proxies);
res.status(201).json(result);
}
catch (error) {
console.error('Error adding proxies:', error);
res.status(500).json({ error: 'Failed to add proxies' });
}
});
// Test single proxy
router.post('/:id/test', (0, middleware_1.requireRole)('superadmin', 'admin'), async (req, res) => {
try {
const { id } = req.params;
const proxyResult = await migrate_1.pool.query(`
SELECT host, port, protocol, username, password
FROM proxies
WHERE id = $1
`, [id]);
if (proxyResult.rows.length === 0) {
return res.status(404).json({ error: 'Proxy not found' });
}
const proxy = proxyResult.rows[0];
const testResult = await (0, proxy_1.testProxy)(proxy.host, proxy.port, proxy.protocol, proxy.username, proxy.password);
// Update proxy with test results
await migrate_1.pool.query(`
UPDATE proxies
SET last_tested_at = CURRENT_TIMESTAMP,
test_result = $1,
response_time_ms = $2,
is_anonymous = $3,
active = $4
WHERE id = $5
`, [
testResult.success ? 'success' : 'failed',
testResult.responseTimeMs,
testResult.isAnonymous,
testResult.success,
id
]);
res.json({ test_result: testResult });
}
catch (error) {
console.error('Error testing proxy:', error);
res.status(500).json({ error: 'Failed to test proxy' });
}
});
// Test all proxies
router.post('/test-all', (0, middleware_1.requireRole)('superadmin', 'admin'), async (req, res) => {
try {
// Run in background
(0, proxy_1.testAllProxies)().catch(err => {
console.error('Background proxy testing error:', err);
});
res.json({ message: 'Proxy testing started in background' });
}
catch (error) {
console.error('Error starting proxy tests:', error);
res.status(500).json({ error: 'Failed to start proxy tests' });
}
});
// Update proxy
router.put('/:id', (0, middleware_1.requireRole)('superadmin', 'admin'), async (req, res) => {
try {
const { id } = req.params;
const { host, port, protocol, username, password, active } = req.body;
const result = await migrate_1.pool.query(`
UPDATE proxies
SET host = COALESCE($1, host),
port = COALESCE($2, port),
protocol = COALESCE($3, protocol),
username = COALESCE($4, username),
password = COALESCE($5, password),
active = COALESCE($6, active),
updated_at = CURRENT_TIMESTAMP
WHERE id = $7
RETURNING *
`, [host, port, protocol, username, password, active, id]);
if (result.rows.length === 0) {
return res.status(404).json({ error: 'Proxy not found' });
}
res.json({ proxy: result.rows[0] });
}
catch (error) {
console.error('Error updating proxy:', error);
res.status(500).json({ error: 'Failed to update proxy' });
}
});
// Delete proxy
router.delete('/:id', (0, middleware_1.requireRole)('superadmin'), async (req, res) => {
try {
const { id } = req.params;
const result = await migrate_1.pool.query(`
DELETE FROM proxies WHERE id = $1 RETURNING id
`, [id]);
if (result.rows.length === 0) {
return res.status(404).json({ error: 'Proxy not found' });
}
res.json({ message: 'Proxy deleted successfully' });
}
catch (error) {
console.error('Error deleting proxy:', error);
res.status(500).json({ error: 'Failed to delete proxy' });
}
});
exports.default = router;

130
backend/dist/routes/scraper-monitor.js vendored Normal file
View File

@@ -0,0 +1,130 @@
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
exports.activeScrapers = void 0;
exports.registerScraper = registerScraper;
exports.updateScraperStats = updateScraperStats;
exports.completeScraper = completeScraper;
const express_1 = require("express");
const middleware_1 = require("../auth/middleware");
const migrate_1 = require("../db/migrate");
const router = (0, express_1.Router)();
router.use(middleware_1.authMiddleware);
exports.activeScrapers = new Map();
// Get all active scrapers
router.get('/active', async (req, res) => {
try {
const scrapers = Array.from(exports.activeScrapers.values()).map(scraper => ({
...scraper,
duration: Date.now() - scraper.startTime.getTime(),
isStale: Date.now() - scraper.lastUpdate.getTime() > 60000 // 1 minute
}));
res.json({ scrapers });
}
catch (error) {
console.error('Error fetching active scrapers:', error);
res.status(500).json({ error: 'Failed to fetch active scrapers' });
}
});
// Get scraper by ID
router.get('/active/:id', async (req, res) => {
try {
const { id } = req.params;
const scraper = exports.activeScrapers.get(id);
if (!scraper) {
return res.status(404).json({ error: 'Scraper not found' });
}
res.json({
scraper: {
...scraper,
duration: Date.now() - scraper.startTime.getTime(),
isStale: Date.now() - scraper.lastUpdate.getTime() > 60000
}
});
}
catch (error) {
console.error('Error fetching scraper:', error);
res.status(500).json({ error: 'Failed to fetch scraper' });
}
});
// Get scraper history (last 50 completed scrapes)
router.get('/history', async (req, res) => {
try {
const { limit = 50, store_id } = req.query;
let query = `
SELECT
s.id as store_id,
s.name as store_name,
c.id as category_id,
c.name as category_name,
c.last_scraped_at,
(
SELECT COUNT(*)
FROM products p
WHERE p.store_id = s.id
AND p.category_id = c.id
) as product_count
FROM stores s
LEFT JOIN categories c ON c.store_id = s.id
WHERE c.last_scraped_at IS NOT NULL
`;
const params = [];
let paramCount = 1;
if (store_id) {
query += ` AND s.id = $${paramCount}`;
params.push(store_id);
paramCount++;
}
query += ` ORDER BY c.last_scraped_at DESC LIMIT $${paramCount}`;
params.push(limit);
const result = await migrate_1.pool.query(query, params);
res.json({ history: result.rows });
}
catch (error) {
console.error('Error fetching scraper history:', error);
res.status(500).json({ error: 'Failed to fetch scraper history' });
}
});
// Helper function to register a scraper
function registerScraper(id, storeId, storeName, categoryId, categoryName) {
exports.activeScrapers.set(id, {
id,
storeId,
storeName,
categoryId,
categoryName,
startTime: new Date(),
lastUpdate: new Date(),
status: 'running',
stats: {
requestsTotal: 0,
requestsSuccess: 0,
itemsSaved: 0,
itemsDropped: 0,
errorsCount: 0
}
});
}
// Helper function to update scraper stats
function updateScraperStats(id, stats, currentActivity) {
const scraper = exports.activeScrapers.get(id);
if (scraper) {
scraper.stats = { ...scraper.stats, ...stats };
scraper.lastUpdate = new Date();
if (currentActivity) {
scraper.currentActivity = currentActivity;
}
}
}
// Helper function to mark scraper as completed
function completeScraper(id, error) {
const scraper = exports.activeScrapers.get(id);
if (scraper) {
scraper.status = error ? 'error' : 'completed';
scraper.lastUpdate = new Date();
// Remove after 5 minutes
setTimeout(() => {
exports.activeScrapers.delete(id);
}, 5 * 60 * 1000);
}
}
exports.default = router;

118
backend/dist/routes/settings.js vendored Normal file
View File

@@ -0,0 +1,118 @@
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
const express_1 = require("express");
const middleware_1 = require("../auth/middleware");
const migrate_1 = require("../db/migrate");
const scheduler_1 = require("../services/scheduler");
const router = (0, express_1.Router)();
router.use(middleware_1.authMiddleware);
// Get all settings
router.get('/', async (req, res) => {
try {
const result = await migrate_1.pool.query(`
SELECT key, value, description, updated_at
FROM settings
ORDER BY key
`);
res.json({ settings: result.rows });
}
catch (error) {
console.error('Error fetching settings:', error);
res.status(500).json({ error: 'Failed to fetch settings' });
}
});
// Get single setting
router.get('/:key', async (req, res) => {
try {
const { key } = req.params;
const result = await migrate_1.pool.query(`
SELECT key, value, description, updated_at
FROM settings
WHERE key = $1
`, [key]);
if (result.rows.length === 0) {
return res.status(404).json({ error: 'Setting not found' });
}
res.json({ setting: result.rows[0] });
}
catch (error) {
console.error('Error fetching setting:', error);
res.status(500).json({ error: 'Failed to fetch setting' });
}
});
// Update setting
router.put('/:key', (0, middleware_1.requireRole)('superadmin', 'admin'), async (req, res) => {
try {
const { key } = req.params;
const { value } = req.body;
if (value === undefined) {
return res.status(400).json({ error: 'Value required' });
}
const result = await migrate_1.pool.query(`
UPDATE settings
SET value = $1, updated_at = CURRENT_TIMESTAMP
WHERE key = $2
RETURNING *
`, [value, key]);
if (result.rows.length === 0) {
return res.status(404).json({ error: 'Setting not found' });
}
// Restart scheduler if scrape settings changed
if (key === 'scrape_interval_hours' || key === 'scrape_specials_time') {
console.log('Restarting scheduler due to setting change...');
await (0, scheduler_1.restartScheduler)();
}
res.json({ setting: result.rows[0] });
}
catch (error) {
console.error('Error updating setting:', error);
res.status(500).json({ error: 'Failed to update setting' });
}
});
// Update multiple settings at once
router.put('/', (0, middleware_1.requireRole)('superadmin', 'admin'), async (req, res) => {
try {
const { settings } = req.body;
if (!settings || !Array.isArray(settings)) {
return res.status(400).json({ error: 'Settings array required' });
}
const client = await migrate_1.pool.connect();
try {
await client.query('BEGIN');
const updated = [];
let needsSchedulerRestart = false;
for (const setting of settings) {
const result = await client.query(`
UPDATE settings
SET value = $1, updated_at = CURRENT_TIMESTAMP
WHERE key = $2
RETURNING *
`, [setting.value, setting.key]);
if (result.rows.length > 0) {
updated.push(result.rows[0]);
if (setting.key === 'scrape_interval_hours' || setting.key === 'scrape_specials_time') {
needsSchedulerRestart = true;
}
}
}
await client.query('COMMIT');
if (needsSchedulerRestart) {
console.log('Restarting scheduler due to setting changes...');
await (0, scheduler_1.restartScheduler)();
}
res.json({ settings: updated });
}
catch (error) {
await client.query('ROLLBACK');
throw error;
}
finally {
client.release();
}
}
catch (error) {
console.error('Error updating settings:', error);
res.status(500).json({ error: 'Failed to update settings' });
}
});
exports.default = router;

257
backend/dist/routes/stores.js vendored Normal file
View File

@@ -0,0 +1,257 @@
"use strict";
var __createBinding = (this && this.__createBinding) || (Object.create ? (function(o, m, k, k2) {
if (k2 === undefined) k2 = k;
var desc = Object.getOwnPropertyDescriptor(m, k);
if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
desc = { enumerable: true, get: function() { return m[k]; } };
}
Object.defineProperty(o, k2, desc);
}) : (function(o, m, k, k2) {
if (k2 === undefined) k2 = k;
o[k2] = m[k];
}));
var __setModuleDefault = (this && this.__setModuleDefault) || (Object.create ? (function(o, v) {
Object.defineProperty(o, "default", { enumerable: true, value: v });
}) : function(o, v) {
o["default"] = v;
});
var __importStar = (this && this.__importStar) || (function () {
var ownKeys = function(o) {
ownKeys = Object.getOwnPropertyNames || function (o) {
var ar = [];
for (var k in o) if (Object.prototype.hasOwnProperty.call(o, k)) ar[ar.length] = k;
return ar;
};
return ownKeys(o);
};
return function (mod) {
if (mod && mod.__esModule) return mod;
var result = {};
if (mod != null) for (var k = ownKeys(mod), i = 0; i < k.length; i++) if (k[i] !== "default") __createBinding(result, mod, k[i]);
__setModuleDefault(result, mod);
return result;
};
})();
Object.defineProperty(exports, "__esModule", { value: true });
const express_1 = require("express");
const middleware_1 = require("../auth/middleware");
const migrate_1 = require("../db/migrate");
const scraper_v2_1 = require("../scraper-v2");
const router = (0, express_1.Router)();
router.use(middleware_1.authMiddleware);
// Get all stores
router.get('/', async (req, res) => {
try {
const result = await migrate_1.pool.query(`
SELECT
s.*,
COUNT(DISTINCT p.id) as product_count,
COUNT(DISTINCT c.id) as category_count
FROM stores s
LEFT JOIN products p ON s.id = p.store_id
LEFT JOIN categories c ON s.id = c.store_id
GROUP BY s.id
ORDER BY s.name
`);
res.json({ stores: result.rows });
}
catch (error) {
console.error('Error fetching stores:', error);
res.status(500).json({ error: 'Failed to fetch stores' });
}
});
// Get single store
router.get('/:id', async (req, res) => {
try {
const { id } = req.params;
const result = await migrate_1.pool.query(`
SELECT
s.*,
COUNT(DISTINCT p.id) as product_count,
COUNT(DISTINCT c.id) as category_count
FROM stores s
LEFT JOIN products p ON s.id = p.store_id
LEFT JOIN categories c ON s.id = c.store_id
WHERE s.id = $1
GROUP BY s.id
`, [id]);
if (result.rows.length === 0) {
return res.status(404).json({ error: 'Store not found' });
}
res.json(result.rows[0]);
}
catch (error) {
console.error('Error fetching store:', error);
res.status(500).json({ error: 'Failed to fetch store' });
}
});
// Create store
router.post('/', (0, middleware_1.requireRole)('superadmin', 'admin'), async (req, res) => {
try {
const { name, slug, dutchie_url, active, scrape_enabled } = req.body;
const result = await migrate_1.pool.query(`
INSERT INTO stores (name, slug, dutchie_url, active, scrape_enabled)
VALUES ($1, $2, $3, $4, $5)
RETURNING *
`, [name, slug, dutchie_url, active ?? true, scrape_enabled ?? true]);
res.status(201).json(result.rows[0]);
}
catch (error) {
console.error('Error creating store:', error);
res.status(500).json({ error: 'Failed to create store' });
}
});
// Update store
router.put('/:id', (0, middleware_1.requireRole)('superadmin', 'admin'), async (req, res) => {
try {
const { id } = req.params;
const { name, slug, dutchie_url, active, scrape_enabled } = req.body;
const result = await migrate_1.pool.query(`
UPDATE stores
SET name = COALESCE($1, name),
slug = COALESCE($2, slug),
dutchie_url = COALESCE($3, dutchie_url),
active = COALESCE($4, active),
scrape_enabled = COALESCE($5, scrape_enabled),
updated_at = CURRENT_TIMESTAMP
WHERE id = $6
RETURNING *
`, [name, slug, dutchie_url, active, scrape_enabled, id]);
if (result.rows.length === 0) {
return res.status(404).json({ error: 'Store not found' });
}
res.json(result.rows[0]);
}
catch (error) {
console.error('Error updating store:', error);
res.status(500).json({ error: 'Failed to update store' });
}
});
// Delete store
router.delete('/:id', (0, middleware_1.requireRole)('superadmin'), async (req, res) => {
try {
const { id } = req.params;
const result = await migrate_1.pool.query('DELETE FROM stores WHERE id = $1 RETURNING *', [id]);
if (result.rows.length === 0) {
return res.status(404).json({ error: 'Store not found' });
}
res.json({ message: 'Store deleted successfully' });
}
catch (error) {
console.error('Error deleting store:', error);
res.status(500).json({ error: 'Failed to delete store' });
}
});
// Trigger scrape for a store
router.post('/:id/scrape', (0, middleware_1.requireRole)('superadmin', 'admin'), async (req, res) => {
try {
const { id } = req.params;
const { parallel = 3 } = req.body; // Default to 3 parallel scrapers
const storeResult = await migrate_1.pool.query('SELECT id FROM stores WHERE id = $1', [id]);
if (storeResult.rows.length === 0) {
return res.status(404).json({ error: 'Store not found' });
}
(0, scraper_v2_1.scrapeStore)(parseInt(id), parseInt(parallel)).catch(err => {
console.error('Background scrape error:', err);
});
res.json({
message: 'Scrape started',
parallel: parseInt(parallel)
});
}
catch (error) {
console.error('Error triggering scrape:', error);
res.status(500).json({ error: 'Failed to trigger scrape' });
}
});
// Download missing images for a store
router.post('/:id/download-images', (0, middleware_1.requireRole)('superadmin', 'admin'), async (req, res) => {
try {
const { id } = req.params;
const storeResult = await migrate_1.pool.query('SELECT id, name FROM stores WHERE id = $1', [id]);
if (storeResult.rows.length === 0) {
return res.status(404).json({ error: 'Store not found' });
}
const store = storeResult.rows[0];
const productsResult = await migrate_1.pool.query(`
SELECT id, name, image_url
FROM products
WHERE store_id = $1
AND image_url IS NOT NULL
AND local_image_path IS NULL
`, [id]);
(async () => {
const { uploadImageFromUrl } = await Promise.resolve().then(() => __importStar(require('../utils/minio')));
let downloaded = 0;
for (const product of productsResult.rows) {
try {
console.log(`📸 Downloading image for: ${product.name}`);
const localPath = await uploadImageFromUrl(product.image_url, product.id);
await migrate_1.pool.query(`
UPDATE products
SET local_image_path = $1
WHERE id = $2
`, [localPath, product.id]);
downloaded++;
}
catch (error) {
console.error(`Failed to download image for ${product.name}:`, error);
}
}
console.log(`✅ Downloaded ${downloaded} of ${productsResult.rows.length} missing images for ${store.name}`);
})().catch(err => console.error('Background image download error:', err));
res.json({
message: 'Image download started',
total_missing: productsResult.rows.length
});
}
catch (error) {
console.error('Error triggering image download:', error);
res.status(500).json({ error: 'Failed to trigger image download' });
}
});
// Discover categories for a store
router.post('/:id/discover-categories', (0, middleware_1.requireRole)('superadmin', 'admin'), async (req, res) => {
try {
const { id } = req.params;
const storeResult = await migrate_1.pool.query('SELECT id FROM stores WHERE id = $1', [id]);
if (storeResult.rows.length === 0) {
return res.status(404).json({ error: 'Store not found' });
}
(0, scraper_v2_1.discoverCategories)(parseInt(id)).catch(err => {
console.error('Background category discovery error:', err);
});
res.json({ message: 'Category discovery started' });
}
catch (error) {
console.error('Error triggering category discovery:', error);
res.status(500).json({ error: 'Failed to trigger category discovery' });
}
});
// Debug scraper
router.post('/:id/debug-scrape', (0, middleware_1.requireRole)('superadmin', 'admin'), async (req, res) => {
try {
const { id } = req.params;
console.log('Debug scrape triggered for store:', id);
const categoryResult = await migrate_1.pool.query(`
SELECT c.dutchie_url, c.name
FROM categories c
WHERE c.store_id = $1 AND c.slug = 'edibles'
LIMIT 1
`, [id]);
if (categoryResult.rows.length === 0) {
return res.status(404).json({ error: 'Edibles category not found' });
}
console.log('Found category:', categoryResult.rows[0]);
const { debugDutchiePage } = await Promise.resolve().then(() => __importStar(require('../services/scraper-debug')));
debugDutchiePage(categoryResult.rows[0].dutchie_url).catch(err => {
console.error('Debug error:', err);
});
res.json({ message: 'Debug started, check logs', url: categoryResult.rows[0].dutchie_url });
}
catch (error) {
console.error('Debug endpoint error:', error);
res.status(500).json({ error: 'Failed to debug' });
}
});
exports.default = router;

324
backend/dist/scraper-v2/downloader.js vendored Normal file
View File

@@ -0,0 +1,324 @@
"use strict";
var __importDefault = (this && this.__importDefault) || function (mod) {
return (mod && mod.__esModule) ? mod : { "default": mod };
};
Object.defineProperty(exports, "__esModule", { value: true });
exports.Downloader = void 0;
const puppeteer_1 = __importDefault(require("puppeteer"));
const axios_1 = __importDefault(require("axios"));
const types_1 = require("./types");
const logger_1 = require("../services/logger");
class Downloader {
browser = null;
page = null;
pageInUse = false;
/**
* Initialize browser instance (lazy initialization)
*/
async getBrowser() {
if (!this.browser || !this.browser.isConnected()) {
const launchOptions = {
headless: 'new',
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-blink-features=AutomationControlled',
'--window-size=1920,1080',
'--disable-web-security',
'--disable-features=IsolateOrigins,site-per-process'
]
};
this.browser = await puppeteer_1.default.launch(launchOptions);
logger_1.logger.info('scraper', 'Browser instance created');
}
return this.browser;
}
/**
* Get or create a page instance
*/
async getPage() {
if (!this.page || this.page.isClosed()) {
const browser = await this.getBrowser();
this.page = await browser.newPage();
await this.page.setViewport({ width: 1920, height: 1080 });
logger_1.logger.debug('scraper', 'New page created');
}
return this.page;
}
/**
* Apply stealth mode to page
*/
async makePageStealthy(page) {
await page.evaluateOnNewDocument(() => {
// @ts-ignore - runs in browser context
Object.defineProperty(navigator, 'webdriver', {
get: () => false,
});
// @ts-ignore - runs in browser context
Object.defineProperty(navigator, 'plugins', {
get: () => [1, 2, 3, 4, 5],
});
// @ts-ignore - runs in browser context
Object.defineProperty(navigator, 'languages', {
get: () => ['en-US', 'en'],
});
// @ts-ignore - runs in browser context
window.chrome = {
runtime: {},
};
// @ts-ignore - runs in browser context
const originalQuery = window.navigator.permissions.query;
// @ts-ignore - runs in browser context
window.navigator.permissions.query = (parameters) => parameters.name === 'notifications'
? Promise.resolve({ state: 'denied' })
: originalQuery(parameters);
});
}
/**
* Configure proxy for browser
*/
getProxyArgs(proxy) {
if (proxy.protocol === 'socks5') {
return [`--proxy-server=socks5://${proxy.host}:${proxy.port}`];
}
else if (proxy.protocol === 'http' || proxy.protocol === 'https') {
return [`--proxy-server=${proxy.protocol}://${proxy.host}:${proxy.port}`];
}
return [];
}
/**
* HTTP-based fetch (lightweight, fast)
*/
async httpFetch(request) {
try {
const config = {
timeout: 30000,
headers: {
'User-Agent': request.metadata.userAgent || 'Mozilla/5.0',
...request.metadata.headers
},
validateStatus: () => true // Don't throw on any status
};
// Add proxy if available
if (request.metadata.proxy) {
const proxy = request.metadata.proxy;
config.proxy = {
host: proxy.host,
port: proxy.port,
protocol: proxy.protocol
};
if (proxy.username && proxy.password) {
config.proxy.auth = {
username: proxy.username,
password: proxy.password
};
}
}
const response = await axios_1.default.get(request.url, config);
return {
url: request.url,
statusCode: response.status,
content: response.data,
metadata: {
headers: response.headers,
method: 'http'
},
request
};
}
catch (error) {
const scraperError = new Error(error.message);
if (error.code === 'ETIMEDOUT' || error.code === 'ECONNABORTED') {
scraperError.type = types_1.ErrorType.TIMEOUT;
}
else if (error.code === 'ECONNREFUSED' || error.code === 'ENOTFOUND') {
scraperError.type = types_1.ErrorType.NETWORK_ERROR;
}
else {
scraperError.type = types_1.ErrorType.UNKNOWN;
}
scraperError.retryable = true;
scraperError.request = request;
throw scraperError;
}
}
/**
* Browser-based fetch (for JS-heavy sites)
*/
async browserFetch(request) {
// Wait if page is in use
while (this.pageInUse) {
await new Promise(resolve => setTimeout(resolve, 100));
}
this.pageInUse = true;
try {
const page = await this.getPage();
// Apply stealth mode if required
if (request.metadata.requiresStealth) {
await this.makePageStealthy(page);
}
// Set user agent
if (request.metadata.userAgent) {
await page.setUserAgent(request.metadata.userAgent);
}
// Navigate to page
const navigationPromise = page.goto(request.url, {
waitUntil: 'domcontentloaded',
timeout: 60000
});
const response = await navigationPromise;
if (!response) {
throw new Error('Navigation failed - no response');
}
// Wait for initial render
await page.waitForTimeout(3000);
// Check for lazy-loaded content
await this.autoScroll(page);
// Get page content
const content = await page.content();
const statusCode = response.status();
return {
url: request.url,
statusCode,
content,
metadata: {
method: 'browser',
finalUrl: page.url()
},
request
};
}
catch (error) {
const scraperError = new Error(error.message);
if (error.message.includes('timeout') || error.message.includes('Navigation timeout')) {
scraperError.type = types_1.ErrorType.TIMEOUT;
}
else if (error.message.includes('net::')) {
scraperError.type = types_1.ErrorType.NETWORK_ERROR;
}
else if (error.message.includes('404')) {
scraperError.type = types_1.ErrorType.NOT_FOUND;
}
else {
scraperError.type = types_1.ErrorType.UNKNOWN;
}
scraperError.retryable = scraperError.type !== types_1.ErrorType.NOT_FOUND;
scraperError.request = request;
throw scraperError;
}
finally {
this.pageInUse = false;
}
}
/**
* Auto-scroll to load lazy content
*/
async autoScroll(page) {
try {
await page.evaluate(async () => {
await new Promise((resolve) => {
let totalHeight = 0;
const distance = 500;
const maxScrolls = 20; // Prevent infinite scrolling
let scrollCount = 0;
const timer = setInterval(() => {
// @ts-ignore - runs in browser context
const scrollHeight = document.body.scrollHeight;
// @ts-ignore - runs in browser context
window.scrollBy(0, distance);
totalHeight += distance;
scrollCount++;
if (totalHeight >= scrollHeight || scrollCount >= maxScrolls) {
clearInterval(timer);
// Scroll back to top
// @ts-ignore - runs in browser context
window.scrollTo(0, 0);
resolve();
}
}, 200);
});
});
// Wait for any lazy-loaded content
await page.waitForTimeout(1000);
}
catch (error) {
logger_1.logger.warn('scraper', `Auto-scroll failed: ${error}`);
}
}
/**
* Main fetch method - tries HTTP first, falls back to browser
*/
async fetch(request) {
const startTime = Date.now();
try {
// Force browser mode if required
if (request.metadata.requiresBrowser) {
logger_1.logger.debug('scraper', `Browser fetch: ${request.url}`);
const response = await this.browserFetch(request);
logger_1.logger.debug('scraper', `Fetch completed in ${Date.now() - startTime}ms`);
return response;
}
// Try HTTP first (faster)
try {
logger_1.logger.debug('scraper', `HTTP fetch: ${request.url}`);
const response = await this.httpFetch(request);
// Check if we got a meaningful response
if (response.statusCode && response.statusCode >= 200 && response.statusCode < 300) {
logger_1.logger.debug('scraper', `HTTP fetch succeeded in ${Date.now() - startTime}ms`);
return response;
}
// Fall through to browser mode for non-2xx responses
logger_1.logger.debug('scraper', `HTTP got ${response.statusCode || 'unknown'}, trying browser`);
}
catch (httpError) {
logger_1.logger.debug('scraper', `HTTP failed, falling back to browser: ${httpError}`);
}
// Fall back to browser
request.metadata.requiresBrowser = true;
const response = await this.browserFetch(request);
logger_1.logger.debug('scraper', `Browser fetch completed in ${Date.now() - startTime}ms`);
return response;
}
catch (error) {
logger_1.logger.error('scraper', `Fetch failed after ${Date.now() - startTime}ms: ${error}`);
throw error;
}
}
/**
* Evaluate JavaScript in the current page context
*/
async evaluate(fn) {
if (!this.page || this.page.isClosed()) {
throw new Error('No active page for evaluation');
}
return await this.page.evaluate(fn);
}
/**
* Get the current page (for custom operations)
*/
async getCurrentPage() {
return this.page;
}
/**
* Close the browser
*/
async close() {
if (this.page && !this.page.isClosed()) {
await this.page.close();
this.page = null;
}
if (this.browser && this.browser.isConnected()) {
await this.browser.close();
this.browser = null;
logger_1.logger.info('scraper', 'Browser closed');
}
}
/**
* Clean up resources
*/
async cleanup() {
await this.close();
}
}
exports.Downloader = Downloader;

652
backend/dist/scraper-v2/engine.js vendored Normal file
View File

@@ -0,0 +1,652 @@
"use strict";
var __createBinding = (this && this.__createBinding) || (Object.create ? (function(o, m, k, k2) {
if (k2 === undefined) k2 = k;
var desc = Object.getOwnPropertyDescriptor(m, k);
if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
desc = { enumerable: true, get: function() { return m[k]; } };
}
Object.defineProperty(o, k2, desc);
}) : (function(o, m, k, k2) {
if (k2 === undefined) k2 = k;
o[k2] = m[k];
}));
var __setModuleDefault = (this && this.__setModuleDefault) || (Object.create ? (function(o, v) {
Object.defineProperty(o, "default", { enumerable: true, value: v });
}) : function(o, v) {
o["default"] = v;
});
var __importStar = (this && this.__importStar) || (function () {
var ownKeys = function(o) {
ownKeys = Object.getOwnPropertyNames || function (o) {
var ar = [];
for (var k in o) if (Object.prototype.hasOwnProperty.call(o, k)) ar[ar.length] = k;
return ar;
};
return ownKeys(o);
};
return function (mod) {
if (mod && mod.__esModule) return mod;
var result = {};
if (mod != null) for (var k = ownKeys(mod), i = 0; i < k.length; i++) if (k[i] !== "default") __createBinding(result, mod, k[i]);
__setModuleDefault(result, mod);
return result;
};
})();
Object.defineProperty(exports, "__esModule", { value: true });
exports.DutchieSpider = exports.ScraperEngine = void 0;
const scheduler_1 = require("./scheduler");
const downloader_1 = require("./downloader");
const middlewares_1 = require("./middlewares");
const pipelines_1 = require("./pipelines");
const logger_1 = require("../services/logger");
const migrate_1 = require("../db/migrate");
/**
* Main Scraper Engine - orchestrates the entire scraping process
*/
class ScraperEngine {
scheduler;
downloader;
middlewareEngine;
pipelineEngine;
stats;
isRunning = false;
concurrency = 1; // Conservative default
constructor(concurrency = 1) {
this.scheduler = new scheduler_1.RequestScheduler();
this.downloader = new downloader_1.Downloader();
this.middlewareEngine = new middlewares_1.MiddlewareEngine();
this.pipelineEngine = new pipelines_1.PipelineEngine();
this.concurrency = concurrency;
// Initialize stats
this.stats = {
requestsTotal: 0,
requestsSuccess: 0,
requestsFailed: 0,
itemsScraped: 0,
itemsSaved: 0,
itemsDropped: 0,
errorsCount: 0,
startTime: new Date()
};
// Setup middlewares
this.setupMiddlewares();
// Setup pipelines
this.setupPipelines();
}
/**
* Setup middleware chain
*/
setupMiddlewares() {
this.middlewareEngine.use(new middlewares_1.UserAgentMiddleware());
this.middlewareEngine.use(new middlewares_1.ProxyMiddleware());
this.middlewareEngine.use(new middlewares_1.RateLimitMiddleware());
this.middlewareEngine.use(new middlewares_1.RetryMiddleware());
this.middlewareEngine.use(new middlewares_1.BotDetectionMiddleware());
this.middlewareEngine.use(new middlewares_1.StealthMiddleware());
}
/**
* Setup pipeline chain
*/
setupPipelines() {
this.pipelineEngine.use(new pipelines_1.ValidationPipeline());
this.pipelineEngine.use(new pipelines_1.SanitizationPipeline());
this.pipelineEngine.use(new pipelines_1.DeduplicationPipeline());
this.pipelineEngine.use(new pipelines_1.ImagePipeline());
this.pipelineEngine.use(new pipelines_1.StatsPipeline());
this.pipelineEngine.use(new pipelines_1.DatabasePipeline());
}
/**
* Add a request to the queue
*/
enqueue(request) {
this.scheduler.enqueue(request);
}
/**
* Start the scraping engine
*/
async start() {
if (this.isRunning) {
logger_1.logger.warn('scraper', 'Engine is already running');
return;
}
this.isRunning = true;
this.stats.startTime = new Date();
logger_1.logger.info('scraper', `🚀 Starting scraper engine (concurrency: ${this.concurrency})`);
// Process queue
await this.processQueue();
this.isRunning = false;
this.stats.endTime = new Date();
this.stats.duration = this.stats.endTime.getTime() - this.stats.startTime.getTime();
logger_1.logger.info('scraper', `✅ Scraper engine finished`);
this.logStats();
// Cleanup
await this.downloader.cleanup();
}
/**
* Process the request queue
*/
async processQueue() {
while (!this.scheduler.isEmpty() && this.isRunning) {
const request = this.scheduler.dequeue();
if (!request) {
// Wait a bit and check again
await new Promise(resolve => setTimeout(resolve, 100));
continue;
}
try {
await this.processRequest(request);
}
catch (error) {
logger_1.logger.error('scraper', `Failed to process request: ${error}`);
}
}
}
/**
* Process a single request
*/
async processRequest(request) {
this.stats.requestsTotal++;
try {
logger_1.logger.debug('scraper', `Processing: ${request.url}`);
// Apply request middlewares
const processedRequest = await this.middlewareEngine.processRequest(request);
// Download
let response = await this.downloader.fetch(processedRequest);
// Apply response middlewares
response = await this.middlewareEngine.processResponse(response);
// Parse response using callback
const parseResult = await request.callback(response);
// Process items through pipeline
if (parseResult.items && parseResult.items.length > 0) {
for (const item of parseResult.items) {
await this.processItem(item, 'default');
}
}
// Enqueue follow-up requests
if (parseResult.requests && parseResult.requests.length > 0) {
for (const followUpRequest of parseResult.requests) {
this.scheduler.enqueue(followUpRequest);
}
}
this.stats.requestsSuccess++;
this.scheduler.markComplete(request);
}
catch (error) {
this.stats.requestsFailed++;
this.stats.errorsCount++;
logger_1.logger.error('scraper', `Request failed: ${request.url} - ${error.message}`);
// Apply error middlewares
const handledError = await this.middlewareEngine.processError(error, request);
// If error is null, it was handled (e.g., retry)
if (handledError === null) {
this.scheduler.requeueForRetry(request);
}
else {
this.scheduler.markComplete(request);
// Call error handler if provided
if (request.errorHandler) {
await request.errorHandler(error, request);
}
}
}
}
/**
* Process an item through pipelines
*/
async processItem(item, spider) {
this.stats.itemsScraped++;
try {
const processedItem = await this.pipelineEngine.processItem(item, spider);
if (processedItem) {
this.stats.itemsSaved++;
}
else {
this.stats.itemsDropped++;
}
}
catch (error) {
logger_1.logger.error('scraper', `Failed to process item: ${error}`);
this.stats.itemsDropped++;
}
}
/**
* Log statistics
*/
logStats() {
logger_1.logger.info('scraper', '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━');
logger_1.logger.info('scraper', '📊 Scraper Statistics');
logger_1.logger.info('scraper', '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━');
logger_1.logger.info('scraper', ` Requests: ${this.stats.requestsSuccess}/${this.stats.requestsTotal} successful`);
logger_1.logger.info('scraper', ` Items: ${this.stats.itemsSaved} saved, ${this.stats.itemsDropped} dropped`);
logger_1.logger.info('scraper', ` Errors: ${this.stats.errorsCount}`);
logger_1.logger.info('scraper', ` Duration: ${Math.round((this.stats.duration || 0) / 1000)}s`);
// Get stats from StatsPipeline
const statsPipeline = this.pipelineEngine.getPipeline('StatsPipeline');
if (statsPipeline) {
const itemStats = statsPipeline.getStats();
logger_1.logger.info('scraper', ` Items with images: ${itemStats.withImages}/${itemStats.total}`);
logger_1.logger.info('scraper', ` Items with THC: ${itemStats.withThc}/${itemStats.total}`);
logger_1.logger.info('scraper', ` Items with descriptions: ${itemStats.withDescription}/${itemStats.total}`);
}
logger_1.logger.info('scraper', '━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━');
}
/**
* Stop the engine
*/
stop() {
this.isRunning = false;
logger_1.logger.info('scraper', 'Stopping scraper engine...');
}
/**
* Get current stats
*/
getStats() {
return { ...this.stats };
}
/**
* Get queue stats
*/
getQueueStats() {
return this.scheduler.getStats();
}
}
exports.ScraperEngine = ScraperEngine;
/**
* Spider for scraping Dutchie categories
*/
class DutchieSpider {
engine;
constructor(engine) {
this.engine = engine;
}
/**
* Scrape a category
*/
async scrapeCategory(storeId, categoryId) {
logger_1.logger.info('scraper', `Starting category scrape: store=${storeId}, category=${categoryId}`);
const scraperId = `scraper-${storeId}-${categoryId}-${Date.now()}`;
let registerScraper, updateScraperStats, completeScraper;
try {
// Import monitoring functions
const monitor = await Promise.resolve().then(() => __importStar(require('../routes/scraper-monitor')));
registerScraper = monitor.registerScraper;
updateScraperStats = monitor.updateScraperStats;
completeScraper = monitor.completeScraper;
}
catch (e) {
// Monitoring not available
}
try {
// Get category info
const categoryResult = await migrate_1.pool.query(`
SELECT c.*, s.slug as store_slug, s.name as store_name
FROM categories c
JOIN stores s ON c.store_id = s.id
WHERE c.id = $1
`, [categoryId]);
if (categoryResult.rows.length === 0) {
throw new Error('Category not found');
}
const category = categoryResult.rows[0];
logger_1.logger.info('scraper', `Category: ${category.name} (${category.dutchie_url})`);
// Register with monitoring system
if (registerScraper) {
registerScraper(scraperId, storeId, category.store_name, categoryId, category.name);
}
// Mark products as out of stock before scraping
await migrate_1.pool.query(`
UPDATE products
SET in_stock = false
WHERE store_id = $1 AND category_id = $2
`, [storeId, categoryId]);
if (updateScraperStats) {
updateScraperStats(scraperId, {}, 'Marking products as out of stock');
}
// Enqueue category page request
this.engine.enqueue({
url: category.dutchie_url,
priority: 100,
maxRetries: 3,
metadata: {
requiresBrowser: true,
storeId,
categoryId,
categorySlug: category.slug,
storeSlug: category.store_slug
},
callback: this.parseCategoryPage.bind(this)
});
// Start the engine
if (updateScraperStats) {
updateScraperStats(scraperId, {}, 'Scraping category page');
}
await this.engine.start();
// Update stats from engine
const engineStats = this.engine.getStats();
if (updateScraperStats) {
updateScraperStats(scraperId, {
requestsTotal: engineStats.requestsTotal,
requestsSuccess: engineStats.requestsSuccess,
itemsSaved: engineStats.itemsSaved,
itemsDropped: engineStats.itemsDropped,
errorsCount: engineStats.errorsCount
}, 'Finalizing');
}
// Update category last_scraped_at
await migrate_1.pool.query(`
UPDATE categories
SET last_scraped_at = CURRENT_TIMESTAMP
WHERE id = $1
`, [categoryId]);
logger_1.logger.info('scraper', `✅ Category scrape completed: ${category.name}`);
if (completeScraper) {
completeScraper(scraperId);
}
}
catch (error) {
logger_1.logger.error('scraper', `Category scrape failed: ${error}`);
if (completeScraper) {
completeScraper(scraperId, error.toString());
}
throw error;
}
}
/**
* Parse category page (product listing)
*/
async parseCategoryPage(response) {
const page = await this.engine['downloader'].getCurrentPage();
if (!page) {
throw new Error('No active page');
}
logger_1.logger.info('scraper', 'Parsing category page...');
// Extract product cards
const productCards = await page.evaluate(() => {
// @ts-ignore - runs in browser context
const cards = document.querySelectorAll('[data-testid="product-list-item"]');
const items = [];
cards.forEach((card) => {
try {
const allText = card.textContent || '';
// Extract name
let name = '';
const nameSelectors = ['a[href*="/product/"]', 'h1', 'h2', 'h3', 'h4'];
for (const sel of nameSelectors) {
const el = card.querySelector(sel);
if (el?.textContent?.trim()) {
name = el.textContent.trim().split('\n')[0].trim();
break;
}
}
if (!name || name.length < 2)
return;
// Extract price
let price = null;
let originalPrice = null;
const priceMatches = allText.match(/\$(\d+\.?\d*)/g);
if (priceMatches && priceMatches.length > 0) {
price = parseFloat(priceMatches[0].replace('$', ''));
if (priceMatches.length > 1) {
originalPrice = parseFloat(priceMatches[1].replace('$', ''));
}
}
// Extract link
const linkEl = card.querySelector('a[href*="/product/"]');
let href = linkEl?.getAttribute('href') || '';
if (href && href.startsWith('/')) {
// @ts-ignore - runs in browser context
href = window.location.origin + href;
}
items.push({ name, price, originalPrice, href });
}
catch (err) {
console.error('Error parsing product card:', err);
}
});
return items;
});
logger_1.logger.info('scraper', `Found ${productCards.length} products on listing page`);
// Create follow-up requests for each product
const requests = productCards.map((card, index) => ({
url: card.href,
priority: 50,
maxRetries: 3,
metadata: {
...response.request.metadata,
productName: card.name,
productPrice: card.price,
productOriginalPrice: card.originalPrice,
requiresBrowser: true
},
callback: this.parseProductPage.bind(this)
}));
return { items: [], requests };
}
/**
* Parse individual product page
*/
async parseProductPage(response) {
const page = await this.engine['downloader'].getCurrentPage();
if (!page) {
throw new Error('No active page');
}
const productName = response.request.metadata.productName;
logger_1.logger.debug('scraper', `Parsing product: ${productName}`);
// Extract product details
const details = await page.evaluate(() => {
// @ts-ignore - runs in browser context
const allText = document.body.textContent || '';
// Extract image
let fullSizeImage = null;
const mainImageSelectors = [
'img[class*="ProductImage"]',
'img[class*="product-image"]',
'[class*="ImageGallery"] img',
'main img',
'img[src*="images.dutchie.com"]'
];
for (const sel of mainImageSelectors) {
// @ts-ignore - runs in browser context
const img = document.querySelector(sel);
if (img?.src && img.src.includes('dutchie.com')) {
fullSizeImage = img.src;
break;
}
}
// Extract description
let description = '';
const descSelectors = [
'[class*="description"]',
'[class*="Description"]',
'[data-testid*="description"]',
'p[class*="product"]'
];
for (const sel of descSelectors) {
// @ts-ignore - runs in browser context
const el = document.querySelector(sel);
if (el?.textContent?.trim() && el.textContent.length > 20) {
description = el.textContent.trim();
break;
}
}
// Extract THC/CBD
let thc = null;
const thcPatterns = [
/THC[:\s]*(\d+\.?\d*)\s*%/i,
/Total\s+THC[:\s]*(\d+\.?\d*)\s*%/i,
/(\d+\.?\d*)\s*%\s+THC/i
];
for (const pattern of thcPatterns) {
const match = allText.match(pattern);
if (match) {
thc = parseFloat(match[1]);
break;
}
}
let cbd = null;
const cbdPatterns = [
/CBD[:\s]*(\d+\.?\d*)\s*%/i,
/Total\s+CBD[:\s]*(\d+\.?\d*)\s*%/i,
/(\d+\.?\d*)\s*%\s+CBD/i
];
for (const pattern of cbdPatterns) {
const match = allText.match(pattern);
if (match) {
cbd = parseFloat(match[1]);
break;
}
}
// Extract strain type
let strainType = null;
if (allText.match(/\bindica\b/i))
strainType = 'Indica';
else if (allText.match(/\bsativa\b/i))
strainType = 'Sativa';
else if (allText.match(/\bhybrid\b/i))
strainType = 'Hybrid';
// Extract brand
let brand = null;
const brandSelectors = [
'[class*="brand"]',
'[class*="Brand"]',
'[data-testid*="brand"]'
];
for (const sel of brandSelectors) {
// @ts-ignore - runs in browser context
const el = document.querySelector(sel);
if (el?.textContent?.trim()) {
brand = el.textContent.trim();
break;
}
}
// Extract metadata
const terpenes = [];
const terpeneNames = ['Myrcene', 'Limonene', 'Caryophyllene', 'Pinene', 'Linalool', 'Humulene'];
terpeneNames.forEach(terp => {
if (allText.match(new RegExp(`\\b${terp}\\b`, 'i'))) {
terpenes.push(terp);
}
});
const effects = [];
const effectNames = ['Relaxed', 'Happy', 'Euphoric', 'Uplifted', 'Creative', 'Energetic'];
effectNames.forEach(effect => {
if (allText.match(new RegExp(`\\b${effect}\\b`, 'i'))) {
effects.push(effect);
}
});
return {
fullSizeImage,
description,
thc,
cbd,
strainType,
brand,
terpenes,
effects
};
});
// Create product item
const product = {
dutchieProductId: `${response.request.metadata.storeSlug}-${response.request.metadata.categorySlug}-${Date.now()}-${Math.random()}`,
name: productName || 'Unknown Product',
description: details.description,
price: response.request.metadata.productPrice,
originalPrice: response.request.metadata.productOriginalPrice,
thcPercentage: details.thc || undefined,
cbdPercentage: details.cbd || undefined,
strainType: details.strainType || undefined,
brand: details.brand || undefined,
imageUrl: details.fullSizeImage || undefined,
dutchieUrl: response.url,
metadata: {
terpenes: details.terpenes,
effects: details.effects
},
storeId: response.request.metadata.storeId,
categoryId: response.request.metadata.categoryId
};
return { items: [product], requests: [] };
}
/**
* Scrape entire store
*/
async scrapeStore(storeId, parallel = 3) {
logger_1.logger.info('scraper', `🏪 Starting store scrape: ${storeId} (${parallel} parallel scrapers)`);
try {
// Get all leaf categories (no children)
const categoriesResult = await migrate_1.pool.query(`
SELECT c.id, c.name
FROM categories c
WHERE c.store_id = $1
AND c.scrape_enabled = true
AND NOT EXISTS (
SELECT 1 FROM categories child
WHERE child.parent_id = c.id
)
ORDER BY c.name
`, [storeId]);
const categories = categoriesResult.rows;
logger_1.logger.info('scraper', `Found ${categories.length} categories to scrape`);
if (parallel === 1) {
// Sequential scraping (original behavior)
for (const category of categories) {
try {
await this.scrapeCategory(storeId, category.id);
await new Promise(resolve => setTimeout(resolve, 3000));
}
catch (error) {
logger_1.logger.error('scraper', `Failed to scrape category ${category.name}: ${error}`);
}
}
}
else {
// Parallel scraping with concurrency limit
const results = await this.scrapeMultipleCategoriesParallel(storeId, categories, parallel);
const successful = results.filter(r => r.status === 'fulfilled').length;
const failed = results.filter(r => r.status === 'rejected').length;
logger_1.logger.info('scraper', `Parallel scrape results: ${successful} successful, ${failed} failed`);
}
// Update store last_scraped_at
await migrate_1.pool.query(`
UPDATE stores
SET last_scraped_at = CURRENT_TIMESTAMP
WHERE id = $1
`, [storeId]);
logger_1.logger.info('scraper', `🎉 Store scrape completed: ${storeId}`);
}
catch (error) {
logger_1.logger.error('scraper', `Store scrape failed: ${error}`);
throw error;
}
}
/**
* Scrape multiple categories in parallel with concurrency limit
*/
async scrapeMultipleCategoriesParallel(storeId, categories, concurrency) {
const results = [];
// Process categories in batches
for (let i = 0; i < categories.length; i += concurrency) {
const batch = categories.slice(i, i + concurrency);
logger_1.logger.info('scraper', `Scraping batch ${Math.floor(i / concurrency) + 1}: ${batch.map(c => c.name).join(', ')}`);
const batchPromises = batch.map(category => {
// Create a new spider instance for each category
const engine = new ScraperEngine(1); // 1 concurrent request per spider
const spider = new DutchieSpider(engine);
return spider.scrapeCategory(storeId, category.id)
.catch(error => {
logger_1.logger.error('scraper', `Category ${category.name} failed: ${error}`);
throw error;
});
});
const batchResults = await Promise.allSettled(batchPromises);
results.push(...batchResults);
// Delay between batches to avoid overwhelming the server
if (i + concurrency < categories.length) {
logger_1.logger.info('scraper', 'Waiting 5s before next batch...');
await new Promise(resolve => setTimeout(resolve, 5000));
}
}
return results;
}
}
exports.DutchieSpider = DutchieSpider;

108
backend/dist/scraper-v2/index.js vendored Normal file
View File

@@ -0,0 +1,108 @@
"use strict";
/**
* Scraper V2 - Scrapy-inspired web scraping framework
*
* Architecture:
* - Engine: Main orchestrator
* - Scheduler: Priority queue with deduplication
* - Downloader: HTTP + Browser hybrid fetcher
* - Middlewares: Request/response processing chain
* - Pipelines: Item processing and persistence
* - Navigation: Category discovery
*/
var __createBinding = (this && this.__createBinding) || (Object.create ? (function(o, m, k, k2) {
if (k2 === undefined) k2 = k;
var desc = Object.getOwnPropertyDescriptor(m, k);
if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
desc = { enumerable: true, get: function() { return m[k]; } };
}
Object.defineProperty(o, k2, desc);
}) : (function(o, m, k, k2) {
if (k2 === undefined) k2 = k;
o[k2] = m[k];
}));
var __exportStar = (this && this.__exportStar) || function(m, exports) {
for (var p in m) if (p !== "default" && !Object.prototype.hasOwnProperty.call(exports, p)) __createBinding(exports, m, p);
};
Object.defineProperty(exports, "__esModule", { value: true });
exports.StatsPipeline = exports.DatabasePipeline = exports.ImagePipeline = exports.DeduplicationPipeline = exports.SanitizationPipeline = exports.ValidationPipeline = exports.PipelineEngine = exports.StealthMiddleware = exports.BotDetectionMiddleware = exports.RetryMiddleware = exports.RateLimitMiddleware = exports.ProxyMiddleware = exports.UserAgentMiddleware = exports.MiddlewareEngine = exports.NavigationDiscovery = exports.Downloader = exports.RequestScheduler = exports.DutchieSpider = exports.ScraperEngine = void 0;
exports.scrapeCategory = scrapeCategory;
exports.scrapeStore = scrapeStore;
exports.discoverCategories = discoverCategories;
var engine_1 = require("./engine");
Object.defineProperty(exports, "ScraperEngine", { enumerable: true, get: function () { return engine_1.ScraperEngine; } });
Object.defineProperty(exports, "DutchieSpider", { enumerable: true, get: function () { return engine_1.DutchieSpider; } });
var scheduler_1 = require("./scheduler");
Object.defineProperty(exports, "RequestScheduler", { enumerable: true, get: function () { return scheduler_1.RequestScheduler; } });
var downloader_1 = require("./downloader");
Object.defineProperty(exports, "Downloader", { enumerable: true, get: function () { return downloader_1.Downloader; } });
var navigation_1 = require("./navigation");
Object.defineProperty(exports, "NavigationDiscovery", { enumerable: true, get: function () { return navigation_1.NavigationDiscovery; } });
var middlewares_1 = require("./middlewares");
Object.defineProperty(exports, "MiddlewareEngine", { enumerable: true, get: function () { return middlewares_1.MiddlewareEngine; } });
Object.defineProperty(exports, "UserAgentMiddleware", { enumerable: true, get: function () { return middlewares_1.UserAgentMiddleware; } });
Object.defineProperty(exports, "ProxyMiddleware", { enumerable: true, get: function () { return middlewares_1.ProxyMiddleware; } });
Object.defineProperty(exports, "RateLimitMiddleware", { enumerable: true, get: function () { return middlewares_1.RateLimitMiddleware; } });
Object.defineProperty(exports, "RetryMiddleware", { enumerable: true, get: function () { return middlewares_1.RetryMiddleware; } });
Object.defineProperty(exports, "BotDetectionMiddleware", { enumerable: true, get: function () { return middlewares_1.BotDetectionMiddleware; } });
Object.defineProperty(exports, "StealthMiddleware", { enumerable: true, get: function () { return middlewares_1.StealthMiddleware; } });
var pipelines_1 = require("./pipelines");
Object.defineProperty(exports, "PipelineEngine", { enumerable: true, get: function () { return pipelines_1.PipelineEngine; } });
Object.defineProperty(exports, "ValidationPipeline", { enumerable: true, get: function () { return pipelines_1.ValidationPipeline; } });
Object.defineProperty(exports, "SanitizationPipeline", { enumerable: true, get: function () { return pipelines_1.SanitizationPipeline; } });
Object.defineProperty(exports, "DeduplicationPipeline", { enumerable: true, get: function () { return pipelines_1.DeduplicationPipeline; } });
Object.defineProperty(exports, "ImagePipeline", { enumerable: true, get: function () { return pipelines_1.ImagePipeline; } });
Object.defineProperty(exports, "DatabasePipeline", { enumerable: true, get: function () { return pipelines_1.DatabasePipeline; } });
Object.defineProperty(exports, "StatsPipeline", { enumerable: true, get: function () { return pipelines_1.StatsPipeline; } });
__exportStar(require("./types"), exports);
// Main API functions
const engine_2 = require("./engine");
const navigation_2 = require("./navigation");
const downloader_2 = require("./downloader");
const logger_1 = require("../services/logger");
/**
* Scrape a single category
*/
async function scrapeCategory(storeId, categoryId) {
const engine = new engine_2.ScraperEngine(1);
const spider = new engine_2.DutchieSpider(engine);
try {
await spider.scrapeCategory(storeId, categoryId);
}
catch (error) {
logger_1.logger.error('scraper', `scrapeCategory failed: ${error}`);
throw error;
}
}
/**
* Scrape an entire store
*/
async function scrapeStore(storeId, parallel = 3) {
const engine = new engine_2.ScraperEngine(1);
const spider = new engine_2.DutchieSpider(engine);
try {
await spider.scrapeStore(storeId, parallel);
}
catch (error) {
logger_1.logger.error('scraper', `scrapeStore failed: ${error}`);
throw error;
}
}
/**
* Discover categories for a store
*/
async function discoverCategories(storeId) {
const downloader = new downloader_2.Downloader();
const discovery = new navigation_2.NavigationDiscovery(downloader);
try {
// Discover categories (uses your existing Dutchie category structure)
await discovery.discoverCategories(storeId);
}
catch (error) {
logger_1.logger.error('scraper', `discoverCategories failed: ${error}`);
throw error;
}
finally {
await downloader.cleanup();
}
}

263
backend/dist/scraper-v2/middlewares.js vendored Normal file
View File

@@ -0,0 +1,263 @@
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
exports.MiddlewareEngine = exports.StealthMiddleware = exports.BotDetectionMiddleware = exports.RetryMiddleware = exports.RateLimitMiddleware = exports.ProxyMiddleware = exports.UserAgentMiddleware = void 0;
const types_1 = require("./types");
const logger_1 = require("../services/logger");
const migrate_1 = require("../db/migrate");
const USER_AGENTS = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.1 Safari/605.1.15'
];
function getRandomUserAgent() {
return USER_AGENTS[Math.floor(Math.random() * USER_AGENTS.length)];
}
function sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
/**
* User Agent Rotation Middleware
*/
class UserAgentMiddleware {
name = 'UserAgentMiddleware';
priority = 100;
async processRequest(request) {
if (!request.metadata.userAgent) {
request.metadata.userAgent = getRandomUserAgent();
}
return request;
}
}
exports.UserAgentMiddleware = UserAgentMiddleware;
/**
* Proxy Rotation Middleware
*/
class ProxyMiddleware {
name = 'ProxyMiddleware';
priority = 90;
async getActiveProxy() {
try {
const result = await migrate_1.pool.query(`
SELECT host, port, protocol, username, password
FROM proxies
WHERE active = true AND is_anonymous = true
ORDER BY RANDOM()
LIMIT 1
`);
if (result.rows.length === 0) {
return null;
}
return result.rows[0];
}
catch (error) {
logger_1.logger.error('scraper', `Failed to get proxy: ${error}`);
return null;
}
}
async processRequest(request) {
// Only add proxy if not already set
if (!request.metadata.proxy && request.retryCount > 0) {
// Use proxy on retries
request.metadata.proxy = await this.getActiveProxy();
if (request.metadata.proxy) {
logger_1.logger.debug('scraper', `Using proxy for retry: ${request.metadata.proxy.host}:${request.metadata.proxy.port}`);
}
}
return request;
}
}
exports.ProxyMiddleware = ProxyMiddleware;
/**
* Rate Limiting Middleware with Adaptive Delays
*/
class RateLimitMiddleware {
name = 'RateLimitMiddleware';
priority = 80;
requestTimes = [];
errorCount = 0;
baseDelay = 2000; // 2 seconds base delay
maxDelay = 30000; // 30 seconds max
async processRequest(request) {
await this.waitForNextRequest();
return request;
}
async processResponse(response) {
// Record success - gradually reduce error count
this.errorCount = Math.max(0, this.errorCount - 1);
return response;
}
async processError(error) {
// Record error - increase delay
this.errorCount++;
return error;
}
async waitForNextRequest() {
// Calculate adaptive delay based on error count
const errorMultiplier = Math.pow(1.5, Math.min(this.errorCount, 5));
const adaptiveDelay = Math.min(this.baseDelay * errorMultiplier, this.maxDelay);
// Add random jitter (±20%)
const jitter = (Math.random() - 0.5) * 0.4 * adaptiveDelay;
const delay = adaptiveDelay + jitter;
const now = Date.now();
const lastRequest = this.requestTimes[this.requestTimes.length - 1] || 0;
const timeSinceLast = now - lastRequest;
if (timeSinceLast < delay) {
const waitTime = delay - timeSinceLast;
logger_1.logger.debug('scraper', `Rate limiting: waiting ${Math.round(waitTime)}ms`);
await sleep(waitTime);
}
this.requestTimes.push(Date.now());
this.cleanup();
}
cleanup() {
// Keep only last minute of requests
const cutoff = Date.now() - 60000;
this.requestTimes = this.requestTimes.filter(t => t > cutoff);
}
setBaseDelay(ms) {
this.baseDelay = ms;
}
}
exports.RateLimitMiddleware = RateLimitMiddleware;
/**
* Retry Middleware with Exponential Backoff
*/
class RetryMiddleware {
name = 'RetryMiddleware';
priority = 70;
isRetryable(error) {
const retryableErrors = [
types_1.ErrorType.NETWORK_ERROR,
types_1.ErrorType.TIMEOUT,
types_1.ErrorType.SERVER_ERROR
];
if ('type' in error) {
return retryableErrors.includes(error.type);
}
// Check error message for common retryable patterns
const message = error.message.toLowerCase();
return (message.includes('timeout') ||
message.includes('network') ||
message.includes('econnreset') ||
message.includes('econnrefused') ||
message.includes('500') ||
message.includes('502') ||
message.includes('503'));
}
async processError(error, request) {
if (!this.isRetryable(error)) {
logger_1.logger.warn('scraper', `Non-retryable error for ${request.url}: ${error.message}`);
return error;
}
if (request.retryCount < request.maxRetries) {
// Calculate backoff delay
const backoffDelay = Math.min(1000 * Math.pow(2, request.retryCount), 30000);
logger_1.logger.info('scraper', `Retry ${request.retryCount + 1}/${request.maxRetries} for ${request.url} after ${backoffDelay}ms`);
await sleep(backoffDelay);
// Return null to indicate retry should happen
return null;
}
logger_1.logger.error('scraper', `Max retries exceeded for ${request.url}`);
return error;
}
}
exports.RetryMiddleware = RetryMiddleware;
/**
* Bot Detection Middleware
*/
class BotDetectionMiddleware {
name = 'BotDetectionMiddleware';
priority = 60;
detectedCount = 0;
DETECTION_THRESHOLD = 3;
async processResponse(response) {
const content = typeof response.content === 'string'
? response.content
: JSON.stringify(response.content);
// Check for bot detection indicators
const botIndicators = [
/captcha/i,
/cloudflare/i,
/access denied/i,
/you have been blocked/i,
/unusual traffic/i,
/robot/i
];
const detected = botIndicators.some(pattern => pattern.test(content));
if (detected) {
this.detectedCount++;
logger_1.logger.warn('scraper', `Bot detection suspected (${this.detectedCount}/${this.DETECTION_THRESHOLD}): ${response.url}`);
if (this.detectedCount >= this.DETECTION_THRESHOLD) {
const error = new Error('Bot detection threshold reached');
error.type = types_1.ErrorType.BOT_DETECTION;
error.retryable = true;
error.request = response.request;
throw error;
}
}
else {
// Gradually decrease detection count on successful requests
this.detectedCount = Math.max(0, this.detectedCount - 0.5);
}
return response;
}
}
exports.BotDetectionMiddleware = BotDetectionMiddleware;
/**
* Stealth Mode Middleware
*/
class StealthMiddleware {
name = 'StealthMiddleware';
priority = 95;
async processRequest(request) {
// Flag that this request needs stealth mode
request.metadata.requiresStealth = true;
return request;
}
}
exports.StealthMiddleware = StealthMiddleware;
/**
* Middleware Engine to orchestrate all middlewares
*/
class MiddlewareEngine {
middlewares = [];
use(middleware) {
this.middlewares.push(middleware);
// Sort by priority (higher first)
this.middlewares.sort((a, b) => b.priority - a.priority);
}
async processRequest(request) {
let current = request;
for (const middleware of this.middlewares) {
if (middleware.processRequest) {
current = await middleware.processRequest(current);
}
}
return current;
}
async processResponse(response) {
let current = response;
for (const middleware of this.middlewares) {
if (middleware.processResponse) {
current = await middleware.processResponse(current);
}
}
return current;
}
async processError(error, request) {
let currentError = error;
for (const middleware of this.middlewares) {
if (middleware.processError && currentError) {
currentError = await middleware.processError(currentError, request);
if (currentError === null) {
// Middleware handled the error (e.g., retry)
break;
}
}
}
return currentError;
}
}
exports.MiddlewareEngine = MiddlewareEngine;

278
backend/dist/scraper-v2/navigation.js vendored Normal file
View File

@@ -0,0 +1,278 @@
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
exports.NavigationDiscovery = void 0;
const migrate_1 = require("../db/migrate");
const logger_1 = require("../services/logger");
/**
* Navigation Discovery - finds and builds category structure
*/
class NavigationDiscovery {
downloader;
constructor(downloader) {
this.downloader = downloader;
}
/**
* Discover categories from a store's main page
*/
async discoverCategories(storeId) {
logger_1.logger.info('categories', `Starting category discovery for store ${storeId}`);
try {
// Get store info
const storeResult = await migrate_1.pool.query(`
SELECT id, name, slug, dutchie_url
FROM stores
WHERE id = $1
`, [storeId]);
if (storeResult.rows.length === 0) {
throw new Error('Store not found');
}
const store = storeResult.rows[0];
const baseUrl = store.dutchie_url;
// Create request to fetch the main page
const request = {
url: baseUrl,
priority: 100,
retryCount: 0,
maxRetries: 3,
metadata: {
requiresBrowser: true,
requiresStealth: true
},
callback: async () => ({ items: [], requests: [] })
};
// Fetch the page
const response = await this.downloader.fetch(request);
// Extract navigation links
const page = await this.downloader.getCurrentPage();
if (!page) {
throw new Error('No active page for navigation extraction');
}
const links = await this.extractNavigationLinks(page, baseUrl);
logger_1.logger.info('categories', `Found ${links.length} navigation links`);
// Check if it's a Dutchie menu
const isDutchie = await this.isDutchieMenu(page);
if (isDutchie) {
logger_1.logger.info('categories', 'Detected Dutchie menu - using predefined structure');
await this.createDutchieCategories(storeId, store, links);
}
else {
logger_1.logger.info('categories', 'Custom menu detected - extracting from navigation');
await this.createCustomCategories(storeId, store, links);
}
logger_1.logger.info('categories', `✅ Category discovery completed for ${store.name}`);
}
catch (error) {
logger_1.logger.error('categories', `Category discovery failed: ${error}`);
throw error;
}
}
/**
* Extract navigation links from page
*/
async extractNavigationLinks(page, baseUrl) {
return await page.evaluate((base) => {
const links = [];
// Look for navigation elements
const navSelectors = [
'nav a',
'[role="navigation"] a',
'[class*="nav"] a',
'[class*="menu"] a',
'[class*="category"] a',
'header a'
];
const foundLinks = new Set();
for (const selector of navSelectors) {
// @ts-ignore - runs in browser context
const elements = document.querySelectorAll(selector);
elements.forEach((el) => {
const text = el.textContent?.trim();
let href = el.href || el.getAttribute('href');
if (!text || !href || text.length < 2)
return;
// Normalize href
if (href.startsWith('/')) {
// @ts-ignore - runs in browser context
const url = new URL(base);
href = `${url.origin}${href}`;
}
// Skip external links and anchors
if (!href.includes(base) || href.includes('#'))
return;
// Skip duplicates
const linkKey = `${text}:${href}`;
if (foundLinks.has(linkKey))
return;
foundLinks.add(linkKey);
// Determine if it's likely a category
const categoryKeywords = [
'flower', 'pre-roll', 'vape', 'edible', 'concentrate',
'topical', 'accessory', 'brand', 'special', 'shop',
'indica', 'sativa', 'hybrid', 'cbd', 'thc'
];
const isCategory = categoryKeywords.some(kw => text.toLowerCase().includes(kw) ||
href.toLowerCase().includes(kw));
links.push({
text,
href,
isCategory
});
});
}
return links;
}, baseUrl);
}
/**
* Check if it's a Dutchie menu
*/
async isDutchieMenu(page) {
return await page.evaluate(() => {
// Check for Dutchie markers
// @ts-ignore - runs in browser context
if (window.reactEnv) {
// @ts-ignore - runs in browser context
const env = window.reactEnv;
if (env.adminUrl?.includes('dutchie.com') ||
env.apiUrl?.includes('dutchie.com') ||
env.consumerUrl?.includes('dutchie.com')) {
return true;
}
}
// @ts-ignore - runs in browser context
const htmlContent = document.documentElement.innerHTML;
return (htmlContent.includes('admin.dutchie.com') ||
htmlContent.includes('api.dutchie.com') ||
htmlContent.includes('embedded-menu') ||
htmlContent.includes('window.reactEnv'));
});
}
/**
* Create categories for Dutchie menus (predefined structure)
* Uses your existing Dutchie category structure
*/
async createDutchieCategories(storeId, store, discoveredLinks) {
const client = await migrate_1.pool.connect();
try {
await client.query('BEGIN');
logger_1.logger.info('categories', `Creating predefined Dutchie category structure`);
const baseUrl = store.dutchie_url;
// Your existing Dutchie categories structure
const DUTCHIE_CATEGORIES = [
{ name: 'Shop', slug: 'shop', parentSlug: undefined },
{ name: 'Flower', slug: 'flower', parentSlug: 'shop' },
{ name: 'Pre-Rolls', slug: 'pre-rolls', parentSlug: 'shop' },
{ name: 'Vaporizers', slug: 'vaporizers', parentSlug: 'shop' },
{ name: 'Concentrates', slug: 'concentrates', parentSlug: 'shop' },
{ name: 'Edibles', slug: 'edibles', parentSlug: 'shop' },
{ name: 'Topicals', slug: 'topicals', parentSlug: 'shop' },
{ name: 'Accessories', slug: 'accessories', parentSlug: 'shop' },
{ name: 'Brands', slug: 'brands', parentSlug: undefined },
{ name: 'Specials', slug: 'specials', parentSlug: undefined }
];
for (const category of DUTCHIE_CATEGORIES) {
let categoryUrl;
if (category.parentSlug) {
// Subcategory: /embedded-menu/{slug}/shop/flower
categoryUrl = `${baseUrl}/${category.parentSlug}/${category.slug}`;
}
else {
// Top-level: /embedded-menu/{slug}/shop
categoryUrl = `${baseUrl}/${category.slug}`;
}
const path = category.parentSlug ? `${category.parentSlug}/${category.slug}` : category.slug;
if (!category.parentSlug) {
// Create parent category
await client.query(`
INSERT INTO categories (store_id, name, slug, dutchie_url, path, scrape_enabled, parent_id)
VALUES ($1, $2, $3, $4, $5, true, NULL)
ON CONFLICT (store_id, slug)
DO UPDATE SET name = $2, dutchie_url = $4, path = $5
RETURNING id
`, [storeId, category.name, category.slug, categoryUrl, path]);
logger_1.logger.info('categories', `📁 ${category.name}`);
}
else {
// Create subcategory
const parentResult = await client.query(`
SELECT id FROM categories
WHERE store_id = $1 AND slug = $2
`, [storeId, category.parentSlug]);
if (parentResult.rows.length > 0) {
const parentId = parentResult.rows[0].id;
await client.query(`
INSERT INTO categories (store_id, name, slug, dutchie_url, path, scrape_enabled, parent_id)
VALUES ($1, $2, $3, $4, $5, true, $6)
ON CONFLICT (store_id, slug)
DO UPDATE SET name = $2, dutchie_url = $4, path = $5, parent_id = $6
`, [storeId, category.name, category.slug, categoryUrl, path, parentId]);
logger_1.logger.info('categories', ` └── ${category.name}`);
}
}
}
await client.query('COMMIT');
logger_1.logger.info('categories', `✅ Created ${DUTCHIE_CATEGORIES.length} Dutchie categories successfully`);
}
catch (error) {
await client.query('ROLLBACK');
logger_1.logger.error('categories', `Failed to create Dutchie categories: ${error}`);
throw error;
}
finally {
client.release();
}
}
/**
* Create categories from discovered links (custom menus)
*/
async createCustomCategories(storeId, store, links) {
const client = await migrate_1.pool.connect();
try {
await client.query('BEGIN');
// Filter to likely category links
const categoryLinks = links.filter(link => link.isCategory);
let displayOrder = 0;
for (const link of categoryLinks) {
// Generate slug from text
const slug = link.text
.toLowerCase()
.replace(/[^a-z0-9]+/g, '-')
.replace(/^-|-$/g, '');
// Determine path from URL
const url = new URL(link.href);
const path = url.pathname.replace(/^\//, '');
await client.query(`
INSERT INTO categories (store_id, name, slug, dutchie_url, path, scrape_enabled, display_order)
VALUES ($1, $2, $3, $4, $5, true, $6)
ON CONFLICT (store_id, slug)
DO UPDATE SET name = $2, dutchie_url = $4, path = $5, display_order = $6
`, [storeId, link.text, slug, link.href, path, displayOrder++]);
logger_1.logger.info('categories', `📁 ${link.text} -> ${link.href}`);
}
await client.query('COMMIT');
logger_1.logger.info('categories', `✅ Created ${categoryLinks.length} custom categories`);
}
catch (error) {
await client.query('ROLLBACK');
throw error;
}
finally {
client.release();
}
}
/**
* Update display_order column in categories table
*/
async ensureDisplayOrderColumn() {
try {
await migrate_1.pool.query(`
ALTER TABLE categories
ADD COLUMN IF NOT EXISTS display_order INTEGER DEFAULT 0
`);
logger_1.logger.info('categories', 'Ensured display_order column exists');
}
catch (error) {
logger_1.logger.warn('categories', `Could not add display_order column: ${error}`);
}
}
}
exports.NavigationDiscovery = NavigationDiscovery;

300
backend/dist/scraper-v2/pipelines.js vendored Normal file
View File

@@ -0,0 +1,300 @@
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
exports.PipelineEngine = exports.StatsPipeline = exports.DatabasePipeline = exports.ImagePipeline = exports.DeduplicationPipeline = exports.SanitizationPipeline = exports.ValidationPipeline = void 0;
const logger_1 = require("../services/logger");
const migrate_1 = require("../db/migrate");
const minio_1 = require("../utils/minio");
/**
* Validation Pipeline - ensures data quality
*/
class ValidationPipeline {
name = 'ValidationPipeline';
priority = 100;
async process(item, spider) {
// Required fields
if (!item.name || item.name.trim().length < 2) {
logger_1.logger.warn('pipeline', `Dropping product: invalid name`);
return null;
}
if (!item.dutchieUrl) {
logger_1.logger.warn('pipeline', `Dropping product ${item.name}: no URL`);
return null;
}
// Validate numeric fields
if (item.price !== undefined && (item.price < 0 || item.price > 10000)) {
logger_1.logger.warn('pipeline', `Invalid price for ${item.name}: ${item.price}`);
item.price = undefined;
}
if (item.thcPercentage !== undefined && (item.thcPercentage < 0 || item.thcPercentage > 100)) {
logger_1.logger.warn('pipeline', `Invalid THC for ${item.name}: ${item.thcPercentage}`);
item.thcPercentage = undefined;
}
if (item.cbdPercentage !== undefined && (item.cbdPercentage < 0 || item.cbdPercentage > 100)) {
logger_1.logger.warn('pipeline', `Invalid CBD for ${item.name}: ${item.cbdPercentage}`);
item.cbdPercentage = undefined;
}
return item;
}
}
exports.ValidationPipeline = ValidationPipeline;
/**
* Sanitization Pipeline - cleans and normalizes data
*/
class SanitizationPipeline {
name = 'SanitizationPipeline';
priority = 90;
async process(item, spider) {
// Truncate long strings
if (item.name) {
item.name = item.name.substring(0, 500).trim();
}
if (item.description) {
item.description = item.description.substring(0, 5000).trim();
}
if (item.brand) {
item.brand = item.brand.substring(0, 255).trim();
}
if (item.weight) {
item.weight = item.weight.substring(0, 100).trim();
}
// Normalize strain type
if (item.strainType) {
const normalized = item.strainType.toLowerCase();
if (normalized.includes('indica')) {
item.strainType = 'Indica';
}
else if (normalized.includes('sativa')) {
item.strainType = 'Sativa';
}
else if (normalized.includes('hybrid')) {
item.strainType = 'Hybrid';
}
else {
item.strainType = undefined;
}
}
// Clean up metadata
if (item.metadata) {
// Remove empty arrays
Object.keys(item.metadata).forEach(key => {
if (Array.isArray(item.metadata[key]) && item.metadata[key].length === 0) {
delete item.metadata[key];
}
});
}
return item;
}
}
exports.SanitizationPipeline = SanitizationPipeline;
/**
* Deduplication Pipeline - prevents duplicate items
*/
class DeduplicationPipeline {
name = 'DeduplicationPipeline';
priority = 80;
seen = new Set();
async process(item, spider) {
const fingerprint = `${item.dutchieProductId}`;
if (this.seen.has(fingerprint)) {
logger_1.logger.debug('pipeline', `Duplicate product detected: ${item.name}`);
return null;
}
this.seen.add(fingerprint);
return item;
}
clear() {
this.seen.clear();
}
}
exports.DeduplicationPipeline = DeduplicationPipeline;
/**
* Image Processing Pipeline - handles image downloads
*/
class ImagePipeline {
name = 'ImagePipeline';
priority = 70;
extractImageId(url) {
try {
const match = url.match(/images\.dutchie\.com\/([a-f0-9]+)/i);
return match ? match[1] : null;
}
catch (e) {
return null;
}
}
getFullSizeImageUrl(imageUrl) {
const imageId = this.extractImageId(imageUrl);
if (!imageId)
return imageUrl;
return `https://images.dutchie.com/${imageId}?auto=format&fit=max&q=95&w=2000&h=2000`;
}
async process(item, spider) {
if (item.imageUrl) {
// Convert to full-size URL
item.imageUrl = this.getFullSizeImageUrl(item.imageUrl);
}
return item;
}
}
exports.ImagePipeline = ImagePipeline;
/**
* Database Pipeline - saves items to database
*/
class DatabasePipeline {
name = 'DatabasePipeline';
priority = 10; // Low priority - runs last
async process(item, spider) {
const client = await migrate_1.pool.connect();
try {
// Extract store and category from metadata (set by spider)
const storeId = item.storeId;
const categoryId = item.categoryId;
if (!storeId || !categoryId) {
logger_1.logger.error('pipeline', `Missing storeId or categoryId for ${item.name}`);
return null;
}
// Check if product exists
const existingResult = await client.query(`
SELECT id, image_url, local_image_path
FROM products
WHERE store_id = $1 AND name = $2 AND category_id = $3
`, [storeId, item.name, categoryId]);
let localImagePath = null;
let productId;
if (existingResult.rows.length > 0) {
// Update existing product
productId = existingResult.rows[0].id;
localImagePath = existingResult.rows[0].local_image_path;
await client.query(`
UPDATE products
SET name = $1, description = $2, price = $3,
strain_type = $4, thc_percentage = $5, cbd_percentage = $6,
brand = $7, weight = $8, image_url = $9, dutchie_url = $10,
in_stock = true, metadata = $11, last_seen_at = CURRENT_TIMESTAMP,
updated_at = CURRENT_TIMESTAMP
WHERE id = $12
`, [
item.name, item.description, item.price,
item.strainType, item.thcPercentage, item.cbdPercentage,
item.brand, item.weight, item.imageUrl, item.dutchieUrl,
JSON.stringify(item.metadata || {}), productId
]);
logger_1.logger.debug('pipeline', `Updated product: ${item.name}`);
}
else {
// Insert new product
const insertResult = await client.query(`
INSERT INTO products (
store_id, category_id, dutchie_product_id, name, description,
price, strain_type, thc_percentage, cbd_percentage,
brand, weight, image_url, dutchie_url, in_stock, metadata
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, true, $14)
RETURNING id
`, [
storeId, categoryId, item.dutchieProductId, item.name, item.description,
item.price, item.strainType, item.thcPercentage, item.cbdPercentage,
item.brand, item.weight, item.imageUrl, item.dutchieUrl,
JSON.stringify(item.metadata || {})
]);
productId = insertResult.rows[0].id;
logger_1.logger.debug('pipeline', `Inserted new product: ${item.name}`);
}
// Download image if needed
if (item.imageUrl && !localImagePath) {
try {
localImagePath = await (0, minio_1.uploadImageFromUrl)(item.imageUrl, productId);
await client.query(`
UPDATE products
SET local_image_path = $1
WHERE id = $2
`, [localImagePath, productId]);
logger_1.logger.debug('pipeline', `Downloaded image for: ${item.name}`);
}
catch (error) {
logger_1.logger.error('pipeline', `Failed to download image for ${item.name}: ${error}`);
}
}
return item;
}
catch (error) {
logger_1.logger.error('pipeline', `Failed to save product ${item.name}: ${error}`);
return null;
}
finally {
client.release();
}
}
}
exports.DatabasePipeline = DatabasePipeline;
/**
* Stats Pipeline - tracks statistics
*/
class StatsPipeline {
name = 'StatsPipeline';
priority = 50;
stats = {
total: 0,
withImages: 0,
withThc: 0,
withCbd: 0,
withDescription: 0
};
async process(item, spider) {
this.stats.total++;
if (item.imageUrl)
this.stats.withImages++;
if (item.thcPercentage)
this.stats.withThc++;
if (item.cbdPercentage)
this.stats.withCbd++;
if (item.description)
this.stats.withDescription++;
return item;
}
getStats() {
return { ...this.stats };
}
clear() {
this.stats = {
total: 0,
withImages: 0,
withThc: 0,
withCbd: 0,
withDescription: 0
};
}
}
exports.StatsPipeline = StatsPipeline;
/**
* Pipeline Engine - orchestrates all pipelines
*/
class PipelineEngine {
pipelines = [];
use(pipeline) {
this.pipelines.push(pipeline);
// Sort by priority (higher first)
this.pipelines.sort((a, b) => b.priority - a.priority);
}
async processItem(item, spider) {
let current = item;
for (const pipeline of this.pipelines) {
try {
current = await pipeline.process(current, spider);
if (!current) {
// Item was filtered out
logger_1.logger.debug('pipeline', `Item filtered by ${pipeline.name}`);
return null;
}
}
catch (error) {
logger_1.logger.error('pipeline', `Error in ${pipeline.name}: ${error}`);
// Continue with other pipelines
}
}
return current;
}
getPipeline(name) {
return this.pipelines.find(p => p.name === name);
}
}
exports.PipelineEngine = PipelineEngine;

136
backend/dist/scraper-v2/scheduler.js vendored Normal file
View File

@@ -0,0 +1,136 @@
"use strict";
var __importDefault = (this && this.__importDefault) || function (mod) {
return (mod && mod.__esModule) ? mod : { "default": mod };
};
Object.defineProperty(exports, "__esModule", { value: true });
exports.RequestScheduler = void 0;
const logger_1 = require("../services/logger");
const crypto_1 = __importDefault(require("crypto"));
class RequestScheduler {
queue = [];
inProgress = new Set();
seen = new Set();
deduplicationEnabled = true;
constructor(deduplicationEnabled = true) {
this.deduplicationEnabled = deduplicationEnabled;
}
/**
* Generate fingerprint for request deduplication
*/
generateFingerprint(request) {
if (request.fingerprint) {
return request.fingerprint;
}
// Generate fingerprint based on URL and relevant metadata
const data = {
url: request.url,
method: request.metadata?.method || 'GET',
body: request.metadata?.body
};
return crypto_1.default.createHash('md5').update(JSON.stringify(data)).digest('hex');
}
/**
* Add a request to the queue
*/
enqueue(partialRequest) {
if (!partialRequest.url) {
logger_1.logger.warn('scraper', 'Cannot enqueue request without URL');
return false;
}
const fingerprint = this.generateFingerprint(partialRequest);
// Check for duplicates
if (this.deduplicationEnabled && this.seen.has(fingerprint)) {
logger_1.logger.debug('scraper', `Request already seen: ${partialRequest.url}`);
return false;
}
// Create full request with defaults
const request = {
url: partialRequest.url,
priority: partialRequest.priority ?? 0,
retryCount: partialRequest.retryCount ?? 0,
maxRetries: partialRequest.maxRetries ?? 3,
metadata: partialRequest.metadata || {},
callback: partialRequest.callback,
errorHandler: partialRequest.errorHandler,
fingerprint
};
this.queue.push(request);
this.seen.add(fingerprint);
// Sort by priority (higher priority first)
this.queue.sort((a, b) => b.priority - a.priority);
logger_1.logger.debug('scraper', `Enqueued: ${request.url} (priority: ${request.priority})`);
return true;
}
/**
* Get the next request from the queue
*/
dequeue() {
const request = this.queue.shift();
if (request) {
this.inProgress.add(request.fingerprint);
}
return request || null;
}
/**
* Mark a request as complete
*/
markComplete(request) {
if (request.fingerprint) {
this.inProgress.delete(request.fingerprint);
}
}
/**
* Requeue a failed request (for retry)
*/
requeueForRetry(request) {
if (request.fingerprint) {
this.inProgress.delete(request.fingerprint);
this.seen.delete(request.fingerprint);
}
request.retryCount++;
if (request.retryCount > request.maxRetries) {
logger_1.logger.warn('scraper', `Max retries exceeded for: ${request.url}`);
return false;
}
// Decrease priority for retried requests
request.priority = Math.max(0, request.priority - 1);
return this.enqueue(request);
}
/**
* Get queue stats
*/
getStats() {
return {
pending: this.queue.length,
inProgress: this.inProgress.size,
total: this.seen.size
};
}
/**
* Check if queue is empty
*/
isEmpty() {
return this.queue.length === 0 && this.inProgress.size === 0;
}
/**
* Clear all queues
*/
clear() {
this.queue = [];
this.inProgress.clear();
this.seen.clear();
}
/**
* Get pending requests count
*/
getPendingCount() {
return this.queue.length;
}
/**
* Get in-progress count
*/
getInProgressCount() {
return this.inProgress.size;
}
}
exports.RequestScheduler = RequestScheduler;

13
backend/dist/scraper-v2/types.js vendored Normal file
View File

@@ -0,0 +1,13 @@
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
exports.ErrorType = void 0;
var ErrorType;
(function (ErrorType) {
ErrorType["NETWORK_ERROR"] = "NETWORK_ERROR";
ErrorType["TIMEOUT"] = "TIMEOUT";
ErrorType["PARSE_ERROR"] = "PARSE_ERROR";
ErrorType["BOT_DETECTION"] = "BOT_DETECTION";
ErrorType["NOT_FOUND"] = "NOT_FOUND";
ErrorType["SERVER_ERROR"] = "SERVER_ERROR";
ErrorType["UNKNOWN"] = "UNKNOWN";
})(ErrorType || (exports.ErrorType = ErrorType = {}));

View File

@@ -0,0 +1,168 @@
"use strict";
var __importDefault = (this && this.__importDefault) || function (mod) {
return (mod && mod.__esModule) ? mod : { "default": mod };
};
Object.defineProperty(exports, "__esModule", { value: true });
exports.discoverCategories = discoverCategories;
const puppeteer_1 = __importDefault(require("puppeteer"));
const migrate_1 = require("../db/migrate");
const logger_1 = require("./logger");
const DUTCHIE_CATEGORIES = [
{ name: 'Shop', slug: 'shop' },
{ name: 'Flower', slug: 'flower', parentSlug: 'shop' },
{ name: 'Pre-Rolls', slug: 'pre-rolls', parentSlug: 'shop' },
{ name: 'Vaporizers', slug: 'vaporizers', parentSlug: 'shop' },
{ name: 'Concentrates', slug: 'concentrates', parentSlug: 'shop' },
{ name: 'Edibles', slug: 'edibles', parentSlug: 'shop' },
{ name: 'Topicals', slug: 'topicals', parentSlug: 'shop' },
{ name: 'Accessories', slug: 'accessories', parentSlug: 'shop' },
{ name: 'Brands', slug: 'brands' },
{ name: 'Specials', slug: 'specials' }
];
async function makePageStealthy(page) {
await page.evaluateOnNewDocument(() => {
Object.defineProperty(navigator, 'webdriver', { get: () => false });
Object.defineProperty(navigator, 'plugins', { get: () => [1, 2, 3, 4, 5] });
Object.defineProperty(navigator, 'languages', { get: () => ['en-US', 'en'] });
window.chrome = { runtime: {} };
});
}
async function isDutchieMenu(page) {
try {
// Check page source for Dutchie markers
const isDutchie = await page.evaluate(() => {
// Check for window.reactEnv with dutchie URLs
if (window.reactEnv) {
const env = window.reactEnv;
if (env.adminUrl?.includes('dutchie.com') ||
env.apiUrl?.includes('dutchie.com') ||
env.consumerUrl?.includes('dutchie.com')) {
return true;
}
}
// Check HTML source for dutchie references
const htmlContent = document.documentElement.innerHTML;
if (htmlContent.includes('admin.dutchie.com') ||
htmlContent.includes('api.dutchie.com') ||
htmlContent.includes('embedded-menu') ||
htmlContent.includes('window.reactEnv')) {
return true;
}
return false;
});
return isDutchie;
}
catch (error) {
logger_1.logger.warn('categories', `Error detecting Dutchie menu: ${error}`);
return false;
}
}
async function discoverCategories(storeId) {
let browser = null;
try {
logger_1.logger.info('categories', `Discovering categories for store ID: ${storeId}`);
const storeResult = await migrate_1.pool.query(`
SELECT id, name, slug, dutchie_url
FROM stores
WHERE id = $1
`, [storeId]);
if (storeResult.rows.length === 0) {
throw new Error('Store not found');
}
const store = storeResult.rows[0];
const baseUrl = store.dutchie_url;
// Launch browser to check page source
browser = await puppeteer_1.default.launch({
headless: 'new',
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-blink-features=AutomationControlled'
]
});
const page = await browser.newPage();
await makePageStealthy(page);
await page.setViewport({ width: 1920, height: 1080 });
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36');
logger_1.logger.info('categories', `Loading page to detect menu type: ${baseUrl}`);
await page.goto(baseUrl, { waitUntil: 'domcontentloaded', timeout: 60000 });
await page.waitForTimeout(3000);
// Detect if it's a Dutchie menu by inspecting page source
const isDutchie = await isDutchieMenu(page);
await browser.close();
browser = null;
if (isDutchie) {
logger_1.logger.info('categories', `✅ Detected Dutchie menu for ${store.name}`);
await createDutchieCategories(storeId, store);
}
else {
logger_1.logger.info('categories', `⚠️ Non-Dutchie menu detected, would need custom scraping logic`);
throw new Error('Non-Dutchie menus not yet supported. Please contact support.');
}
}
catch (error) {
logger_1.logger.error('categories', `Category discovery error: ${error}`);
if (browser)
await browser.close();
throw error;
}
}
async function createDutchieCategories(storeId, store) {
const client = await migrate_1.pool.connect();
try {
await client.query('BEGIN');
logger_1.logger.info('categories', `Creating predefined Dutchie category structure`);
const baseUrl = store.dutchie_url;
for (const category of DUTCHIE_CATEGORIES) {
let categoryUrl;
if (category.parentSlug) {
// Subcategory: /embedded-menu/{slug}/shop/flower
categoryUrl = `${baseUrl}/${category.parentSlug}/${category.slug}`;
}
else {
// Top-level: /embedded-menu/{slug}/shop
categoryUrl = `${baseUrl}/${category.slug}`;
}
const path = category.parentSlug ? `${category.parentSlug}/${category.slug}` : category.slug;
if (!category.parentSlug) {
// Create parent category
await client.query(`
INSERT INTO categories (store_id, name, slug, dutchie_url, path, scrape_enabled, parent_id)
VALUES ($1, $2, $3, $4, $5, true, NULL)
ON CONFLICT (store_id, slug)
DO UPDATE SET name = $2, dutchie_url = $4, path = $5
RETURNING id
`, [storeId, category.name, category.slug, categoryUrl, path]);
logger_1.logger.info('categories', `📁 ${category.name}`);
}
else {
// Create subcategory
const parentResult = await client.query(`
SELECT id FROM categories
WHERE store_id = $1 AND slug = $2
`, [storeId, category.parentSlug]);
if (parentResult.rows.length > 0) {
const parentId = parentResult.rows[0].id;
await client.query(`
INSERT INTO categories (store_id, name, slug, dutchie_url, path, scrape_enabled, parent_id)
VALUES ($1, $2, $3, $4, $5, true, $6)
ON CONFLICT (store_id, slug)
DO UPDATE SET name = $2, dutchie_url = $4, path = $5, parent_id = $6
`, [storeId, category.name, category.slug, categoryUrl, path, parentId]);
logger_1.logger.info('categories', ` └── ${category.name}`);
}
}
}
await client.query('COMMIT');
logger_1.logger.info('categories', `✅ Created ${DUTCHIE_CATEGORIES.length} Dutchie categories successfully`);
}
catch (error) {
await client.query('ROLLBACK');
logger_1.logger.error('categories', `Failed to create Dutchie categories: ${error}`);
throw error;
}
finally {
client.release();
}
}

56
backend/dist/services/logger.js vendored Normal file
View File

@@ -0,0 +1,56 @@
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
exports.logger = void 0;
class LogService {
logs = [];
maxLogs = 1000;
log(level, category, message) {
const entry = {
timestamp: new Date(),
level,
category,
message
};
this.logs.unshift(entry);
if (this.logs.length > this.maxLogs) {
this.logs = this.logs.slice(0, this.maxLogs);
}
const timestamp = entry.timestamp.toISOString();
const prefix = `[${timestamp}] [${category.toUpperCase()}] [${level.toUpperCase()}]`;
if (level === 'error') {
console.error(prefix, message);
}
else if (level === 'warn') {
console.warn(prefix, message);
}
else {
console.log(prefix, message);
}
}
info(category, message) {
this.log('info', category, message);
}
error(category, message) {
this.log('error', category, message);
}
warn(category, message) {
this.log('warn', category, message);
}
debug(category, message) {
this.log('debug', category, message);
}
getLogs(limit = 100, level, category) {
let filtered = this.logs;
if (level) {
filtered = filtered.filter(log => log.level === level);
}
if (category) {
filtered = filtered.filter(log => log.category === category);
}
return filtered.slice(0, limit);
}
clear() {
this.logs = [];
}
}
exports.logger = new LogService();

Some files were not shown because too many files have changed in this diff Show More